python - Convert html table to dictionary without losing structure -


i'm new python (and programming) , using beautifulsoup first time.

i'm trying find best way parse contents of table in html , convert dictionary - ideally in least brittle way.

here example of html i'm trying parse (i've put key value numbers text i'm trying pick up).

<div class="tablename"> <table border="0" cellpadding="0" cellspacing="0" style="border: 1px solid #dddddd;  border-collapse: collapse; font-family: arial, helvetica, sans-serif; font-size: 14px; margin: 0; padding: 0; width: 100%"> <thead> <tr> <th colspan="4" style="background-color: #000; border: 1px solid #616161; color: #ffffff; font-size: 14px; font-weight: bold; line-height: 20px; padding: 14px 20px 12px 20px; text-align: left">some text not needed</th> </tr> </thead> <tbody> <tr> <td style="width: 20px"> </td> <td style="border-bottom: 1px solid #dddddd; color: #666666; font-size: 14px; line-height: 20px; padding: 11px 20px 10px 0; text-align: left; width: 42.5%; vertical-align: middle">key 1</td> <td style="border-bottom: 1px solid #dddddd; color: #000; font-size: 14px; line-height: 20px; padding: 11px 0 10px 0; text-align: left; vertical-align: middle">value 1</td> <td style="width: 20px"> </td> </tr> <tr> <td> </td> <td style="border-bottom: 1px solid #dddddd; color: #666666; font-size: 14px; line-height: 20px; padding: 11px 20px 10px 0; text-align: left; vertical-align: middle">key 2</td> <td style="border-bottom: 1px solid #dddddd; color: #000; font-size: 14px; line-height: 20px; padding: 11px 0 10px 0; text-align: left; vertical-align: middle">value 2</td> <td> </td> </tr> <tr> <td> </td> <td style="border-bottom: 1px solid #dddddd; color: #666666; font-size: 14px; line-height: 20px; padding: 11px 20px 10px 0; text-align: left; vertical-align: middle">key 3</td> <td style="border-bottom: 1px solid #dddddd; color: #000; font-size: 14px; line-height: 20px; padding: 11px 0 10px 0; text-align: left; vertical-align: middle">value 3</td> <td> </td> </tr> <tr> 

and code i'm using:

import requests bs4 import beautifulsoup  html = requests.get('https://examplewebaddress.com') soup = beautifulsoup(html.text) print(soup.tbody.text) 

i loop on soup.tbody.text string , split key value pairs. doesn't seem way , seem losing structure of table converting string , building again dictionary.

is there more direct way parse table beautifulsoup (or more suitable) dictionary can use?

the idea iterate on table rows , each row extract the text of second , third cells represent key , value of future dictionary:

soup = beautifulsoup(html.text)  result = dict([[item.get_text(strip=true) item in row.find_all('td')[1:3]]                row in soup.select("div.tablename table tr")[1:]])  print result 

for provided sample data, prints:

{u'key 1': u'value 1', u'key 2': u'value 2', u'key 3': u'value 3'} 

div.tablename table tr css selector match tr elements under table element has div class="tablename" parent. slicing result of select ([1:]) skip first header row.


Comments

Popular posts from this blog

How to run C# code using mono without Xamarin in Android? -

c# - SharpSsh Command Execution -

python - Specify path of savefig with pylab or matplotlib -