Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #26780 > unrolled thread
| Started by | Andreas Perstinger <andipersti@gmail.com> |
|---|---|
| First post | 2012-08-09 09:25 +0200 |
| Last post | 2012-08-09 09:25 +0200 |
| Articles | 1 — 1 participant |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Beautiful Soup Table Parsing Andreas Perstinger <andipersti@gmail.com> - 2012-08-09 09:25 +0200
| From | Andreas Perstinger <andipersti@gmail.com> |
|---|---|
| Date | 2012-08-09 09:25 +0200 |
| Subject | Re: Beautiful Soup Table Parsing |
| Message-ID | <mailman.3095.1344497153.4697.python-list@python.org> |
On 09.08.2012 01:58, Tom Russell wrote:
> For instance this code below:
>
> soup = BeautifulSoup(urlopen('http://online.wsj.com/mdc/public/page/2_3021-tradingdiary2.html?mod=mdc_pastcalendar'))
>
> table = soup.find("table",{"class": "mdcTable"})
> for row in table.findAll("tr"):
> for cell in row.findAll("td"):
> print cell.findAll(text=True)
>
> brings in a list that looks like this:
[snip]
> What I want to do is only be getting the data for NYSE and nothing
> else so I do not know if that's possible or not. Also I want to do
> something like:
>
> If cell.contents[0] == "Advances":
> Advances = next cell or whatever??---> this part I am not sure how to do.
>
> Can someone help point me in the right direction to get the first data
> point for the Advances row? I have others I will get as well but
> figure once I understand how to do this I can do the rest.
To get the header row you could do something like:
header_row = table.find(lambda tag: tag.td.string == "NYSE")
From there you can look for the next row you are interested in:
advances_row = header_row.findNextSibling(lambda tag: tag.td.string ==
"Advances")
You could also iterate through all next siblings of the header_row:
for row in header_row.findNextSiblings("tr"):
# do something
Bye, Andreas
Back to top | Article view | comp.lang.python
csiph-web