Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #22113
| Path | csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!npeer01.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail |
|---|---|
| From | Jon Clements <joncle@googlemail.com> |
| Newsgroups | comp.lang.python |
| Subject | Re: Fetching data from a HTML file |
| Date | Fri, 23 Mar 2012 22:12:46 -0700 (PDT) |
| Organization | http://groups.google.com |
| Lines | 48 |
| Message-ID | <18618102.2255.1332565966684.JavaMail.geo-discussion-forums@vbtv42> (permalink) |
| References | <9362386.1094.1332510725414.JavaMail.geo-discussion-forums@ynlt15> |
| NNTP-Posting-Host | 86.156.91.130 |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=UTF-8 |
| Content-Transfer-Encoding | quoted-printable |
| X-Trace | posting.google.com 1332565966 6165 127.0.0.1 (24 Mar 2012 05:12:46 GMT) |
| X-Complaints-To | groups-abuse@google.com |
| NNTP-Posting-Date | Sat, 24 Mar 2012 05:12:46 +0000 (UTC) |
| In-Reply-To | <9362386.1094.1332510725414.JavaMail.geo-discussion-forums@ynlt15> |
| Complaints-To | groups-abuse@google.com |
| Injection-Info | glegroupsg2000goo.googlegroups.com; posting-host=86.156.91.130; posting-account=HLD_OAoAAAD-0RilNRZUjdKEwXt97Q9q |
| User-Agent | G2/1.0 |
| X-Received-Bytes | 2774 |
| Xref | csiph.com comp.lang.python:22113 |
Show key headers only | View raw
On Friday, 23 March 2012 13:52:05 UTC, Sangeet wrote:
> Hi,
>
> I've got to fetch data from the snippet below and have been trying to match the digits in this to specifically to specific groups. But I can't seem to figure how to go about stripping the tags! :(
>
> <tr><td align="center"><b>Sum</b></td><td></td><td align='center' class="green">245</td><td align='center' class="red">11</td><td align='center'>0</td><td align='center' >256</td><td align='center' >1.496 [min]</td></tr>
> </table>
>
> Actually, I'm working on ROBOT Framework, and haven't been able to figure out how to read data from HTML tables. Reading from the source, is the best (read rudimentary) way I could come up with. Any suggestions are welcome!
>
> Thanks,
> Sangeet
I would personally use lxml - a quick example:
# -*- coding: utf-8 -*-
import lxml.html
text = """
<tr><td align="center"><b>Sum</b></td><td></td><td align='center' class="green">245</td><td align='center' class="red">11</td><td align='center'>0</td><td align='center' >256</td><td align='center' >1.496 [min]</td></tr>
</table>
"""
table = lxml.html.fromstring(text)
for tr in table.xpath('//tr'):
print [ (el.get('class', ''), el.text_content()) for el in tr.iterfind('td') ]
[('', 'Sum'), ('', ''), ('green', '245'), ('red', '11'), ('', '0'), ('', '256'), ('', '1.496 [min]')]
It does a reasonable job, but if it doesn't work quite right, then there's a .fromstring(parser=...) option, and you should be able to pass in ElementSoup and try your luck from there.
hth,
Jon.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Fetching data from a HTML file Sangeet <mrsangeet@gmail.com> - 2012-03-23 06:52 -0700
RE: Fetching data from a HTML file "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-23 15:08 +0000
Re: Fetching data from a HTML file Daniel Fetchinson <fetchinson@googlemail.com> - 2012-03-23 16:28 +0100
Re: Fetching data from a HTML file Jon Clements <joncle@googlemail.com> - 2012-03-23 22:12 -0700
Re: Fetching data from a HTML file John Nagle <nagle@animats.com> - 2012-03-24 14:04 -0700
csiph-web