Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #22072 > unrolled thread
| Started by | Sangeet <mrsangeet@gmail.com> |
|---|---|
| First post | 2012-03-23 06:52 -0700 |
| Last post | 2012-03-24 14:04 -0700 |
| Articles | 5 — 5 participants |
Back to article view | Back to comp.lang.python
Fetching data from a HTML file Sangeet <mrsangeet@gmail.com> - 2012-03-23 06:52 -0700
RE: Fetching data from a HTML file "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-23 15:08 +0000
Re: Fetching data from a HTML file Daniel Fetchinson <fetchinson@googlemail.com> - 2012-03-23 16:28 +0100
Re: Fetching data from a HTML file Jon Clements <joncle@googlemail.com> - 2012-03-23 22:12 -0700
Re: Fetching data from a HTML file John Nagle <nagle@animats.com> - 2012-03-24 14:04 -0700
| From | Sangeet <mrsangeet@gmail.com> |
|---|---|
| Date | 2012-03-23 06:52 -0700 |
| Subject | Fetching data from a HTML file |
| Message-ID | <9362386.1094.1332510725414.JavaMail.geo-discussion-forums@ynlt15> |
Hi, I've got to fetch data from the snippet below and have been trying to match the digits in this to specifically to specific groups. But I can't seem to figure how to go about stripping the tags! :( <tr><td align="center"><b>Sum</b></td><td></td><td align='center' class="green">245</td><td align='center' class="red">11</td><td align='center'>0</td><td align='center' >256</td><td align='center' >1.496 [min]</td></tr> </table> Actually, I'm working on ROBOT Framework, and haven't been able to figure out how to read data from HTML tables. Reading from the source, is the best (read rudimentary) way I could come up with. Any suggestions are welcome! Thanks, Sangeet
[toc] | [next] | [standalone]
| From | "Prasad, Ramit" <ramit.prasad@jpmorgan.com> |
|---|---|
| Date | 2012-03-23 15:08 +0000 |
| Message-ID | <mailman.928.1332515505.3037.python-list@python.org> |
| In reply to | #22072 |
> Actually, I'm working on ROBOT Framework, and haven't been able to figure > out how to read data from HTML tables. Reading from the source, is the best > (read rudimentary) way I could come up with. Any suggestions are welcome! > I've got to fetch data from the snippet below and have been trying to match > the digits in this to specifically to specific groups. But I can't seem to > figure how to go about stripping the tags! :( In addition to Simon's response. You may want to look at Beautiful Soup which I hear is good at dealing with malformed HTML. http://www.crummy.com/software/BeautifulSoup/ Ramit Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology 712 Main Street | Houston, TX 77002 work phone: 713 - 216 - 5423 -- This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.
[toc] | [prev] | [next] | [standalone]
| From | Daniel Fetchinson <fetchinson@googlemail.com> |
|---|---|
| Date | 2012-03-23 16:28 +0100 |
| Message-ID | <mailman.932.1332516539.3037.python-list@python.org> |
| In reply to | #22072 |
On 3/23/12, Sangeet <mrsangeet@gmail.com> wrote: > Hi, > > I've got to fetch data from the snippet below and have been trying to match > the digits in this to specifically to specific groups. But I can't seem to > figure how to go about stripping the tags! :( > > <tr><td align="center"><b>Sum</b></td><td></td><td align='center' > class="green">245</td><td align='center' class="red">11</td><td > align='center'>0</td><td align='center' >256</td><td align='center' >1.496 > [min]</td></tr> > </table> Try beautiful soup: http://www.crummy.com/software/BeautifulSoup/ > Actually, I'm working on ROBOT Framework, and haven't been able to figure > out how to read data from HTML tables. Reading from the source, is the best > (read rudimentary) way I could come up with. Any suggestions are welcome! > > Thanks, > Sangeet > -- > http://mail.python.org/mailman/listinfo/python-list > -- Psss, psss, put it down! - http://www.cafepress.com/putitdown
[toc] | [prev] | [next] | [standalone]
| From | Jon Clements <joncle@googlemail.com> |
|---|---|
| Date | 2012-03-23 22:12 -0700 |
| Message-ID | <18618102.2255.1332565966684.JavaMail.geo-discussion-forums@vbtv42> |
| In reply to | #22072 |
On Friday, 23 March 2012 13:52:05 UTC, Sangeet wrote:
> Hi,
>
> I've got to fetch data from the snippet below and have been trying to match the digits in this to specifically to specific groups. But I can't seem to figure how to go about stripping the tags! :(
>
> <tr><td align="center"><b>Sum</b></td><td></td><td align='center' class="green">245</td><td align='center' class="red">11</td><td align='center'>0</td><td align='center' >256</td><td align='center' >1.496 [min]</td></tr>
> </table>
>
> Actually, I'm working on ROBOT Framework, and haven't been able to figure out how to read data from HTML tables. Reading from the source, is the best (read rudimentary) way I could come up with. Any suggestions are welcome!
>
> Thanks,
> Sangeet
I would personally use lxml - a quick example:
# -*- coding: utf-8 -*-
import lxml.html
text = """
<tr><td align="center"><b>Sum</b></td><td></td><td align='center' class="green">245</td><td align='center' class="red">11</td><td align='center'>0</td><td align='center' >256</td><td align='center' >1.496 [min]</td></tr>
</table>
"""
table = lxml.html.fromstring(text)
for tr in table.xpath('//tr'):
print [ (el.get('class', ''), el.text_content()) for el in tr.iterfind('td') ]
[('', 'Sum'), ('', ''), ('green', '245'), ('red', '11'), ('', '0'), ('', '256'), ('', '1.496 [min]')]
It does a reasonable job, but if it doesn't work quite right, then there's a .fromstring(parser=...) option, and you should be able to pass in ElementSoup and try your luck from there.
hth,
Jon.
[toc] | [prev] | [next] | [standalone]
| From | John Nagle <nagle@animats.com> |
|---|---|
| Date | 2012-03-24 14:04 -0700 |
| Message-ID | <4f6e36c0$0$11966$742ec2ed@news.sonic.net> |
| In reply to | #22113 |
On 3/23/2012 10:12 PM, Jon Clements wrote:
> ROBOT Framework
Would people please stop using robotic names for
things that aren't robots? Thank you.
John Nagle
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web