Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #22072 > unrolled thread

Fetching data from a HTML file

Started bySangeet <mrsangeet@gmail.com>
First post2012-03-23 06:52 -0700
Last post2012-03-24 14:04 -0700
Articles 5 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  Fetching data from a HTML file Sangeet <mrsangeet@gmail.com> - 2012-03-23 06:52 -0700
    RE: Fetching data from a HTML file "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-23 15:08 +0000
    Re: Fetching data from a HTML file Daniel Fetchinson <fetchinson@googlemail.com> - 2012-03-23 16:28 +0100
    Re: Fetching data from a HTML file Jon Clements <joncle@googlemail.com> - 2012-03-23 22:12 -0700
      Re: Fetching data from a HTML file John Nagle <nagle@animats.com> - 2012-03-24 14:04 -0700

#22072 — Fetching data from a HTML file

FromSangeet <mrsangeet@gmail.com>
Date2012-03-23 06:52 -0700
SubjectFetching data from a HTML file
Message-ID<9362386.1094.1332510725414.JavaMail.geo-discussion-forums@ynlt15>
Hi,

I've got to fetch data from the snippet below and have been trying to match the digits in this to specifically to specific groups. But I can't seem to figure how to go about stripping the tags! :(

<tr><td align="center"><b>Sum</b></td><td></td><td align='center' class="green">245</td><td align='center' class="red">11</td><td align='center'>0</td><td align='center' >256</td><td align='center' >1.496 [min]</td></tr>
</table>

Actually, I'm working on ROBOT Framework, and haven't been able to figure out how to read data from HTML tables. Reading from the source, is the best (read rudimentary) way I could come up with. Any suggestions are welcome!

Thanks,
Sangeet

[toc] | [next] | [standalone]


#22075

From"Prasad, Ramit" <ramit.prasad@jpmorgan.com>
Date2012-03-23 15:08 +0000
Message-ID<mailman.928.1332515505.3037.python-list@python.org>
In reply to#22072
> Actually, I'm working on ROBOT Framework, and haven't been able to figure
> out how to read data from HTML tables. Reading from the source, is the best
> (read rudimentary) way I could come up with. Any suggestions are welcome!

> I've got to fetch data from the snippet below and have been trying to match
> the digits in this to specifically to specific groups. But I can't seem to
> figure how to go about stripping the tags! :(

In addition to Simon's response. You may want to look at Beautiful Soup 
which I hear is good at dealing with malformed HTML.
http://www.crummy.com/software/BeautifulSoup/



Ramit


Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423

--
This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  

[toc] | [prev] | [next] | [standalone]


#22078

FromDaniel Fetchinson <fetchinson@googlemail.com>
Date2012-03-23 16:28 +0100
Message-ID<mailman.932.1332516539.3037.python-list@python.org>
In reply to#22072
On 3/23/12, Sangeet <mrsangeet@gmail.com> wrote:
> Hi,
>
> I've got to fetch data from the snippet below and have been trying to match
> the digits in this to specifically to specific groups. But I can't seem to
> figure how to go about stripping the tags! :(
>
> <tr><td align="center"><b>Sum</b></td><td></td><td align='center'
> class="green">245</td><td align='center' class="red">11</td><td
> align='center'>0</td><td align='center' >256</td><td align='center' >1.496
> [min]</td></tr>
> </table>

Try beautiful soup: http://www.crummy.com/software/BeautifulSoup/

> Actually, I'm working on ROBOT Framework, and haven't been able to figure
> out how to read data from HTML tables. Reading from the source, is the best
> (read rudimentary) way I could come up with. Any suggestions are welcome!
>
> Thanks,
> Sangeet
> --
> http://mail.python.org/mailman/listinfo/python-list
>


-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown

[toc] | [prev] | [next] | [standalone]


#22113

FromJon Clements <joncle@googlemail.com>
Date2012-03-23 22:12 -0700
Message-ID<18618102.2255.1332565966684.JavaMail.geo-discussion-forums@vbtv42>
In reply to#22072
On Friday, 23 March 2012 13:52:05 UTC, Sangeet  wrote:
> Hi,
> 
> I've got to fetch data from the snippet below and have been trying to match the digits in this to specifically to specific groups. But I can't seem to figure how to go about stripping the tags! :(
> 
> <tr><td align="center"><b>Sum</b></td><td></td><td align='center' class="green">245</td><td align='center' class="red">11</td><td align='center'>0</td><td align='center' >256</td><td align='center' >1.496 [min]</td></tr>
> </table>
> 
> Actually, I'm working on ROBOT Framework, and haven't been able to figure out how to read data from HTML tables. Reading from the source, is the best (read rudimentary) way I could come up with. Any suggestions are welcome!
> 
> Thanks,
> Sangeet

I would personally use lxml - a quick example:

# -*- coding: utf-8 -*-
import lxml.html

text = """
<tr><td align="center"><b>Sum</b></td>​<td></td><td align='center' class="green">245</td><td align='center' class="red">11</td><td align='center'>0</td><td align='center' >256</td><td align='center' >1.496 [min]</td></tr>
</table>
"""

table = lxml.html.fromstring(text)
for tr in table.xpath('//tr'):
    print [ (el.get('class', ''), el.text_content()) for el in tr.iterfind('td') ]

[('', 'Sum'), ('', ''), ('green', '245'), ('red', '11'), ('', '0'), ('', '256'), ('', '1.496 [min]')]

It does a reasonable job, but if it doesn't work quite right, then there's a .fromstring(parser=...) option, and you should be able to pass in ElementSoup and try your luck from there. 

hth,

Jon.

[toc] | [prev] | [next] | [standalone]


#22125

FromJohn Nagle <nagle@animats.com>
Date2012-03-24 14:04 -0700
Message-ID<4f6e36c0$0$11966$742ec2ed@news.sonic.net>
In reply to#22113
On 3/23/2012 10:12 PM, Jon Clements wrote:
> ROBOT Framework

    Would people please stop using robotic names for
things that aren't robots?  Thank you.

				John Nagle

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web