Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #35799 > unrolled thread

Re: Noob trying to parse bad HTML using xml.etree.ElementTree

Started byChris Angelico <rosuav@gmail.com>
First post2012-12-30 21:07 +1100
Last post2012-12-30 21:07 +1100
Articles 1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Noob trying to parse bad HTML using xml.etree.ElementTree Chris Angelico <rosuav@gmail.com> - 2012-12-30 21:07 +1100

#35799 — Re: Noob trying to parse bad HTML using xml.etree.ElementTree

FromChris Angelico <rosuav@gmail.com>
Date2012-12-30 21:07 +1100
SubjectRe: Noob trying to parse bad HTML using xml.etree.ElementTree
Message-ID<mailman.1461.1356862061.29569.python-list@python.org>
On Sun, Dec 30, 2012 at 8:52 PM, Morten Guldager
<morten.guldager@gmail.com> wrote:
> Question is if it's possible to tweak xml.etree.ElementTree to accept, and
> understand sloppy html, or if you have suggestions for similar easy to use
> framework, preferably among the included batteries?
>

Check out BeautifulSoup, it's fairly good at dealing with messy input.

ChrisA

[toc] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web