Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #35799

Re: Noob trying to parse bad HTML using xml.etree.ElementTree

References <CACQBTrQqJmNLAfvTpdxWMmLPy2kTQNrcDmNET_hA5c-8Axtm-Q@mail.gmail.com>
Date 2012-12-30 21:07 +1100
Subject Re: Noob trying to parse bad HTML using xml.etree.ElementTree
From Chris Angelico <rosuav@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.1461.1356862061.29569.python-list@python.org> (permalink)

Show all headers | View raw


On Sun, Dec 30, 2012 at 8:52 PM, Morten Guldager
<morten.guldager@gmail.com> wrote:
> Question is if it's possible to tweak xml.etree.ElementTree to accept, and
> understand sloppy html, or if you have suggestions for similar easy to use
> framework, preferably among the included batteries?
>

Check out BeautifulSoup, it's fairly good at dealing with messy input.

ChrisA

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Noob trying to parse bad HTML using xml.etree.ElementTree Chris Angelico <rosuav@gmail.com> - 2012-12-30 21:07 +1100

csiph-web