Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #5543 > unrolled thread
| Started by | Andrew Berg <bahamutzero8825@gmail.com> |
|---|---|
| First post | 2011-05-16 20:05 -0500 |
| Last post | 2011-05-16 20:05 -0500 |
| Articles | 1 — 1 participant |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Trying to understand html.parser.HTMLParser Andrew Berg <bahamutzero8825@gmail.com> - 2011-05-16 20:05 -0500
| From | Andrew Berg <bahamutzero8825@gmail.com> |
|---|---|
| Date | 2011-05-16 20:05 -0500 |
| Subject | Re: Trying to understand html.parser.HTMLParser |
| Message-ID | <mailman.1653.1305594331.9059.python-list@python.org> |
On 2011.05.16 02:26 AM, Karim wrote:
> Use regular expression for bad HTLM or beautifulSoup (google it), below
> a exemple to extract all html links:
>
> linksList = re.findall('<a href=(.*?)>.*?</a>',htmlSource)
> for link in linksList:
> print link
I was afraid I might have to use regexes (mostly because I could never
understand them).
Even the BeautifulSoup website itself admits it's awful with Python 3 -
only the admittedly broken 3.1.0 will work with Python 3 at all.
ElementTree doesn't seem to have been updated in a long time, so I'll
assume it won't work with Python 3.
lxml looks promising, but it doesn't say anywhere whether it'll work on
Python 3 or not, which is puzzling since the latest release was only a
couple months ago.
Actually, if I'm going to use regex, I might as well try to implement
Versions* in Python.
Thanks for the answers!
*http://en.totalcmd.pl/download/wfx/net/Versions (original, made for
Total Commander) and
https://addons.mozilla.org/en-US/firefox/addon/versions-wfx_versions/
(clone implemented as a Firefox add-on; it's so wonderful, I even wrote
the docs for it!)
Back to top | Article view | comp.lang.python
csiph-web