Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #21526
| From | Stefan Behnel <stefan_ml@behnel.de> |
|---|---|
| Subject | Re: html5lib not thread safe. Is the Python SAX library thread-safe? |
| Date | 2012-03-12 11:05 +0100 |
| References | <4f5d0b82$0$11967$742ec2ed@news.sonic.net> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.582.1331546749.3037.python-list@python.org> (permalink) |
John Nagle, 11.03.2012 21:30: > "html5lib" is apparently not thread safe. > (see "http://code.google.com/p/html5lib/issues/detail?id=189") > Looking at the code, I've only found about three problems. > They're all the usual "cached in a global without locking" bug. > A few locks would fix that. > > But html5lib calls the XML SAX parser. Is that thread-safe? > Or is there more trouble down at the bottom? > > (I run a multi-threaded web crawler, and currently use BeautifulSoup, > which is thread safe, although dated. I'm looking at converting to > html5lib.) You may also consider moving to lxml. BeautifulSoup supports it as a parser backend these days, so you wouldn't even have to rewrite your code to use it. And performance-wise, well ... http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/ Stefan
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
html5lib not thread safe. Is the Python SAX library thread-safe? John Nagle <nagle@animats.com> - 2012-03-11 13:30 -0700
Re: html5lib not thread safe. Is the Python SAX library thread-safe? Cameron Simpson <cs@zip.com.au> - 2012-03-12 08:45 +1100
Re: html5lib not thread safe. Is the Python SAX library thread-safe? John Nagle <nagle@animats.com> - 2012-03-11 21:48 -0700
Re: html5lib not thread safe. Is the Python SAX library thread-safe? Paul Rubin <no.email@nospam.invalid> - 2012-03-12 02:39 -0700
Re: html5lib not thread safe. Is the Python SAX library thread-safe? Stefan Behnel <stefan_ml@behnel.de> - 2012-03-12 11:05 +0100
Re: html5lib not thread safe. Is the Python SAX library thread-safe? John Nagle <nagle@animats.com> - 2012-03-12 09:07 -0700
csiph-web