Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #21506
| Path | csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!tudelft.nl!txtfeed1.tudelft.nl!multikabel.net!newsfeed20.multikabel.net!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <cameron@cskk.homeip.net> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'classes.': 0.05; 'cpython': 0.05; 'subject:Python': 0.05; 'bug.': 0.07; 'none:': 0.07; 'received:edu.au': 0.07; 'python': 0.08; "'''": 0.09; 'compute': 0.09; 'decorator': 0.09; 'fetch': 0.09; 'foo': 0.09; 'pointers': 0.09; 'rolling': 0.09; 'subject:library': 0.09; 'def': 0.13; 'url:software': 0.13; 'case.': 0.15; 'converting': 0.15; 'tries': 0.15; '"cached': 0.16; '(eg': 0.16; "campbell's": 0.16; 'fetches': 0.16; 'from:addr:cs': 0.16; 'from:addr:zip.com.au': 0.16; 'from:name:cameron simpson': 0.16; 'gil.': 0.16; 'html5lib': 0.16; 'iirc,': 0.16; 'message-id:@cskk.homeip.net': 0.16; 'received:202.125.174': 0.16; 'received:202.125.174.133': 0.16; 'received:boardofstudies.nsw.edu.au': 0.16; 'received:cskk.homeip.net': 0.16; 'received:harvey.boardofstudies.nsw.edu.au': 0.16; 'received:homeip.net': 0.16; 'received:nsw.edu.au': 0.16; 'remarks?': 0.16; 'safe,': 0.16; 'soup': 0.16; 'subject:html5lib': 0.16; 'url:issues': 0.16; 'cc:addr:python-list': 0.16; 'wrote:': 0.18; 'subject:not': 0.19; 'cheers,': 0.20; 'cc:no real name:2**0': 0.21; 'wrote': 0.21; 'header:In-Reply-To:1': 0.22; 'default,': 0.23; 'suspect': 0.24; 'fix': 0.25; 'url:doc': 0.25; 'cc:2**0': 0.26; 'stuff': 0.26; 'code,': 0.28; '(see': 0.28; 'url:code': 0.28; "i'm": 0.28; 'cc:addr:python.org': 0.29; 'bare': 0.30; 'controlled': 0.30; 'lock': 0.30; 'locks': 0.30; 'url:detail': 0.30; 'least': 0.30; 'xml': 0.31; 'subject:?': 0.31; 'shared': 0.31; 'thread': 0.32; "i've": 0.32; 'usual': 0.32; 'there': 0.33; 'header:User-Agent:1': 0.33; 'skip:@ 10': 0.34; 'probably': 0.35; '...': 0.35; 'trouble': 0.35; 'received:au': 0.36; 'run': 0.37; 'but': 0.37; 'charset:us-ascii': 0.37; 'uses': 0.38; 'back.': 0.38; 'first.': 0.39; 'that.': 0.39; 'raw': 0.40; 'put': 0.40; 'john': 0.61; 'more': 0.61; 'url:p': 0.62; 'property': 0.63; 'subject:. ': 0.63; 'here': 0.64; 'day,': 0.65; 'received:202': 0.66; 'safe': 0.70; 'cameron': 0.77; 'you:': 0.82; 'amongst': 0.91; 'interest,': 0.91; 'safe.': 0.95 |
| Date | Mon, 12 Mar 2012 08:45:01 +1100 |
| From | Cameron Simpson <cs@zip.com.au> |
| To | John Nagle <nagle@animats.com> |
| Subject | Re: html5lib not thread safe. Is the Python SAX library thread-safe? |
| MIME-Version | 1.0 |
| Content-Type | text/plain; charset=us-ascii |
| Content-Disposition | inline |
| In-Reply-To | <4f5d0b82$0$11967$742ec2ed@news.sonic.net> |
| User-Agent | Mutt/1.5.21 (2010-09-15) |
| References | <4f5d0b82$0$11967$742ec2ed@news.sonic.net> |
| Cc | python-list@python.org |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.12 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.574.1331502568.3037.python-list@python.org> (permalink) |
| Lines | 64 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1331502568 news.xs4all.nl 6949 [2001:888:2000:d::a6]:39434 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:21506 |
Show key headers only | View raw
On 11Mar2012 13:30, John Nagle <nagle@animats.com> wrote:
| "html5lib" is apparently not thread safe.
| (see "http://code.google.com/p/html5lib/issues/detail?id=189")
| Looking at the code, I've only found about three problems.
| They're all the usual "cached in a global without locking" bug.
| A few locks would fix that.
|
| But html5lib calls the XML SAX parser. Is that thread-safe?
| Or is there more trouble down at the bottom?
|
| (I run a multi-threaded web crawler, and currently use BeautifulSoup,
| which is thread safe, although dated. I'm looking at converting to
| html5lib.)
IIRC, BeautifulSoup4 may do that for you:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#you-need-a-parser
"Beautiful Soup 4 uses html.parser by default, but you can plug in
lxml or html5lib and use that instead."
Just for interest, re locking, I wrote a little decorator the other day,
thus:
@locked_property
def foo(self):
compute foo here ...
return foo value
and am rolling its use out amongst my classes. Code:
def locked_property(func, lock_name='_lock', prop_name=None, unset_object=None):
''' A property whose access is controlled by a lock if unset.
'''
if prop_name is None:
prop_name = '_' + func.func_name
def getprop(self):
''' Attempt lockless fetch of property first.
Use lock if property is unset.
'''
p = getattr(self, prop_name)
if p is unset_object:
with getattr(self, lock_name):
p = getattr(self, prop_name)
if p is unset_object:
p = func(self)
setattr(self, prop_name, p)
return p
return property(getprop)
It tries to be lockless in the common case. I suspect it is only safe in
CPython where there is a GIL. If raw python assignments and fetches can
overlap (eg Jypthon I think?) I probably need shared "read" lock around
the first "p = getattr(self, prop_name). Any remarks?
Cheers,
--
Cameron Simpson <cs@zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/
Ed Campbell's <ed@Tekelex.Com> pointers for long trips:
1. lay out the bare minimum of stuff that you need to take with you, then
put at least half of it back.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
html5lib not thread safe. Is the Python SAX library thread-safe? John Nagle <nagle@animats.com> - 2012-03-11 13:30 -0700
Re: html5lib not thread safe. Is the Python SAX library thread-safe? Cameron Simpson <cs@zip.com.au> - 2012-03-12 08:45 +1100
Re: html5lib not thread safe. Is the Python SAX library thread-safe? John Nagle <nagle@animats.com> - 2012-03-11 21:48 -0700
Re: html5lib not thread safe. Is the Python SAX library thread-safe? Paul Rubin <no.email@nospam.invalid> - 2012-03-12 02:39 -0700
Re: html5lib not thread safe. Is the Python SAX library thread-safe? Stefan Behnel <stefan_ml@behnel.de> - 2012-03-12 11:05 +0100
Re: html5lib not thread safe. Is the Python SAX library thread-safe? John Nagle <nagle@animats.com> - 2012-03-12 09:07 -0700
csiph-web