Path: csiph.com!usenet.pasdenom.info!news.albasani.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '(at': 0.04; 'class,': 0.07; 'accelerator': 0.09; 'constructor': 0.09; 'expectation': 0.09; 'raises': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:skip:c 10': 0.09; 'mostly': 0.14; '(before': 0.16; 'behaviour.': 0.16; 'commented': 0.16; 'data)': 0.16; 'fallback': 0.16; 'finds,': 0.16; 'from:addr:behnel.de': 0.16; 'from:addr:stefan_ml': 0.16; 'from:name:stefan behnel': 0.16; 'keyerror': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:plane.gmane.org': 0.16; 'received:t-ipconnect.de': 0.16; 'subject:slow': 0.16; 'subtype': 0.16; 'exception': 0.16; 'fix': 0.17; 'wrote:': 0.18; 'stefan': 0.19; '>>>': 0.22; 'import': 0.22; 'header:User- Agent:1': 0.23; 'helper': 0.24; 'mon,': 0.24; 'least': 0.26; 'header:X-Complaints-To:1': 0.27; 'header:In-Reply-To:1': 0.27; 'function': 0.29; 'url:bugs': 0.29; 'am,': 0.29; "doesn't": 0.30; "i'm": 0.30; 'code': 0.31; '>>>>': 0.31; 'apparently': 0.31; 'forces': 0.31; 'overhead': 0.31; 'post.': 0.31; 'this.': 0.32; 'extend': 0.32; 'option': 0.32; 'url:python': 0.33; 'not.': 0.33; 'maybe': 0.34; 'but': 0.35; 'url:org': 0.36; 'so,': 0.37; '8bit%:86': 0.38; 'to:addr:python-list': 0.38; 'rather': 0.38; 'expect': 0.39; "couldn't": 0.39; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'even': 0.60; 'read': 0.60; 'ian': 0.60; 'new': 0.61; 'first': 0.61; 'more': 0.64; 'worth': 0.66; 'jul': 0.74; '3.4': 0.84; 'costly': 0.84; 'dict,': 0.84; 'dict.': 0.84; 'duplication': 0.84; 'received:87.139': 0.84; 'surprised,': 0.84; '2013': 0.98 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Stefan Behnel Subject: Re: collections.Counter surprisingly slow Date: Tue, 30 Jul 2013 08:51:48 +0200 References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Gmane-NNTP-Posting-Host: p578ba676.dip0.t-ipconnect.de User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130623 Thunderbird/17.0.7 In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 33 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1375167121 news.xs4all.nl 15982 [2001:888:2000:d::a6]:42758 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:51546 Stefan Behnel, 30.07.2013 08:39: > Serhiy Storchaka, 29.07.2013 21:37: >> 29.07.13 20:19, Ian Kelly написав(ла): >>> On Mon, Jul 29, 2013 at 5:49 AM, Joshua Landau wrote: >>>> Also, couldn't Counter just extend from defaultdict? >>> >>> It could, but I expect the C helper function in 3.4 will be faster >>> since it doesn't even need to call __missing__ in the first place. >> >> I'm surprised, but the Counter constructor with commented out import of >> this accelerator is faster (at least for some data). > > Read my post. The accelerator doesn't take the fast path for dicts as > Counter is only a subtype of dict, not exactly a dict. That means that it > raises and catches a KeyError exception for each new value that it finds, > and that is apparently more costly than the overhead of calling get(). > > So, my expectation is that it's faster for highly repetitive data and > slower for mostly unique data. > > Maybe a "fast_dict_lookup" option for the accelerator that forces the fast > path would fix this. The Counter class, just like many (most?) other > subtypes of dict, definitely doesn't need the fallback behaviour. Or rather drop the fallback path completely. It's not worth having code duplication if it's not predictable up-front (before looking at the data) if it will help or not. http://bugs.python.org/issue18594 Stefan