Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '(at': 0.04; 'class,': 0.07; 'accelerator': 0.09; 'constructor': 0.09; 'expectation': 0.09; 'raises': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:skip:c 10': 0.09; 'mostly': 0.14; 'behaviour.': 0.16; 'commented': 0.16; 'fallback': 0.16; 'finds,': 0.16; 'from:addr:behnel.de': 0.16; 'from:addr:stefan_ml': 0.16; 'from:name:stefan behnel': 0.16; 'keyerror': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:plane.gmane.org': 0.16; 'received:t-ipconnect.de': 0.16; 'subject:slow': 0.16; 'subtype': 0.16; 'exception': 0.16; 'fix': 0.17; 'wrote:': 0.18; 'stefan': 0.19; '>>>': 0.22; 'import': 0.22; 'header:User-Agent:1': 0.23; 'helper': 0.24; 'mon,': 0.24; 'least': 0.26; 'header:X-Complaints-To:1': 0.27; 'header:In-Reply-To:1': 0.27; 'function': 0.29; 'am,': 0.29; "doesn't": 0.30; "i'm": 0.30; 'apparently': 0.31; 'forces': 0.31; 'overhead': 0.31; 'post.': 0.31; 'this.': 0.32; 'extend': 0.32; 'option': 0.32; 'maybe': 0.34; 'but': 0.35; 'so,': 0.37; '8bit%:86': 0.38; 'to:addr:python-list': 0.38; 'expect': 0.39; "couldn't": 0.39; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'even': 0.60; 'read': 0.60; 'ian': 0.60; 'new': 0.61; 'first': 0.61; 'more': 0.64; 'jul': 0.74; '3.4': 0.84; 'costly': 0.84; 'dict,': 0.84; 'dict.': 0.84; 'received:87.139': 0.84; 'surprised,': 0.84; '2013': 0.98 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Stefan Behnel Subject: Re: collections.Counter surprisingly slow Date: Tue, 30 Jul 2013 08:39:00 +0200 References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Gmane-NNTP-Posting-Host: p578ba676.dip0.t-ipconnect.de User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130623 Thunderbird/17.0.7 In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 26 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1375166352 news.xs4all.nl 15964 [2001:888:2000:d::a6]:39369 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:51545 Serhiy Storchaka, 29.07.2013 21:37: > 29.07.13 20:19, Ian Kelly написав(ла): >> On Mon, Jul 29, 2013 at 5:49 AM, Joshua Landau wrote: >>> Also, couldn't Counter just extend from defaultdict? >> >> It could, but I expect the C helper function in 3.4 will be faster >> since it doesn't even need to call __missing__ in the first place. > > I'm surprised, but the Counter constructor with commented out import of > this accelerator is faster (at least for some data). Read my post. The accelerator doesn't take the fast path for dicts as Counter is only a subtype of dict, not exactly a dict. That means that it raises and catches a KeyError exception for each new value that it finds, and that is apparently more costly than the overhead of calling get(). So, my expectation is that it's faster for highly repetitive data and slower for mostly unique data. Maybe a "fast_dict_lookup" option for the accelerator that forces the fast path would fix this. The Counter class, just like many (most?) other subtypes of dict, definitely doesn't need the fallback behaviour. Stefan