Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!eweka.nl!lightspeed.eweka.nl!194.109.133.87.MISMATCH!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
Sender: joshua.landau.ws@gmail.com
In-Reply-To: <kt5knb$9tj$1@ger.gmane.org>
References: <roy-8C60F5.15590428072013@news.panix.com> <51f5843f$0$29971$c3e8da3$5496439d@news.astraweb.com> <kt5knb$9tj$1@ger.gmane.org>
From: Joshua Landau <joshua@landau.ws>
Date: Mon, 29 Jul 2013 13:07:48 +0100
Subject: Re: collections.Counter surprisingly slow
To: Stefan Behnel <stefan_ml@behnel.de>
Content-Type: multipart/alternative; boundary=001a11c34da0a8d3f104e2a55915
Cc: python-list <python-list@python.org>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.5228.1375099712.3114.python-list@python.org>
Lines: 122
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:51438

--001a11c34da0a8d3f104e2a55915
Content-Type: text/plain; charset=UTF-8

On 29 July 2013 12:46, Stefan Behnel <stefan_ml@behnel.de> wrote:

> Steven D'Aprano, 28.07.2013 22:51:
> > Calling Counter ends up calling essentially this code:
> >
> > for elem in iterable:
> >     self[elem] = self.get(elem, 0) + 1
> >
> > (although micro-optimized), where "iterable" is your data (lines).
> > Calling the get method has higher overhead than dict[key], that will also
> > contribute.
>
> It comes with a C accelerator (at least in Py3.4dev), but it seems like
> that stumbles a bit over its own feet. The accelerator function special
> cases the (exact) dict type, but the Counter class is a subtype of dict and
> thus takes the generic path, which makes it benefit a bit less than
> possible.
>
> Look for _count_elements() in
>
> http://hg.python.org/cpython/file/tip/Modules/_collectionsmodule.c
>
> Nevertheless, even the generic C code path looks fast enough in general. I
> think the problem is just that the OP used Python 2.7, which doesn't have
> this accelerator function.
>

# _count_elements({}, items), _count_elements(dict_subclass(), items),
Counter(items), defaultdict(int) loop with exception handling
# "items" is always 1m long with varying levels of repetition

>>> for items in randoms:
... helper.timeit(1), helper_subclass.timeit(1), counter.timeit(1),
default.timeit(1)
...
(0.18816172199876746, 0.4679023139997298, 0.9684444869999425,
0.33518486200046027)
(0.2936601179990248, 0.6056111739999324, 1.1316078849995392,
0.46283868699902087)
(0.35396358400066674, 0.685048443998312, 1.2120939880005608,
0.5497965239992482)
(0.5337620789996436, 0.8658702100001392, 1.4507492869997805,
0.7772859329998028)
(0.745282343999861, 1.1455801379997865, 2.116569702000561,
1.3293145009993168)

:(

I have the helper but Counter is still slow. Is it not getting used for
some reason? It's not even as fast as helper on a dict's (direct, no
overridden methods) subclass.

--001a11c34da0a8d3f104e2a55915
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On 29 July 2013 12:46, Stefan Behnel <span dir=3D"ltr">&lt=
;<a href=3D"mailto:stefan_ml@behnel.de" target=3D"_blank">stefan_ml@behnel.=
de</a>&gt;</span> wrote:<br><div class=3D"gmail_extra"><div class=3D"gmail_=
quote"><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;=
border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:=
solid;padding-left:1ex">


Steven D&#39;Aprano, 28.07.2013 22:51:<br>
<div>&gt; Calling Counter ends up calling essentially this code:<br>
&gt;<br>
&gt; for elem in iterable:<br>
&gt; =C2=A0 =C2=A0 self[elem] =3D self.get(elem, 0) + 1<br>
&gt;<br>
&gt; (although micro-optimized), where &quot;iterable&quot; is your data (l=
ines).<br>
&gt; Calling the get method has higher overhead than dict[key], that will a=
lso<br>
&gt; contribute.<br>
<br>
</div>It comes with a C accelerator (at least in Py3.4dev), but it seems li=
ke<br>
that stumbles a bit over its own feet. The accelerator function special<br>
cases the (exact) dict type, but the Counter class is a subtype of dict and=
<br>
thus takes the generic path, which makes it benefit a bit less than possibl=
e.<br>
<br>
Look for _count_elements() in<br>
<br>
<a href=3D"http://hg.python.org/cpython/file/tip/Modules/_collectionsmodule=
.c" target=3D"_blank">http://hg.python.org/cpython/file/tip/Modules/_collec=
tionsmodule.c</a><br>
<br>
Nevertheless, even the generic C code path looks fast enough in general. I<=
br>
think the problem is just that the OP used Python 2.7, which doesn&#39;t ha=
ve<br>
this accelerator function.<br></blockquote><div><br></div><div># _count_ele=
ments({}, items), _count_elements(dict_subclass(), items), Counter(items), =
defaultdict(int) loop with exception handling</div><div># &quot;items&quot;=
 is always 1m long with varying levels of repetition</div>

<div><br></div><div><div>&gt;&gt;&gt; for items in randoms:</div><div>... <=
span class=3D"" style=3D"white-space:pre">	</span>helper.timeit(1), helper_=
subclass.timeit(1), counter.timeit(1), default.timeit(1)</div><div>...=C2=
=A0</div>

<div>(0.18816172199876746, 0.4679023139997298, 0.9684444869999425, 0.335184=
86200046027)</div><div>(0.2936601179990248, 0.6056111739999324, 1.131607884=
9995392, 0.46283868699902087)</div><div>(0.35396358400066674, 0.68504844399=
8312, 1.2120939880005608, 0.5497965239992482)</div>

<div>(0.5337620789996436, 0.8658702100001392, 1.4507492869997805, 0.7772859=
329998028)</div><div>(0.745282343999861, 1.1455801379997865, 2.116569702000=
561, 1.3293145009993168)</div></div><div><br></div><div>:(</div><div><br>

</div><div>I have the helper but Counter is still slow. Is it not getting u=
sed for some reason? It&#39;s not even as fast as helper on a dict&#39;s (d=
irect, no overridden methods) subclass.</div>
</div></div></div>

--001a11c34da0a8d3f104e2a55915--