Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #51405

collections.Counter surprisingly slow

From Roy Smith <roy@panix.com>
Newsgroups comp.lang.python
Subject collections.Counter surprisingly slow
Date 2013-07-28 15:59 -0400
Organization PANIX Public Access Internet and UNIX, NYC
Message-ID <roy-8C60F5.15590428072013@news.panix.com> (permalink)

Show all headers | View raw


I've been doing an informal "intro to Python" lunchtime series for some 
co-workers (who are all experienced programmers, in other languages).  
This week I was going to cover list comprehensions, exceptions, and 
profiling. So, I did a little demo showing different ways to build a 
dictionary counting how many times a string appears in some input:

   test() implements a "look before you leap" python loop

   exception() uses a try/catch construct in a similar python loop

   default() uses a defaultdict

   count() uses a Counter

I profiled it, to show how the profiler works.  The full code is below. 
 The input is an 8.8 Mbyte file containing about 570,000 lines (11,000 
unique strings).  Python 2.7.3 on Ubuntu Precise.

As I expected, test() is slower than exception(), which is slower than 
default().  I'm rather shocked to discover that count() is the slowest 
of all!  I expected it to be the fastest.  Or, certainly, no slower 
than default().

The full profiler dump is at the end of this message, but the gist of 
it is:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.000    0.000    0.322    0.322 ./stations.py:42(count)
    1    0.159    0.159    0.159    0.159 ./stations.py:17(test)
    1    0.114    0.114    0.114    0.114 ./stations.py:27(exception)
    1    0.097    0.097    0.097    0.097 ./stations.py:36(default)

Why is count() [i.e. collections.Counter] so slow?

-------------------------------------------------------------
from collections import defaultdict, Counter

def main():
    lines = open('stations.txt').readlines()

    d1 = test(lines)
    d2 = exception(lines)
    d3 = default(lines)
    d4 = count(lines)

    print d1 == d2
    print d1 == d3
    print d1 == d4

def test(lines):
    d = {}
    for station in lines:
        if station in d:
            d[station] += 1
        else:
            d[station] = 1
    return d


def exception(lines):
    d = {}
    for station in lines:
        try:
            d[station] += 1
        except KeyError:
            d[station] = 1
    return d

def default(lines):
    d = defaultdict(int)
    for station in lines:
        d[station] += 1
    return d

def count(lines):
    d = Counter(lines)
    return d


if __name__ == '__main__':
    import cProfile
    import pstats
    cProfile.run('main()', 'stations.stats')
    p = pstats.Stats('stations.stats')
    p.sort_stats('cumulative').print_stats()
-------------------------------------------------------------

         570335 function calls (570332 primitive calls) in 0.776 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   1    0.023    0.023    0.776    0.776 <string>:1(<module>)
   1    0.006    0.006    0.753    0.753 ./stations.py:5(main)
   1    0.000    0.000    0.322    0.322 ./stations.py:42(count)
   1    0.000    0.000    0.322    0.322 /usr/lib/python2.7/collections.py:407(__init__)
   1    0.242    0.242    0.322    0.322 /usr/lib/python2.7/collections.py:470(update)
   1    0.159    0.159    0.159    0.159 ./stations.py:17(test)
   1    0.114    0.114    0.114    0.114 ./stations.py:27(exception)
   1    0.097    0.097    0.097    0.097 ./stations.py:36(default)
   570285    0.080    0.000    0.080    0.000 {method 'get' of 'dict' objects}
   1    0.055    0.055    0.055    0.055 {method 'readlines' of 'file' objects}
   1    0.000    0.000    0.000    0.000 {isinstance}
   1    0.000    0.000    0.000    0.000 /home/roy/deploy/current/python/lib/python2.7/abc.py:128(__instancecheck__)
      2/1    0.000    0.000    0.000    0.000 /home/roy/deploy/current/python/lib/python2.7/abc.py:148(__subclasscheck__)
      3/1    0.000    0.000    0.000    0.000 {issubclass}
        4    0.000    0.000    0.000    0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:58(__iter__)
        1    0.000    0.000    0.000    0.000 {open}
        2    0.000    0.000    0.000    0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:26(__exit__)
        2    0.000    0.000    0.000    0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:20(__enter__)
        2    0.000    0.000    0.000    0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:36(__init__)
        3    0.000    0.000    0.000    0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:68(__contains__)
        2    0.000    0.000    0.000    0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:52(_commit_removals)
        2    0.000    0.000    0.000    0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:81(add)
        3    0.000    0.000    0.000    0.000 {getattr}
        2    0.000    0.000    0.000    0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:16(__init__)
        2    0.000    0.000    0.000    0.000 {method 'remove' of 'set' objects}
        2    0.000    0.000    0.000    0.000 {method '__subclasses__' of 'type' objects}
        2    0.000    0.000    0.000    0.000 /home/roy/deploy/current/python/lib/python2.7/_abcoll.py:97(__subclasshook__)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        4    0.000    0.000    0.000    0.000 {method 'add' of 'set' objects}

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

collections.Counter surprisingly slow Roy Smith <roy@panix.com> - 2013-07-28 15:59 -0400
  Re: collections.Counter surprisingly slow Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-28 20:51 +0000
    Re: collections.Counter surprisingly slow Roy Smith <roy@panix.com> - 2013-07-28 17:57 -0400
    Re: collections.Counter surprisingly slow Stefan Behnel <stefan_ml@behnel.de> - 2013-07-29 13:46 +0200
    Re: collections.Counter surprisingly slow Joshua Landau <joshua@landau.ws> - 2013-07-29 13:07 +0100
  Re: collections.Counter surprisingly slow Serhiy Storchaka <storchaka@gmail.com> - 2013-07-29 09:25 +0300
  Re: collections.Counter surprisingly slow Joshua Landau <joshua@landau.ws> - 2013-07-29 12:49 +0100
  Re: collections.Counter surprisingly slow Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-29 11:19 -0600
  Re: collections.Counter surprisingly slow Serhiy Storchaka <storchaka@gmail.com> - 2013-07-29 22:37 +0300
  Re: collections.Counter surprisingly slow Stefan Behnel <stefan_ml@behnel.de> - 2013-07-30 08:39 +0200
  Re: collections.Counter surprisingly slow Stefan Behnel <stefan_ml@behnel.de> - 2013-07-30 08:51 +0200
  Re: collections.Counter surprisingly slow Serhiy Storchaka <storchaka@gmail.com> - 2013-07-30 16:04 +0300

csiph-web