Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #44074

Re: List Count

Message-ID <517545F7.5090209@nowhere.org> (permalink)
Date 2013-04-22 15:15 +0100
From Blind Anagram <blindanagram@nowhere.org>
Newsgroups comp.lang.python
Subject Re: List Count
References <sridnffI6YhxuOjMnZ2dnUVZ7tSdnZ2d@brightview.co.uk> <5175377f$0$29977$c3e8da3$5496439d@news.astraweb.com>

Show all headers | View raw


On 22/04/2013 14:13, Steven D'Aprano wrote:
> On Mon, 22 Apr 2013 12:58:20 +0100, Blind Anagram wrote:
> 
>> I would be grateful for any advice people can offer on the fastest way
>> to count items in a sub-sequence of a large list.
>>
>> I have a list of boolean values that can contain many hundreds of
>> millions of elements for which I want to count the number of True values
>> in a sub-sequence, one from the start up to some value (say hi).
>>
>> I am currently using:
>>
>>    sieve[:hi].count(True)
>>
>> but I believe this may be costly because it copies a possibly large part
>> of the sieve.
> 
> Have you timed it? Because Python is a high-level language, it is rarely 
> obvious what code will be fast. Yes, sieve[:hi] will copy the first hi 
> entries, but that's likely to be fast, basically just a memcopy, unless 
> sieve is huge and memory is short. In other words, unless your sieve is 
> so huge that the operating system cannot find enough memory for it, 
> making a copy is likely to be relatively insignificant.
> 
> I've just tried seven different techniques to "optimize" this, and the 
> simplest, most obvious technique is by far the fastest. Here are the 
> seven different code snippets I measured, with results:
> 
> 
> sieve[:hi].count(True)
> sum(sieve[:hi])
> sum(islice(sieve, hi))
> sum(x for x in islice(sieve, hi) if x)
> sum(x for x in islice(sieve, hi) if x is True)
> sum(1 for x in islice(sieve, hi) if x is True)
> len(list(filter(None, islice(sieve, hi))))

Yes, I did time it and I agree with your results (where my tests overlap
with yours).

But when using a sub-sequence, I do suffer a significant reduction in
speed for a count when compared with count on the full list.  When the
list is small enough not to cause memory allocation issues this is about
30% on 100,000,000 items.  But when the list is 1,000,000,000 items, OS
memory allocation becomes an issue and the cost on my system rises to
over 600%.

I agree that this is not a big issue but it seems to me a high price to
pay for the lack of a sieve.count(value, limit), which I feel is a
useful function (given that memoryview operations are not available for
lists).

> Of course. But don't optimize this until you know that you *need* to 
> optimize it. Is it really a bottleneck in your code? There's no point in 
> saving the 0.1 second it takes to copy the list if it takes 2 seconds to 
> count the items regardless.
> 
>> Are there any other solutions that will avoid copying a large part of
>> the list?
> 
> Yes, but they're slower.
> 
> Perhaps a better solution might be to avoid counting anything. If you can 
> keep a counter, and each time you add a value to the list you update the 
> counter, then getting the number of True values will be instantaneous.

Creating the sieve is currently very fast as it is not done by adding
single items but by adding a large number of items at the same time
using a slice operation.  I could count the items in each slice as it is
added but this would add complexity that I would prefer to avoid because
the creation of the sieve is quite tricky to get right and I would
prefer not to fiddle with this.

Thank you (and others) for advice on this.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 12:58 +0100
  Re: List Count Dave Angel <davea@davea.name> - 2013-04-22 08:51 -0400
    Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 14:03 +0100
    Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 14:03 +0100
  Re: List Count Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-22 13:13 +0000
    Re: List Count Skip Montanaro <skip@pobox.com> - 2013-04-22 08:57 -0500
    Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 15:15 +0100
      Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-22 16:14 +0100
        Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 16:50 +0100
          Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-22 17:06 +0100
            Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 17:38 +0100
              Re: List Count Skip Montanaro <skip@pobox.com> - 2013-04-22 12:48 -0500
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 20:22 +0100
              Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-22 21:18 +0100
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 22:25 +0100
                Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 00:06 +0100
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 07:45 +0100
                Re: List Count Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-22 23:28 +0000
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 08:00 +0100
              Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-22 22:03 +0100
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 22:32 +0100
                Re: List Count Dave Angel <davea@davea.name> - 2013-04-22 21:47 -0400
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 08:02 +0100
            Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 17:38 +0100
        Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 16:50 +0100
      Re: List Count Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-22 23:01 +0000
        Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 08:05 +0100
          Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 12:08 +0100
            Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 12:45 +0100
              Re: List Count Terry Jan Reedy <tjreedy@udel.edu> - 2013-04-23 15:01 -0400
          Re: List Count Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-23 14:49 +0000
            Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 17:57 +0100
              Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 18:45 +0100
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 19:30 +0100
                Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 20:16 +0100
              Re: List Count Terry Jan Reedy <tjreedy@udel.edu> - 2013-04-23 16:00 -0400
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 21:41 +0100
              Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 21:38 +0100
              Re: List Count Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-24 01:59 +0000
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-24 10:01 +0100
  Re: List Count Peter Otten <__peter__@web.de> - 2013-04-22 15:22 +0200
  Re: List Count 88888 Dihedral <dihedral88888@googlemail.com> - 2013-04-22 06:36 -0700

csiph-web