Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #44205

Re: List Count

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <oscar.j.benjamin@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.025
X-Spam-Evidence '*H*': 0.95; '*S*': 0.00; 'algorithm': 0.04; 'odd': 0.07; 'suppose': 0.07; 'arrays': 0.09; 'comfortably': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; "wouldn't": 0.14; '(say': 0.16; 'boolean': 0.16; 'compute': 0.16; 'lengths': 0.16; 'limit.': 0.16; 'pity': 0.16; 'slice.': 0.16; 'unavailable': 0.16; 'elements': 0.16; 'wrote:': 0.18; 'looked': 0.18; "python's": 0.19; 'small,': 0.19; 'fit': 0.20; '>>>': 0.22; 'memory': 0.22; 'cc:addr:python.org': 0.22; 'this?': 0.23; 'case.': 0.24; 'comparing': 0.24; 'driven': 0.24; 'skip': 0.24; "haven't": 0.24; 'cc:2**0': 0.24; 'cc:no real name:2**0': 0.24; 'equivalent': 0.26; 'primary': 0.26; 'asking': 0.27; 'values': 0.27; 'gets': 0.27; 'header:In-Reply-To:1': 0.27; 'point': 0.28; 'function': 0.29; 'array': 0.29; 'wonder': 0.29; 'absolute': 0.30; 'fastest': 0.30; 'primarily': 0.30; 'message-id:@mail.gmail.com': 0.30; 'code': 0.31; 'posting': 0.31; '+0100,': 0.31; '20%': 0.31; "d'aprano": 0.31; 'forces': 0.31; 'go.': 0.31; 'large.': 0.31; 'quantities': 0.31; 'steven': 0.31; 'lists': 0.32; 'another': 0.32; 'running': 0.33; '(i.e.': 0.33; 'comment': 0.34; 'noticed': 0.34; 'sense': 0.34; 'maybe': 0.34; 'could': 0.34; 'advice': 0.35; 'received:209.85': 0.35; 'possible.': 0.35; 'received:209.85.220': 0.35; 'computing': 0.35; 'hundreds': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'grateful': 0.36; 'subject:List': 0.36; 'words,': 0.36; 'doing': 0.36; 'half': 0.37; 'so,': 0.37; 'list': 0.37; 'list.': 0.37; 'received:209': 0.37; 'performance': 0.37; 'problems': 0.38; 'rather': 0.38; 'that,': 0.38; 'either': 0.39; 'even': 0.60; 'easy': 0.60; 'most': 0.60; 'length': 0.61; 'effective': 0.61; 'providing': 0.61; 'simply': 0.61; 'first': 0.61; 'offer': 0.62; 'making': 0.63; 'reached': 0.63; 'choose': 0.64; 'taking': 0.65; 'within': 0.65; 'believe': 0.68; 'overall': 0.69; 'limit': 0.70; 'allocation': 0.74; 'cut': 0.74; 'lack': 0.78; '30k': 0.84; 'fast,': 0.84; 'improvement': 0.84; 'oscar': 0.84; 'lift': 0.91; '2013': 0.98
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=X2QC2iRUexvVqKTgJ3ftFNiNXF4mhk+zBxtOiIKXp9I=; b=ydYMc11aYRXtlBCHEJIop50wz7PqOMr8lZO1Wx0zdKcAQ0POqKjnMhhOX/wjrQJEIQ sUUbpGtpQI2+RssXtK20NrAJIyEldoCLVrlIbU/tZQuVCzHLEF+HG/8e8fZhSqo9Z2rG FuiAiG6D5W6w2OeN6sYwGyydghjGm0mHeANM5EY3IBFc1/wdaQ08hvW3v4vSGZY0XDnv rNRyutS/DfBIEXTmOq92saTple7RSOJnw8kGaZSV5GGJKOWnmCsBnxZbY78VvcPGI36e 5QyrB+FaBZi2gpzajeqoUItpFVceOv6hep27XzXH/i/81UMFAUmCpFWXtKTyuJ++RxJq WKBA==
X-Received by 10.52.100.193 with SMTP id fa1mr7540245vdb.46.1366739162628; Tue, 23 Apr 2013 10:46:02 -0700 (PDT)
MIME-Version 1.0
In-Reply-To <fd6dnXTDxbbiIOvMnZ2dnUVZ8qGdnZ2d@brightview.co.uk>
References <sridnffI6YhxuOjMnZ2dnUVZ7tSdnZ2d@brightview.co.uk> <5175377f$0$29977$c3e8da3$5496439d@news.astraweb.com> <517545F7.5090209@nowhere.org> <5175c12f$0$29977$c3e8da3$5496439d@news.astraweb.com> <e4KdnXzMesF-r-vMnZ2dnUVZ8rWdnZ2d@brightview.co.uk> <51769f96$0$29977$c3e8da3$5496439d@news.astraweb.com> <fd6dnXTDxbbiIOvMnZ2dnUVZ8qGdnZ2d@brightview.co.uk>
From Oscar Benjamin <oscar.j.benjamin@gmail.com>
Date Tue, 23 Apr 2013 18:45:41 +0100
Subject Re: List Count
To Blind Anagram <blindanagram@nowhere.org>
Content-Type text/plain; charset=ISO-8859-1
Cc python-list@python.org
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.987.1366739164.3114.python-list@python.org> (permalink)
Lines 75
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1366739164 news.xs4all.nl 2200 [2001:888:2000:d::a6]:44974
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:44205

Show key headers only | View raw


On 23 April 2013 17:57, Blind Anagram <blindanagram@nowhere.org> wrote:
> On 23/04/2013 15:49, Steven D'Aprano wrote:
>> On Tue, 23 Apr 2013 08:05:53 +0100, Blind Anagram wrote:
>>
>>> I did a lot of work comparing the overall performance of the sieve when
>>> using either lists or arrays and I found that lists were a lot faster
>>> for the majority use case when the sieve is not large.
>>
>> And when the sieve is large?
>
> I don't know but since the majority use case is when the sieve is small,
> it makes sense to choose a list.

That's an odd comment given what you said at the start of this thread:

Blind Anagram wrote:
> I would be grateful for any advice people can offer on the fastest way
> to count items in a sub-sequence of a large list.
>
> I have a list of boolean values that can contain many hundreds of
> millions of elements for which I want to count the number of True values
> in a sub-sequence, one from the start up to some value (say hi).


>> I don't actually believe that the bottleneck is the cost of taking a list
>> slice. That's pretty fast, even for huge lists, and all efforts to skip
>> making a copy by using itertools.islice actually ended up slower. But
>> suppose it is the bottleneck. Then *sooner or later* arrays will win over
>> lists, simply because they're smaller.
>
> Maybe you have not noticed that, when I am discussing a huge sieve, I am
> simply pushing a sieve designed primarily for a small sieve lengths to
> the absolute limit.  This is most definitely a minority use case.
>
> In pushing the size of the sieve upwards, it is the slice operation that
> is the first thing that 'breaks'.  This is because the slice can be
> almost as big as the primary array so the OS gets driven into memory
> allocation problems for a sieve that is about half the length it would
> otherwise be. It still works but the cost of the slice once this point
> is reached rises from about 20% to over 600% because of all the paging
> going on.

You keep mentioning that you want it to work with a large sieve. I
would much rather compute the same quantities with a small sieve if
possible. If you were using the Lehmer/Meissel algorithm you would be
able to compute the same quantity (i.e. pi(1e9)) using a much smaller
sieve with 30k items instead of 1e9. that would fit *very* comfortably
in memory and you wouldn't even need to slice the list. Or to put it
another way, you could compute pi(~1e18) using your current sieve
without slicing or paging. If you want to lift the limit on computing
pi(x) this is clearly the way to go.

>
> The unavailable list.count(value, limit) function would hence allow the
> sieve length to be up to twice as large before running into problems and
> would also cut the 20% slice cost I am seeing on smaller sieve lengths.
>
> So, all I was doing in asking for advice was to check whether there is
> an easy way of avoiding the slice copy, not because this is critical,
> but rather because it is a pity to limit the performance because Python
> forces a (strictly unnecessary) copy in order to perform a count within
> a part of a list.
>
> In other words, the lack of a list.count(value, limit) function makes
> Python less effective than it would otherwise be.  I haven't looked at
> Python's C code base but I still wonder if there a good reason for NOT
> providing this?

If you feel that this is a good suggestion for an improvement to
Python consider posting it on python-ideas. I wasn't aware of the
equivalent functionality on strings but I see that the tuple.count()
function is the same as list.count().


Oscar

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 12:58 +0100
  Re: List Count Dave Angel <davea@davea.name> - 2013-04-22 08:51 -0400
    Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 14:03 +0100
    Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 14:03 +0100
  Re: List Count Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-22 13:13 +0000
    Re: List Count Skip Montanaro <skip@pobox.com> - 2013-04-22 08:57 -0500
    Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 15:15 +0100
      Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-22 16:14 +0100
        Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 16:50 +0100
          Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-22 17:06 +0100
            Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 17:38 +0100
              Re: List Count Skip Montanaro <skip@pobox.com> - 2013-04-22 12:48 -0500
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 20:22 +0100
              Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-22 21:18 +0100
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 22:25 +0100
                Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 00:06 +0100
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 07:45 +0100
                Re: List Count Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-22 23:28 +0000
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 08:00 +0100
              Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-22 22:03 +0100
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 22:32 +0100
                Re: List Count Dave Angel <davea@davea.name> - 2013-04-22 21:47 -0400
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 08:02 +0100
            Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 17:38 +0100
        Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 16:50 +0100
      Re: List Count Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-22 23:01 +0000
        Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 08:05 +0100
          Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 12:08 +0100
            Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 12:45 +0100
              Re: List Count Terry Jan Reedy <tjreedy@udel.edu> - 2013-04-23 15:01 -0400
          Re: List Count Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-23 14:49 +0000
            Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 17:57 +0100
              Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 18:45 +0100
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 19:30 +0100
                Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 20:16 +0100
              Re: List Count Terry Jan Reedy <tjreedy@udel.edu> - 2013-04-23 16:00 -0400
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 21:41 +0100
              Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 21:38 +0100
              Re: List Count Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-24 01:59 +0000
                Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-24 10:01 +0100
  Re: List Count Peter Otten <__peter__@web.de> - 2013-04-22 15:22 +0200
  Re: List Count 88888 Dihedral <dihedral88888@googlemail.com> - 2013-04-22 06:36 -0700

csiph-web