Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.025 X-Spam-Evidence: '*H*': 0.95; '*S*': 0.00; 'algorithm': 0.04; 'odd': 0.07; 'suppose': 0.07; 'arrays': 0.09; 'comfortably': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; "wouldn't": 0.14; '(say': 0.16; 'boolean': 0.16; 'compute': 0.16; 'lengths': 0.16; 'limit.': 0.16; 'pity': 0.16; 'slice.': 0.16; 'unavailable': 0.16; 'elements': 0.16; 'wrote:': 0.18; 'looked': 0.18; "python's": 0.19; 'small,': 0.19; 'fit': 0.20; '>>>': 0.22; 'memory': 0.22; 'cc:addr:python.org': 0.22; 'this?': 0.23; 'case.': 0.24; 'comparing': 0.24; 'driven': 0.24; 'skip': 0.24; "haven't": 0.24; 'cc:2**0': 0.24; 'cc:no real name:2**0': 0.24; 'equivalent': 0.26; 'primary': 0.26; 'asking': 0.27; 'values': 0.27; 'gets': 0.27; 'header:In-Reply-To:1': 0.27; 'point': 0.28; 'function': 0.29; 'array': 0.29; 'wonder': 0.29; 'absolute': 0.30; 'fastest': 0.30; 'primarily': 0.30; 'message-id:@mail.gmail.com': 0.30; 'code': 0.31; 'posting': 0.31; '+0100,': 0.31; '20%': 0.31; "d'aprano": 0.31; 'forces': 0.31; 'go.': 0.31; 'large.': 0.31; 'quantities': 0.31; 'steven': 0.31; 'lists': 0.32; 'another': 0.32; 'running': 0.33; '(i.e.': 0.33; 'comment': 0.34; 'noticed': 0.34; 'sense': 0.34; 'maybe': 0.34; 'could': 0.34; 'advice': 0.35; 'received:209.85': 0.35; 'possible.': 0.35; 'received:209.85.220': 0.35; 'computing': 0.35; 'hundreds': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'grateful': 0.36; 'subject:List': 0.36; 'words,': 0.36; 'doing': 0.36; 'half': 0.37; 'so,': 0.37; 'list': 0.37; 'list.': 0.37; 'received:209': 0.37; 'performance': 0.37; 'problems': 0.38; 'rather': 0.38; 'that,': 0.38; 'either': 0.39; 'even': 0.60; 'easy': 0.60; 'most': 0.60; 'length': 0.61; 'effective': 0.61; 'providing': 0.61; 'simply': 0.61; 'first': 0.61; 'offer': 0.62; 'making': 0.63; 'reached': 0.63; 'choose': 0.64; 'taking': 0.65; 'within': 0.65; 'believe': 0.68; 'overall': 0.69; 'limit': 0.70; 'allocation': 0.74; 'cut': 0.74; 'lack': 0.78; '30k': 0.84; 'fast,': 0.84; 'improvement': 0.84; 'oscar': 0.84; 'lift': 0.91; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=X2QC2iRUexvVqKTgJ3ftFNiNXF4mhk+zBxtOiIKXp9I=; b=ydYMc11aYRXtlBCHEJIop50wz7PqOMr8lZO1Wx0zdKcAQ0POqKjnMhhOX/wjrQJEIQ sUUbpGtpQI2+RssXtK20NrAJIyEldoCLVrlIbU/tZQuVCzHLEF+HG/8e8fZhSqo9Z2rG FuiAiG6D5W6w2OeN6sYwGyydghjGm0mHeANM5EY3IBFc1/wdaQ08hvW3v4vSGZY0XDnv rNRyutS/DfBIEXTmOq92saTple7RSOJnw8kGaZSV5GGJKOWnmCsBnxZbY78VvcPGI36e 5QyrB+FaBZi2gpzajeqoUItpFVceOv6hep27XzXH/i/81UMFAUmCpFWXtKTyuJ++RxJq WKBA== X-Received: by 10.52.100.193 with SMTP id fa1mr7540245vdb.46.1366739162628; Tue, 23 Apr 2013 10:46:02 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <5175377f$0$29977$c3e8da3$5496439d@news.astraweb.com> <517545F7.5090209@nowhere.org> <5175c12f$0$29977$c3e8da3$5496439d@news.astraweb.com> <51769f96$0$29977$c3e8da3$5496439d@news.astraweb.com> From: Oscar Benjamin Date: Tue, 23 Apr 2013 18:45:41 +0100 Subject: Re: List Count To: Blind Anagram Content-Type: text/plain; charset=ISO-8859-1 Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 75 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1366739164 news.xs4all.nl 2200 [2001:888:2000:d::a6]:44974 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:44205 On 23 April 2013 17:57, Blind Anagram wrote: > On 23/04/2013 15:49, Steven D'Aprano wrote: >> On Tue, 23 Apr 2013 08:05:53 +0100, Blind Anagram wrote: >> >>> I did a lot of work comparing the overall performance of the sieve when >>> using either lists or arrays and I found that lists were a lot faster >>> for the majority use case when the sieve is not large. >> >> And when the sieve is large? > > I don't know but since the majority use case is when the sieve is small, > it makes sense to choose a list. That's an odd comment given what you said at the start of this thread: Blind Anagram wrote: > I would be grateful for any advice people can offer on the fastest way > to count items in a sub-sequence of a large list. > > I have a list of boolean values that can contain many hundreds of > millions of elements for which I want to count the number of True values > in a sub-sequence, one from the start up to some value (say hi). >> I don't actually believe that the bottleneck is the cost of taking a list >> slice. That's pretty fast, even for huge lists, and all efforts to skip >> making a copy by using itertools.islice actually ended up slower. But >> suppose it is the bottleneck. Then *sooner or later* arrays will win over >> lists, simply because they're smaller. > > Maybe you have not noticed that, when I am discussing a huge sieve, I am > simply pushing a sieve designed primarily for a small sieve lengths to > the absolute limit. This is most definitely a minority use case. > > In pushing the size of the sieve upwards, it is the slice operation that > is the first thing that 'breaks'. This is because the slice can be > almost as big as the primary array so the OS gets driven into memory > allocation problems for a sieve that is about half the length it would > otherwise be. It still works but the cost of the slice once this point > is reached rises from about 20% to over 600% because of all the paging > going on. You keep mentioning that you want it to work with a large sieve. I would much rather compute the same quantities with a small sieve if possible. If you were using the Lehmer/Meissel algorithm you would be able to compute the same quantity (i.e. pi(1e9)) using a much smaller sieve with 30k items instead of 1e9. that would fit *very* comfortably in memory and you wouldn't even need to slice the list. Or to put it another way, you could compute pi(~1e18) using your current sieve without slicing or paging. If you want to lift the limit on computing pi(x) this is clearly the way to go. > > The unavailable list.count(value, limit) function would hence allow the > sieve length to be up to twice as large before running into problems and > would also cut the 20% slice cost I am seeing on smaller sieve lengths. > > So, all I was doing in asking for advice was to check whether there is > an easy way of avoiding the slice copy, not because this is critical, > but rather because it is a pity to limit the performance because Python > forces a (strictly unnecessary) copy in order to perform a count within > a part of a list. > > In other words, the lack of a list.count(value, limit) function makes > Python less effective than it would otherwise be. I haven't looked at > Python's C code base but I still wonder if there a good reason for NOT > providing this? If you feel that this is a good suggestion for an improvement to Python consider posting it on python-ideas. I wasn't aware of the equivalent functionality on strings but I see that the tuple.count() function is the same as list.count(). Oscar