Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!feeds.phibee-telecom.net!border2.nntp.ams2.giganews.com!border1.nntp.ams2.giganews.com!border3.nntp.ams.giganews.com!Xl.tags.giganews.com!border1.nntp.ams.giganews.com!nntp.giganews.com!local2.nntp.ams.giganews.com!nntp.brightview.co.uk!news.brightview.co.uk.POSTED!not-for-mail NNTP-Posting-Date: Mon, 22 Apr 2013 08:03:41 -0500 Message-ID: <5175351D.9070102@nowhere.org> Date: Mon, 22 Apr 2013 14:03:25 +0100 From: Blind Anagram User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 MIME-Version: 1.0 Newsgroups: comp.lang.python To: Dave Angel CC: python-list@python.org Subject: Re: List Count References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Lines: 55 X-Usenet-Provider: http://www.giganews.com X-Trace: sv3-IIPjuLAkdzazjfLGNL2U1x5zs6huo/OHhPRlYvHS8fITTfv2RGuW54VAISVJgpDeXtaAqA0mV1uvg4x!6FNJkZmIMqT0IxViz2I6fGNMKRFx3EKAPIEluesxA8Gq1bOSYhOVVVWdmDD/utGRgV+EvQLRiA== X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly X-Postfilter: 1.3.40 X-Original-Bytes: 3046 Xref: csiph.com comp.lang.python:44060 On 22/04/2013 13:51, Dave Angel wrote: > On 04/22/2013 07:58 AM, Blind Anagram wrote: >> I would be grateful for any advice people can offer on the fastest way >> to count items in a sub-sequence of a large list. >> >> I have a list of boolean values that can contain many hundreds of >> millions of elements for which I want to count the number of True values >> in a sub-sequence, one from the start up to some value (say hi). >> >> I am currently using: >> >> sieve[:hi].count(True) >> >> but I believe this may be costly because it copies a possibly large part >> of the sieve. >> >> Ideally I would like to be able to use: >> >> sieve.count(True, hi) >> >> where 'hi' sets the end of the count but this function is, sadly, not >> available for lists. >> >> The use of a bytearray with a memoryview object instead of a list solves >> this particular problem but it is not a solution for me as it creates >> more problems than it solves in other aspects of the program. >> >> Can I assume that one possible solution would be to sub-class list and >> create a C based extension to provide list.count(value, limit)? >> >> Are there any other solutions that will avoid copying a large part of >> the list? >> > > Instead of using the default slice notation, why not use > itertools.islice() ? > > Something like (untested): > > import itertools > > it = itertools.islice(sieve, 0, hi) > sum(itertools.imap(bool, it)) > > I only broke it into two lines for clarity. It could also be: > > sum(itertools.imap(bool, itertools.islice(sieve, 0, hi))) > > If you're using Python 3.x, say so, and I'm sure somebody can simplify > these, since in Python 3, many functions already produce iterators > instead of lists. Thanks, I'll look at these ideas. And, yes, my interest is mainly in Python 3.