Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #44205
| References | (2 earlier) <517545F7.5090209@nowhere.org> <5175c12f$0$29977$c3e8da3$5496439d@news.astraweb.com> <e4KdnXzMesF-r-vMnZ2dnUVZ8rWdnZ2d@brightview.co.uk> <51769f96$0$29977$c3e8da3$5496439d@news.astraweb.com> <fd6dnXTDxbbiIOvMnZ2dnUVZ8qGdnZ2d@brightview.co.uk> |
|---|---|
| From | Oscar Benjamin <oscar.j.benjamin@gmail.com> |
| Date | 2013-04-23 18:45 +0100 |
| Subject | Re: List Count |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.987.1366739164.3114.python-list@python.org> (permalink) |
On 23 April 2013 17:57, Blind Anagram <blindanagram@nowhere.org> wrote: > On 23/04/2013 15:49, Steven D'Aprano wrote: >> On Tue, 23 Apr 2013 08:05:53 +0100, Blind Anagram wrote: >> >>> I did a lot of work comparing the overall performance of the sieve when >>> using either lists or arrays and I found that lists were a lot faster >>> for the majority use case when the sieve is not large. >> >> And when the sieve is large? > > I don't know but since the majority use case is when the sieve is small, > it makes sense to choose a list. That's an odd comment given what you said at the start of this thread: Blind Anagram wrote: > I would be grateful for any advice people can offer on the fastest way > to count items in a sub-sequence of a large list. > > I have a list of boolean values that can contain many hundreds of > millions of elements for which I want to count the number of True values > in a sub-sequence, one from the start up to some value (say hi). >> I don't actually believe that the bottleneck is the cost of taking a list >> slice. That's pretty fast, even for huge lists, and all efforts to skip >> making a copy by using itertools.islice actually ended up slower. But >> suppose it is the bottleneck. Then *sooner or later* arrays will win over >> lists, simply because they're smaller. > > Maybe you have not noticed that, when I am discussing a huge sieve, I am > simply pushing a sieve designed primarily for a small sieve lengths to > the absolute limit. This is most definitely a minority use case. > > In pushing the size of the sieve upwards, it is the slice operation that > is the first thing that 'breaks'. This is because the slice can be > almost as big as the primary array so the OS gets driven into memory > allocation problems for a sieve that is about half the length it would > otherwise be. It still works but the cost of the slice once this point > is reached rises from about 20% to over 600% because of all the paging > going on. You keep mentioning that you want it to work with a large sieve. I would much rather compute the same quantities with a small sieve if possible. If you were using the Lehmer/Meissel algorithm you would be able to compute the same quantity (i.e. pi(1e9)) using a much smaller sieve with 30k items instead of 1e9. that would fit *very* comfortably in memory and you wouldn't even need to slice the list. Or to put it another way, you could compute pi(~1e18) using your current sieve without slicing or paging. If you want to lift the limit on computing pi(x) this is clearly the way to go. > > The unavailable list.count(value, limit) function would hence allow the > sieve length to be up to twice as large before running into problems and > would also cut the 20% slice cost I am seeing on smaller sieve lengths. > > So, all I was doing in asking for advice was to check whether there is > an easy way of avoiding the slice copy, not because this is critical, > but rather because it is a pity to limit the performance because Python > forces a (strictly unnecessary) copy in order to perform a count within > a part of a list. > > In other words, the lack of a list.count(value, limit) function makes > Python less effective than it would otherwise be. I haven't looked at > Python's C code base but I still wonder if there a good reason for NOT > providing this? If you feel that this is a good suggestion for an improvement to Python consider posting it on python-ideas. I wasn't aware of the equivalent functionality on strings but I see that the tuple.count() function is the same as list.count(). Oscar
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 12:58 +0100
Re: List Count Dave Angel <davea@davea.name> - 2013-04-22 08:51 -0400
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 14:03 +0100
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 14:03 +0100
Re: List Count Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-22 13:13 +0000
Re: List Count Skip Montanaro <skip@pobox.com> - 2013-04-22 08:57 -0500
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 15:15 +0100
Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-22 16:14 +0100
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 16:50 +0100
Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-22 17:06 +0100
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 17:38 +0100
Re: List Count Skip Montanaro <skip@pobox.com> - 2013-04-22 12:48 -0500
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 20:22 +0100
Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-22 21:18 +0100
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 22:25 +0100
Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 00:06 +0100
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 07:45 +0100
Re: List Count Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-22 23:28 +0000
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 08:00 +0100
Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-22 22:03 +0100
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 22:32 +0100
Re: List Count Dave Angel <davea@davea.name> - 2013-04-22 21:47 -0400
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 08:02 +0100
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 17:38 +0100
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-22 16:50 +0100
Re: List Count Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-22 23:01 +0000
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 08:05 +0100
Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 12:08 +0100
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 12:45 +0100
Re: List Count Terry Jan Reedy <tjreedy@udel.edu> - 2013-04-23 15:01 -0400
Re: List Count Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-23 14:49 +0000
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 17:57 +0100
Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 18:45 +0100
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 19:30 +0100
Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 20:16 +0100
Re: List Count Terry Jan Reedy <tjreedy@udel.edu> - 2013-04-23 16:00 -0400
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-23 21:41 +0100
Re: List Count Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 21:38 +0100
Re: List Count Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-24 01:59 +0000
Re: List Count Blind Anagram <blindanagram@nowhere.org> - 2013-04-24 10:01 +0100
Re: List Count Peter Otten <__peter__@web.de> - 2013-04-22 15:22 +0200
Re: List Count 88888 Dihedral <dihedral88888@googlemail.com> - 2013-04-22 06:36 -0700
csiph-web