Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #61630 > unrolled thread
| Started by | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| First post | 2013-12-11 23:54 +0000 |
| Last post | 2013-12-12 13:07 -0500 |
| Articles | 19 — 9 participants |
Back to article view | Back to comp.lang.python
Optimizing list processing Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-11 23:54 +0000
Re: Optimizing list processing MRAB <python@mrabarnett.plus.com> - 2013-12-12 00:59 +0000
Re: Optimizing list processing Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-12 01:43 +0000
Re: Optimizing list processing MRAB <python@mrabarnett.plus.com> - 2013-12-12 02:09 +0000
Re: Optimizing list processing duncan smith <buzzard@invalid.invalid> - 2013-12-12 01:02 +0000
Re: Optimizing list processing Ben Finney <ben+python@benfinney.id.au> - 2013-12-12 12:18 +1100
Re: Optimizing list processing Terry Reedy <tjreedy@udel.edu> - 2013-12-11 21:26 -0500
Re: Optimizing list processing Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-12 12:08 +0000
Re: Optimizing list processing Chris Angelico <rosuav@gmail.com> - 2013-12-12 23:25 +1100
Re: Optimizing list processing MRAB <python@mrabarnett.plus.com> - 2013-12-12 13:32 +0000
Re: Optimizing list processing Chris Angelico <rosuav@gmail.com> - 2013-12-13 01:06 +1100
Re: Optimizing list processing Terry Reedy <tjreedy@udel.edu> - 2013-12-12 13:40 -0500
Re: Optimizing list processing Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-13 00:14 +0000
Re: Optimizing list processing Chris Angelico <rosuav@gmail.com> - 2013-12-13 12:01 +1100
Re: Optimizing list processing Stefan Behnel <stefan_ml@behnel.de> - 2013-12-12 12:09 +0100
Re: Optimizing list processing Peter Otten <__peter__@web.de> - 2013-12-12 16:08 +0100
Re: Optimizing list processing Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-13 03:01 +0000
Re: Optimizing list processing rusi <rustompmody@gmail.com> - 2013-12-12 21:35 -0800
Re: Optimizing list processing Terry Reedy <tjreedy@udel.edu> - 2013-12-12 13:07 -0500
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-12-11 23:54 +0000 |
| Subject | Optimizing list processing |
| Message-ID | <52a8fb2d$0$29992$c3e8da3$5496439d@news.astraweb.com> |
I have some code which produces a list from an iterable using at least
one temporary list, using a Decorate-Sort-Undecorate idiom. The algorithm
looks something like this (simplified):
table = sorted([(x, i) for i,x in enumerate(iterable)])
table = [i for x,i in table]
The problem here is that for large iterables, say 10 million items or so,
this is *painfully* slow, as my system has to page memory like mad to fit
two large lists into memory at once. So I came up with an in-place
version that saves (approximately) two-thirds of the memory needed.
table = [(x, i) for i,x in enumerate(iterable)]
table.sort()
for x, i in table:
table[i] = x
For giant iterables (ten million items), this version is a big
improvement, about three times faster than the list comp version. Since
we're talking about the difference between 4 seconds and 12 seconds (plus
an additional 40-80 seconds of general slow-down as the computer pages
memory into and out of virtual memory), this is a good, solid
optimization.
Except that for more reasonably sized iterables, it's a pessimization.
With one million items, the ratio is the other way around: the list comp
version is 2-3 times faster than the in-place version. For smaller lists,
the ratio varies, but the list comp version is typically around twice as
fast. A good example of trading memory for time.
So, ideally I'd like to write my code like this:
table = [(x, i) for i,x in enumerate(iterable)]
table.sort()
if len(table) < ?????:
table = [i for x,i in table]
else:
for x, i in table:
table[i] = x
where ????? no doubt will depend on how much memory is available in one
contiguous chunk.
Is there any way to determine which branch I should run, apart from hard-
coding some arbitrary and constant cut-off value?
--
Steven
[toc] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2013-12-12 00:59 +0000 |
| Message-ID | <mailman.3942.1386809980.18130.python-list@python.org> |
| In reply to | #61630 |
On 11/12/2013 23:54, Steven D'Aprano wrote:
> I have some code which produces a list from an iterable using at least
> one temporary list, using a Decorate-Sort-Undecorate idiom. The algorithm
> looks something like this (simplified):
>
> table = sorted([(x, i) for i,x in enumerate(iterable)])
> table = [i for x,i in table]
>
> The problem here is that for large iterables, say 10 million items or so,
> this is *painfully* slow, as my system has to page memory like mad to fit
> two large lists into memory at once. So I came up with an in-place
> version that saves (approximately) two-thirds of the memory needed.
>
> table = [(x, i) for i,x in enumerate(iterable)]
> table.sort()
This looks wrong to me:
> for x, i in table:
> table[i] = x
Couldn't it replace an item it'll need later on?
Let me see if I can find an example where it would fail.
Start with:
>>> table = [('b', 0), ('a', 1)]
Sort it and you get:
>>> table.sort()
>>> table
[('a', 1), ('b', 0)]
Run that code:
>>> for x, i in table:
table[i] = x
Traceback (most recent call last):
File "<pyshell#18>", line 1, in <module>
for x, i in table:
ValueError: need more than 1 value to unpack
Why did it fail?
>>> table
[('a', 1), 'a']
The 2 methods give different results anyway: the first returns a list
of indexes, and the second returns a list of items from the iterable.
[snip]
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-12-12 01:43 +0000 |
| Message-ID | <52a914c3$0$29992$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #61632 |
On Thu, 12 Dec 2013 00:59:42 +0000, MRAB wrote:
>> table = [(x, i) for i,x in enumerate(iterable)]
>> table.sort()
>
> This looks wrong to me:
>
>> for x, i in table:
>> table[i] = x
Yes, you're right, I over-simplified the example, and in doing so
introduced a bug. What I actually use is:
for i, x in enumerate(table):
table[i] = x[1]
modulo any further bugs introduced while copying and pasting :-)
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2013-12-12 02:09 +0000 |
| Message-ID | <mailman.3951.1386814181.18130.python-list@python.org> |
| In reply to | #61644 |
On 12/12/2013 01:43, Steven D'Aprano wrote: > On Thu, 12 Dec 2013 00:59:42 +0000, MRAB wrote: > >>> table = [(x, i) for i,x in enumerate(iterable)] >>> table.sort() >> >> This looks wrong to me: >> >>> for x, i in table: >>> table[i] = x > > Yes, you're right, I over-simplified the example, and in doing so > introduced a bug. What I actually use is: > > for i, x in enumerate(table): > table[i] = x[1] > > modulo any further bugs introduced while copying and pasting :-) > How does this compare: table = list(iterable) indexes = list(range(len(table))) indexes.sort(key=table.__getitem__)
[toc] | [prev] | [next] | [standalone]
| From | duncan smith <buzzard@invalid.invalid> |
|---|---|
| Date | 2013-12-12 01:02 +0000 |
| Message-ID | <52a90b21$0$47892$862e30e2@ngroups.net> |
| In reply to | #61630 |
On 11/12/13 23:54, Steven D'Aprano wrote: > I have some code which produces a list from an iterable using at least > one temporary list, using a Decorate-Sort-Undecorate idiom. The algorithm > looks something like this (simplified): > > table = sorted([(x, i) for i,x in enumerate(iterable)]) > table = [i for x,i in table] > > The problem here is that for large iterables, say 10 million items or so, > this is *painfully* slow, as my system has to page memory like mad to fit > two large lists into memory at once. So I came up with an in-place > version that saves (approximately) two-thirds of the memory needed. > > table = [(x, i) for i,x in enumerate(iterable)] > table.sort() > for x, i in table: > table[i] = x > > For giant iterables (ten million items), this version is a big > improvement, about three times faster than the list comp version. Since > we're talking about the difference between 4 seconds and 12 seconds (plus > an additional 40-80 seconds of general slow-down as the computer pages > memory into and out of virtual memory), this is a good, solid > optimization. > > Except that for more reasonably sized iterables, it's a pessimization. > With one million items, the ratio is the other way around: the list comp > version is 2-3 times faster than the in-place version. For smaller lists, > the ratio varies, but the list comp version is typically around twice as > fast. A good example of trading memory for time. > > So, ideally I'd like to write my code like this: > > > table = [(x, i) for i,x in enumerate(iterable)] > table.sort() > if len(table) < ?????: > table = [i for x,i in table] > else: > for x, i in table: > table[i] = x > > where ????? no doubt will depend on how much memory is available in one > contiguous chunk. > > Is there any way to determine which branch I should run, apart from hard- > coding some arbitrary and constant cut-off value? > I had a slightly similar problem a while ago. I actually wanted to process data from a large file in sorted order. In the end I read chunks of data from the file, sorted them, then wrote each chunk of data to a temporary file. Then I used heapq.merge to merge the data in the temporary files. It vastly reduced memory consumption, and was 'quick enough'. It was based on Guido's solution for sorting a million 32-bit integers in 2MB of RAM (http://neopythonic.blogspot.co.uk/2008/10/sorting-million-32-bit-integers-in-2mb.html). Cheers. Duncan
[toc] | [prev] | [next] | [standalone]
| From | Ben Finney <ben+python@benfinney.id.au> |
|---|---|
| Date | 2013-12-12 12:18 +1100 |
| Message-ID | <mailman.3947.1386811122.18130.python-list@python.org> |
| In reply to | #61630 |
Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes: > For giant iterables (ten million items), this version is a big > improvement, about three times faster than the list comp version. […] > > Except that for more reasonably sized iterables, it's a pessimization. > With one million items, the ratio is the other way around: the list > comp version is 2-3 times faster than the in-place version. For > smaller lists, the ratio varies, but the list comp version is > typically around twice as fast. A good example of trading memory for > time. > Is there any way to determine which branch I should run, apart from > hard- coding some arbitrary and constant cut-off value? Hmm. The code isn't going to be able to accurately judge in advance the time it'll take to run. But perhaps it could quickly and accurately calculate the memory usage of your data structure? Is that useful for determining which branch to take? -- \ “The fact that a believer is happier than a skeptic is no more | `\ to the point than the fact that a drunken man is happier than a | _o__) sober one.” —George Bernard Shaw | Ben Finney
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-12-11 21:26 -0500 |
| Message-ID | <mailman.3953.1386815228.18130.python-list@python.org> |
| In reply to | #61630 |
On 12/11/2013 6:54 PM, Steven D'Aprano wrote: > I have some code which produces a list from an iterable using at least > one temporary list, using a Decorate-Sort-Undecorate idiom. It is a non-standard version thereof, as DSU usually means to decorate with a key that gets discarded. A couple of examples of input and expected output would have been good ;-). > The algorithm looks something like this (simplified): > > table = sorted([(x, i) for i,x in enumerate(iterable)]) This makes two temporaries when only one is needed, and shows why we added generator expressions. table = sorted((x, i) for i,x in enumerate(iterable)) is equivalent to the table; table.sort lines below. The following (untested) should be faster as it avoids tuple unpacking and repacking. from itertools import count table = sorted(t for t in zip(iterable, count)) > table = [i for x,i in table] Did your original un-simplified use zip instead enumerate? Table now has the original index of the items in iterable sorted by the items value. The use for this is not obvious. > The problem here is that for large iterables, say 10 million items or so, > this is *painfully* slow, as my system has to page memory like mad to fit > two large lists into memory at once. So I came up with an in-place > version that saves (approximately) two-thirds of the memory needed. With 1/3 saved by using a genex, it saves 1/2 of the remainder. > table = [(x, i) for i,x in enumerate(iterable)] > table.sorted() (You meant table.sort().) This is an expansion of sorted(genex). It might be slightly faster as list comp may be faster than list(equivalent genex). > for x, i in table: > table[i] = x I cannot understand what you are aiming for here. Besides the fact that this does not work, it keeps x values rather than i indexes as before. > table = [i for x,i in table] done in place is for j, (x,i) in enumerate(table): table[j] = i I expect the list comp be faster than in-place as long as the result list can be allocated and held in memory without paging. (This of course depends on system memory and other memory uses.) List iterators have a __length_hint__ method giving the length of the underlying list, so the list comp result list can potentially be allocated once and then filled in by enumeration and replacement, but in C rather than Python code. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-12-12 12:08 +0000 |
| Message-ID | <52a9a75a$0$29992$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #61650 |
On Wed, 11 Dec 2013 21:26:51 -0500, Terry Reedy wrote:
> On 12/11/2013 6:54 PM, Steven D'Aprano wrote:
>> I have some code which produces a list from an iterable using at least
>> one temporary list, using a Decorate-Sort-Undecorate idiom.
>
> It is a non-standard version thereof, as DSU usually means to decorate
> with a key that gets discarded.
I do decorate with a key that gets discarded. The actual values that are
being sorted are the enumerated values 0, 1, 2, ... and the temporary key
values come from the iterable.
Confused yet? :-)
> A couple of examples of input and expected output would have been good
> ;-).
Only if I wanted you to understand the algorithm, which isn't really
important. My question is about swapping between in-place modifications
and external copying. If I wanted to be really mysterious (annoying), I
could have just give pseudo-code :-)
Please don't focus on the algorithm I gave. Focus on the fact that I
could write it like this:
if some condition to do with the computer's available memory:
make modifications in place
else:
make a copy of the data containing the modifications
if only I knew what that condition was and how to test for it.
>> The algorithm looks something like this (simplified):
>>
>> table = sorted([(x, i) for i,x in enumerate(iterable)])
>
> This makes two temporaries when only one is needed, and shows why we
> added generator expressions.
You may be right in this particular case, I don't know I haven't tried
it, but there are cases where list comps are significantly faster than
generator expressions. For example, when passed to str.join, list comps
are (on my computer) consistently 15% faster:
[steve@ando ~]$ python3.3 -m timeit "''.join([chr(n) for n in range(1,
201)])"
10000 loops, best of 3: 100 usec per loop
[steve@ando ~]$ python3.3 -m timeit "''.join([chr(n) for n in range(1,
201)])"
10000 loops, best of 3: 94.1 usec per loop
[steve@ando ~]$ python3.3 -m timeit "''.join(chr(n) for n in range(1,
201))"
10000 loops, best of 3: 116 usec per loop
[steve@ando ~]$ python3.3 -m timeit "''.join(chr(n) for n in range(1,
201))"
10000 loops, best of 3: 109 usec per loop
> table = sorted((x, i) for i,x in enumerate(iterable)) is equivalent to
> the table; table.sort lines below.
>
> The following (untested) should be faster as it avoids tuple unpacking
> and repacking.
I'm not terribly fussed about micro-optimizations here. I'm concerned
about *macro-optimizing* the case where creating two (or more) lists
forces the computer to page memory in and out of virtual memory. Saving a
few milliseconds? I don't care. Saving 5 or 6 seconds? That I care about.
[...]
> I expect the list comp be faster than in-place as long as the result
> list can be allocated and held in memory without paging. (This of course
> depends on system memory and other memory uses.)
Exactly!
> List iterators have a
> __length_hint__ method giving the length of the underlying list, so the
> list comp result list can potentially be allocated once and then filled
> in by enumeration and replacement, but in C rather than Python code.
I don't want to pre-allocate the list comp. I want to avoid having to
have two LARGE lists in memory at the same time, one containing the
decorated values, and one not. I know that this is a good optimization
for large enough lists. On my computer, ten million items is sufficient
to demonstrate the optimization, and with sufficient testing I could
determine a rough cut-off value, below which list comps are more
efficient and above which in-place modifications are better.
But I don't know how to generalize that cut-off value. If I buy extra
RAM, the cut-off value will shift. If I run it on another computer, it
will shift.
P.S. The algorithm I'm working on is a way of generating index and rank
tables. Not that it really matters -- what matters is determining whether
or not to shift from "make a copy of the list" to "modify the list in
place".
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-12-12 23:25 +1100 |
| Message-ID | <mailman.3984.1386851879.18130.python-list@python.org> |
| In reply to | #61701 |
On Thu, Dec 12, 2013 at 11:08 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> P.S. The algorithm I'm working on is a way of generating index and rank
> tables. Not that it really matters -- what matters is determining whether
> or not to shift from "make a copy of the list" to "modify the list in
> place".
So you're currently looking at...
if len(table) < ?????:
table = [i for x,i in table]
else:
for x, i in table:
table[i] = x
Can I throw a spanner [1] in the works with other suggestions to try timing?
table[:] = [i for x,i in table] # Does slice assignment get optimized?
SPLIT = 1048576 # Pick some useful cutoff
for part in range(0,len(table),SPLIT):
table[part:part+SPLIT] = [i for x,i in table[part:part+SPLIT]]
If slice assignment is reasonably fast (which I suspect it is), the
one-liner should be practically identical to your small-table
one-liner. Then if the million-record splits can be done inside
memory, it ought to be possible to do this in comparable time, even if
the total table length is huge. The choice of SPLIT would then matter
a lot less than the cutoff that you're trying to find; you might have
been able to do it in half the number of sections, but that won't make
as big a difference as suddenly paging out. Ideally, what I'd like to
see is that a higher SPLIT improves performance slightly until it gets
too high, at which point you go bust and the dealer wins... but the
critical word there is "slightly", meaning that it wouldn't cost too
much for SPLIT to be lower than it needs to be. That's the criteria
for the experiment; do you have the data on which to try it?
[1] Monkey wrench, for the Yanks in the room
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2013-12-12 13:32 +0000 |
| Message-ID | <mailman.3986.1386855129.18130.python-list@python.org> |
| In reply to | #61701 |
On 12/12/2013 12:25, Chris Angelico wrote: > On Thu, Dec 12, 2013 at 11:08 PM, Steven D'Aprano > <steve+comp.lang.python@pearwood.info> wrote: >> P.S. The algorithm I'm working on is a way of generating index and rank >> tables. Not that it really matters -- what matters is determining whether >> or not to shift from "make a copy of the list" to "modify the list in >> place". > > So you're currently looking at... > > if len(table) < ?????: > table = [i for x,i in table] > else: > for x, i in table: > table[i] = x > > > Can I throw a spanner [1] in the works with other suggestions to try timing? > > table[:] = [i for x,i in table] # Does slice assignment get optimized? > [snip] If you're trying that, you could also try: table[:] = (i for x,i in table)
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-12-13 01:06 +1100 |
| Message-ID | <mailman.3991.1386857197.18130.python-list@python.org> |
| In reply to | #61701 |
On Fri, Dec 13, 2013 at 12:32 AM, MRAB <python@mrabarnett.plus.com> wrote: >> table[:] = [i for x,i in table] # Does slice assignment get optimized? >> > [snip] > > If you're trying that, you could also try: > > table[:] = (i for x,i in table) Right, a tweak which could be applied also to the split version. Thanks MRAB. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-12-12 13:40 -0500 |
| Message-ID | <mailman.4010.1386873654.18130.python-list@python.org> |
| In reply to | #61701 |
On 12/12/2013 7:08 AM, Steven D'Aprano wrote: > Please don't focus on the algorithm I gave. Focus on the fact that I > could write it like this: > > if some condition to do with the computer's available memory: > make modifications in place > else: > make a copy of the data containing the modifications > > if only I knew what that condition was and how to test for it. In other words, you want a magic formula that depends on just about everything: the (static) memory in the machine, the other programs running, the memory otherwise used by the current program, the number items, the nature of the items, and possibly the memory fragmentation state. > You may be right in this particular case, I don't know I haven't tried > it, but there are cases where list comps are significantly faster than > generator expressions. Stefan noted that neither is needed if zip produces the items wanted. > I'm not terribly fussed about micro-optimizations here. I'm concerned > about *macro-optimizing* the case where creating two (or more) lists > forces the computer to page memory in and out of virtual memory. Saving a > few milliseconds? I don't care. Saving 5 or 6 seconds? That I care about. Why would you spend an hour or more of your time to save 5 seconds? Theoretical interest or a practical problem with enough runs to turns seconds into hours or days? A non-trivial practical answer will require more info about the practical context, including the distribution of machine memory sizes and of problem sizes. > I don't want to pre-allocate the list comp. I want to avoid having to > have two LARGE lists in memory at the same time, one containing the > decorated values, and one not. I know that this is a good optimization > for large enough lists. On my computer, ten million items is sufficient > to demonstrate the optimization, and with sufficient testing I could > determine a rough cut-off value, below which list comps are more > efficient and above which in-place modifications are better. > > But I don't know how to generalize that cut-off value. If I buy extra > RAM, the cut-off value will shift. If I run it on another computer, it > will shift. The simplest answer is to always avoid catastrophe by always modifying the one list. For 'short' lists, the time difference will be relatively small. If the fraction of short lists is large enough to make the cumulative time difference large enough, and the list object sizes are more or less constant or at least bounded, a possible answer is to pick a minimum memory size, get a machine with that size, and work out a list size small enough to avoid thrashing. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-12-13 00:14 +0000 |
| Message-ID | <52aa5149$0$29992$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #61739 |
On Thu, 12 Dec 2013 13:40:36 -0500, Terry Reedy wrote: > On 12/12/2013 7:08 AM, Steven D'Aprano wrote: > >> Please don't focus on the algorithm I gave. Focus on the fact that I >> could write it like this: >> >> if some condition to do with the computer's available memory: >> make modifications in place >> else: >> make a copy of the data containing the modifications >> >> if only I knew what that condition was and how to test for it. > > In other words, you want a magic formula that depends on just about > everything: the (static) memory in the machine, the other programs > running, the memory otherwise used by the current program, the number > items, the nature of the items, and possibly the memory fragmentation > state. Yes. Surely there's a way to check whether allocating a list of N items will succeed without paging, without actually having to allocate a list of N items? Back in the 1980s, when I was a Mac user and occasional programmer, there were memory manager routines which (if I recall correctly) would tell you whether or not an allocation would succeed or not. Macs of that vintage didn't have virtual memory, it is true, but the principle remains sound: "is there a single contiguous block of N bytes available? if so, use a list comprehension, if not, switch to an in-place algorithm". Until Python, I had never used a programming language that didn't give you at least a rough-and-ready way to determine how much memory was available so you could make an informed guess as to whether an expensive operation would succeed before actually attempting it. I will be very disappointed (but hardly surprised) to learn that functionality which was standard in 1984 computers is not possible in 2013. But I don't *specifically* care about measuring memory size, only as a proxy for whether or not I should switch algorithms. That's why this post is called "Optimizing list processing" rather than "Finding out how much memory is available". If there's another way to solve the same problem, I'm happy to hear it. [...] >> I'm not terribly fussed about micro-optimizations here. I'm concerned >> about *macro-optimizing* the case where creating two (or more) lists >> forces the computer to page memory in and out of virtual memory. Saving >> a few milliseconds? I don't care. Saving 5 or 6 seconds? That I care >> about. > > Why would you spend an hour or more of your time to save 5 seconds? For a number of reasons: - Trading off increased memory use for faster code is a good optimization technique so long as it actually does result in faster code. I measured, and discovered that there comes a point where it results in serious performance degradation. Not just a little bit slower by 5% or 10%, but nearly 300% slower. We're told not to optimize until we've measured. Well I've measured. Swapping algorithms to in-place for large enough lists is a BIG time saver. - It's not just the five seconds that I save, but the 30 or 50 seconds of general slowdown of *everything* on the computer as it frantically pages blocks of memory. And not just on *this* computer, if I'm running the code on a server it will effect all the other machines relying on that server. That five seconds of time for a single process may affect all the users of that machine for a minute or two before the system settles down to normal performance again. Paging is *deadly* for performance -- moving memory around is typically O(N**2), and hard drive speeds are typically millions of times slower than RAM speeds. Here is a graphical view of the latency of various media: http://i.imgur.com/X1Hi1.gif [...] > The simplest answer is to always avoid catastrophe by always modifying > the one list. For 'short' lists, the time difference will be relatively > small. It seems perverse to avoid writing idiomatic, and usually faster, Python code just because occasionally it performs badly. What you describe is a last resort. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-12-13 12:01 +1100 |
| Message-ID | <mailman.4042.1386896504.18130.python-list@python.org> |
| In reply to | #61777 |
On Fri, Dec 13, 2013 at 11:14 AM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > Back in the 1980s, when I was a Mac user and occasional programmer, there > were memory manager routines which (if I recall correctly) would tell you > whether or not an allocation would succeed or not. Macs of that vintage > didn't have virtual memory, it is true, but the principle remains sound: > "is there a single contiguous block of N bytes available? if so, use a > list comprehension, if not, switch to an in-place algorithm". Until > Python, I had never used a programming language that didn't give you at > least a rough-and-ready way to determine how much memory was available so > you could make an informed guess as to whether an expensive operation > would succeed before actually attempting it. > > I will be very disappointed (but hardly surprised) to learn that > functionality which was standard in 1984 computers is not possible in > 2013. The situation is a bit different. Virtualization of memory makes it a lot harder to figure out what can/can't be allocated without paging. As long as you're okay with it being really rough, you could probably get some info from the system. On my Linux box, /proc/meminfo tells me: MemTotal: 16452036 kB MemFree: 1689636 kB Buffers: 251240 kB Cached: 6228724 kB That suggests I can allocate 1.5GB + .25GB + 6.25GB = 8GB, roughly, before hitting the swapper. (Buffers and Cached represent memory that will be reclaimed to satisfy other requests, but is being used - most of it caching stuff off the primary filesystem.) In very VERY round figures, that would be how much it ought to be safe to allocate. Now, how that compares with your list requirements is a separate question. I've no idea how many list elements would fit in that 8GB, but it probably won't be a billion (which would be the theoretical, one 64-bit pointer per element). This might tie in nicely with the theory I suggested about splitting the job. You could pick a SPLIT by figuring how much memory is free, dividing that by your pointer size or other estimate of usage, halving it because you need two lists, and then halving again for safety. On this same 64-bit system, sys.getsizeof(list(range(N))) for various values of N seems to suggest that it approaches 9*N. So by the guesstimate above, I come up with a plausible SPLIT of 226933 == (1689636+251240+6228724)//9//2//2, not forgetting that this is scaled by one SI unit (so 226 million elements would be safe). Alas, that's not true, because I just tried that, and he hit the swapper... even though sys.getsizeof() returned the expected N*9+112 figure. Which is because I'm an idiot. Of course. The reason it's more is because I'm creating all those integers too... actually, I'm quite impressed with that, that's costing almost nothing more. >>> sys.getsizeof(list(range(100000000))) 900000112 >>> sys.getsizeof([None]*100000000) 800000064 Not quite sure what's going on there. I know those integers are taking up more than 1 byte of storage. Guess getsizeof() isn't the best way to calculate. Anyway, a bit of experimentation should tell you how much you can do safely, and as long as there's a good margin of error, this would be an extremely rough guide. Now, getting that info in a cross-platform way... that's the problem! ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Stefan Behnel <stefan_ml@behnel.de> |
|---|---|
| Date | 2013-12-12 12:09 +0100 |
| Message-ID | <mailman.3980.1386846588.18130.python-list@python.org> |
| In reply to | #61630 |
Terry Reedy, 12.12.2013 03:26: > from itertools import count > table = sorted(t for t in zip(iterable, count)) This is a useless use of a generator expression. sorted(zip(...)) is enough (and likely to be substantially faster also). Stefan
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2013-12-12 16:08 +0100 |
| Message-ID | <mailman.3999.1386860910.18130.python-list@python.org> |
| In reply to | #61630 |
Steven D'Aprano wrote: > I have some code which produces a list from an iterable using at least > one temporary list, using a Decorate-Sort-Undecorate idiom. The algorithm > looks something like this (simplified): > > table = sorted([(x, i) for i,x in enumerate(iterable)]) > table = [i for x,i in table] > > The problem here is that for large iterables, say 10 million items or so, > this is *painfully* slow, as my system has to page memory like mad to fit > two large lists into memory at once. So I came up with an in-place > version that saves (approximately) two-thirds of the memory needed. > > table = [(x, i) for i,x in enumerate(iterable)] > table.sort() > for x, i in table: > table[i] = x > > For giant iterables (ten million items), this version is a big > improvement, about three times faster than the list comp version. Since > we're talking about the difference between 4 seconds and 12 seconds (plus > an additional 40-80 seconds of general slow-down as the computer pages > memory into and out of virtual memory), this is a good, solid > optimization. > > Except that for more reasonably sized iterables, it's a pessimization. > With one million items, the ratio is the other way around: the list comp > version is 2-3 times faster than the in-place version. For smaller lists, > the ratio varies, but the list comp version is typically around twice as > fast. A good example of trading memory for time. > > So, ideally I'd like to write my code like this: > > > table = [(x, i) for i,x in enumerate(iterable)] > table.sort() > if len(table) < ?????: > table = [i for x,i in table] > else: > for x, i in table: > table[i] = x > > where ????? no doubt will depend on how much memory is available in one > contiguous chunk. > > Is there any way to determine which branch I should run, apart from hard- > coding some arbitrary and constant cut-off value? How about using two lists? keys = list(iterable) values = range(len(keys)) values.sort(key=keys.__getitem__) del keys The intention is to save memory used for the 2-tuples; I don't know if they pop up elsewhere.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-12-13 03:01 +0000 |
| Message-ID | <52aa7890$0$29992$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #61726 |
On Thu, 12 Dec 2013 16:08:33 +0100, Peter Otten wrote:
> Steven D'Aprano wrote:
[...]
>> So, ideally I'd like to write my code like this:
>>
>>
>> table = [(x, i) for i,x in enumerate(iterable)] table.sort()
>> if len(table) < ?????:
>> table = [i for x,i in table]
>> else:
>> for x, i in table:
>> table[i] = x
>>
>> where ????? no doubt will depend on how much memory is available in one
>> contiguous chunk.
>>
>> Is there any way to determine which branch I should run, apart from
>> hard- coding some arbitrary and constant cut-off value?
>
> How about using two lists?
>
> keys = list(iterable)
> values = range(len(keys))
> values.sort(key=keys.__getitem__)
> del keys
>
> The intention is to save memory used for the 2-tuples; I don't know if
> they pop up elsewhere.
They don't. On the other hand, you've now got two lists where before I
only had one.
In my post, I said that the code I showed was a simplification. What I
actually have is two branches. If the given iterable is a sequence type
(say, a list) I use something quite similar to your code above:
if isinstance(iterable, collections.Sequence):
getitem = iterable.__getitem__
if key is not None:
# Compose the given key with getitem.
f = lambda i: key(getitem(i))
else:
f = getitem
table = range(len(iterable)) # Python 2.x version
table.sort(key=f, reverse=reverse)
which avoids any temporary two-tuples, and creates only the one
additional list. I can't avoid making that list, since it's the result
required. I don't think that this branch of the code can be optimized in
any serious way, I can't see any additional fat to be trimmed off that.
If the input list is so large as to cause thrashing, the only solution is
to buy more memory.
The other branch handles iterators, and I have two versions. The
idiomatic version is faster for small data sets (small in this case
meaning "up to at least one million items on my computer") but ends up
building at least one extra temporary list which is thrown away:
else:
table = [(x, i) for (i, x) in enumerate(iterable)]
table.sort(key=key, reverse=reverse)
table = [i for x, i in table]
I've experimented with slight variations, e.g. using sorted() and a
generator expression instead of a list comp, and the differences are
trivial. The important thing here is that there's a temporary list which
in some cases is *very large* but destined to be thrown away. Having two
very large lists (for large enough input sizes) causes thrashing.
To avoid that thrashing, I have a second version which builds only one
list and then modifies it in place. For very large input sizes (ten
million items) this is significantly faster:
else:
table = [(x, i) for (i, x) in enumerate(iterable)]
table.sort(key=key, reverse=reverse)
for x, i in enumerate(table):
table[i] = x
but much slower, and less idiomatic, for small input sizes.
I don't know of any reasonable way to tell at runtime which of the two
algorithms I ought to take. Hard-coding an arbitrary value
("if len(table) > 5000000") is not the worst idea I've ever had, but I'm
hoping for something more deterministic which can be cheaply calculated
at runtime.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | rusi <rustompmody@gmail.com> |
|---|---|
| Date | 2013-12-12 21:35 -0800 |
| Message-ID | <3a844cdb-218a-43e2-a1ad-7a7186dea282@googlegroups.com> |
| In reply to | #61785 |
On Friday, December 13, 2013 8:31:37 AM UTC+5:30, Steven D'Aprano wrote:
> I don't know of any reasonable way to tell at runtime which of the two
> algorithms I ought to take. Hard-coding an arbitrary value
> ("if len(table) > 5000000") is not the worst idea I've ever had, but I'm
> hoping for something more deterministic which can be cheaply calculated
> at runtime.
Have you seen guppy and heapy?
http://guppy-pe.sourceforge.net/
Even if you dont use it directly, it should have syscalls that you can steal
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-12-12 13:07 -0500 |
| Message-ID | <mailman.4006.1386871684.18130.python-list@python.org> |
| In reply to | #61630 |
On 12/12/2013 6:09 AM, Stefan Behnel wrote: > Terry Reedy, 12.12.2013 03:26: >> from itertools import count >> table = sorted(t for t in zip(iterable, count)) > > This is a useless use of a generator expression. sorted(zip(...)) is enough > (and likely to be substantially faster also). Yes, definitely, and thank you. I did not squeeze enough. -- Terry Jan Reedy
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web