Re: Optimizing list processing

From	Terry Reedy <tjreedy@udel.edu>
Subject	Re: Optimizing list processing
Date	2013-12-11 21:26 -0500
References	<52a8fb2d$0$29992$c3e8da3$5496439d@news.astraweb.com>
Newsgroups	comp.lang.python
Message-ID	<mailman.3953.1386815228.18130.python-list@python.org> (permalink)

Show all headers | View raw

On 12/11/2013 6:54 PM, Steven D'Aprano wrote:
> I have some code which produces a list from an iterable using at least
> one temporary list, using a Decorate-Sort-Undecorate idiom.

It is a non-standard version thereof, as DSU usually means to decorate 
with a key that gets discarded.

A couple of examples of input and expected output would have been good ;-).

> The algorithm looks something like this (simplified):
>
> table = sorted([(x, i) for i,x in enumerate(iterable)])

This makes two temporaries when only one is needed, and shows why we 
added generator expressions.

table = sorted((x, i) for i,x in enumerate(iterable))
is equivalent to the table; table.sort lines below.

The following (untested) should be faster as it avoids tuple unpacking 
and repacking.

from itertools import count
table = sorted(t for t in zip(iterable, count))

 > table = [i for x,i in table]

Did your original un-simplified use zip instead enumerate? Table now has 
the original index of the items in iterable sorted by the items value. 
The use for this is not obvious.

> The problem here is that for large iterables, say 10 million items or so,
> this is *painfully* slow, as my system has to page memory like mad to fit
> two large lists into memory at once. So I came up with an in-place
> version that saves (approximately) two-thirds of the memory needed.

With 1/3 saved by using a genex, it saves 1/2 of the remainder.

> table = [(x, i) for i,x in enumerate(iterable)]
> table.sorted()

(You meant table.sort().)  This is an expansion of sorted(genex). It 
might be slightly faster as list comp may be faster than list(equivalent 
genex).

> for x, i in table:
>      table[i] = x

I cannot understand what you are aiming for here. Besides the fact that 
this does not work, it keeps x values rather than i indexes as before.

 > table = [i for x,i in table]

done in place is

for j, (x,i) in enumerate(table):
   table[j] = i

I expect the list comp be faster than in-place as long as the result 
list can be allocated and held in memory without paging. (This of course 
depends on system memory and other memory uses.)  List iterators have a 
__length_hint__ method giving the length of the underlying list, so the 
list comp result list can potentially be allocated once and then filled 
in by enumeration and replacement, but in C rather than Python code.

-- 
Terry Jan Reedy

Thread

Optimizing list processing Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-11 23:54 +0000
  Re: Optimizing list processing MRAB <python@mrabarnett.plus.com> - 2013-12-12 00:59 +0000
    Re: Optimizing list processing Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-12 01:43 +0000
      Re: Optimizing list processing MRAB <python@mrabarnett.plus.com> - 2013-12-12 02:09 +0000
  Re: Optimizing list processing duncan smith <buzzard@invalid.invalid> - 2013-12-12 01:02 +0000
  Re: Optimizing list processing Ben Finney <ben+python@benfinney.id.au> - 2013-12-12 12:18 +1100
  Re: Optimizing list processing Terry Reedy <tjreedy@udel.edu> - 2013-12-11 21:26 -0500
    Re: Optimizing list processing Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-12 12:08 +0000
      Re: Optimizing list processing Chris Angelico <rosuav@gmail.com> - 2013-12-12 23:25 +1100
      Re: Optimizing list processing MRAB <python@mrabarnett.plus.com> - 2013-12-12 13:32 +0000
      Re: Optimizing list processing Chris Angelico <rosuav@gmail.com> - 2013-12-13 01:06 +1100
      Re: Optimizing list processing Terry Reedy <tjreedy@udel.edu> - 2013-12-12 13:40 -0500
        Re: Optimizing list processing Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-13 00:14 +0000
          Re: Optimizing list processing Chris Angelico <rosuav@gmail.com> - 2013-12-13 12:01 +1100
  Re: Optimizing list processing Stefan Behnel <stefan_ml@behnel.de> - 2013-12-12 12:09 +0100
  Re: Optimizing list processing Peter Otten <__peter__@web.de> - 2013-12-12 16:08 +0100
    Re: Optimizing list processing Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-13 03:01 +0000
      Re: Optimizing list processing rusi <rustompmody@gmail.com> - 2013-12-12 21:35 -0800
  Re: Optimizing list processing Terry Reedy <tjreedy@udel.edu> - 2013-12-12 13:07 -0500

csiph-web