Groups > comp.lang.python > #101315 > unrolled thread

Fast pythonic way to process a huge integer list

Started by	high5storage@gmail.com
First post	2016-01-06 18:36 -0800
Last post	2016-01-07 16:33 -0800
Articles	7 — 7 participants

Back to article view | Back to comp.lang.python

  Fast pythonic way to process a huge integer list high5storage@gmail.com - 2016-01-06 18:36 -0800
    Re: Fast pythonic way to process a huge integer list Terry Reedy <tjreedy@udel.edu> - 2016-01-06 22:10 -0500
    Re: Fast pythonic way to process a huge integer list Tim Chase <python.list@tim.thechases.com> - 2016-01-06 21:21 -0600
    Re: Fast pythonic way to process a huge integer list Cameron Simpson <cs@zip.com.au> - 2016-01-07 14:31 +1100
    Re: Fast pythonic way to process a huge integer list Steven D'Aprano <steve@pearwood.info> - 2016-01-07 20:25 +1100
    Re: Fast pythonic way to process a huge integer list Peter Otten <__peter__@web.de> - 2016-01-07 11:21 +0100
    Re: Fast pythonic way to process a huge integer list KP <kai.peters@gmail.com> - 2016-01-07 16:33 -0800

#101315 — Fast pythonic way to process a huge integer list

From	high5storage@gmail.com
Date	2016-01-06 18:36 -0800
Subject	Fast pythonic way to process a huge integer list
Message-ID	<7e2b93e4-c224-40c4-8e88-7dcc847edab1@googlegroups.com>

I have a list of 163.840 integers. What is a fast & pythonic way to process this list in 1,280 chunks of 128 integers?

[toc] | [next] | [standalone]

#101316

From	Terry Reedy <tjreedy@udel.edu>
Date	2016-01-06 22:10 -0500
Message-ID	<mailman.35.1452136231.2305.python-list@python.org>
In reply to	#101315

On 1/6/2016 9:36 PM, high5storage@gmail.com wrote:
>
> I have a list of 163.840 integers. What is a fast & pythonic way to process this list in 1,280 chunks of 128 integers?

What have you tried that did not work?  This is really pretty simple, 
but the detail depend on the meaning of 'process a chunk'.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#101317

From	Tim Chase <python.list@tim.thechases.com>
Date	2016-01-06 21:21 -0600
Message-ID	<mailman.36.1452137045.2305.python-list@python.org>
In reply to	#101315

On 2016-01-06 18:36, high5storage@gmail.com wrote:
> I have a list of 163.840 integers. What is a fast & pythonic way to
> process this list in 1,280 chunks of 128 integers?

That's a modest list, far from huge.

You have lots of options, but the following seems the most pythonic to
me:

  # I don't know how you populate your data so
  # create some junk data
  from random import randint
  data = [randint(0,1000) for _ in range(163840)]

  import itertools as i
  GROUP_SIZE = 128

  def do_something(grp, vals):
    for _, val in vals:
      # I don't know what you want to do with each
      # pair.  You can print them:

      # print("%s: %s" % (grp, val))

      # or write them to various chunked files:
      with open("chunk%04i.txt" % grp, "w") as f:
        f.write(str(val))
        f.write("\n")

  # but here's the core logic:

  def key_fn(x):
    # x is a tuple of (index, value)
    return x[0] // GROUP_SIZE

  # actually iterate over the grouped data
  # and do something with it:
  for grp, vals in i.groupby(enumerate(data), key_fn):
    do_something(grp, vals)

-tkc

[toc] | [prev] | [next] | [standalone]

#101318

From	Cameron Simpson <cs@zip.com.au>
Date	2016-01-07 14:31 +1100
Message-ID	<mailman.37.1452137521.2305.python-list@python.org>
In reply to	#101315

On 06Jan2016 18:36, high5storage@gmail.com <high5storage@gmail.com> wrote:
>I have a list of 163.840 integers. What is a fast & pythonic way to process 
>this list in 1,280 chunks of 128 integers?

The depends. When you say "list", is it already a _python_ list? Or do you just 
mean that the intergers are in a file or something?

If they're already in a python list you can probably just use a range:

  for offset in range(0, 163840, 128):
    ... do stuff with the elements starting at offset ...

Cheers,
Cameron Simpson <cs@zip.com.au>

[toc] | [prev] | [next] | [standalone]

#101323

From	Steven D'Aprano <steve@pearwood.info>
Date	2016-01-07 20:25 +1100
Message-ID	<568e2f1d$0$1602$c3e8da3$5496439d@news.astraweb.com>
In reply to	#101315

On Thu, 7 Jan 2016 01:36 pm, high5storage@gmail.com wrote:

> 
> I have a list of 163.840 integers. What is a fast & pythonic way to
> process this list in 1,280 chunks of 128 integers?


py> from itertools import izip_longest
py> def grouper(iterable, n, fillvalue=None):
...     "Collect data into fixed-length chunks or blocks"
...     # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
...     args = [iter(iterable)] * n
...     return izip_longest(fillvalue=fillvalue, *args)
...
py> alist = range(163840)
py> count = 0
py> for block in grouper(alist, 128):
...     assert len(list(block)) == 128
...     count += 1
...
py> count
1280


This was almost instantaneous on my computer. 163840 isn't a very large
number of ints.




-- 
Steven

[toc] | [prev] | [next] | [standalone]

#101328

From	Peter Otten <__peter__@web.de>
Date	2016-01-07 11:21 +0100
Message-ID	<mailman.40.1452162085.2305.python-list@python.org>
In reply to	#101315

high5storage@gmail.com wrote:

> I have a list of 163.840 integers. What is a fast & pythonic way to
> process this list in 1,280 chunks of 128 integers?

What kind of processing do you have in mind? 
If it is about numbercrunching use a numpy.array. This can also easily 
change its shape:

>>> import numpy
>>> a = numpy.array(range(12))
>>> a
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
>>> a.shape = (3, 4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

If it's really only(!) under a million integers slicing is also good:

items = [1, 2, ...]
CHUNKSIZE = 128

for i in range(0, len(items), CHUNKSIZE):
    process(items[start:start + CHUNKSIZE])

If the "list" is really huge (your system starts swapping memory) you can go 
completely lazy:

from itertools import chain, islice

def chunked(items, chunksize):
    items = iter(items)
    for first in items:
        chunk = chain((first,), islice(items, chunksize-1))
        yield chunk
        for dummy in chunk:  # consume items that may have been skipped
                             # by your processing
            pass

def produce_items(file):
    for line in file:
        yield int(line)

CHUNKSIZE = 128  # this could also be "huge" 
                 # without affecting memory footprint

with open("somefile") as file:
    for chunk in chunked(produce_items(file), CHUNKSIZE):
        process(chunk)

[toc] | [prev] | [next] | [standalone]

#101355

From	KP <kai.peters@gmail.com>
Date	2016-01-07 16:33 -0800
Message-ID	<17e513e2-fb15-47ba-8606-97b8a1b904ef@googlegroups.com>
In reply to	#101315

On Wednesday, 6 January 2016 18:37:22 UTC-8, high5s...@gmail.com  wrote:
> I have a list of 163.840 integers. What is a fast & pythonic way to process this list in 1,280 chunks of 128 integers?

Thanks all for your valuable input - much appreciated!

[toc] | [prev] | [standalone]

csiph-web

Fast pythonic way to process a huge integer list

Contents

#101315 — Fast pythonic way to process a huge integer list

#101316

#101317

#101318

#101323

#101328

#101355