Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From: Peter Otten <__peter__@web.de>
Newsgroups: comp.lang.python
Subject: Re: Fast pythonic way to process a huge integer list
Date: Thu, 07 Jan 2016 11:21:03 +0100
Organization: None
Lines: 52
Message-ID: <mailman.40.1452162085.2305.python-list@python.org>
References: <7e2b93e4-c224-40c4-8e88-7dcc847edab1@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7Bit
User-Agent: KNode/4.13.3
Precedence: list
Xref: csiph.com comp.lang.python:101328

high5storage@gmail.com wrote:

> I have a list of 163.840 integers. What is a fast & pythonic way to
> process this list in 1,280 chunks of 128 integers?

What kind of processing do you have in mind? 
If it is about numbercrunching use a numpy.array. This can also easily 
change its shape:

>>> import numpy
>>> a = numpy.array(range(12))
>>> a
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
>>> a.shape = (3, 4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

If it's really only(!) under a million integers slicing is also good:

items = [1, 2, ...]
CHUNKSIZE = 128

for i in range(0, len(items), CHUNKSIZE):
    process(items[start:start + CHUNKSIZE])

If the "list" is really huge (your system starts swapping memory) you can go 
completely lazy:

from itertools import chain, islice

def chunked(items, chunksize):
    items = iter(items)
    for first in items:
        chunk = chain((first,), islice(items, chunksize-1))
        yield chunk
        for dummy in chunk:  # consume items that may have been skipped
                             # by your processing
            pass

def produce_items(file):
    for line in file:
        yield int(line)

CHUNKSIZE = 128  # this could also be "huge" 
                 # without affecting memory footprint

with open("somefile") as file:
    for chunk in chunked(produce_items(file), CHUNKSIZE):
        process(chunk)