Re: Fast pythonic way to process a huge integer list

Path	csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From	Peter Otten <__peter__@web.de>
Newsgroups	comp.lang.python
Subject	Re: Fast pythonic way to process a huge integer list
Date	Thu, 07 Jan 2016 11:21:03 +0100
Organization	None
Lines	52
Message-ID	<mailman.40.1452162085.2305.python-list@python.org> (permalink)
References	<7e2b93e4-c224-40c4-8e88-7dcc847edab1@googlegroups.com>
Mime-Version	1.0
Content-Type	text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding	7Bit
X-Trace	news.uni-berlin.de khP1c8F+h5Szwl6RcOSzdwYdIf8snnHux4tDvO+wWc6A==
Return-Path	<python-python-list@m.gmane.org>
X-Original-To	python-list@python.org
Delivered-To	python-list@mail.python.org
X-Spam-Status	OK 0.000
X-Spam-Evidence	'H': 1.00; 'S': 0.00; 'chunk': 0.07; '128': 0.09; '[1,': 0.09; 'chunks': 0.09; 'integers': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:process': 0.09; 'def': 0.13; '(3,': 0.16; '...]': 0.16; '3],': 0.16; 'chunk:': 0.16; 'consume': 0.16; 'dummy': 0.16; 'integers.': 0.16; 'itertools': 0.16; 'numpy': 0.16; 'pythonic': 0.16; 'range(0,': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'received:t-ipconnect.de': 0.16; 'skipped': 0.16; 'wrote:': 0.16; 'memory': 0.17; '>>>': 0.20; 'file:': 0.22; 'pass': 0.22; 'import': 0.24; 'header:User- Agent:1': 0.26; 'subject:list': 0.26; 'header:X-Complaints-To:1': 0.26; 'yield': 0.27; 'starts': 0.29; 'list': 0.34; 'could': 0.35; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'really': 0.37; 'received:org': 0.37; 'skip:p 20': 0.38; 'easily': 0.39; 'to:addr:python.org': 0.40; 'received:de': 0.40; 'your': 0.60; 'email addr:gmail.com': 0.62; 'skip:n 10': 0.62; 'million': 0.74; '(your': 0.84; 'footprint': 0.84; 'good:': 0.84; 'subject:Fast': 0.84
X-Injected-Via-Gmane	http://gmane.org/
X-Gmane-NNTP-Posting-Host	p57bd9a62.dip0.t-ipconnect.de
User-Agent	KNode/4.13.3
X-BeenThere	python-list@python.org
X-Mailman-Version	2.1.20+
Precedence	list
List-Id	General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe	<https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive	<http://mail.python.org/pipermail/python-list/>
List-Post	<mailto:python-list@python.org>
List-Help	<mailto:python-list-request@python.org?subject=help>
List-Subscribe	<https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Xref	csiph.com comp.lang.python:101328

Show key headers only | View raw

high5storage@gmail.com wrote:

> I have a list of 163.840 integers. What is a fast & pythonic way to
> process this list in 1,280 chunks of 128 integers?

What kind of processing do you have in mind? 
If it is about numbercrunching use a numpy.array. This can also easily 
change its shape:

>>> import numpy
>>> a = numpy.array(range(12))
>>> a
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
>>> a.shape = (3, 4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

If it's really only(!) under a million integers slicing is also good:

items = [1, 2, ...]
CHUNKSIZE = 128

for i in range(0, len(items), CHUNKSIZE):
    process(items[start:start + CHUNKSIZE])

If the "list" is really huge (your system starts swapping memory) you can go 
completely lazy:

from itertools import chain, islice

def chunked(items, chunksize):
    items = iter(items)
    for first in items:
        chunk = chain((first,), islice(items, chunksize-1))
        yield chunk
        for dummy in chunk:  # consume items that may have been skipped
                             # by your processing
            pass

def produce_items(file):
    for line in file:
        yield int(line)

CHUNKSIZE = 128  # this could also be "huge" 
                 # without affecting memory footprint

with open("somefile") as file:
    for chunk in chunked(produce_items(file), CHUNKSIZE):
        process(chunk)

Thread

Fast pythonic way to process a huge integer list high5storage@gmail.com - 2016-01-06 18:36 -0800
  Re: Fast pythonic way to process a huge integer list Terry Reedy <tjreedy@udel.edu> - 2016-01-06 22:10 -0500
  Re: Fast pythonic way to process a huge integer list Tim Chase <python.list@tim.thechases.com> - 2016-01-06 21:21 -0600
  Re: Fast pythonic way to process a huge integer list Cameron Simpson <cs@zip.com.au> - 2016-01-07 14:31 +1100
  Re: Fast pythonic way to process a huge integer list Steven D'Aprano <steve@pearwood.info> - 2016-01-07 20:25 +1100
  Re: Fast pythonic way to process a huge integer list Peter Otten <__peter__@web.de> - 2016-01-07 11:21 +0100
  Re: Fast pythonic way to process a huge integer list KP <kai.peters@gmail.com> - 2016-01-07 16:33 -0800

csiph-web