Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Peter Otten <__peter__@web.de> Newsgroups: comp.lang.python Subject: Re: Fast pythonic way to process a huge integer list Date: Thu, 07 Jan 2016 11:21:03 +0100 Organization: None Lines: 52 Message-ID: References: <7e2b93e4-c224-40c4-8e88-7dcc847edab1@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit X-Trace: news.uni-berlin.de khP1c8F+h5Szwl6RcOSzdwYdIf8snnHux4tDvO+wWc6A== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'chunk': 0.07; '128': 0.09; '[1,': 0.09; 'chunks': 0.09; 'integers': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:process': 0.09; 'def': 0.13; '(3,': 0.16; '...]': 0.16; '3],': 0.16; 'chunk:': 0.16; 'consume': 0.16; 'dummy': 0.16; 'integers.': 0.16; 'itertools': 0.16; 'numpy': 0.16; 'pythonic': 0.16; 'range(0,': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'received:t-ipconnect.de': 0.16; 'skipped': 0.16; 'wrote:': 0.16; 'memory': 0.17; '>>>': 0.20; 'file:': 0.22; 'pass': 0.22; 'import': 0.24; 'header:User- Agent:1': 0.26; 'subject:list': 0.26; 'header:X-Complaints-To:1': 0.26; 'yield': 0.27; 'starts': 0.29; 'list': 0.34; 'could': 0.35; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'really': 0.37; 'received:org': 0.37; 'skip:p 20': 0.38; 'easily': 0.39; 'to:addr:python.org': 0.40; 'received:de': 0.40; 'your': 0.60; 'email addr:gmail.com': 0.62; 'skip:n 10': 0.62; 'million': 0.74; '(your': 0.84; 'footprint': 0.84; 'good:': 0.84; 'subject:Fast': 0.84 X-Injected-Via-Gmane: http://gmane.org/ X-Gmane-NNTP-Posting-Host: p57bd9a62.dip0.t-ipconnect.de User-Agent: KNode/4.13.3 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:101328 high5storage@gmail.com wrote: > I have a list of 163.840 integers. What is a fast & pythonic way to > process this list in 1,280 chunks of 128 integers? What kind of processing do you have in mind? If it is about numbercrunching use a numpy.array. This can also easily change its shape: >>> import numpy >>> a = numpy.array(range(12)) >>> a array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) >>> a.shape = (3, 4) >>> a array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) If it's really only(!) under a million integers slicing is also good: items = [1, 2, ...] CHUNKSIZE = 128 for i in range(0, len(items), CHUNKSIZE): process(items[start:start + CHUNKSIZE]) If the "list" is really huge (your system starts swapping memory) you can go completely lazy: from itertools import chain, islice def chunked(items, chunksize): items = iter(items) for first in items: chunk = chain((first,), islice(items, chunksize-1)) yield chunk for dummy in chunk: # consume items that may have been skipped # by your processing pass def produce_items(file): for line in file: yield int(line) CHUNKSIZE = 128 # this could also be "huge" # without affecting memory footprint with open("somefile") as file: for chunk in chunked(produce_items(file), CHUNKSIZE): process(chunk)