Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Tim Chase Newsgroups: comp.lang.python Subject: Re: Fast pythonic way to process a huge integer list Date: Wed, 6 Jan 2016 21:21:41 -0600 Lines: 46 Message-ID: References: <7e2b93e4-c224-40c4-8e88-7dcc847edab1@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Trace: news.uni-berlin.de maVxrsmhhMeIrHGFEXdq5QxSMoEEqC0wzSgh5b5epUMw== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'val': 0.07; '"w")': 0.09; '128': 0.09; 'chunks': 0.09; 'files:': 0.09; 'iterate': 0.09; 'subject:process': 0.09; 'tuple': 0.09; 'def': 0.13; '-tkc': 0.16; 'from:addr:python.list': 0.16; 'from:addr:tim.thechases.com': 0.16; 'from:name:tim chase': 0.16; 'grouped': 0.16; 'integers.': 0.16; 'itertools': 0.16; 'pair.': 0.16; 'pythonic': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'vals': 0.16; 'wrote:': 0.16; '%s"': 0.22; 'seems': 0.23; 'import': 0.24; 'header:In-Reply-To:1': 0.24; 'subject:list': 0.26; 'value)': 0.29; 'random': 0.29; 'print': 0.30; 'received:184': 0.30; 'skip:[ 10': 0.31; 'core': 0.32; 'list': 0.34; 'something': 0.35; 'but': 0.36; 'list,': 0.36; 'skip:i 20': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:10': 0.37; 'charset:us-ascii': 0.37; 'skip:o 20': 0.38; 'data': 0.39; 'to:addr:python.org': 0.40; 'some': 0.40; 'your': 0.60; 'email addr:gmail.com': 0.62; 'subject:Fast': 0.84 X-Sender-Id: wwwh|x-authuser|tim@thechases.com X-Sender-Id: wwwh|x-authuser|tim@thechases.com X-MC-Relay: Neutral X-MailChannels-SenderId: wwwh|x-authuser|tim@thechases.com X-MailChannels-Auth-Id: wwwh X-MC-Loop-Signature: 1452137033801:1051864338 X-MC-Ingress-Time: 1452137033801 In-Reply-To: <7e2b93e4-c224-40c4-8e88-7dcc847edab1@googlegroups.com> X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.25; x86_64-pc-linux-gnu) X-AuthUser: tim@thechases.com X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:101317 On 2016-01-06 18:36, high5storage@gmail.com wrote: > I have a list of 163.840 integers. What is a fast & pythonic way to > process this list in 1,280 chunks of 128 integers? That's a modest list, far from huge. You have lots of options, but the following seems the most pythonic to me: # I don't know how you populate your data so # create some junk data from random import randint data = [randint(0,1000) for _ in range(163840)] import itertools as i GROUP_SIZE = 128 def do_something(grp, vals): for _, val in vals: # I don't know what you want to do with each # pair. You can print them: # print("%s: %s" % (grp, val)) # or write them to various chunked files: with open("chunk%04i.txt" % grp, "w") as f: f.write(str(val)) f.write("\n") # but here's the core logic: def key_fn(x): # x is a tuple of (index, value) return x[0] // GROUP_SIZE # actually iterate over the grouped data # and do something with it: for grp, vals in i.groupby(enumerate(data), key_fn): do_something(grp, vals) -tkc