Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From: Tim Chase <python.list@tim.thechases.com>
Newsgroups: comp.lang.python
Subject: Re: Fast pythonic way to process a huge integer list
Date: Wed, 6 Jan 2016 21:21:41 -0600
Lines: 46
Message-ID: <mailman.36.1452137045.2305.python-list@python.org>
References: <7e2b93e4-c224-40c4-8e88-7dcc847edab1@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
In-Reply-To: <7e2b93e4-c224-40c4-8e88-7dcc847edab1@googlegroups.com>
Precedence: list
Xref: csiph.com comp.lang.python:101317

On 2016-01-06 18:36, high5storage@gmail.com wrote:
> I have a list of 163.840 integers. What is a fast & pythonic way to
> process this list in 1,280 chunks of 128 integers?

That's a modest list, far from huge.

You have lots of options, but the following seems the most pythonic to
me:

  # I don't know how you populate your data so
  # create some junk data
  from random import randint
  data = [randint(0,1000) for _ in range(163840)]

  import itertools as i
  GROUP_SIZE = 128

  def do_something(grp, vals):
    for _, val in vals:
      # I don't know what you want to do with each
      # pair.  You can print them:

      # print("%s: %s" % (grp, val))

      # or write them to various chunked files:
      with open("chunk%04i.txt" % grp, "w") as f:
        f.write(str(val))
        f.write("\n")

  # but here's the core logic:

  def key_fn(x):
    # x is a tuple of (index, value)
    return x[0] // GROUP_SIZE

  # actually iterate over the grouped data
  # and do something with it:
  for grp, vals in i.groupby(enumerate(data), key_fn):
    do_something(grp, vals)

-tkc