Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #101317

Re: Fast pythonic way to process a huge integer list

Path csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From Tim Chase <python.list@tim.thechases.com>
Newsgroups comp.lang.python
Subject Re: Fast pythonic way to process a huge integer list
Date Wed, 6 Jan 2016 21:21:41 -0600
Lines 46
Message-ID <mailman.36.1452137045.2305.python-list@python.org> (permalink)
References <7e2b93e4-c224-40c4-8e88-7dcc847edab1@googlegroups.com>
Mime-Version 1.0
Content-Type text/plain; charset=US-ASCII
Content-Transfer-Encoding 7bit
X-Trace news.uni-berlin.de maVxrsmhhMeIrHGFEXdq5QxSMoEEqC0wzSgh5b5epUMw==
Return-Path <python.list@tim.thechases.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'val': 0.07; '"w")': 0.09; '128': 0.09; 'chunks': 0.09; 'files:': 0.09; 'iterate': 0.09; 'subject:process': 0.09; 'tuple': 0.09; 'def': 0.13; '-tkc': 0.16; 'from:addr:python.list': 0.16; 'from:addr:tim.thechases.com': 0.16; 'from:name:tim chase': 0.16; 'grouped': 0.16; 'integers.': 0.16; 'itertools': 0.16; 'pair.': 0.16; 'pythonic': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'vals': 0.16; 'wrote:': 0.16; '%s"': 0.22; 'seems': 0.23; 'import': 0.24; 'header:In-Reply-To:1': 0.24; 'subject:list': 0.26; 'value)': 0.29; 'random': 0.29; 'print': 0.30; 'received:184': 0.30; 'skip:[ 10': 0.31; 'core': 0.32; 'list': 0.34; 'something': 0.35; 'but': 0.36; 'list,': 0.36; 'skip:i 20': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:10': 0.37; 'charset:us-ascii': 0.37; 'skip:o 20': 0.38; 'data': 0.39; 'to:addr:python.org': 0.40; 'some': 0.40; 'your': 0.60; 'email addr:gmail.com': 0.62; 'subject:Fast': 0.84
X-Sender-Id wwwh|x-authuser|tim@thechases.com
X-Sender-Id wwwh|x-authuser|tim@thechases.com
X-MC-Relay Neutral
X-MailChannels-SenderId wwwh|x-authuser|tim@thechases.com
X-MailChannels-Auth-Id wwwh
X-MC-Loop-Signature 1452137033801:1051864338
X-MC-Ingress-Time 1452137033801
In-Reply-To <7e2b93e4-c224-40c4-8e88-7dcc847edab1@googlegroups.com>
X-Mailer Claws Mail 3.11.1 (GTK+ 2.24.25; x86_64-pc-linux-gnu)
X-AuthUser tim@thechases.com
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.20+
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Xref csiph.com comp.lang.python:101317

Show key headers only | View raw


On 2016-01-06 18:36, high5storage@gmail.com wrote:
> I have a list of 163.840 integers. What is a fast & pythonic way to
> process this list in 1,280 chunks of 128 integers?

That's a modest list, far from huge.

You have lots of options, but the following seems the most pythonic to
me:

  # I don't know how you populate your data so
  # create some junk data
  from random import randint
  data = [randint(0,1000) for _ in range(163840)]

  import itertools as i
  GROUP_SIZE = 128

  def do_something(grp, vals):
    for _, val in vals:
      # I don't know what you want to do with each
      # pair.  You can print them:

      # print("%s: %s" % (grp, val))

      # or write them to various chunked files:
      with open("chunk%04i.txt" % grp, "w") as f:
        f.write(str(val))
        f.write("\n")

  # but here's the core logic:

  def key_fn(x):
    # x is a tuple of (index, value)
    return x[0] // GROUP_SIZE

  # actually iterate over the grouped data
  # and do something with it:
  for grp, vals in i.groupby(enumerate(data), key_fn):
    do_something(grp, vals)

-tkc




Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Fast pythonic way to process a huge integer list high5storage@gmail.com - 2016-01-06 18:36 -0800
  Re: Fast pythonic way to process a huge integer list Terry Reedy <tjreedy@udel.edu> - 2016-01-06 22:10 -0500
  Re: Fast pythonic way to process a huge integer list Tim Chase <python.list@tim.thechases.com> - 2016-01-06 21:21 -0600
  Re: Fast pythonic way to process a huge integer list Cameron Simpson <cs@zip.com.au> - 2016-01-07 14:31 +1100
  Re: Fast pythonic way to process a huge integer list Steven D'Aprano <steve@pearwood.info> - 2016-01-07 20:25 +1100
  Re: Fast pythonic way to process a huge integer list Peter Otten <__peter__@web.de> - 2016-01-07 11:21 +0100
  Re: Fast pythonic way to process a huge integer list KP <kai.peters@gmail.com> - 2016-01-07 16:33 -0800

csiph-web