Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #101315 > unrolled thread
| Started by | high5storage@gmail.com |
|---|---|
| First post | 2016-01-06 18:36 -0800 |
| Last post | 2016-01-07 16:33 -0800 |
| Articles | 7 — 7 participants |
Back to article view | Back to comp.lang.python
Fast pythonic way to process a huge integer list high5storage@gmail.com - 2016-01-06 18:36 -0800
Re: Fast pythonic way to process a huge integer list Terry Reedy <tjreedy@udel.edu> - 2016-01-06 22:10 -0500
Re: Fast pythonic way to process a huge integer list Tim Chase <python.list@tim.thechases.com> - 2016-01-06 21:21 -0600
Re: Fast pythonic way to process a huge integer list Cameron Simpson <cs@zip.com.au> - 2016-01-07 14:31 +1100
Re: Fast pythonic way to process a huge integer list Steven D'Aprano <steve@pearwood.info> - 2016-01-07 20:25 +1100
Re: Fast pythonic way to process a huge integer list Peter Otten <__peter__@web.de> - 2016-01-07 11:21 +0100
Re: Fast pythonic way to process a huge integer list KP <kai.peters@gmail.com> - 2016-01-07 16:33 -0800
| From | high5storage@gmail.com |
|---|---|
| Date | 2016-01-06 18:36 -0800 |
| Subject | Fast pythonic way to process a huge integer list |
| Message-ID | <7e2b93e4-c224-40c4-8e88-7dcc847edab1@googlegroups.com> |
I have a list of 163.840 integers. What is a fast & pythonic way to process this list in 1,280 chunks of 128 integers?
[toc] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2016-01-06 22:10 -0500 |
| Message-ID | <mailman.35.1452136231.2305.python-list@python.org> |
| In reply to | #101315 |
On 1/6/2016 9:36 PM, high5storage@gmail.com wrote: > > I have a list of 163.840 integers. What is a fast & pythonic way to process this list in 1,280 chunks of 128 integers? What have you tried that did not work? This is really pretty simple, but the detail depend on the meaning of 'process a chunk'. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2016-01-06 21:21 -0600 |
| Message-ID | <mailman.36.1452137045.2305.python-list@python.org> |
| In reply to | #101315 |
On 2016-01-06 18:36, high5storage@gmail.com wrote:
> I have a list of 163.840 integers. What is a fast & pythonic way to
> process this list in 1,280 chunks of 128 integers?
That's a modest list, far from huge.
You have lots of options, but the following seems the most pythonic to
me:
# I don't know how you populate your data so
# create some junk data
from random import randint
data = [randint(0,1000) for _ in range(163840)]
import itertools as i
GROUP_SIZE = 128
def do_something(grp, vals):
for _, val in vals:
# I don't know what you want to do with each
# pair. You can print them:
# print("%s: %s" % (grp, val))
# or write them to various chunked files:
with open("chunk%04i.txt" % grp, "w") as f:
f.write(str(val))
f.write("\n")
# but here's the core logic:
def key_fn(x):
# x is a tuple of (index, value)
return x[0] // GROUP_SIZE
# actually iterate over the grouped data
# and do something with it:
for grp, vals in i.groupby(enumerate(data), key_fn):
do_something(grp, vals)
-tkc
[toc] | [prev] | [next] | [standalone]
| From | Cameron Simpson <cs@zip.com.au> |
|---|---|
| Date | 2016-01-07 14:31 +1100 |
| Message-ID | <mailman.37.1452137521.2305.python-list@python.org> |
| In reply to | #101315 |
On 06Jan2016 18:36, high5storage@gmail.com <high5storage@gmail.com> wrote:
>I have a list of 163.840 integers. What is a fast & pythonic way to process
>this list in 1,280 chunks of 128 integers?
The depends. When you say "list", is it already a _python_ list? Or do you just
mean that the intergers are in a file or something?
If they're already in a python list you can probably just use a range:
for offset in range(0, 163840, 128):
... do stuff with the elements starting at offset ...
Cheers,
Cameron Simpson <cs@zip.com.au>
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-01-07 20:25 +1100 |
| Message-ID | <568e2f1d$0$1602$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #101315 |
On Thu, 7 Jan 2016 01:36 pm, high5storage@gmail.com wrote:
>
> I have a list of 163.840 integers. What is a fast & pythonic way to
> process this list in 1,280 chunks of 128 integers?
py> from itertools import izip_longest
py> def grouper(iterable, n, fillvalue=None):
... "Collect data into fixed-length chunks or blocks"
... # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
... args = [iter(iterable)] * n
... return izip_longest(fillvalue=fillvalue, *args)
...
py> alist = range(163840)
py> count = 0
py> for block in grouper(alist, 128):
... assert len(list(block)) == 128
... count += 1
...
py> count
1280
This was almost instantaneous on my computer. 163840 isn't a very large
number of ints.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2016-01-07 11:21 +0100 |
| Message-ID | <mailman.40.1452162085.2305.python-list@python.org> |
| In reply to | #101315 |
high5storage@gmail.com wrote:
> I have a list of 163.840 integers. What is a fast & pythonic way to
> process this list in 1,280 chunks of 128 integers?
What kind of processing do you have in mind?
If it is about numbercrunching use a numpy.array. This can also easily
change its shape:
>>> import numpy
>>> a = numpy.array(range(12))
>>> a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>> a.shape = (3, 4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
If it's really only(!) under a million integers slicing is also good:
items = [1, 2, ...]
CHUNKSIZE = 128
for i in range(0, len(items), CHUNKSIZE):
process(items[start:start + CHUNKSIZE])
If the "list" is really huge (your system starts swapping memory) you can go
completely lazy:
from itertools import chain, islice
def chunked(items, chunksize):
items = iter(items)
for first in items:
chunk = chain((first,), islice(items, chunksize-1))
yield chunk
for dummy in chunk: # consume items that may have been skipped
# by your processing
pass
def produce_items(file):
for line in file:
yield int(line)
CHUNKSIZE = 128 # this could also be "huge"
# without affecting memory footprint
with open("somefile") as file:
for chunk in chunked(produce_items(file), CHUNKSIZE):
process(chunk)
[toc] | [prev] | [next] | [standalone]
| From | KP <kai.peters@gmail.com> |
|---|---|
| Date | 2016-01-07 16:33 -0800 |
| Message-ID | <17e513e2-fb15-47ba-8606-97b8a1b904ef@googlegroups.com> |
| In reply to | #101315 |
On Wednesday, 6 January 2016 18:37:22 UTC-8, high5s...@gmail.com wrote: > I have a list of 163.840 integers. What is a fast & pythonic way to process this list in 1,280 chunks of 128 integers? Thanks all for your valuable input - much appreciated!
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web