Groups > comp.lang.python > #17238 > unrolled thread

file data => array(s)

Started by	Eric <einazaki668@yahoo.com>
First post	2011-12-14 14:20 -0800
Last post	2011-12-15 10:51 -0800
Articles	7 — 4 participants

Back to article view | Back to comp.lang.python

  file data => array(s) Eric <einazaki668@yahoo.com> - 2011-12-14 14:20 -0800
    Re: file data => array(s) Dave Angel <d@davea.name> - 2011-12-14 17:59 -0500
      Re: file data => array(s) Eric <einazaki668@yahoo.com> - 2011-12-14 15:25 -0800
      Re: file data => array(s) Eric <einazaki668@yahoo.com> - 2011-12-15 10:37 -0800
        Re: file data => array(s) MRAB <python@mrabarnett.plus.com> - 2011-12-15 19:34 +0000
    Re: file data => array(s) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-12-14 23:27 +0000
      Re: file data => array(s) Eric <einazaki668@yahoo.com> - 2011-12-15 10:51 -0800

#17238 — file data => array(s)

From	Eric <einazaki668@yahoo.com>
Date	2011-12-14 14:20 -0800
Subject	file data => array(s)
Message-ID	<81b5d566-3a82-4a8e-ac8a-98b8dd92f6bc@4g2000yqu.googlegroups.com>

I'm trying to read some file data into a set of arrays.  The file data
is just four columns of numbers, like so:

   1.2    2.2   3.3  0.5
   0.1   0.2    1.0  10.1
   ... and so on

I'd like to read this into four arrays, one array for each column.
Alternatively, I guess something like this is okay too:

   [[1.2, 2.2, 3.3, 0.5], [0.1, 0.2, 1.0, 10.1], ... and so on]

I came up with the following for the four array option:

   file = open(fileName, 'r')
   for line in file.readlines():
      d1, e1, d2, e2 = map(float, line.split())
      data1.append(d1)  # where data1, err1, data2, err2 are init-ed
as empty lists
      err1.append(e1)
      data2.append(d2)
      err2.append(e2)
   file.close()

But somehow it doesn't seem very python-esque (I'm thinking there's a
more elegant and succinct way to do it in python).  I've also tried
replacing the above "map" line with:

      d = d + map(float, line.split())  # where d is initialized as d
= []

But all I get is one long flat list, not what I want.

So is the map and append method the best I can do or is there a
slicker way?

One more thing, no numpy.  Nothing against numpy but I'm curious to
see what can be done with just the box stock python install.

TIA,
eric

[toc] | [next] | [standalone]

#17239

From	Dave Angel <d@davea.name>
Date	2011-12-14 17:59 -0500
Message-ID	<mailman.3658.1323903585.27778.python-list@python.org>
In reply to	#17238

On 12/14/2011 05:20 PM, Eric wrote:
> I'm trying to read some file data into a set of arrays.  The file data
> is just four columns of numbers, like so:
>
>     1.2    2.2   3.3  0.5
>     0.1   0.2    1.0  10.1
>     ... and so on
>
> I'd like to read this into four arrays, one array for each column.
> Alternatively, I guess something like this is okay too:
>
>     [[1.2, 2.2, 3.3, 0.5], [0.1, 0.2, 1.0, 10.1], ... and so on]
>
> I came up with the following for the four array option:
>
>     file = open(fileName, 'r')
>     for line in file.readlines():
The readlines() call is a waste of time/space.  file is already an 
iterator that'll return lines for you.
>        d1, e1, d2, e2 = map(float, line.split())
>        data1.append(d1)  # where data1, err1, data2, err2 are init-ed
> as empty lists
>        err1.append(e1)
>        data2.append(d2)
>        err2.append(e2)
>     file.close()
>
> But somehow it doesn't seem very python-esque (I'm thinking there's a
> more elegant and succinct way to do it in python).  I've also tried
> replacing the above "map" line with:
>
>        d = d + map(float, line.split())  # where d is initialized as d
> = []
>
> But all I get is one long flat list, not what I want.
>
> So is the map and append method the best I can do or is there a
> slicker way?
>
> One more thing, no numpy.  Nothing against numpy but I'm curious to
> see what can be done with just the box stock python install.
>
> TIA,
> eric
When I see a problem like this, I turn to zip().  It's got some powerful 
uses when rows and columns need inverting.

I didn't try it on an actual file, but the following works:
linedata =    [[1.2, 2.2, 3.3, 0.5], [0.1, 0.2, 1.0, 10.1] ]

data, err1, data2, err2 = zip(*linedata)

print data
print err1
print data2
print err2

So you could try (untested)

file = open(filename, "r")
linedata = [ map(float, line) for line in file]
data, err1, data2, err2 = zip(*linedata)
file.close()

Note that your code won't work (and mine probably won't either) if one 
of the lines has 3 or 5 items.  Or if one of the numbers isn't legal 
format for a float.  So you need to think about error checking, or 
decide whether a partial result is important.

-- 

DaveA

[toc] | [prev] | [next] | [standalone]

#17242

From	Eric <einazaki668@yahoo.com>
Date	2011-12-14 15:25 -0800
Message-ID	<d123c0c9-9986-4724-9dfd-f1c4889a10af@b32g2000yqn.googlegroups.com>
In reply to	#17239

On Dec 14, 4:59 pm, Dave Angel <d...@davea.name> wrote:

> Note that your code won't work (and mine probably won't either) if one
> of the lines has 3 or 5 items.  Or if one of the numbers isn't legal
> format for a float.  So you need to think about error checking, or
> decide whether a partial result is important.
>
> --
>
> DaveA

Haven't tried your suggestion yet, I just wanted to comment on this
last part real quick.  I have the same concern, my plan is to wrap all
that stuff up in a "try:" construct.  In such cases the program only
has to kick out a simple, sensible (for non-programmers) error message
and quit.

BTW, I didn't say it originally, but this is for 2.7 and hopefully
it'll be easy to carry over to 3.2.

Thanks,
eric

[toc] | [prev] | [next] | [standalone]

#17308

From	Eric <einazaki668@yahoo.com>
Date	2011-12-15 10:37 -0800
Message-ID	<388939ad-b465-4dd0-8eb8-51395ee643a7@o7g2000yqk.googlegroups.com>
In reply to	#17239

On Dec 14, 4:59 pm, Dave Angel <d...@davea.name> wrote:
> On 12/14/2011 05:20 PM, Eric wrote:
>
>
>
>
>
>
>
> > I'm trying to read some file data into a set of arrays.  The file data
> > is just four columns of numbers, like so:
>
> >     1.2    2.2   3.3  0.5
> >     0.1   0.2    1.0  10.1
> >     ... and so on
>
> > I'd like to read this into four arrays, one array for each column.
> > Alternatively, I guess something like this is okay too:
>
> >     [[1.2, 2.2, 3.3, 0.5], [0.1, 0.2, 1.0, 10.1], ... and so on]
>
> > I came up with the following for the four array option:
>
> >     file = open(fileName, 'r')
> >     for line in file.readlines():
>
> The readlines() call is a waste of time/space.  file is already an
> iterator that'll return lines for you.
>
>
>
>
>
>
>
> >        d1, e1, d2, e2 = map(float, line.split())
> >        data1.append(d1)  # where data1, err1, data2, err2 are init-ed
> > as empty lists
> >        err1.append(e1)
> >        data2.append(d2)
> >        err2.append(e2)
> >     file.close()
>
> > But somehow it doesn't seem very python-esque (I'm thinking there's a
> > more elegant and succinct way to do it in python).  I've also tried
> > replacing the above "map" line with:
>
> >        d = d + map(float, line.split())  # where d is initialized as d
> > = []
>
> > But all I get is one long flat list, not what I want.
>
> > So is the map and append method the best I can do or is there a
> > slicker way?
>
> > One more thing, no numpy.  Nothing against numpy but I'm curious to
> > see what can be done with just the box stock python install.
>
> > TIA,
> > eric
>
> When I see a problem like this, I turn to zip().  It's got some powerful
> uses when rows and columns need inverting.
>
> I didn't try it on an actual file, but the following works:
> linedata =    [[1.2, 2.2, 3.3, 0.5], [0.1, 0.2, 1.0, 10.1] ]
>
> data, err1, data2, err2 = zip(*linedata)
>
> print data
> print err1
> print data2
> print err2
>
> So you could try (untested)
>
> file = open(filename, "r")
> linedata = [ map(float, line) for line in file]
> data, err1, data2, err2 = zip(*linedata)
> file.close()
>
> DaveA

Neat.  This is what I had in mind for a python-esque solution.  Only
thing is "map(float,line)" should be "map(float,line.split()).  Looks
like it should be easy enough to weed out any funky data sets because
between map() and zip() it's fairly picky about the amount and type of
data.

Finally, the input files I'll be using for real aren't just four
columns of data.  The beginning of the file may have comments
(optional) and will have two lines of text to identify the data.
Maybe I can still do it w/o readlines.

Thanks,
eric

[toc] | [prev] | [next] | [standalone]

#17310

From	MRAB <python@mrabarnett.plus.com>
Date	2011-12-15 19:34 +0000
Message-ID	<mailman.3695.1323977619.27778.python-list@python.org>
In reply to	#17308

On 15/12/2011 18:37, Eric wrote:
[snip]
>
> Neat.  This is what I had in mind for a python-esque solution.
[snip]
FYI, the word is "Pythonic" when talking about the programming
language. The word "Pythonesque" refers to Monty Python.

[toc] | [prev] | [next] | [standalone]

#17243

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2011-12-14 23:27 +0000
Message-ID	<4ee930fd$0$29979$c3e8da3$5496439d@news.astraweb.com>
In reply to	#17238

On Wed, 14 Dec 2011 14:20:40 -0800, Eric wrote:

> I'm trying to read some file data into a set of arrays.  The file data
> is just four columns of numbers, like so:
> 
>    1.2    2.2   3.3  0.5
>    0.1   0.2    1.0  10.1
>    ... and so on
> 
> I'd like to read this into four arrays, one array for each column.
> Alternatively, I guess something like this is okay too:
> 
>    [[1.2, 2.2, 3.3, 0.5], [0.1, 0.2, 1.0, 10.1], ... and so on]

First thing: due to the fundamental nature of binary floating point 
numbers, if you convert text like "0.1" to a float, you don't get 0.1, 
you get 0.10000000000000001. That is because 0.1000...01 is the closest 
possible combination of fractions of 1/2, 1/4, 1/8, ... that adds up to 
1/10.

If this fact disturbs you, you can import the decimal module and use 
decimal.Decimal instead; otherwise forget I said anything and continue 
using float. I will assume you're happy with floats.

Assuming the file is small, say, less than 50MB, I'd do it like this:

# Version for Python 2.x
f = open(filename, 'r')
text = f.read()  # Grab the whole file at once.
numbers = map(float, text.split())
f.close()

That gives you a single list [1.2, 2.2, 3.3, 0.5, 0.1, 0.2, ...] which 
you can now split into groups of four. There are lots of ways to do this. 
Here's an inefficient way which hopefully will be simple to understand:

result = []
while numbers != []:
    result.append(numbers[0:4])
    del numbers[0:4]

Here is a much more efficient method which is only a tad harder to 
understand:

result = []
for start in range(0, len(numbers), 4):
    result.append(numbers[start:start+4])

And just for completeness, here is an advanced technique using itertools:

n = len(numbers)//4
numbers = iter(numbers)
from itertools import islice
result = [list(islice(numbers, 4)) for i in range(n)]

Be warned that this version throws away any partial group left over at 
the end; if you don't want that, change the line defining n to this 
instead:

n = len(numbers)//4 + (len(numbers)%4 > 0)

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#17304

From	Eric <einazaki668@yahoo.com>
Date	2011-12-15 10:51 -0800
Message-ID	<1744e277-69ba-4648-a86f-023af83ca939@u32g2000yqe.googlegroups.com>
In reply to	#17243

On Dec 14, 5:27 pm, Steven D'Aprano <steve
+comp.lang.pyt...@pearwood.info> wrote:
> On Wed, 14 Dec 2011 14:20:40 -0800, Eric wrote:
> > I'm trying to read some file data into a set of arrays.  The file data
> > is just four columns of numbers, like so:
>
> >    1.2    2.2   3.3  0.5
> >    0.1   0.2    1.0  10.1
> >    ... and so on
>
> > I'd like to read this into four arrays, one array for each column.
> > Alternatively, I guess something like this is okay too:
>
> >    [[1.2, 2.2, 3.3, 0.5], [0.1, 0.2, 1.0, 10.1], ... and so on]
>
> First thing: due to the fundamental nature of binary floating point
> numbers, if you convert text like "0.1" to a float, you don't get 0.1,
> you get 0.10000000000000001. That is because 0.1000...01 is the closest
> possible combination of fractions of 1/2, 1/4, 1/8, ... that adds up to
> 1/10.
>
> If this fact disturbs you, you can import the decimal module and use
> decimal.Decimal instead; otherwise forget I said anything and continue
> using float. I will assume you're happy with floats.
>

Yeah, I don't think it'll be a problem.  As I understand it a float in
python is a double in C and all our old C programs used doubles.  From
PDP-11 to MIPS3k to P2 I've seen what I think may have been rounding
or precision errors but I haven't heard any complaints because of
them.

Thanks,
eric

[toc] | [prev] | [standalone]

csiph-web

file data => array(s)

Contents

#17238 — file data => array(s)

#17239

#17242

#17308

#17310

#17243

#17304