Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #43453 > unrolled thread

CSV to matrix array

Started byAna Dionísio <anadionisio257@gmail.com>
First post2013-04-12 07:22 -0700
Last post2013-04-13 20:01 +0100
Articles 10 — 6 participants

Back to article view | Back to comp.lang.python


Contents

  CSV to matrix array Ana Dionísio <anadionisio257@gmail.com> - 2013-04-12 07:22 -0700
    Re: CSV to matrix array Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-04-12 16:25 +0100
      Re: CSV to matrix array Ana Dionísio <anadionisio257@gmail.com> - 2013-04-12 10:12 -0700
        Re: CSV to matrix array rusi <rustompmody@gmail.com> - 2013-04-12 10:34 -0700
      Re: CSV to matrix array Ana Dionísio <anadionisio257@gmail.com> - 2013-04-12 10:12 -0700
    Re: CSV to matrix array Miki Tebeka <miki.tebeka@gmail.com> - 2013-04-12 16:10 -0700
    Re: CSV to matrix array giacomo boffi <pecore@pascolo.net> - 2013-04-13 14:06 +0200
      Re: CSV to matrix array Ana Dionísio <anadionisio257@gmail.com> - 2013-04-13 08:30 -0700
        Re: CSV to matrix array Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-04-13 17:52 +0100
        Re: CSV to matrix array Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-13 20:01 +0100

#43453 — CSV to matrix array

FromAna Dionísio <anadionisio257@gmail.com>
Date2013-04-12 07:22 -0700
SubjectCSV to matrix array
Message-ID<2506a155-40b6-4040-bc12-a08ce0d79cd7@googlegroups.com>
Hello!

I have a CSV file with 20 rows and 12 columns and I need to store it as a matrix. I already created an array with zeros, but I don't know how to fill it with the data from the csv file. I have this script:

import numpy
from numpy import array
from array import *
import csv

input = open('Cenarios.csv','r')
cenario = csv.reader(input)

array=numpy.zeros([20, 12])


I know I have to use for loops but I don't know how to use it to put the data the way I want. Can you help me?

Thanks!

[toc] | [next] | [standalone]


#43463

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2013-04-12 16:25 +0100
Message-ID<mailman.526.1365780304.3114.python-list@python.org>
In reply to#43453
On 12/04/2013 15:22, Ana Dionísio wrote:
> Hello!
>
> I have a CSV file with 20 rows and 12 columns and I need to store it as a matrix. I already created an array with zeros, but I don't know how to fill it with the data from the csv file. I have this script:
>
> import numpy
> from numpy import array
> from array import *
> import csv
>
> input = open('Cenarios.csv','r')
> cenario = csv.reader(input)
>
> array=numpy.zeros([20, 12])
>
>
> I know I have to use for loops but I don't know how to use it to put the data the way I want. Can you help me?
>
> Thanks!
>

I'm no expert on numpy but there is a loadtxt function see 
http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html

-- 
If you're using GoogleCrap™ please read this 
http://wiki.python.org/moin/GoogleGroupsPython.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#43470

FromAna Dionísio <anadionisio257@gmail.com>
Date2013-04-12 10:12 -0700
Message-ID<45bfb6db-fb1a-478d-8c97-7ddd44167076@googlegroups.com>
In reply to#43463
Hi, thanks for yor answer! ;)

Anyone has more suggestions?

[toc] | [prev] | [next] | [standalone]


#43475

Fromrusi <rustompmody@gmail.com>
Date2013-04-12 10:34 -0700
Message-ID<c9f729f1-3a4f-4b84-8575-0487f0f01041@pa2g2000pbb.googlegroups.com>
In reply to#43470
On Apr 12, 10:12 pm, Ana Dionísio <anadionisio...@gmail.com> wrote:
> Hi, thanks for yor answer! ;)
>
> Anyone has more suggestions?

My suggestions:

1. Tell us what was lacking in Mark's suggestion (to use loadtxt)
2. Read his postscript (for googlegroup posters).
[In case you did not notice your posts are arriving in doubles]

[toc] | [prev] | [next] | [standalone]


#43471

FromAna Dionísio <anadionisio257@gmail.com>
Date2013-04-12 10:12 -0700
Message-ID<mailman.529.1365786772.3114.python-list@python.org>
In reply to#43463
Hi, thanks for yor answer! ;)

Anyone has more suggestions?

[toc] | [prev] | [next] | [standalone]


#43491

FromMiki Tebeka <miki.tebeka@gmail.com>
Date2013-04-12 16:10 -0700
Message-ID<e41f5548-1832-4498-b30a-ea7e95436be2@googlegroups.com>
In reply to#43453
> I have a CSV file with 20 rows and 12 columns and I need to store it as a matrix.
If you can use pandas, the pandas.read_csv is what you want.

[toc] | [prev] | [next] | [standalone]


#43510

Fromgiacomo boffi <pecore@pascolo.net>
Date2013-04-13 14:06 +0200
Message-ID<87li8m7hyk.fsf@pascolo.net>
In reply to#43453
Ana Dionísio <anadionisio257@gmail.com> writes:

> Hello!
>
> I have a CSV file with 20 rows and 12 columns and I need to store it
> as a matrix.

array=numpy.array([row for row in csv.reader(open('Cenarios.csv'))])

NB: i used "array=" as in your sample code, BUT

[toc] | [prev] | [next] | [standalone]


#43518

FromAna Dionísio <anadionisio257@gmail.com>
Date2013-04-13 08:30 -0700
Message-ID<fe492580-d59d-4862-a586-407ad9759174@googlegroups.com>
In reply to#43510
It's still not working. I still have one column with all the data inside, like this:

2999;T3;3;1;1;Off;ON;OFF;ON;ON;ON;ON;Night;;;;;;

How can I split this data in a way that if I want to print "T3" I would just do "print array[0][1]"?  

[toc] | [prev] | [next] | [standalone]


#43524

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2013-04-13 17:52 +0100
Message-ID<mailman.557.1365871903.3114.python-list@python.org>
In reply to#43518
On 13/04/2013 16:30, Ana Dionísio wrote:
> It's still not working. I still have one column with all the data inside, like this:
>
> 2999;T3;3;1;1;Off;ON;OFF;ON;ON;ON;ON;Night;;;;;;
>
> How can I split this data in a way that if I want to print "T3" I would just do "print array[0][1]"?
>

I said before I'm no expert on numpy but my understanding is that all 
arrays are homogeneous, hence you can't load the data you show above 
without some form of mapping.  In that case you'd have to read the data 
with the csv module as others have already suggested, apply your mapping 
and then write this to your array.  The obvious alternative is to use a 
list of lists.

-- 
If you're using GoogleCrap™ please read this 
http://wiki.python.org/moin/GoogleGroupsPython.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#43538

FromOscar Benjamin <oscar.j.benjamin@gmail.com>
Date2013-04-13 20:01 +0100
Message-ID<mailman.571.1365879717.3114.python-list@python.org>
In reply to#43518
On 13 April 2013 16:30, Ana Dionísio <anadionisio257@gmail.com> wrote:
> It's still not working. I still have one column with all the data inside, like this:
>
> 2999;T3;3;1;1;Off;ON;OFF;ON;ON;ON;ON;Night;;;;;;
>
> How can I split this data in a way that if I want to print "T3" I would just do "print array[0][1]"?

You initially reported that your data was a CSV file, which normally
means that the values in each row are separated by comma characters
e.g. ','. Actually the data here are separated by semicolons e.g. ';'.
This means that whether you use numpy or the csv module you will need
to specify that the data is separated by semicolons. In numpy you
would do this with

import numpy
data = numpy.loadtxt('file.csv', dtype=int, delimiter=';')

You need to set dtype to be whatever data type you want to convert the
values to e.g. int or float. This is because numpy arrays are
homogeneous. In your case the data (presumably a channel/event header
from an EEG file) is not homogeneous as you have integer data '2999'
followed by the channel name 'T3' which is a string. You can load all
values as strings with

data = numpy.loadtxt('file.csv', dtype=str, delimiter=';')

It is possible to have heterogeneous types in a numpy array using
dtype=object but if you use that with the loadtxt function it will
just use strings for all values.

Alternatively you can use the csv module in the standard library to
load all the data as strings

import csv
with open('file.csv', 'rb') as csvfile:
    data = list(csv.reader(csvfile, delimiter=';'))

This will give you a list of lists of strings rather than a numpy
array. Afterwards you can convert the integer values to int if you
want like so:

for row in data:
    row[0] = int(row[0])

This works because lists can store heterogeneous data, unlike numpy arrays.

Either of the above will let you access the data with e.g. data[2][7]
to get the value from the 8th column of the 3rd row. However, I think
that the better thing to do though would be to use a csv.DictReader
and store your data as a list of dicts. This would look like:

# Change this to names that actually describe each column of your data
columns = ['sample_rate', 'channel_name', 'electrode_number',
'lowpass_filter',...]

data = []
with open('file.csv') as csvfile:
    for row in csv.DictReader(csvfile, fieldnames=columns, delimiter=';'):
        # Convert non-string data here e.g.:
        row['sample_rate'] = int(row['sample_rate'])
        data.append(row)

Now you can access the data using e.g. data[0]['channel_name'] which I
think is better than data[0][1] and you can store data of
heterogeneous type e.g. int, str, etc. in the same row.


Oscar

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web