Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #36249 > unrolled thread
| Started by | RueTheDay <nospam@nospam.com> |
|---|---|
| First post | 2013-01-06 07:57 -0600 |
| Last post | 2013-01-06 12:32 -0500 |
| Articles | 6 — 3 participants |
Back to article view | Back to comp.lang.python
Newbie problem with Python pandas RueTheDay <nospam@nospam.com> - 2013-01-06 07:57 -0600
Re: Newbie problem with Python pandas Miki Tebeka <miki.tebeka@gmail.com> - 2013-01-06 08:05 -0800
Re: Newbie problem with Python pandas RueTheDay <nospam@nospam.com> - 2013-01-06 10:36 -0600
Re: Newbie problem with Python pandas Roy Smith <roy@panix.com> - 2013-01-06 11:45 -0500
Re: Newbie problem with Python pandas RueTheDay <nospam@nospam.com> - 2013-01-06 11:15 -0600
Re: Newbie problem with Python pandas Roy Smith <roy@panix.com> - 2013-01-06 12:32 -0500
| From | RueTheDay <nospam@nospam.com> |
|---|---|
| Date | 2013-01-06 07:57 -0600 |
| Subject | Newbie problem with Python pandas |
| Message-ID | <8qSdnfYwI8QgH3TNnZ2dnUVZ_uadnZ2d@giganews.com> |
I'm working my way through the examples in the O'Reilly book Python For
Data Analysis and have encountered a snag.
The following code is supposed to analyze some web server log data and
produces aggregate counts by client operating system.
###################
import json # used to process json records
from pandas import DataFrame, Series
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
path = '/home/rich/code/sample.txt'
records = [json.loads(line) for line in open(path)] #read in records one
line at a time
frame = DataFrame(records)
cframe = frame[frame.a.notnull()]
operating_system = np.where(cframe['a'].str.contains
('Windows'),'Windows', 'Not Windows')
by_tz_os = cframe.groupby(['tz', operating_system])
agg_counts = by_tz_os.size().unstack().fillna(0)
indexer = agg_counts.sum(1).argsort()
count_subset = agg_counts.take(indexer)[-10:]
print count_subset
####################
I am getting the following error when running on Python 2.7 on Ubuntu
12.04:
>>>>>>
Traceback (most recent call last):
File "./lp1.py", line 12, in <module>
operating_system = np.where(cframe['a'].str.contains
('Windows'),'Windows', 'Not Windows')
AttributeError: 'Series' object has no attribute 'str'
>>>>>>>
Note that I was able to get the code to work fine on Windows 7, so this
appears to be specific to Linux.
A little Googling showed others have encountered this problem and
suggested replacing the np.where with a find, as so:
########
operating_system = ['Windows' if a.find('Windows') > 0 else 'Not Windows'
for a in cframe['a']]
########
This appears to solve the first problem, but then it fails on the next
line with:
>>>>>>>>
Traceback (most recent call last):
File "./lp1.py", line 14, in <module>
by_tz_os = cframe.groupby(['tz', operating_system])
File "/usr/lib/pymodules/python2.7/pandas/core/generic.py", line 133,
in groupby
sort=sort)
File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 522,
in groupby
return klass(obj, by, **kwds)
File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 115,
in __init__
level=level, sort=sort)
File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 705,
in _get_groupings
ping = Grouping(group_axis, gpr, name=name, level=level, sort=sort)
File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 600,
in __init__
self.grouper = self.index.map(self.grouper)
File "/usr/lib/pymodules/python2.7/pandas/core/index.py", line 591, in
map
return self._arrmap(self.values, mapper)
File "generated.pyx", line 1141, in pandas._tseries.arrmap_int64
(pandas/src/tseries.c:40593)
TypeError: 'list' object is not callable
>>>>>>>>>
The problem looks to be with the pandas module and appears to be Linux-
specific.
Any ideas? I'm pulling my hair out over this.
[toc] | [next] | [standalone]
| From | Miki Tebeka <miki.tebeka@gmail.com> |
|---|---|
| Date | 2013-01-06 08:05 -0800 |
| Message-ID | <733a78da-9393-4712-8255-a9942a5052b3@googlegroups.com> |
| In reply to | #36249 |
On Sunday, January 6, 2013 5:57:17 AM UTC-8, RueTheDay wrote: > I am getting the following error when running on Python 2.7 on Ubuntu > 12.04: > >>>>>> > > AttributeError: 'Series' object has no attribute 'str' I would *guess* that you have an older version of pandas on your Linux machine. Try "print(pd.__version__)" to see which version you have. Also, trying asking over at https://groups.google.com/forum/?fromgroups=#!forum/pydata which is more dedicated to pandas.
[toc] | [prev] | [next] | [standalone]
| From | RueTheDay <nospam@nospam.com> |
|---|---|
| Date | 2013-01-06 10:36 -0600 |
| Message-ID | <_dudnTTyxduONXTNnZ2dnUVZ_oCdnZ2d@giganews.com> |
| In reply to | #36264 |
On Sun, 06 Jan 2013 08:05:59 -0800, Miki Tebeka wrote: > On Sunday, January 6, 2013 5:57:17 AM UTC-8, RueTheDay wrote: >> I am getting the following error when running on Python 2.7 on Ubuntu >> 12.04: >> >>>>>> >> >>>>>> >> AttributeError: 'Series' object has no attribute 'str' > I would *guess* that you have an older version of pandas on your Linux > machine. > Try "print(pd.__version__)" to see which version you have. > > Also, trying asking over at > https://groups.google.com/forum/?fromgroups=#!forum/pydata which is more > dedicated to pandas. Thank you! That was it. I had 0.7 installed (the latest in the Ubuntu repository). I downloaded and manually installed 0.10 and now it's working. Coincidentally, this also fixed a problem I was having with running a matplotlib plot function against a pandas Data Frame (worked with some chart types but not others). I'm starting to understand why people rely on easy_install and pip. Thanks again.
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2013-01-06 11:45 -0500 |
| Message-ID | <roy-72AA86.11453406012013@news.panix.com> |
| In reply to | #36266 |
In article <_dudnTTyxduONXTNnZ2dnUVZ_oCdnZ2d@giganews.com>, RueTheDay <nospam@nospam.com> wrote: > On Sun, 06 Jan 2013 08:05:59 -0800, Miki Tebeka wrote: > > > On Sunday, January 6, 2013 5:57:17 AM UTC-8, RueTheDay wrote: > >> I am getting the following error when running on Python 2.7 on Ubuntu > >> 12.04: > >> >>>>>> > >> >>>>>> > >> AttributeError: 'Series' object has no attribute 'str' > > I would *guess* that you have an older version of pandas on your Linux > > machine. > > Try "print(pd.__version__)" to see which version you have. > > > > Also, trying asking over at > > https://groups.google.com/forum/?fromgroups=#!forum/pydata which is more > > dedicated to pandas. > > Thank you! That was it. I had 0.7 installed (the latest in the Ubuntu > repository). I downloaded and manually installed 0.10 and now it's > working. Coincidentally, this also fixed a problem I was having with > running a matplotlib plot function against a pandas Data Frame (worked > with some chart types but not others). > > I'm starting to understand why people rely on easy_install and pip. > Thanks again. Yeah, Ubuntu is a bit of a mess when it comes to pandas and the things it depends on. Apt gets you numpy 1.4.1, which is really old. Pandas won't even install on top of it. I've got pandas (and numpy, and scipy, and matplotlib) running on a Ubuntu 12.04 box. I installed everything with pip. My problem at this point, however, is I want to replicate that setup in EMR (Amazon's Elastic Map-Reduce). In theory, I could just run "pip install numpy" in my mrjob.conf bootstrap, but it's a really long install process, building a lot of stuff from source. Not the kind of thing you want to put in a bootstrap for an ephemeral instance. Does anybody know where I can find a debian package for numpy 1.6?
[toc] | [prev] | [next] | [standalone]
| From | RueTheDay <nospam@nospam.com> |
|---|---|
| Date | 2013-01-06 11:15 -0600 |
| Message-ID | <_dudnTfyxdvcLHTNnZ2dnUVZ_oCdnZ2d@giganews.com> |
| In reply to | #36268 |
On Sun, 06 Jan 2013 11:45:34 -0500, Roy Smith wrote: > In article <_dudnTTyxduONXTNnZ2dnUVZ_oCdnZ2d@giganews.com>, > RueTheDay <nospam@nospam.com> wrote: > >> On Sun, 06 Jan 2013 08:05:59 -0800, Miki Tebeka wrote: >> >> > On Sunday, January 6, 2013 5:57:17 AM UTC-8, RueTheDay wrote: >> >> I am getting the following error when running on Python 2.7 on >> >> Ubuntu 12.04: >> >> >>>>>> >> >> >>>>>> >> >> AttributeError: 'Series' object has no attribute 'str' >> > I would *guess* that you have an older version of pandas on your >> > Linux machine. >> > Try "print(pd.__version__)" to see which version you have. >> > >> > Also, trying asking over at >> > https://groups.google.com/forum/?fromgroups=#!forum/pydata which is >> > more dedicated to pandas. >> >> Thank you! That was it. I had 0.7 installed (the latest in the Ubuntu >> repository). I downloaded and manually installed 0.10 and now it's >> working. Coincidentally, this also fixed a problem I was having with >> running a matplotlib plot function against a pandas Data Frame (worked >> with some chart types but not others). >> >> I'm starting to understand why people rely on easy_install and pip. >> Thanks again. > > Yeah, Ubuntu is a bit of a mess when it comes to pandas and the things > it depends on. Apt gets you numpy 1.4.1, which is really old. Pandas > won't even install on top of it. > > I've got pandas (and numpy, and scipy, and matplotlib) running on a > Ubuntu 12.04 box. I installed everything with pip. My problem at this > point, however, is I want to replicate that setup in EMR (Amazon's > Elastic Map-Reduce). In theory, I could just run "pip install numpy" in > my mrjob.conf bootstrap, but it's a really long install process, > building a lot of stuff from source. Not the kind of thing you want to > put in a bootstrap for an ephemeral instance. > > Does anybody know where I can find a debian package for numpy 1.6? Go here: http://neuro.debian.net/index.html#how-to-use-this-repository and add one their repositories to your sources. Then you can do use apt-get to install ALL the latest packages on your Ubuntu box - numpy, scipy, pandas, matplotlib, statsmodels, etc. I wish I found this a few days ago.
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2013-01-06 12:32 -0500 |
| Message-ID | <roy-413E29.12322906012013@news.panix.com> |
| In reply to | #36270 |
In article <_dudnTfyxdvcLHTNnZ2dnUVZ_oCdnZ2d@giganews.com>, RueTheDay <nospam@nospam.com> wrote: > On Sun, 06 Jan 2013 11:45:34 -0500, Roy Smith wrote: > > > In article <_dudnTTyxduONXTNnZ2dnUVZ_oCdnZ2d@giganews.com>, > > RueTheDay <nospam@nospam.com> wrote: > > > >> On Sun, 06 Jan 2013 08:05:59 -0800, Miki Tebeka wrote: > >> > >> > On Sunday, January 6, 2013 5:57:17 AM UTC-8, RueTheDay wrote: > >> >> I am getting the following error when running on Python 2.7 on > >> >> Ubuntu 12.04: > >> >> >>>>>> > >> >> >>>>>> > >> >> AttributeError: 'Series' object has no attribute 'str' > >> > I would *guess* that you have an older version of pandas on your > >> > Linux machine. > >> > Try "print(pd.__version__)" to see which version you have. > >> > > >> > Also, trying asking over at > >> > https://groups.google.com/forum/?fromgroups=#!forum/pydata which is > >> > more dedicated to pandas. > >> > >> Thank you! That was it. I had 0.7 installed (the latest in the Ubuntu > >> repository). I downloaded and manually installed 0.10 and now it's > >> working. Coincidentally, this also fixed a problem I was having with > >> running a matplotlib plot function against a pandas Data Frame (worked > >> with some chart types but not others). > >> > >> I'm starting to understand why people rely on easy_install and pip. > >> Thanks again. > > > > Yeah, Ubuntu is a bit of a mess when it comes to pandas and the things > > it depends on. Apt gets you numpy 1.4.1, which is really old. Pandas > > won't even install on top of it. > > > > I've got pandas (and numpy, and scipy, and matplotlib) running on a > > Ubuntu 12.04 box. I installed everything with pip. My problem at this > > point, however, is I want to replicate that setup in EMR (Amazon's > > Elastic Map-Reduce). In theory, I could just run "pip install numpy" in > > my mrjob.conf bootstrap, but it's a really long install process, > > building a lot of stuff from source. Not the kind of thing you want to > > put in a bootstrap for an ephemeral instance. > > > > Does anybody know where I can find a debian package for numpy 1.6? > > Go here: > > http://neuro.debian.net/index.html#how-to-use-this-repository > > and add one their repositories to your sources. > > Then you can do use apt-get to install ALL the latest packages on your > Ubuntu box - numpy, scipy, pandas, matplotlib, statsmodels, etc. > > I wish I found this a few days ago. Cool, thanks! Really glad you're a few days ahead of me :-)
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web