Groups > comp.lang.python > #36249 > unrolled thread

Newbie problem with Python pandas

Started by	RueTheDay <nospam@nospam.com>
First post	2013-01-06 07:57 -0600
Last post	2013-01-06 12:32 -0500
Articles	6 — 3 participants

Back to article view | Back to comp.lang.python

  Newbie problem with Python pandas RueTheDay <nospam@nospam.com> - 2013-01-06 07:57 -0600
    Re: Newbie problem with Python pandas Miki Tebeka <miki.tebeka@gmail.com> - 2013-01-06 08:05 -0800
      Re: Newbie problem with Python pandas RueTheDay <nospam@nospam.com> - 2013-01-06 10:36 -0600
        Re: Newbie problem with Python pandas Roy Smith <roy@panix.com> - 2013-01-06 11:45 -0500
          Re: Newbie problem with Python pandas RueTheDay <nospam@nospam.com> - 2013-01-06 11:15 -0600
            Re: Newbie problem with Python pandas Roy Smith <roy@panix.com> - 2013-01-06 12:32 -0500

#36249 — Newbie problem with Python pandas

From	RueTheDay <nospam@nospam.com>
Date	2013-01-06 07:57 -0600
Subject	Newbie problem with Python pandas
Message-ID	<8qSdnfYwI8QgH3TNnZ2dnUVZ_uadnZ2d@giganews.com>

I'm working my way through the examples in the O'Reilly book Python For 
Data Analysis and have encountered a snag.

The following code is supposed to analyze some web server log data and 
produces aggregate counts by client operating system.

###################
import json # used to process json records

from pandas import DataFrame, Series

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np



path = '/home/rich/code/sample.txt'

records = [json.loads(line) for line in open(path)] #read in records one 
line at a time

frame = DataFrame(records)



cframe = frame[frame.a.notnull()]

operating_system = np.where(cframe['a'].str.contains
('Windows'),'Windows', 'Not Windows')

by_tz_os = cframe.groupby(['tz', operating_system])

agg_counts = by_tz_os.size().unstack().fillna(0)

indexer = agg_counts.sum(1).argsort()

count_subset = agg_counts.take(indexer)[-10:]

print count_subset
####################

I am getting the following error when running on Python 2.7 on Ubuntu 
12.04:

>>>>>>
Traceback (most recent call last):
  File "./lp1.py", line 12, in <module>
    operating_system = np.where(cframe['a'].str.contains
('Windows'),'Windows', 'Not Windows')
AttributeError: 'Series' object has no attribute 'str'
>>>>>>>

Note that I was able to get the code to work fine on Windows 7, so this 
appears to be specific to Linux.

A little Googling showed others have encountered this problem and 
suggested replacing the np.where with a find, as so:

########
operating_system = ['Windows' if a.find('Windows') > 0 else 'Not Windows' 
for a in cframe['a']]
########

This appears to solve the first problem, but then it fails on the next 
line with:

>>>>>>>>
Traceback (most recent call last):
  File "./lp1.py", line 14, in <module>
    by_tz_os = cframe.groupby(['tz', operating_system])
  File "/usr/lib/pymodules/python2.7/pandas/core/generic.py", line 133, 
in groupby
    sort=sort)
  File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 522, 
in groupby
    return klass(obj, by, **kwds)
  File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 115, 
in __init__
    level=level, sort=sort)
  File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 705, 
in _get_groupings
    ping = Grouping(group_axis, gpr, name=name, level=level, sort=sort)
  File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 600, 
in __init__
    self.grouper = self.index.map(self.grouper)
  File "/usr/lib/pymodules/python2.7/pandas/core/index.py", line 591, in 
map
    return self._arrmap(self.values, mapper)
  File "generated.pyx", line 1141, in pandas._tseries.arrmap_int64 
(pandas/src/tseries.c:40593)
TypeError: 'list' object is not callable
>>>>>>>>>

The problem looks to be with the pandas module and appears to be Linux-
specific.

Any ideas?  I'm pulling my hair out over this.

[toc] | [next] | [standalone]

#36264

From	Miki Tebeka <miki.tebeka@gmail.com>
Date	2013-01-06 08:05 -0800
Message-ID	<733a78da-9393-4712-8255-a9942a5052b3@googlegroups.com>
In reply to	#36249

On Sunday, January 6, 2013 5:57:17 AM UTC-8, RueTheDay wrote:
> I am getting the following error when running on Python 2.7 on Ubuntu 
> 12.04:
> >>>>>>
>
> AttributeError: 'Series' object has no attribute 'str'
I would *guess* that  you have an older version of pandas on your Linux machine.
Try "print(pd.__version__)" to see which version you have.

Also, trying asking over at https://groups.google.com/forum/?fromgroups=#!forum/pydata which is more dedicated to pandas.

[toc] | [prev] | [next] | [standalone]

#36266

From	RueTheDay <nospam@nospam.com>
Date	2013-01-06 10:36 -0600
Message-ID	<_dudnTTyxduONXTNnZ2dnUVZ_oCdnZ2d@giganews.com>
In reply to	#36264

On Sun, 06 Jan 2013 08:05:59 -0800, Miki Tebeka wrote:

> On Sunday, January 6, 2013 5:57:17 AM UTC-8, RueTheDay wrote:
>> I am getting the following error when running on Python 2.7 on Ubuntu
>> 12.04:
>> >>>>>>
>> >>>>>>
>> AttributeError: 'Series' object has no attribute 'str'
> I would *guess* that  you have an older version of pandas on your Linux
> machine.
> Try "print(pd.__version__)" to see which version you have.
> 
> Also, trying asking over at
> https://groups.google.com/forum/?fromgroups=#!forum/pydata which is more
> dedicated to pandas.

Thank you!  That was it.  I had 0.7 installed (the latest in the Ubuntu 
repository).  I downloaded and manually installed 0.10 and now it's 
working.  Coincidentally, this also fixed a problem I was having with 
running a matplotlib plot function against a pandas Data Frame (worked 
with some chart types but not others).

I'm starting to understand why people rely on easy_install and pip.  
Thanks again.

[toc] | [prev] | [next] | [standalone]

#36268

From	Roy Smith <roy@panix.com>
Date	2013-01-06 11:45 -0500
Message-ID	<roy-72AA86.11453406012013@news.panix.com>
In reply to	#36266

In article <_dudnTTyxduONXTNnZ2dnUVZ_oCdnZ2d@giganews.com>,
 RueTheDay <nospam@nospam.com> wrote:

> On Sun, 06 Jan 2013 08:05:59 -0800, Miki Tebeka wrote:
> 
> > On Sunday, January 6, 2013 5:57:17 AM UTC-8, RueTheDay wrote:
> >> I am getting the following error when running on Python 2.7 on Ubuntu
> >> 12.04:
> >> >>>>>>
> >> >>>>>>
> >> AttributeError: 'Series' object has no attribute 'str'
> > I would *guess* that  you have an older version of pandas on your Linux
> > machine.
> > Try "print(pd.__version__)" to see which version you have.
> > 
> > Also, trying asking over at
> > https://groups.google.com/forum/?fromgroups=#!forum/pydata which is more
> > dedicated to pandas.
> 
> Thank you!  That was it.  I had 0.7 installed (the latest in the Ubuntu 
> repository).  I downloaded and manually installed 0.10 and now it's 
> working.  Coincidentally, this also fixed a problem I was having with 
> running a matplotlib plot function against a pandas Data Frame (worked 
> with some chart types but not others).
> 
> I'm starting to understand why people rely on easy_install and pip.  
> Thanks again.

Yeah, Ubuntu is a bit of a mess when it comes to pandas and the things 
it depends on.  Apt gets you numpy 1.4.1, which is really old.  Pandas 
won't even install on top of it.

I've got pandas (and numpy, and scipy, and matplotlib) running on a 
Ubuntu 12.04 box.  I installed everything with pip.  My problem at this 
point, however, is I want to replicate that setup in EMR (Amazon's 
Elastic Map-Reduce).  In theory, I could just run "pip install numpy" in 
my mrjob.conf bootstrap, but it's a really long install process, 
building a lot of stuff from source.  Not the kind of thing you want to 
put in a bootstrap for an ephemeral instance.

Does anybody know where I can find a debian package for numpy 1.6?

[toc] | [prev] | [next] | [standalone]

#36270

From	RueTheDay <nospam@nospam.com>
Date	2013-01-06 11:15 -0600
Message-ID	<_dudnTfyxdvcLHTNnZ2dnUVZ_oCdnZ2d@giganews.com>
In reply to	#36268

On Sun, 06 Jan 2013 11:45:34 -0500, Roy Smith wrote:

> In article <_dudnTTyxduONXTNnZ2dnUVZ_oCdnZ2d@giganews.com>,
>  RueTheDay <nospam@nospam.com> wrote:
> 
>> On Sun, 06 Jan 2013 08:05:59 -0800, Miki Tebeka wrote:
>> 
>> > On Sunday, January 6, 2013 5:57:17 AM UTC-8, RueTheDay wrote:
>> >> I am getting the following error when running on Python 2.7 on
>> >> Ubuntu 12.04:
>> >> >>>>>>
>> >> >>>>>>
>> >> AttributeError: 'Series' object has no attribute 'str'
>> > I would *guess* that  you have an older version of pandas on your
>> > Linux machine.
>> > Try "print(pd.__version__)" to see which version you have.
>> > 
>> > Also, trying asking over at
>> > https://groups.google.com/forum/?fromgroups=#!forum/pydata which is
>> > more dedicated to pandas.
>> 
>> Thank you!  That was it.  I had 0.7 installed (the latest in the Ubuntu
>> repository).  I downloaded and manually installed 0.10 and now it's
>> working.  Coincidentally, this also fixed a problem I was having with
>> running a matplotlib plot function against a pandas Data Frame (worked
>> with some chart types but not others).
>> 
>> I'm starting to understand why people rely on easy_install and pip.
>> Thanks again.
> 
> Yeah, Ubuntu is a bit of a mess when it comes to pandas and the things
> it depends on.  Apt gets you numpy 1.4.1, which is really old.  Pandas
> won't even install on top of it.
> 
> I've got pandas (and numpy, and scipy, and matplotlib) running on a
> Ubuntu 12.04 box.  I installed everything with pip.  My problem at this
> point, however, is I want to replicate that setup in EMR (Amazon's
> Elastic Map-Reduce).  In theory, I could just run "pip install numpy" in
> my mrjob.conf bootstrap, but it's a really long install process,
> building a lot of stuff from source.  Not the kind of thing you want to
> put in a bootstrap for an ephemeral instance.
> 
> Does anybody know where I can find a debian package for numpy 1.6?

Go here:

http://neuro.debian.net/index.html#how-to-use-this-repository

and add one their repositories to your sources.

Then you can do use apt-get to install ALL the latest packages on your 
Ubuntu box - numpy, scipy, pandas, matplotlib, statsmodels, etc.

I wish I found this a few days ago.

[toc] | [prev] | [next] | [standalone]

#36273

From	Roy Smith <roy@panix.com>
Date	2013-01-06 12:32 -0500
Message-ID	<roy-413E29.12322906012013@news.panix.com>
In reply to	#36270

In article <_dudnTfyxdvcLHTNnZ2dnUVZ_oCdnZ2d@giganews.com>,
 RueTheDay <nospam@nospam.com> wrote:

> On Sun, 06 Jan 2013 11:45:34 -0500, Roy Smith wrote:
> 
> > In article <_dudnTTyxduONXTNnZ2dnUVZ_oCdnZ2d@giganews.com>,
> >  RueTheDay <nospam@nospam.com> wrote:
> > 
> >> On Sun, 06 Jan 2013 08:05:59 -0800, Miki Tebeka wrote:
> >> 
> >> > On Sunday, January 6, 2013 5:57:17 AM UTC-8, RueTheDay wrote:
> >> >> I am getting the following error when running on Python 2.7 on
> >> >> Ubuntu 12.04:
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> AttributeError: 'Series' object has no attribute 'str'
> >> > I would *guess* that  you have an older version of pandas on your
> >> > Linux machine.
> >> > Try "print(pd.__version__)" to see which version you have.
> >> > 
> >> > Also, trying asking over at
> >> > https://groups.google.com/forum/?fromgroups=#!forum/pydata which is
> >> > more dedicated to pandas.
> >> 
> >> Thank you!  That was it.  I had 0.7 installed (the latest in the Ubuntu
> >> repository).  I downloaded and manually installed 0.10 and now it's
> >> working.  Coincidentally, this also fixed a problem I was having with
> >> running a matplotlib plot function against a pandas Data Frame (worked
> >> with some chart types but not others).
> >> 
> >> I'm starting to understand why people rely on easy_install and pip.
> >> Thanks again.
> > 
> > Yeah, Ubuntu is a bit of a mess when it comes to pandas and the things
> > it depends on.  Apt gets you numpy 1.4.1, which is really old.  Pandas
> > won't even install on top of it.
> > 
> > I've got pandas (and numpy, and scipy, and matplotlib) running on a
> > Ubuntu 12.04 box.  I installed everything with pip.  My problem at this
> > point, however, is I want to replicate that setup in EMR (Amazon's
> > Elastic Map-Reduce).  In theory, I could just run "pip install numpy" in
> > my mrjob.conf bootstrap, but it's a really long install process,
> > building a lot of stuff from source.  Not the kind of thing you want to
> > put in a bootstrap for an ephemeral instance.
> > 
> > Does anybody know where I can find a debian package for numpy 1.6?
> 
> Go here:
> 
> http://neuro.debian.net/index.html#how-to-use-this-repository
> 
> and add one their repositories to your sources.
> 
> Then you can do use apt-get to install ALL the latest packages on your 
> Ubuntu box - numpy, scipy, pandas, matplotlib, statsmodels, etc.
> 
> I wish I found this a few days ago.

Cool, thanks!  Really glad you're a few days ahead of me :-)

[toc] | [prev] | [standalone]

csiph-web

Newbie problem with Python pandas

Contents

#36249 — Newbie problem with Python pandas

#36264

#36266

#36268

#36270

#36273