Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #75404
| References | <CALyJZZXMmSk8L7+P7erQSp2W4EpkbxiXmJkY-+5svfK5C3Z0kw@mail.gmail.com> <CANc-5UzCLT8ZPF2PCDMQyOqr0gqrDjFGnsRAoFBfGVmCh5-0jg@mail.gmail.com> |
|---|---|
| From | Vincent Davis <vincent@vincentdavis.net> |
| Date | 2014-07-30 18:28 -0600 |
| Subject | Re: speed up pandas calculation |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.12458.1406785998.18130.python-list@python.org> (permalink) |
[Multipart message — attachments visible in raw view] - view raw
On Wed, Jul 30, 2014 at 5:57 PM, Skip Montanaro <skip.montanaro@gmail.com>
wrote:
> > df = pd.read_csv('nhamcsopd2010.csv' , index_col='PATCODE',
> low_memory=False)
> > col_init = list(df.columns.values)
> > keep_col = ['PATCODE', 'PATWT', 'VDAY', 'VMONTH', 'VYEAR', 'MED1',
> 'MED2', 'MED3', 'MED4', 'MED5']
> > for col in col_init:
> > if col not in keep_col:
> > del df[col]
>
> I'm no pandas expert, but a couple things come to mind. First, where is
> your code slow (profile it, even with a few well-placed prints)? If it's in
> read_csv there might be little you can do unless you load those data
> repeatedly, and can save a pickled data frame as a caching measure. Second,
> you loop over columns deciding one by one whether to keep or toss a column.
> Instead try
>
> df = df[keep_col]
>
> Third, if deleting those other columns is costly, can you perhaps just
> ignore them?
>
> Can't be more investigative right now. I don't have pandas on Android. :-)
>
So the df = df[keep_col] is not fast but it is not that slow. You made me
think of a solution to that part. just slice and copy. The only gotya is
that the keep_col have to actually exist
keep_col = ['PATCODE', 'PATWT', 'VDAYR', 'VMONTH', 'MED1', 'MED2', 'MED3',
'MED4', 'MED5']
df = df[keep_col]
The real slow part seems to be
for n in drugs:
df[n] = df[['MED1','MED2','MED3','MED4','MED5']].isin([drugs[n]]).any(1)
Vincent Davis
720-301-3003
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: speed up pandas calculation Vincent Davis <vincent@vincentdavis.net> - 2014-07-30 18:28 -0600
csiph-web