Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #75391 > unrolled thread

Re: speed up pandas calculation

Started bySkip Montanaro <skip.montanaro@gmail.com>
First post2014-07-30 18:57 -0500
Last post2014-07-30 18:57 -0500
Articles 1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: speed up pandas calculation Skip Montanaro <skip.montanaro@gmail.com> - 2014-07-30 18:57 -0500

#75391 — Re: speed up pandas calculation

FromSkip Montanaro <skip.montanaro@gmail.com>
Date2014-07-30 18:57 -0500
SubjectRe: speed up pandas calculation
Message-ID<mailman.12448.1406764670.18130.python-list@python.org>

[Multipart message — attachments visible in raw view] — view raw

> df = pd.read_csv('nhamcsopd2010.csv' , index_col='PATCODE',
low_memory=False)
> col_init = list(df.columns.values)
> keep_col = ['PATCODE', 'PATWT', 'VDAY', 'VMONTH', 'VYEAR', 'MED1',
'MED2', 'MED3', 'MED4', 'MED5']
> for col in col_init:
>     if col not in keep_col:
>         del df[col]

I'm no pandas expert, but a couple things come to mind. First, where is
your code slow (profile it, even with a few well-placed prints)? If it's in
read_csv there might be little you can do unless you load those data
repeatedly, and can save a pickled data frame as a caching measure. Second,
you loop over columns deciding one by one whether to keep or toss a column.
Instead try

df = df[keep_col]

Third, if deleting those other columns is costly, can you perhaps just
ignore them?

Can't be more investigative right now. I don't have pandas on Android. :-)

Skip

[toc] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web