Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #76581 > unrolled thread

Functions on list items

Started byKurt <kplunt@gmail.com>
First post2014-08-19 10:34 -0700
Last post2014-08-19 12:58 -0500
Articles 4 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  Functions on list items Kurt <kplunt@gmail.com> - 2014-08-19 10:34 -0700
    Re: Functions on list items Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-08-19 18:43 +0100
      Re: Functions on list items Kurt <kplunt@gmail.com> - 2014-08-19 10:56 -0700
    Re: Functions on list items Tim Chase <python.list@tim.thechases.com> - 2014-08-19 12:58 -0500

#76581 — Functions on list items

FromKurt <kplunt@gmail.com>
Date2014-08-19 10:34 -0700
SubjectFunctions on list items
Message-ID<33c7dfd2-015a-4b3f-a3c8-7bf4d6b6cb73@googlegroups.com>
I am trying to process the following calendar and data attributes in a file:
Da Mo Yr AttrA AttrB AttrC...
I need to average AttrA for each of 365 Da days across Yr years. Then do the same for 27K files. Repeat for AttrB, AttrC etc.
Can I do the averaging with lists or do I need some Python Db module (if it exists)? 

Kurt

[toc] | [next] | [standalone]


#76584

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2014-08-19 18:43 +0100
Message-ID<mailman.13156.1408470222.18130.python-list@python.org>
In reply to#76581
On 19/08/2014 18:34, Kurt wrote:
> I am trying to process the following calendar and data attributes in a file:
> Da Mo Yr AttrA AttrB AttrC...
> I need to average AttrA for each of 365 Da days across Yr years. Then do the same for 27K files. Repeat for AttrB, AttrC etc.
> Can I do the averaging with lists or do I need some Python Db module (if it exists)?
>
> Kurt
>

Maybe http://pandas.pydata.org/ ???

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#76589

FromKurt <kplunt@gmail.com>
Date2014-08-19 10:56 -0700
Message-ID<a4991f19-4483-4c76-b80b-fdc88c9309dd@googlegroups.com>
In reply to#76584
> 
> Maybe http://pandas.pydata.org/ ???
> 
> 

Thanks. This reply is like that butterfly wing flap causing hurricanes a world away; big steerage away from proprietary stuff used in my org. Detox will take some time though.

K

[toc] | [prev] | [next] | [standalone]


#76590

FromTim Chase <python.list@tim.thechases.com>
Date2014-08-19 12:58 -0500
Message-ID<mailman.13158.1408471182.18130.python-list@python.org>
In reply to#76581
On 2014-08-19 10:34, Kurt wrote:
> I am trying to process the following calendar and data attributes
> in a file: Da Mo Yr AttrA AttrB AttrC...
> I need to average AttrA for each of 365 Da days across Yr years.
> Then do the same for 27K files. Repeat for AttrB, AttrC etc. Can I
> do the averaging with lists or do I need some Python Db module (if
> it exists)? 

Unless these files are huge *AND* you plan to perform this operation
repeatedly (rather than once), just do it in plain Python.  (though
Python does offer both the "anydbm" and "sqlite3" modules for
database'ish stuff)

You omit some key details, so the example below would need to be
tweaked depending on things like:

 - does it have headers
 - how are the columns separated? (tabs, commas, spaces?)
 - can a single date appear multiple times in a single file? (and if
   so, what would you intend to do with it?)
 - do you need aggregate averages across all the files, or just
   per-file?
 - are those values you're averaging integers or floats?

It would look something like (untested)

  import csv
  from collections import defaultdict

  def report(fname, names_avgs):
    "Do something with the averages"
    print(fname)
    for name, avg in names_avgs:
      print(" %s: %f" % (name, avg))

  def no_data(fname):
    "Do something if a file has no data"
    pass

  for fname in fname_iter():
    sums = defaultdict(float)
    row_count = 0
    with open(fname) as f:
      r = csv.DictReader(f):
      for row in r:
        row_count += 1
        for data in ("AttrA", "AttrB", "AttrC"):
          sums[data] += float(row[data])
    if row_count:
      report(fname, [
        (fieldname, total/row_count)
        for fieldname, total
        in sums
        ])
    else:
      no_data(fname)

I leave the details of fname_iter() and tweaking it based on the
omitted details as an exercise to the reader. :-)

-tkc



[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web