Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #2572
| From | nn <pruebauno@latinmail.com> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: better way to do this in python |
| Date | 2011-04-04 09:10 -0700 |
| Organization | http://groups.google.com |
| Message-ID | <1b4f9635-b9dd-4a2f-a306-72e98bb03ea2@i39g2000prd.googlegroups.com> (permalink) |
| References | <BANLkTimnvP0mrvt6dOj_EJ3aC8+0sRfNXg@mail.gmail.com> <BANLkTimxzZXaLvj=L4B33zYtQOBhedbtCQ@mail.gmail.com> <BANLkTinfNzHN+e7B4B01MNbGQ_r9SfZEKQ@mail.gmail.com> <mailman.167.1301832402.2990.python-list@python.org> |
On Apr 3, 8:06 am, Mag Gam <magaw...@gmail.com> wrote:
> Thanks for the responses.
>
> Basically, I have a large file with this format,
>
> Date INFO username command srcipaddress filename
>
> I would like to do statistics on:
> total number of usernames and who they are
> username and commands
> username and filenames
> unique source ip addresses
> unique filenames
>
> Then I would like to bucket findings with days (date).
>
> Overall, I would like to build a log file analyzer.
>
>
>
>
>
>
>
> On Sat, Apr 2, 2011 at 10:59 PM, Dan Stromberg <drsali...@gmail.com> wrote:
>
> > On Sat, Apr 2, 2011 at 5:24 PM, Chris Angelico <ros...@gmail.com> wrote:
>
> >> On Sun, Apr 3, 2011 at 9:58 AM, Mag Gam <magaw...@gmail.com> wrote:
> >> > I suppose I can do something like this.
> >> > (pseudocode)
>
> >> > d={}
> >> > try:
> >> > d[key]+=1
> >> > except KeyError:
> >> > d[key]=1
>
> >> > I was wondering if there is a pythonic way of doing this? I plan on
> >> > doing this many times for various files. Would the python collections
> >> > class be sufficient?
>
> >> I think you want collections.Counter. From the docs: "Counter objects
> >> have a dictionary interface except that they return a zero count for
> >> missing items instead of raising a KeyError".
>
> >> ChrisA
>
> > I realize you (Mag) asked for a Python solution, but since you mention
> > awk... you can also do this with "sort < input | uniq -c" - one line of
> > "code". GNU sort doesn't use as nice an algorithm as a hashing-based
> > solution (like you'd probably use with Python), but for a sort, GNU sort's
> > quite good.
>
> > --
> >http://mail.python.org/mailman/listinfo/python-list
Take a look at:
http://code.activestate.com/recipes/577535-aggregates-using-groupby-defaultdict-and-counter/
for some ideas of how to group and count things.
Back to comp.lang.python | Previous | Next — Previous in thread | Find similar | Unroll thread
Re: better way to do this in python Mag Gam <magawake@gmail.com> - 2011-04-03 08:06 -0400 Re: better way to do this in python nn <pruebauno@latinmail.com> - 2011-04-04 09:10 -0700
csiph-web