Groups > comp.lang.python > #2525 > unrolled thread

Re: better way to do this in python

Started by	Mag Gam <magawake@gmail.com>
First post	2011-04-03 08:06 -0400
Last post	2011-04-04 09:10 -0700
Articles	2 — 2 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: better way to do this in python Mag Gam <magawake@gmail.com> - 2011-04-03 08:06 -0400
    Re: better way to do this in python nn <pruebauno@latinmail.com> - 2011-04-04 09:10 -0700

#2525 — Re: better way to do this in python

From	Mag Gam <magawake@gmail.com>
Date	2011-04-03 08:06 -0400
Subject	Re: better way to do this in python
Message-ID	<mailman.167.1301832402.2990.python-list@python.org>

Thanks for the responses.


Basically, I have a large file with this format,

Date INFO username command srcipaddress filename


I would like to do statistics on:
total number of usernames and who they are
username and commands
username and filenames
unique source ip addresses
unique filenames

Then I would like to bucket findings with days (date).

Overall, I would like to build a log file analyzer.



On Sat, Apr 2, 2011 at 10:59 PM, Dan Stromberg <drsalists@gmail.com> wrote:
>
> On Sat, Apr 2, 2011 at 5:24 PM, Chris Angelico <rosuav@gmail.com> wrote:
>>
>> On Sun, Apr 3, 2011 at 9:58 AM, Mag Gam <magawake@gmail.com> wrote:
>> > I suppose I can do something like this.
>> > (pseudocode)
>> >
>> > d={}
>> > try:
>> >  d[key]+=1
>> > except KeyError:
>> >  d[key]=1
>> >
>> >
>> > I was wondering if there is a pythonic way of doing this? I plan on
>> > doing this many times for various files. Would the python collections
>> > class be sufficient?
>>
>> I think you want collections.Counter. From the docs: "Counter objects
>> have a dictionary interface except that they return a zero count for
>> missing items instead of raising a KeyError".
>>
>> ChrisA
>
> I realize you (Mag) asked for a Python solution, but since you mention
> awk... you can also do this with "sort < input | uniq -c" - one line of
> "code".  GNU sort doesn't use as nice an algorithm as a hashing-based
> solution (like you'd probably use with Python), but for a sort, GNU sort's
> quite good.
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>

[toc] | [next] | [standalone]

#2572

From	nn <pruebauno@latinmail.com>
Date	2011-04-04 09:10 -0700
Message-ID	<1b4f9635-b9dd-4a2f-a306-72e98bb03ea2@i39g2000prd.googlegroups.com>
In reply to	#2525

On Apr 3, 8:06 am, Mag Gam <magaw...@gmail.com> wrote:
> Thanks for the responses.
>
> Basically, I have a large file with this format,
>
> Date INFO username command srcipaddress filename
>
> I would like to do statistics on:
> total number of usernames and who they are
> username and commands
> username and filenames
> unique source ip addresses
> unique filenames
>
> Then I would like to bucket findings with days (date).
>
> Overall, I would like to build a log file analyzer.
>
>
>
>
>
>
>
> On Sat, Apr 2, 2011 at 10:59 PM, Dan Stromberg <drsali...@gmail.com> wrote:
>
> > On Sat, Apr 2, 2011 at 5:24 PM, Chris Angelico <ros...@gmail.com> wrote:
>
> >> On Sun, Apr 3, 2011 at 9:58 AM, Mag Gam <magaw...@gmail.com> wrote:
> >> > I suppose I can do something like this.
> >> > (pseudocode)
>
> >> > d={}
> >> > try:
> >> >  d[key]+=1
> >> > except KeyError:
> >> >  d[key]=1
>
> >> > I was wondering if there is a pythonic way of doing this? I plan on
> >> > doing this many times for various files. Would the python collections
> >> > class be sufficient?
>
> >> I think you want collections.Counter. From the docs: "Counter objects
> >> have a dictionary interface except that they return a zero count for
> >> missing items instead of raising a KeyError".
>
> >> ChrisA
>
> > I realize you (Mag) asked for a Python solution, but since you mention
> > awk... you can also do this with "sort < input | uniq -c" - one line of
> > "code".  GNU sort doesn't use as nice an algorithm as a hashing-based
> > solution (like you'd probably use with Python), but for a sort, GNU sort's
> > quite good.
>
> > --
> >http://mail.python.org/mailman/listinfo/python-list

Take a look at:
http://code.activestate.com/recipes/577535-aggregates-using-groupby-defaultdict-and-counter/

for some ideas of how to group and count things.

[toc] | [prev] | [standalone]

csiph-web

Re: better way to do this in python

Contents

#2525 — Re: better way to do this in python

#2572