Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #2525 > unrolled thread
| Started by | Mag Gam <magawake@gmail.com> |
|---|---|
| First post | 2011-04-03 08:06 -0400 |
| Last post | 2011-04-04 09:10 -0700 |
| Articles | 2 — 2 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: better way to do this in python Mag Gam <magawake@gmail.com> - 2011-04-03 08:06 -0400
Re: better way to do this in python nn <pruebauno@latinmail.com> - 2011-04-04 09:10 -0700
| From | Mag Gam <magawake@gmail.com> |
|---|---|
| Date | 2011-04-03 08:06 -0400 |
| Subject | Re: better way to do this in python |
| Message-ID | <mailman.167.1301832402.2990.python-list@python.org> |
Thanks for the responses.
Basically, I have a large file with this format,
Date INFO username command srcipaddress filename
I would like to do statistics on:
total number of usernames and who they are
username and commands
username and filenames
unique source ip addresses
unique filenames
Then I would like to bucket findings with days (date).
Overall, I would like to build a log file analyzer.
On Sat, Apr 2, 2011 at 10:59 PM, Dan Stromberg <drsalists@gmail.com> wrote:
>
> On Sat, Apr 2, 2011 at 5:24 PM, Chris Angelico <rosuav@gmail.com> wrote:
>>
>> On Sun, Apr 3, 2011 at 9:58 AM, Mag Gam <magawake@gmail.com> wrote:
>> > I suppose I can do something like this.
>> > (pseudocode)
>> >
>> > d={}
>> > try:
>> > d[key]+=1
>> > except KeyError:
>> > d[key]=1
>> >
>> >
>> > I was wondering if there is a pythonic way of doing this? I plan on
>> > doing this many times for various files. Would the python collections
>> > class be sufficient?
>>
>> I think you want collections.Counter. From the docs: "Counter objects
>> have a dictionary interface except that they return a zero count for
>> missing items instead of raising a KeyError".
>>
>> ChrisA
>
> I realize you (Mag) asked for a Python solution, but since you mention
> awk... you can also do this with "sort < input | uniq -c" - one line of
> "code". GNU sort doesn't use as nice an algorithm as a hashing-based
> solution (like you'd probably use with Python), but for a sort, GNU sort's
> quite good.
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>
[toc] | [next] | [standalone]
| From | nn <pruebauno@latinmail.com> |
|---|---|
| Date | 2011-04-04 09:10 -0700 |
| Message-ID | <1b4f9635-b9dd-4a2f-a306-72e98bb03ea2@i39g2000prd.googlegroups.com> |
| In reply to | #2525 |
On Apr 3, 8:06 am, Mag Gam <magaw...@gmail.com> wrote:
> Thanks for the responses.
>
> Basically, I have a large file with this format,
>
> Date INFO username command srcipaddress filename
>
> I would like to do statistics on:
> total number of usernames and who they are
> username and commands
> username and filenames
> unique source ip addresses
> unique filenames
>
> Then I would like to bucket findings with days (date).
>
> Overall, I would like to build a log file analyzer.
>
>
>
>
>
>
>
> On Sat, Apr 2, 2011 at 10:59 PM, Dan Stromberg <drsali...@gmail.com> wrote:
>
> > On Sat, Apr 2, 2011 at 5:24 PM, Chris Angelico <ros...@gmail.com> wrote:
>
> >> On Sun, Apr 3, 2011 at 9:58 AM, Mag Gam <magaw...@gmail.com> wrote:
> >> > I suppose I can do something like this.
> >> > (pseudocode)
>
> >> > d={}
> >> > try:
> >> > d[key]+=1
> >> > except KeyError:
> >> > d[key]=1
>
> >> > I was wondering if there is a pythonic way of doing this? I plan on
> >> > doing this many times for various files. Would the python collections
> >> > class be sufficient?
>
> >> I think you want collections.Counter. From the docs: "Counter objects
> >> have a dictionary interface except that they return a zero count for
> >> missing items instead of raising a KeyError".
>
> >> ChrisA
>
> > I realize you (Mag) asked for a Python solution, but since you mention
> > awk... you can also do this with "sort < input | uniq -c" - one line of
> > "code". GNU sort doesn't use as nice an algorithm as a hashing-based
> > solution (like you'd probably use with Python), but for a sort, GNU sort's
> > quite good.
>
> > --
> >http://mail.python.org/mailman/listinfo/python-list
Take a look at:
http://code.activestate.com/recipes/577535-aggregates-using-groupby-defaultdict-and-counter/
for some ideas of how to group and count things.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web