Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #2525

Re: better way to do this in python

Path csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <magawake@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.004
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'suppose': 0.05; 'dictionary': 0.07; 'python': 0.07; 'collections': 0.09; 'filename': 0.09; 'solution,': 0.09; 'sun,': 0.09; 'files.': 0.10; 'pm,': 0.11; 'am,': 0.14; 'wrote:': 0.14; 'subject:python': 0.15; '-c"': 0.16; '10:59': 0.16; 'bucket': 0.16; 'filenames': 0.16; 'keyerror:': 0.16; 'pythonic': 0.16; 'algorithm': 0.16; 'this?': 0.18; 'input': 0.18; 'command': 0.19; 'wondering': 0.19; 'cc:no real name:2**0': 0.20; 'interface': 0.20; '(like': 0.22; 'header :In-Reply-To:1': 0.22; 'cc:addr:python-list': 0.22; 'responses.': 0.23; 'objects': 0.24; 'asked': 0.25; 'format,': 0.25; 'instead': 0.26; 'chris': 0.27; 'url:mailman': 0.27; 'message- id:@mail.gmail.com': 0.28; "doesn't": 0.28; 'thanks': 0.29; 'sat,': 0.29; 'class': 0.29; 'good.': 0.29; 'probably': 0.30; 'sort': 0.30; 'this.': 0.30; 'cc:addr:python.org': 0.31; 'cc:addr:gmail.com': 0.31; 'basically,': 0.31; 'statistics': 0.31; 'source': 0.32; 'url:listinfo': 0.33; 'there': 0.35; 'file': 0.35; 'gnu': 0.35; 'try:': 0.35; 'addresses': 0.36; 'quite': 0.36; 'doing': 0.36; 'missing': 0.36; 'think': 0.36; 'received:209.85': 0.37; 'url:python': 0.37; 'apr': 0.38; 'commands': 0.38; 'received:google.com': 0.38; 'but': 0.38; 'url:org': 0.38; 'cc:2**1': 0.38; 'log': 0.38; 'date': 0.39; 'realize': 0.39; 'received:209': 0.39; 'except': 0.39; 'solution': 0.40; 'count': 0.40; 'would': 0.40; 'header:Received:5': 0.40; '2011': 0.62; 'unique': 0.63; 'username': 0.73; 'subject:this': 0.79; 'mag': 0.84; 'sufficient?': 0.84; 'overall,': 0.91; 'received:209.85.218.46': 0.91; 'received:mail- yi0-f46.google.com': 0.91; 'findings': 0.93; 'subject:better': 0.93
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=Sx+vwZTjabUyeQZTYlrmGqWI4NkzLscMP8HfdT8piAc=; b=gq4B/xSUKWhzNBxNEKEjhk/s+Q7ZxZwluh27m5eNUw0QN6PnFj3RqyI/Nm6+7ozymM UzcpKk9C0NPbbFqyf7WG9vD7LE4hjmB+2B+yQd8aCSWb6YByhQnZ9/nWZ/Ol4fZILCVn r7r53smrBI1mFtuL41N3CkvNsuG/iB9xVRidY=
DomainKey-Signature a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=iEZHtsbZNNUY6HVgPbsTWFmyd+Ek0gyAXtot9OmXOLG0JKeXWTStRkYeSZTW/qm1IQ 5k4VwuThPZk1XMqbt+cjD9MxG74txp3q3Z9CVDrpIbh2YDEoQHek48wb36w0FDv/9qZc cjreWmR26ukBAWEjRWTahfl7t7Dyqq/1otr2M=
MIME-Version 1.0
In-Reply-To <BANLkTinfNzHN+e7B4B01MNbGQ_r9SfZEKQ@mail.gmail.com>
References <BANLkTimnvP0mrvt6dOj_EJ3aC8+0sRfNXg@mail.gmail.com> <BANLkTimxzZXaLvj=L4B33zYtQOBhedbtCQ@mail.gmail.com> <BANLkTinfNzHN+e7B4B01MNbGQ_r9SfZEKQ@mail.gmail.com>
Date Sun, 3 Apr 2011 08:06:32 -0400
Subject Re: better way to do this in python
From Mag Gam <magawake@gmail.com>
To Dan Stromberg <drsalists@gmail.com>
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding quoted-printable
Cc python-list@python.org
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.167.1301832402.2990.python-list@python.org> (permalink)
Lines 60
NNTP-Posting-Host 82.94.164.166
X-Trace 1301832402 news.xs4all.nl 81479 [::ffff:82.94.164.166]:34500
X-Complaints-To abuse@xs4all.nl
Xref x330-a1.tempe.blueboxinc.net comp.lang.python:2525

Show key headers only | View raw


Thanks for the responses.


Basically, I have a large file with this format,

Date INFO username command srcipaddress filename


I would like to do statistics on:
total number of usernames and who they are
username and commands
username and filenames
unique source ip addresses
unique filenames

Then I would like to bucket findings with days (date).

Overall, I would like to build a log file analyzer.



On Sat, Apr 2, 2011 at 10:59 PM, Dan Stromberg <drsalists@gmail.com> wrote:
>
> On Sat, Apr 2, 2011 at 5:24 PM, Chris Angelico <rosuav@gmail.com> wrote:
>>
>> On Sun, Apr 3, 2011 at 9:58 AM, Mag Gam <magawake@gmail.com> wrote:
>> > I suppose I can do something like this.
>> > (pseudocode)
>> >
>> > d={}
>> > try:
>> >  d[key]+=1
>> > except KeyError:
>> >  d[key]=1
>> >
>> >
>> > I was wondering if there is a pythonic way of doing this? I plan on
>> > doing this many times for various files. Would the python collections
>> > class be sufficient?
>>
>> I think you want collections.Counter. From the docs: "Counter objects
>> have a dictionary interface except that they return a zero count for
>> missing items instead of raising a KeyError".
>>
>> ChrisA
>
> I realize you (Mag) asked for a Python solution, but since you mention
> awk... you can also do this with "sort < input | uniq -c" - one line of
> "code".  GNU sort doesn't use as nice an algorithm as a hashing-based
> solution (like you'd probably use with Python), but for a sort, GNU sort's
> quite good.
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Re: better way to do this in python Mag Gam <magawake@gmail.com> - 2011-04-03 08:06 -0400
  Re: better way to do this in python nn <pruebauno@latinmail.com> - 2011-04-04 09:10 -0700

csiph-web