Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'suppose': 0.05; 'dictionary': 0.07; 'python': 0.07; 'collections': 0.09; 'filename': 0.09; 'solution,': 0.09; 'sun,': 0.09; 'files.': 0.10; 'pm,': 0.11; 'am,': 0.14; 'wrote:': 0.14; 'subject:python': 0.15; '-c"': 0.16; '10:59': 0.16; 'bucket': 0.16; 'filenames': 0.16; 'keyerror:': 0.16; 'pythonic': 0.16; 'algorithm': 0.16; 'this?': 0.18; 'input': 0.18; 'command': 0.19; 'wondering': 0.19; 'cc:no real name:2**0': 0.20; 'interface': 0.20; '(like': 0.22; 'header :In-Reply-To:1': 0.22; 'cc:addr:python-list': 0.22; 'responses.': 0.23; 'objects': 0.24; 'asked': 0.25; 'format,': 0.25; 'instead': 0.26; 'chris': 0.27; 'url:mailman': 0.27; 'message- id:@mail.gmail.com': 0.28; "doesn't": 0.28; 'thanks': 0.29; 'sat,': 0.29; 'class': 0.29; 'good.': 0.29; 'probably': 0.30; 'sort': 0.30; 'this.': 0.30; 'cc:addr:python.org': 0.31; 'cc:addr:gmail.com': 0.31; 'basically,': 0.31; 'statistics': 0.31; 'source': 0.32; 'url:listinfo': 0.33; 'there': 0.35; 'file': 0.35; 'gnu': 0.35; 'try:': 0.35; 'addresses': 0.36; 'quite': 0.36; 'doing': 0.36; 'missing': 0.36; 'think': 0.36; 'received:209.85': 0.37; 'url:python': 0.37; 'apr': 0.38; 'commands': 0.38; 'received:google.com': 0.38; 'but': 0.38; 'url:org': 0.38; 'cc:2**1': 0.38; 'log': 0.38; 'date': 0.39; 'realize': 0.39; 'received:209': 0.39; 'except': 0.39; 'solution': 0.40; 'count': 0.40; 'would': 0.40; 'header:Received:5': 0.40; '2011': 0.62; 'unique': 0.63; 'username': 0.73; 'subject:this': 0.79; 'mag': 0.84; 'sufficient?': 0.84; 'overall,': 0.91; 'received:209.85.218.46': 0.91; 'received:mail- yi0-f46.google.com': 0.91; 'findings': 0.93; 'subject:better': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=Sx+vwZTjabUyeQZTYlrmGqWI4NkzLscMP8HfdT8piAc=; b=gq4B/xSUKWhzNBxNEKEjhk/s+Q7ZxZwluh27m5eNUw0QN6PnFj3RqyI/Nm6+7ozymM UzcpKk9C0NPbbFqyf7WG9vD7LE4hjmB+2B+yQd8aCSWb6YByhQnZ9/nWZ/Ol4fZILCVn r7r53smrBI1mFtuL41N3CkvNsuG/iB9xVRidY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=iEZHtsbZNNUY6HVgPbsTWFmyd+Ek0gyAXtot9OmXOLG0JKeXWTStRkYeSZTW/qm1IQ 5k4VwuThPZk1XMqbt+cjD9MxG74txp3q3Z9CVDrpIbh2YDEoQHek48wb36w0FDv/9qZc cjreWmR26ukBAWEjRWTahfl7t7Dyqq/1otr2M= MIME-Version: 1.0 In-Reply-To: References: Date: Sun, 3 Apr 2011 08:06:32 -0400 Subject: Re: better way to do this in python From: Mag Gam To: Dan Stromberg Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 60 NNTP-Posting-Host: 82.94.164.166 X-Trace: 1301832402 news.xs4all.nl 81479 [::ffff:82.94.164.166]:34500 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:2525 Thanks for the responses. Basically, I have a large file with this format, Date INFO username command srcipaddress filename I would like to do statistics on: total number of usernames and who they are username and commands username and filenames unique source ip addresses unique filenames Then I would like to bucket findings with days (date). Overall, I would like to build a log file analyzer. On Sat, Apr 2, 2011 at 10:59 PM, Dan Stromberg wrote: > > On Sat, Apr 2, 2011 at 5:24 PM, Chris Angelico wrote: >> >> On Sun, Apr 3, 2011 at 9:58 AM, Mag Gam wrote: >> > I suppose I can do something like this. >> > (pseudocode) >> > >> > d=3D{} >> > try: >> > =C2=A0d[key]+=3D1 >> > except KeyError: >> > =C2=A0d[key]=3D1 >> > >> > >> > I was wondering if there is a pythonic way of doing this? I plan on >> > doing this many times for various files. Would the python collections >> > class be sufficient? >> >> I think you want collections.Counter. From the docs: "Counter objects >> have a dictionary interface except that they return a zero count for >> missing items instead of raising a KeyError". >> >> ChrisA > > I realize you (Mag) asked for a Python solution, but since you mention > awk... you can also do this with "sort < input | uniq -c" - one line of > "code".=C2=A0 GNU sort doesn't use as nice an algorithm as a hashing-base= d > solution (like you'd probably use with Python), but for a sort, GNU sort'= s > quite good. > > > > -- > http://mail.python.org/mailman/listinfo/python-list > >