Path: csiph.com!usenet.pasdenom.info!dedibox.gegeweb.org!gegeweb.eu!nntpfeed.proxad.net!proxad.net!feeder1-2.proxad.net!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.045 X-Spam-Evidence: '*H*': 0.91; '*S*': 0.00; 'modify': 0.07; 'parser': 0.07; 'filename': 0.09; 'output,': 0.09; 'received:mail- qc0-f174.google.com': 0.09; 'subject:module': 0.09; 'archive': 0.14; '""):': 0.16; 'dest': 0.16; 'dest)': 0.16; 'does,': 0.16; 'gmail.': 0.16; 'headerparser': 0.16; 'maildir': 0.16; 'robust,': 0.16; 'set()': 0.16; 'wrote:': 0.18; 'obviously': 0.18; 'import': 0.22; 'header:User-Agent:1': 0.23; 'headers': 0.24; 'script': 0.25; 'post': 0.26; 'skip:" 20': 0.27; 'header:In-Reply-To:1': 0.27; 'idea': 0.28; 'chris': 0.29; '[1]': 0.29; '"")': 0.31; "d'aprano": 0.31; 'steven': 0.31; 'subject:users': 0.31; 'anyone': 0.31; 'subject:the': 0.34; 'received:209.85': 0.35; 'except': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'google': 0.35; 'charset:us-ascii': 0.36; "i'll": 0.36; 'list': 0.37; 'received:209.85.216': 0.37; 'received:209': 0.37; 'skip:o 20': 0.38; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'rather': 0.38; 'to:addr:python.org': 0.39; 'enough': 0.39; 'skip:p 20': 0.39; 'users': 0.40; 'back': 0.62; 'content-disposition:inline': 0.62; 'complete': 0.62; 'happen': 0.63; 'different': 0.65; 'taking': 0.65; 'determine': 0.67; 'email addr:python.org"': 0.68; 'received:190': 0.69; 'filtered': 0.84; 'posters,': 0.84; 'received:190.163': 0.84; 'senders': 0.84; 'subject:Using': 0.84; 'ratio': 0.91; 'dirty': 0.93; '2013': 0.98 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=CMliFuuHSSsIToH9U/IwW5VhMW1CGSSR7HU7OW4VkGA=; b=kVTfsxXZayHcqyMTquIBJCz56EakGF1sG1KA0aVNOni7tvr4X7UUcWN+vteSfkn8QE DjyEJJb1j8BzXfgrXaxeiQhLVKgUqIy0Cq7/BG3hltLVtX3ChPNchHCNwFY3S68FRNZP O0JQUmcRTRlLwspX9N7r8rMEK8wZbByrO2ynavVB44q4c/Dnf2QDFimPArb60QgLKe7L TtTNeDsW/q3025SKANNwtc9yT1CxAG8SJTDBtVaSJstzPqXUVEC8T9tDAreJCx7uN1Es J1I0Ih5veAfLrB+B9IL1u/E1VbWk6S6DytxnjKMas89Z1QfMIu/+8Qn4w4dKtH8xPVap YFGg== X-Gm-Message-State: ALoCoQlFvKmkClIPAcK4WDYwcwTlUBlImEjdAKPfcRUcZOs2ex9tGrN7RCFAiECQu01FM+cT/ziL X-Received: by 10.224.25.8 with SMTP id x8mr21684294qab.77.1382855854778; Sat, 26 Oct 2013 23:37:34 -0700 (PDT) Date: Sun, 27 Oct 2013 03:37:29 -0300 From: Zero Piraeus To: python-list@python.org Subject: Re: Using the nntplib module to count Google Groups users References: <526c8949$0$29972$c3e8da3$5496439d@news.astraweb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-PGP-Key: http://etiol.net/pubkey.asc User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 68 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1382856296 news.xs4all.nl 15942 [2001:888:2000:d::a6]:51387 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:57724 : On Sun, Oct 27, 2013 at 03:35:40PM +1100, Chris Angelico wrote: > On Sun, Oct 27, 2013 at 2:32 PM, Steven D'Aprano > wrote: > > If anyone wants to modify the script to determine the ratio of posters, > > rather than posts, using GG, be my guest. > > And if anyone does, do please post the result on-list. Taking a different tack, since I happen to have a complete[1] local archive of python-list going back a few years ... here's a quick and dirty script to count unique senders and Google Groups users for this year: - - - import os from email.parser import HeaderParser LIST = "python-list@python.org" MAILDIR = "/path/to/mail/archive/cur" YEAR = "2013" parser = HeaderParser() found = set() gg_users = 0 for filename in os.listdir(MAILDIR): with open(os.path.join(MAILDIR, filename)) as message: headers = parser.parse(message) sender = headers.get("from", "") dest = headers.get("to", "") date = headers.get("date", "") if (LIST not in dest) or (YEAR not in date) or (sender in found): continue found.add(sender) if "groups-abuse@google.com" in headers.get("complaints-to", ""): gg_users += 1 print("GG user:") print(sender) print("Senders: %d" % len(found)) print("GG users: %d" % gg_users) print("---") - - - It's obviously not very robust, but I reckon it's good enough to get an idea what's going on. The results: Senders: 1701 GG users: 879 ... so just over 50%. If anyone wants the complete output, just let me know and I'll email it privately. -[]z. [1] except for spam filtered out by Gmail. -- Zero Piraeus: ad referendum http://etiol.net/pubkey.asc