Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #63152

[ANN] gg_scrapper -- scrapping of the Google Groups

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.datemas.de!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <mcepl@redhat.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.032
X-Spam-Evidence '*H*': 0.94; '*S*': 0.00; 'url:pypi': 0.03; 'subject: -- ': 0.07; 'subject:ANN': 0.07; 'url:blog': 0.10; 'archive': 0.14; 'gpg': 0.16; 'inspiration': 0.16; 'jabber:': 0.16; 'subject:] ': 0.20; 'header:User-Agent:1': 0.23; 'script.': 0.24; 'script': 0.25; 'somewhere': 0.26; 'comments': 0.31; 'url:cz': 0.31; 'front': 0.32; 'url:python': 0.33; 'addresses': 0.33; 'skip:_ 10': 0.34; 'subject:the': 0.34; 'created': 0.35; 'google': 0.35; 'thanks': 0.36; 'url:org': 0.36; 'so,': 0.37; 'list': 0.37; 'received:209': 0.37; 'received:10': 0.37; 'to:addr:python-list': 0.38; 'subject:[': 0.39; 'hosted': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'black': 0.61; 'first': 0.61; 'content-disposition:inline': 0.62; 'frustrated': 0.68; 'received:10.36': 0.84; 'url:2013': 0.84
Date Sat, 4 Jan 2014 23:57:40 +0100
From Matej Cepl <mcepl@redhat.com>
To python-list@python.org
Subject [ANN] gg_scrapper -- scrapping of the Google Groups
MIME-Version 1.0
Content-Type multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="n8g4imXOkfNTN/H1"
Content-Disposition inline
Organization Red Hat Czech, s.r.o.
X-Operating-System Linux 3.10.0-60.el7.x86_64
User-Agent Mutt/1.5.21 (2012-12-30)
X-Scanned-By MIMEDefang 2.68 on 10.5.11.25
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.4916.1388876270.18130.python-list@python.org> (permalink)
Lines 49
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1388876270 news.xs4all.nl 2856 [2001:888:2000:d::a6]:33102
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:63152

Show key headers only | View raw


[Multipart message — attachments visible in raw view] - view raw

Did you try to archive email list hosted on the Google Groups?  
Were you endlessly frustrated by the black hole which is Google 
Groups, conscpicious by its absence on the Data Liberation Front 
website? Yes, I was too_

So, I have created a script webscrapping a google group and 
created gg_scrapper_ . Thanks to `Sean Hogan`_ for the first 
inspiration for the script. Any comments would be welcome via 
email (I am sure you can find my addresses somewhere on the 
Web).

Best,

Matěj

.. _too:
    http://matej.ceplovi.cz/blog/2013/09/we-should-stop-even-pretending-google-is-trying-to-do-the-right-thing/
.. _gg_scrapper:
    https://pypi.python.org/pypi/gg_scrapper
.. _`Sean Hogan`:
    http://matej.ceplovi.cz/blog/2013/09/we-should-stop-even-pretending-google-is-trying-to-do-the-right-thing/#comment-482

-- 
http://www.ceplovi.cz/matej/, Jabber: mcepl<at>ceplovi.cz
GPG Finger: 89EF 4BC6 288A BF43 1BAB  25C3 E09F EF25 D964 84AC
 
<"}}}><

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

[ANN] gg_scrapper -- scrapping of the Google Groups Matej Cepl <mcepl@redhat.com> - 2014-01-04 23:57 +0100

csiph-web