Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #63152
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.datemas.de!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <mcepl@redhat.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.032 |
| X-Spam-Evidence | '*H*': 0.94; '*S*': 0.00; 'url:pypi': 0.03; 'subject: -- ': 0.07; 'subject:ANN': 0.07; 'url:blog': 0.10; 'archive': 0.14; 'gpg': 0.16; 'inspiration': 0.16; 'jabber:': 0.16; 'subject:] ': 0.20; 'header:User-Agent:1': 0.23; 'script.': 0.24; 'script': 0.25; 'somewhere': 0.26; 'comments': 0.31; 'url:cz': 0.31; 'front': 0.32; 'url:python': 0.33; 'addresses': 0.33; 'skip:_ 10': 0.34; 'subject:the': 0.34; 'created': 0.35; 'google': 0.35; 'thanks': 0.36; 'url:org': 0.36; 'so,': 0.37; 'list': 0.37; 'received:209': 0.37; 'received:10': 0.37; 'to:addr:python-list': 0.38; 'subject:[': 0.39; 'hosted': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'black': 0.61; 'first': 0.61; 'content-disposition:inline': 0.62; 'frustrated': 0.68; 'received:10.36': 0.84; 'url:2013': 0.84 |
| Date | Sat, 4 Jan 2014 23:57:40 +0100 |
| From | Matej Cepl <mcepl@redhat.com> |
| To | python-list@python.org |
| Subject | [ANN] gg_scrapper -- scrapping of the Google Groups |
| MIME-Version | 1.0 |
| Content-Type | multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="n8g4imXOkfNTN/H1" |
| Content-Disposition | inline |
| Organization | Red Hat Czech, s.r.o. |
| X-Operating-System | Linux 3.10.0-60.el7.x86_64 |
| User-Agent | Mutt/1.5.21 (2012-12-30) |
| X-Scanned-By | MIMEDefang 2.68 on 10.5.11.25 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.4916.1388876270.18130.python-list@python.org> (permalink) |
| Lines | 49 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1388876270 news.xs4all.nl 2856 [2001:888:2000:d::a6]:33102 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:63152 |
Show key headers only | View raw
[Multipart message — attachments visible in raw view] - view raw
Did you try to archive email list hosted on the Google Groups?
Were you endlessly frustrated by the black hole which is Google
Groups, conscpicious by its absence on the Data Liberation Front
website? Yes, I was too_
So, I have created a script webscrapping a google group and
created gg_scrapper_ . Thanks to `Sean Hogan`_ for the first
inspiration for the script. Any comments would be welcome via
email (I am sure you can find my addresses somewhere on the
Web).
Best,
Matěj
.. _too:
http://matej.ceplovi.cz/blog/2013/09/we-should-stop-even-pretending-google-is-trying-to-do-the-right-thing/
.. _gg_scrapper:
https://pypi.python.org/pypi/gg_scrapper
.. _`Sean Hogan`:
http://matej.ceplovi.cz/blog/2013/09/we-should-stop-even-pretending-google-is-trying-to-do-the-right-thing/#comment-482
--
http://www.ceplovi.cz/matej/, Jabber: mcepl<at>ceplovi.cz
GPG Finger: 89EF 4BC6 288A BF43 1BAB 25C3 E09F EF25 D964 84AC
<"}}}><
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
[ANN] gg_scrapper -- scrapping of the Google Groups Matej Cepl <mcepl@redhat.com> - 2014-01-04 23:57 +0100
csiph-web