Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #30621

Re: get google scholar using python

Path csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!newsgate.cistron.nl!newsgate.news.xs4all.nl!194.109.133.84.MISMATCH!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <malaclypse2@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.037
X-Spam-Evidence '*H*': 0.93; '*S*': 0.00; 'explicitly': 0.04; 'api': 0.09; 'advice.': 0.09; 'lawyer.': 0.09; 'received:mail- vb0-f46.google.com': 0.09; 'subject:using': 0.09; 'url:terms': 0.09; 'violates': 0.09; 'cc:addr:python-list': 0.10; 'subject:python': 0.11; 'url:)': 0.13; 'advice,': 0.16; 'jerry': 0.16; 'oct': 0.16; 'mon,': 0.16; 'wrote:': 0.17; 'appears': 0.18; 'received:209.85.212.46': 0.18; 'cc:2**0': 0.23; 'cc:no real name:2**0': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply- To:1': 0.25; 'appear': 0.26; 'document.': 0.27; 'message- id:@mail.gmail.com': 0.27; 'interface': 0.27; 'received:209.85.212': 0.28; 'block,': 0.29; 'yes.': 0.29; 'interface,': 0.33; 'received:google.com': 0.34; 'said,': 0.35; 'so,': 0.35; 'doing': 0.35; 'pm,': 0.35; 'received:209.85': 0.35; 'but': 0.36; 'method': 0.36; 'beyond': 0.37; 'does': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'url:en': 0.38; 'google': 0.39; 'header:Received:5': 0.40; 'service.': 0.61; "you'll": 0.62; 'services.': 0.63; 'more': 0.63; 'services': 0.64; 'our': 0.65; 'legal': 0.65; 'offer': 0.65; 'overall': 0.66; 'believe': 0.69; '8bit%:100': 0.70; 'subject:get': 0.81; 'lawyer': 0.84; 'misuse': 0.84; 'skip:\xe0 10': 0.84; 'skip:\xe0 30': 0.84; 'don\xe2\x80\x99t': 0.91; 'scraping': 0.91; 'url:intl': 0.93
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=z4FuZF+gUDg6OijS6aHolzIx5+p2rJ9NcBG/w8sDesk=; b=Xnb6Wi29MY4ziOoKfFGP+bmRVD8LZoeASM0qhUya37czq2swBn2nH0q48F7PYPPAEl HQ3nmMmQVA+K5GL8BvlsMuu+Ta/iLwaTh+Q3vgntvBh/sTu5SBQ/xtTN/CB3jvjvIh4K HYANk87rr0PsiP1NRS5yENCzFGU7PfRtzM14FQA927fxMejjbK7bxe7YtUsHYHxfwr1d LabUKf7Xfx0onsDEuaLm08Dr4rvzeGrt3MZtrWaAhmaSC/AuIt/hSZV5AvbaW1ABcl1k pEVtJ9COXB3E7q/y65E1/QCiwEvdV6sOOzbbd4/UroUfUVBV3kDCSlXX6mOVbADPjytL Fl7w==
MIME-Version 1.0
In-Reply-To <1349112522.1787.5.camel@roddur>
References <1349109859.27817.7.camel@roddur> <mailman.1710.1349110326.27098.python-list@python.org> <1349112522.1787.5.camel@roddur>
Date Mon, 1 Oct 2012 14:09:34 -0400
Subject Re: get google scholar using python
From Jerry Hill <malaclypse2@gmail.com>
To রুদ্র ব্যাণার্জী <bnrj.rudra@gmail.com>
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding quoted-printable
Cc python-list@python.org
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.1712.1349114977.27098.python-list@python.org> (permalink)
Lines 28
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1349114977 news.xs4all.nl 6990 [2001:888:2000:d::a6]:55602
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:30621

Show key headers only | View raw


On Mon, Oct 1, 2012 at 1:28 PM, রুদ্র ব্যাণার্জী <bnrj.rudra@gmail.com> wrote:
> So, If I manage to use the User-Agent as shown by you, will I still
> violating the google EULA?

Very likely, yes.  The overall Google Terms of Services
(http://www.google.com/intl/en/policies/terms/) say "Don’t misuse our
Services. For example, don’t interfere with our Services or try to
access them using a method other than the interface and the
instructions that we provide."

The only method that Google appears to allow for accessing Scholar is
via the web interface, and they explicitly block web scraping through
that interface, as you discovered.  It's true that you can get around
their block, but I believe that doing so violates the terms of
service.

Google does not appear to offer an API to access Scholar
programatically, nor do I see a more specific EULA or TOS for the
Scholar service beyond that general TOS document.

That said, I am not a lawyer.  If you want legal advice, you'll need
to pay a lawyer for that advice.

-- 
Jerry

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

get google scholar using python রুদ্র ব্যাণার্জী  <bnrj.rudra@gmail.com> - 2012-10-01 17:44 +0100
  RE: get google scholar using python Nick Cash <nick.cash@npcinternational.com> - 2012-10-01 16:51 +0000
    Re: get google scholar using python Grant Edwards <invalid@invalid.invalid> - 2012-10-01 17:19 +0000
    Re: get google scholar using python রুদ্র ব্যাণার্জী  <bnrj.rudra@gmail.com> - 2012-10-01 18:28 +0100
      Re: get google scholar using python Jerry Hill <malaclypse2@gmail.com> - 2012-10-01 14:09 -0400

csiph-web