Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #30619

Re: get google scholar using python

From রুদ্র ব্যাণার্জী <bnrj.rudra@gmail.com>
Newsgroups comp.lang.python
Subject Re: get google scholar using python
Date 2012-10-01 18:28 +0100
Organization A noiseless patient Spider
Message-ID <1349112522.1787.5.camel@roddur> (permalink)
References <1349109859.27817.7.camel@roddur> <mailman.1710.1349110326.27098.python-list@python.org>

Show all headers | View raw


I know one more python app that do the same thing
http://www.icir.org/christian/downloads/scholar.py

and few other app(Mendeley desktop) for which I found an explanation:
(from
http://academia.stackexchange.com/questions/2567/api-eula-and-scraping-for-google-scholar )
that:
"I know how Mendley uses it: they require you to click a button for each
individual search of Google Scholar. If they automatically did the
Google Scholar meta-data search for each paper when you import a
folder-full then they would violate the old Scholar EULA. That is why
they make you click for each query: if each query is accompanied by a
click and not part of some script or loop then it is in compliance with
the old EULA."

So, If I manage to use the User-Agent as shown by you, will I still
violating the google EULA?

This is my first try of scrapping HTML. So please help

On Mon, 2012-10-01 at 16:51 +0000, Nick Cash wrote:
> > urllib2.urlopen('http://scholar.google.co.uk/scholar?q=albert
> >...
> > urllib2.HTTPError: HTTP Error 403: Forbidden
> > >>>
> > 
> > Will you kindly explain me the way to get rid of this?
> 
> Looks like Google blocks non-browser user agents from retrieving this query. You *could* work around it by setting the User-Agent header to something fake that looks browser-ish, but you're almost certainly breaking Google's TOS if you do so.
> 
> Should you really really want to, urllib2 makes it easy:
> urllib2.urlopen(urllib2.Request("http://scholar.google.co.uk/scholar?q=albert+einstein%2B1905&btnG=&hl=en&as_sdt=0%2C5&as_sdtp=", headers={"User-Agent":"Mozilla/5.0 Cheater/1.0"}))
> 
> -Nick Cash

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

get google scholar using python রুদ্র ব্যাণার্জী  <bnrj.rudra@gmail.com> - 2012-10-01 17:44 +0100
  RE: get google scholar using python Nick Cash <nick.cash@npcinternational.com> - 2012-10-01 16:51 +0000
    Re: get google scholar using python Grant Edwards <invalid@invalid.invalid> - 2012-10-01 17:19 +0000
    Re: get google scholar using python রুদ্র ব্যাণার্জী  <bnrj.rudra@gmail.com> - 2012-10-01 18:28 +0100
      Re: get google scholar using python Jerry Hill <malaclypse2@gmail.com> - 2012-10-01 14:09 -0400

csiph-web