Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #30616
| From | Nick Cash <nick.cash@npcinternational.com> |
|---|---|
| Subject | RE: get google scholar using python |
| Date | 2012-10-01 16:51 +0000 |
| References | <1349109859.27817.7.camel@roddur> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.1710.1349110326.27098.python-list@python.org> (permalink) |
> urllib2.urlopen('http://scholar.google.co.uk/scholar?q=albert
>...
> urllib2.HTTPError: HTTP Error 403: Forbidden
> >>>
>
> Will you kindly explain me the way to get rid of this?
Looks like Google blocks non-browser user agents from retrieving this query. You *could* work around it by setting the User-Agent header to something fake that looks browser-ish, but you're almost certainly breaking Google's TOS if you do so.
Should you really really want to, urllib2 makes it easy:
urllib2.urlopen(urllib2.Request("http://scholar.google.co.uk/scholar?q=albert+einstein%2B1905&btnG=&hl=en&as_sdt=0%2C5&as_sdtp=", headers={"User-Agent":"Mozilla/5.0 Cheater/1.0"}))
-Nick Cash
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
get google scholar using python রুদ্র ব্যাণার্জী <bnrj.rudra@gmail.com> - 2012-10-01 17:44 +0100
RE: get google scholar using python Nick Cash <nick.cash@npcinternational.com> - 2012-10-01 16:51 +0000
Re: get google scholar using python Grant Edwards <invalid@invalid.invalid> - 2012-10-01 17:19 +0000
Re: get google scholar using python রুদ্র ব্যাণার্জী <bnrj.rudra@gmail.com> - 2012-10-01 18:28 +0100
Re: get google scholar using python Jerry Hill <malaclypse2@gmail.com> - 2012-10-01 14:09 -0400
csiph-web