Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!solaris.cc.vt.edu!news.vt.edu!newsfeed-00.mathworks.com!panix!not-for-mail From: Grant Edwards Newsgroups: comp.lang.python Subject: Re: get google scholar using python Date: Mon, 1 Oct 2012 17:19:39 +0000 (UTC) Organization: PANIX Public Access Internet and UNIX, NYC Lines: 21 Message-ID: References: <1349109859.27817.7.camel@roddur> NNTP-Posting-Host: dsl.comtrol.com X-Trace: reader1.panix.com 1349111979 17549 64.122.56.22 (1 Oct 2012 17:19:39 GMT) X-Complaints-To: abuse@panix.com NNTP-Posting-Date: Mon, 1 Oct 2012 17:19:39 +0000 (UTC) User-Agent: slrn/pre1.0.0-18 (Linux) Xref: csiph.com comp.lang.python:30617 On 2012-10-01, Nick Cash wrote: >> urllib2.urlopen('http://scholar.google.co.uk/scholar?q=albert >>... >> urllib2.HTTPError: HTTP Error 403: Forbidden >> >> Will you kindly explain me the way to get rid of this? > > Looks like Google blocks non-browser user agents from retrieving this > query. You *could* work around it by setting the User-Agent header to > something fake that looks browser-ish, but you're almost certainly > breaking Google's TOS if you do so. I don't know about that particular Google service, but Google often provides an API that's intended for use by non-browser programs. Those interfaces are usually both easier to use for the programmer and impose less load on the servers. -- Grant Edwards grant.b.edwards Yow! I am deeply CONCERNED at and I want something GOOD gmail.com for BREAKFAST!