Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!solaris.cc.vt.edu!news.vt.edu!newsfeed-00.mathworks.com!panix!not-for-mail
From: Grant Edwards <invalid@invalid.invalid>
Newsgroups: comp.lang.python
Subject: Re: get google scholar using python
Date: Mon, 1 Oct 2012 17:19:39 +0000 (UTC)
Organization: PANIX Public Access Internet and UNIX, NYC
Lines: 21
Message-ID: <k4cjbb$h4d$1@reader1.panix.com>
References: <1349109859.27817.7.camel@roddur> <mailman.1710.1349110326.27098.python-list@python.org>
NNTP-Posting-Host: dsl.comtrol.com
X-Trace: reader1.panix.com 1349111979 17549 64.122.56.22 (1 Oct 2012 17:19:39 GMT)
X-Complaints-To: abuse@panix.com
NNTP-Posting-Date: Mon, 1 Oct 2012 17:19:39 +0000 (UTC)
User-Agent: slrn/pre1.0.0-18 (Linux)
Xref: csiph.com comp.lang.python:30617

On 2012-10-01, Nick Cash <nick.cash@npcinternational.com> wrote:
>> urllib2.urlopen('http://scholar.google.co.uk/scholar?q=albert
>>...
>> urllib2.HTTPError: HTTP Error 403: Forbidden
>> 
>> Will you kindly explain me the way to get rid of this?
>
> Looks like Google blocks non-browser user agents from retrieving this
> query. You *could* work around it by setting the User-Agent header to
> something fake that looks browser-ish, but you're almost certainly
> breaking Google's TOS if you do so.

I don't know about that particular Google service, but Google often
provides an API that's intended for use by non-browser programs. 
Those interfaces are usually both easier to use for the programmer and
impose less load on the servers.

-- 
Grant Edwards               grant.b.edwards        Yow! I am deeply CONCERNED
                                  at               and I want something GOOD
                              gmail.com            for BREAKFAST!