Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #30616

RE: get google scholar using python

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!ecngs!feeder2.ecngs.de!newsfeeder.ewetel.de!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <nick.cash@npcinternational.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.121
X-Spam-Level *
X-Spam-Evidence '*H*': 0.76; '*S*': 0.00; 'urllib2': 0.07; 'subject:using': 0.09; 'subject:python': 0.11; 'blocks': 0.16; 'query.': 0.16; 'retrieving': 0.16; 'certainly': 0.17; 'skip:u 30': 0.17; '>>>': 0.18; 'to:name:python-list@python.org': 0.20; 'http': 0.22; 'to:2**1': 0.23; 'received:169.254': 0.24; 'header': 0.24; 'so.': 0.24; 'header:In-Reply-To:1': 0.25; 'looks': 0.26; 'setting': 0.26; 'this?': 0.28; 'received:169': 0.29; 'error': 0.30; 'header:Received:8': 0.30; 'rid': 0.33; 'to:addr:python- list': 0.33; 'received:bigfish.com': 0.35; 'something': 0.35; 'really': 0.36; 'explain': 0.36; 'but': 0.36; 'should': 0.36; 'subject:: ': 0.38; 'received:10': 0.38; 'url:en': 0.38; 'to:addr:python.org': 0.39; 'google': 0.39; 'skip:u 10': 0.60; 'url:5': 0.61; 'agents': 0.64; 'to,': 0.65; 'received:10.9': 0.65; 'url:co': 0.66; 'url:0': 0.67; 'kindly': 0.67; 'url:%1': 0.68; 'subject:get': 0.81; 'received:10.9.14': 0.84; 'received:65.55.88': 0.84; 'url:%2b': 0.93; 'url:%2c': 0.95
X-Forefront-Antispam-Report CIP:157.56.240.117; KIP:(null); UIP:(null); IPV:NLI; H:BL2PRD0610HT001.namprd06.prod.outlook.com; RD:none; EFVD:NLI
X-SpamScore 0
X-BigFish PS0(zz1432Id6f1izz1202h1d1ah1d2ahzz17326ahz2fh2a8h668h839h93fhd25hf0ah107ah1288h12a5h12a9h12bdh137ah13b6h1155h)
Received-SPF pass (mail228-tx2: domain of npcinternational.com designates 157.56.240.117 as permitted sender) client-ip=157.56.240.117; envelope-from=nick.cash@npcinternational.com; helo=BL2PRD0610HT001.namprd06.prod.outlook.com ; .outlook.com ;
From Nick Cash <nick.cash@npcinternational.com>
To ????? ?????????? <bnrj.rudra@gmail.com>, "python-list@python.org" <python-list@python.org>
Subject RE: get google scholar using python
Thread-Topic get google scholar using python
Thread-Index AQHNn/QqAiHK11ns6UWwI3Psd77SQ5ekqRYw
Date Mon, 1 Oct 2012 16:51:24 +0000
References <1349109859.27817.7.camel@roddur>
In-Reply-To <1349109859.27817.7.camel@roddur>
Accept-Language en-US
Content-Language en-US
X-MS-Has-Attach
X-MS-TNEF-Correlator
x-originating-ip [70.166.238.194]
Content-Type text/plain; charset="utf-8"
Content-Transfer-Encoding base64
MIME-Version 1.0
X-OriginatorOrg npcinternational.com
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.1710.1349110326.27098.python-list@python.org> (permalink)
Lines 14
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1349110326 news.xs4all.nl 6886 [2001:888:2000:d::a6]:48636
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:30616

Show key headers only | View raw


> urllib2.urlopen('http://scholar.google.co.uk/scholar?q=albert
>...
> urllib2.HTTPError: HTTP Error 403: Forbidden
> >>>
> 
> Will you kindly explain me the way to get rid of this?

Looks like Google blocks non-browser user agents from retrieving this query. You *could* work around it by setting the User-Agent header to something fake that looks browser-ish, but you're almost certainly breaking Google's TOS if you do so.

Should you really really want to, urllib2 makes it easy:
urllib2.urlopen(urllib2.Request("http://scholar.google.co.uk/scholar?q=albert+einstein%2B1905&btnG=&hl=en&as_sdt=0%2C5&as_sdtp=", headers={"User-Agent":"Mozilla/5.0 Cheater/1.0"}))

-Nick Cash

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

get google scholar using python রুদ্র ব্যাণার্জী  <bnrj.rudra@gmail.com> - 2012-10-01 17:44 +0100
  RE: get google scholar using python Nick Cash <nick.cash@npcinternational.com> - 2012-10-01 16:51 +0000
    Re: get google scholar using python Grant Edwards <invalid@invalid.invalid> - 2012-10-01 17:19 +0000
    Re: get google scholar using python রুদ্র ব্যাণার্জী  <bnrj.rudra@gmail.com> - 2012-10-01 18:28 +0100
      Re: get google scholar using python Jerry Hill <malaclypse2@gmail.com> - 2012-10-01 14:09 -0400

csiph-web