Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!newsgate.cistron.nl!newsgate.news.xs4all.nl!194.109.133.84.MISMATCH!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.037 X-Spam-Evidence: '*H*': 0.93; '*S*': 0.00; 'explicitly': 0.04; 'api': 0.09; 'advice.': 0.09; 'lawyer.': 0.09; 'received:mail- vb0-f46.google.com': 0.09; 'subject:using': 0.09; 'url:terms': 0.09; 'violates': 0.09; 'cc:addr:python-list': 0.10; 'subject:python': 0.11; 'url:)': 0.13; 'advice,': 0.16; 'jerry': 0.16; 'oct': 0.16; 'mon,': 0.16; 'wrote:': 0.17; 'appears': 0.18; 'received:209.85.212.46': 0.18; 'cc:2**0': 0.23; 'cc:no real name:2**0': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply- To:1': 0.25; 'appear': 0.26; 'document.': 0.27; 'message- id:@mail.gmail.com': 0.27; 'interface': 0.27; 'received:209.85.212': 0.28; 'block,': 0.29; 'yes.': 0.29; 'interface,': 0.33; 'received:google.com': 0.34; 'said,': 0.35; 'so,': 0.35; 'doing': 0.35; 'pm,': 0.35; 'received:209.85': 0.35; 'but': 0.36; 'method': 0.36; 'beyond': 0.37; 'does': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'url:en': 0.38; 'google': 0.39; 'header:Received:5': 0.40; 'service.': 0.61; "you'll": 0.62; 'services.': 0.63; 'more': 0.63; 'services': 0.64; 'our': 0.65; 'legal': 0.65; 'offer': 0.65; 'overall': 0.66; 'believe': 0.69; '8bit%:100': 0.70; 'subject:get': 0.81; 'lawyer': 0.84; 'misuse': 0.84; 'skip:\xe0 10': 0.84; 'skip:\xe0 30': 0.84; 'don\xe2\x80\x99t': 0.91; 'scraping': 0.91; 'url:intl': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=z4FuZF+gUDg6OijS6aHolzIx5+p2rJ9NcBG/w8sDesk=; b=Xnb6Wi29MY4ziOoKfFGP+bmRVD8LZoeASM0qhUya37czq2swBn2nH0q48F7PYPPAEl HQ3nmMmQVA+K5GL8BvlsMuu+Ta/iLwaTh+Q3vgntvBh/sTu5SBQ/xtTN/CB3jvjvIh4K HYANk87rr0PsiP1NRS5yENCzFGU7PfRtzM14FQA927fxMejjbK7bxe7YtUsHYHxfwr1d LabUKf7Xfx0onsDEuaLm08Dr4rvzeGrt3MZtrWaAhmaSC/AuIt/hSZV5AvbaW1ABcl1k pEVtJ9COXB3E7q/y65E1/QCiwEvdV6sOOzbbd4/UroUfUVBV3kDCSlXX6mOVbADPjytL Fl7w== MIME-Version: 1.0 In-Reply-To: <1349112522.1787.5.camel@roddur> References: <1349109859.27817.7.camel@roddur> <1349112522.1787.5.camel@roddur> Date: Mon, 1 Oct 2012 14:09:34 -0400 Subject: Re: get google scholar using python From: Jerry Hill To: =?UTF-8?B?4Kaw4KeB4Kam4KeN4KawIOCmrOCnjeCmr+CmvuCmo+CmvuCmsOCnjeCmnOCngA==?= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 28 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1349114977 news.xs4all.nl 6990 [2001:888:2000:d::a6]:55602 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:30621 On Mon, Oct 1, 2012 at 1:28 PM, =E0=A6=B0=E0=A7=81=E0=A6=A6=E0=A7=8D=E0=A6= =B0 =E0=A6=AC=E0=A7=8D=E0=A6=AF=E0=A6=BE=E0=A6=A3=E0=A6=BE=E0=A6=B0=E0=A7= =8D=E0=A6=9C=E0=A7=80 wrote: > So, If I manage to use the User-Agent as shown by you, will I still > violating the google EULA? Very likely, yes. The overall Google Terms of Services (http://www.google.com/intl/en/policies/terms/) say "Don=E2=80=99t misuse o= ur Services. For example, don=E2=80=99t interfere with our Services or try to access them using a method other than the interface and the instructions that we provide." The only method that Google appears to allow for accessing Scholar is via the web interface, and they explicitly block web scraping through that interface, as you discovered. It's true that you can get around their block, but I believe that doing so violates the terms of service. Google does not appear to offer an API to access Scholar programatically, nor do I see a more specific EULA or TOS for the Scholar service beyond that general TOS document. That said, I am not a lawyer. If you want legal advice, you'll need to pay a lawyer for that advice. --=20 Jerry