Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #11428 > unrolled thread

same code to login,one is ok,another is not

Started by"守株待兔" <1248283536@qq.com>
First post2011-08-15 08:09 +0800
Last post2011-08-15 10:55 +1000
Articles 2 — 2 participants

Back to article view | Back to comp.lang.python


Contents

  same code to login,one is ok,another is not "守株待兔" <1248283536@qq.com> - 2011-08-15 08:09 +0800
    Re: same code to login,one is ok,another is not Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-08-15 10:55 +1000

#11428 — same code to login,one is ok,another is not

From"守株待兔" <1248283536@qq.com>
Date2011-08-15 08:09 +0800
Subjectsame code to login,one is ok,another is not
Message-ID<mailman.2292.1313367000.1164.python-list@python.org>

[Multipart message — attachments visible in raw view] — view raw

1.http://www.renren.com/Login.do
it is ok,my code:
import cookielib, urllib2, urllib
 cj = cookielib.CookieJar()
 opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
 exheaders = [("User-Agent","Mozilla/4.0 (compatible; MSIE 7.1; Windows NT 5.1; SV1)"),]
 opener.addheaders=exheaders
 url_login = 'http://www.renren.com/Login.do'
 body = (('email','  '), ('password',   '))
 req1 = opener.open(url_login, urllib.urlencode(body))
 file=open('/tmp/sample','w')
 file.write(req1.read())
 file.close()

2.https://passport.baidu.com/?login
can't login,my code:
import cookielib, urllib2, urllib
 cj = cookielib.CookieJar()
 opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
 exheaders = [("User-Agent","Mozilla/4.0 (compatible; MSIE 7.1; Windows NT 5.1; SV1)"),]
 opener.addheaders=exheaders
 url_login = 'https://passport.baidu.com/?login'
 body =(('username','  '), ('password','  '),('pwd','1'))
  req1 = opener.open(url_login, urllib.urlencode(body))
 file=open('/tmp/sample','w')
 file.write(req1.read())
 file.close()
what i get is:
<!--STATUS OK--> 
<html><head><title>������§��������      </title> 
<meta http-equiv=content-type content="text/html; charset=gb2312"> 
<META http-equiv='Pragma' content='no-cache'> 
</head> 
<body> 
 
         
             
            <script> 
            var url="./?pwd=1" 
            url=url.replace(/^\.\//gi,"http://passport.baidu.com/"); 
            location.href=url; 
            </script> 
             
         
 
</body> 
</html>

[toc] | [next] | [standalone]


#11430

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2011-08-15 10:55 +1000
Message-ID<4e486e9a$0$29986$c3e8da3$5496439d@news.astraweb.com>
In reply to#11428
守株待兔 wrote:

> 1.http://www.renren.com/Login.do
> it is ok,my code:
[...]
> 2.https://passport.baidu.com/?login
> can't login,my code:
[...]

Do you have a question, or are you just sharing the bad news?

Websites may choose to respond to login attempts differently. Some may
require cookies, some may not. Some may check the referrer, some may not.
Some may look at the user agent, some may not.

If the web developer of the site insists that you log in with a browser, or
Internet Explorer, you have to fight to convince the web server to let you
in. Many websites really try hard to prevent bots and scripts logging in.
The closer you can imitate what a real human being in a browser does, the
better the chances you can fool the server that you are a real human being
using a browser and not a bot. (Since your script *is* a bot, you may also
be in violation of the web site's terms of service.)

Some web sites may even check how often you try to log in, or how fast.

But what makes you think you can't log in? Given the response below, it
looks to me that you did log in, and got a blank page with some javascript
to redirect you to the real content page. (If you are a web developer and
you do this, I hate you.) But I may be wrong -- I'm not an expert on these
things.


> <!--STATUS OK-->
> <html><head><title>������§��������      </title>
> <meta http-equiv=content-type content="text/html; charset=gb2312">
> <META http-equiv='Pragma' content='no-cache'>
> </head>
> <body>
>  
>          
>              
>             <script>
>             var url="./?pwd=1"
>             url=url.replace(/^\.\//gi,"http://passport.baidu.com/");
>             location.href=url;
>             </script>
>              
>          
>  
> </body>
> </html>


-- 
Steven

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web