Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #63762 > unrolled thread
| Started by | KMeans Algorithm <bilbaow@gmail.com> |
|---|---|
| First post | 2014-01-12 04:17 -0800 |
| Last post | 2014-01-12 15:51 -0500 |
| Articles | 5 — 4 participants |
Back to article view | Back to comp.lang.python
Python: 404 Error when trying to login a webpage by using 'urllib' and 'HTTPCookieProcessor' KMeans Algorithm <bilbaow@gmail.com> - 2014-01-12 04:17 -0800
Re: Python: 404 Error when trying to login a webpage by using 'urllib' and 'HTTPCookieProcessor' Chris Angelico <rosuav@gmail.com> - 2014-01-12 23:42 +1100
Re: Python: 404 Error when trying to login a webpage by using 'urllib' and 'HTTPCookieProcessor' Chris Angelico <rosuav@gmail.com> - 2014-01-12 23:44 +1100
Re: Python: 404 Error when trying to login a webpage by using 'urllib' and 'HTTPCookieProcessor' xDog Walker <thudfoo@gmail.com> - 2014-01-12 07:17 -0800
Re: Python: 404 Error when trying to login a webpage by using 'urllib' and 'HTTPCookieProcessor' Terry Reedy <tjreedy@udel.edu> - 2014-01-12 15:51 -0500
| From | KMeans Algorithm <bilbaow@gmail.com> |
|---|---|
| Date | 2014-01-12 04:17 -0800 |
| Subject | Python: 404 Error when trying to login a webpage by using 'urllib' and 'HTTPCookieProcessor' |
| Message-ID | <9e7e031f-b6db-43fc-84d2-ef68916ec756@googlegroups.com> |
I'm trying to log in a webpage by using 'urllib' and this piece of code
---------
import urllib2,urllib,os
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
login = urllib.urlencode({'username':'john', 'password':'foo'})
url = "https://www.mysite.com/loginpage"
req = urllib2.Request(url, login)
try:
resp = urllib2.urlopen(req)
print resp.read()
except urllib2.HTTPError, e:
print ":( Error = " + str(e.code)
----------------
But I get a "404" error (Not Found). The page "https://www.mysite.com/loginpage" does exist (note please the httpS, since I'm not sure if this the key of my problem).
If I try with
-------
resp = urllib2.urlopen(url)
--------
(with no 'login' data), it works ok but, obviously, I'm not logged in.
What am I doing wrong? Thank you very much.
[toc] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-01-12 23:42 +1100 |
| Message-ID | <mailman.5365.1389530545.18130.python-list@python.org> |
| In reply to | #63762 |
On Sun, Jan 12, 2014 at 11:17 PM, KMeans Algorithm <bilbaow@gmail.com> wrote: > What am I doing wrong? Thank you very much. I can't say what's actually wrong, but I have a few ideas for getting more information out of the system... > opener = urllib2.build_opener(urllib2.HTTPCookieProcessor()) You don't do anything with this opener - could you have a cookie problem? > req = urllib2.Request(url, login) > > But I get a "404" error (Not Found). The page "https://www.mysite.com/loginpage" does exist (note please the httpS, since I'm not sure if this the key of my problem). > > If I try with > > ------- > resp = urllib2.urlopen(url) > -------- > (with no 'login' data), it works ok but, obviously, I'm not logged in. Note that adding a data parameter changes the request from a GET to a POST. I'd normally expect the server to respond 404 to both or neither, but it's theoretically possible. It's also possible that you're getting redirected, and that (maybe because cookies aren't being retained??) the destination is 404. I'm not familiar with urllib2, but if you get a response object back, you can call .geturl() on it - no idea how that goes with HTTP errors, though. You may want to look at the exception's .reason attribute - might be more informative than .code. As a last resort, try firing up Wireshark or something and watch exactly what gets sent and received. I went looking through the docs for a "verbose" mode or a "debug" setting but can't find one - that'd be ideal if it exists, though. Hope that's of at least some help! ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-01-12 23:44 +1100 |
| Message-ID | <mailman.5366.1389530687.18130.python-list@python.org> |
| In reply to | #63762 |
On Sun, Jan 12, 2014 at 11:17 PM, KMeans Algorithm <bilbaow@gmail.com> wrote: > The page "https://www.mysite.com/loginpage" does exist PS. If it's not an intranet site and the URL isn't secret, it'd help if we could actually try things out. One of the tricks I like to use is to access the same page with a different program/library - maybe wget, or bare telnet, or something like that. Sometimes one succeeds and another doesn't, and then you dig into the difference (once I found that a web server failed unless the request headers were in a particular order - that was a pain to (a) find, and (b) work around!). ChrisA
[toc] | [prev] | [next] | [standalone]
| From | xDog Walker <thudfoo@gmail.com> |
|---|---|
| Date | 2014-01-12 07:17 -0800 |
| Message-ID | <mailman.5371.1389539937.18130.python-list@python.org> |
| In reply to | #63762 |
On Sunday 2014 January 12 04:42, Chris Angelico wrote: > As a last resort, try firing up Wireshark or something and watch > exactly what gets sent and received. I went looking through the docs > for a "verbose" mode or a "debug" setting but can't find one - that'd > be ideal if it exists, though. I think you can set debug on httplib before using urllib to get the header traffic printed. I don't recall exactly how to do it though. -- Yonder nor sorghum stenches shut ladle gulls stopper torque wet strainers.
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2014-01-12 15:51 -0500 |
| Message-ID | <mailman.5387.1389559888.18130.python-list@python.org> |
| In reply to | #63762 |
On 1/12/2014 7:17 AM, KMeans Algorithm wrote: > But I get a "404" error (Not Found). The page "https://www.mysite.com/loginpage" does exist Firefox tells me the same thing. If that is a phony address, you should have said so. -- Terry Jan Reedy
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web