Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'algorithm': 0.04; 'subject:Python': 0.06; 'attribute': 0.07; 'problem?': 0.07; "subject:' ": 0.07; 'subject:Error': 0.07; 'cookie': 0.09; 'http': 0.09; 'parameter': 0.09; 'req': 0.09; 'resp': 0.09; 'subject:trying': 0.09; 'subject:using': 0.09; 'wrong,': 0.09; '--------': 0.10; 'cc:addr:python-list': 0.11; 'jan': 0.12; 'changes': 0.15; '(note': 0.16; '-------': 0.16; '404': 0.16; 'cookies': 0.16; 'exists,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'obviously,': 0.16; 'problem).': 0.16; 'skip:u 50': 0.16; 'subject:login': 0.16; 'subject:when': 0.16; 'urllib2,': 0.16; 'wrote:': 0.18; '(not': 0.18; 'normally': 0.19; 'cc:addr:python.org': 0.22; 'error': 0.23; "aren't": 0.24; 'cc:2**0': 0.24; 'help!': 0.26; 'least': 0.26; 'gets': 0.27; 'header:In-Reply-To:1': 0.27; 'idea': 0.28; 'ideal': 0.29; 'mode': 0.30; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'went': 0.31; 'getting': 0.31; '(maybe': 0.31; 'informative': 0.31; 'post.': 0.31; 'though.': 0.31; "i'd": 0.34; 'could': 0.34; "can't": 0.35; 'possible.': 0.35; 'skip:u 20': 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'in.': 0.36; 'much.': 0.36; 'doing': 0.36; 'possible': 0.36; 'being': 0.38; 'server': 0.38; 'thank': 0.38; 'pm,': 0.38; 'anything': 0.39; 'expect': 0.39; 'does': 0.39; '12,': 0.39; 'sure': 0.39; 'how': 0.40; 'logged': 0.60; 'hope': 0.61; "you're": 0.61; 'information': 0.63; 'more': 0.64; 'firing': 0.84; 'opener': 0.84; 'subject:skip:H 10': 0.84; 'url:loginpage': 0.84; 'destination': 0.91; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=jQve1S0rURBVrnG9a5CKFFmdJoZmAhxYwATwVxGKrWI=; b=QBCJNA11uK9nJIxnyiVU8AXMS/26UKFGWis6a+RlWEGZOy0OTnzCbAjS/PCEqHqsEd Nnbf8nzePeKaIRvWgga1KK4kpJN1Ltqyki6f/c9zOMTuns4RgcOuWYTjrD22ab1zXiVq NmXN09t5jp8zJJAGM0/6wy46v5llHVBo+r+LPA1mru5Y2Lhg/ICGk+xPghGDidH53IkI HNzRdrsoU4mgP84dDuKfR+eAJfo0szNpEGrpWJEacJ3eVSTXxF0DQmEzJiSB/kjwJMbF Y/TAJ3vmMuubk2xjBMy+Rq0btrjz+PpMgxVbX6oLuNWr1HOdLEVDqs99ZoF0vH6njNbW CrMQ== MIME-Version: 1.0 X-Received: by 10.68.247.6 with SMTP id ya6mr23368055pbc.45.1389530542883; Sun, 12 Jan 2014 04:42:22 -0800 (PST) In-Reply-To: <9e7e031f-b6db-43fc-84d2-ef68916ec756@googlegroups.com> References: <9e7e031f-b6db-43fc-84d2-ef68916ec756@googlegroups.com> Date: Sun, 12 Jan 2014 23:42:22 +1100 Subject: Re: Python: 404 Error when trying to login a webpage by using 'urllib' and 'HTTPCookieProcessor' From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 42 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1389530545 news.xs4all.nl 2950 [2001:888:2000:d::a6]:50196 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:63766 On Sun, Jan 12, 2014 at 11:17 PM, KMeans Algorithm wrote: > What am I doing wrong? Thank you very much. I can't say what's actually wrong, but I have a few ideas for getting more information out of the system... > opener = urllib2.build_opener(urllib2.HTTPCookieProcessor()) You don't do anything with this opener - could you have a cookie problem? > req = urllib2.Request(url, login) > > But I get a "404" error (Not Found). The page "https://www.mysite.com/loginpage" does exist (note please the httpS, since I'm not sure if this the key of my problem). > > If I try with > > ------- > resp = urllib2.urlopen(url) > -------- > (with no 'login' data), it works ok but, obviously, I'm not logged in. Note that adding a data parameter changes the request from a GET to a POST. I'd normally expect the server to respond 404 to both or neither, but it's theoretically possible. It's also possible that you're getting redirected, and that (maybe because cookies aren't being retained??) the destination is 404. I'm not familiar with urllib2, but if you get a response object back, you can call .geturl() on it - no idea how that goes with HTTP errors, though. You may want to look at the exception's .reason attribute - might be more informative than .code. As a last resort, try firing up Wireshark or something and watch exactly what gets sent and received. I went looking through the docs for a "verbose" mode or a "debug" setting but can't find one - that'd be ideal if it exists, though. Hope that's of at least some help! ChrisA