Groups > comp.lang.python > #52747 > unrolled thread

How to keep cookies when making http requests (Python 2.7)

Started by	Luca Cerone <luca.cerone@gmail.com>
First post	2013-08-20 15:24 -0700
Last post	2013-08-30 05:01 -0700
Articles	9 — 3 participants

Back to article view | Back to comp.lang.python

  How to keep cookies when making http requests (Python 2.7) Luca Cerone <luca.cerone@gmail.com> - 2013-08-20 15:24 -0700
    Re: How to keep cookies when making http requests (Python 2.7) dieter <dieter@handshake.de> - 2013-08-21 08:07 +0200
      Re: How to keep cookies when making http requests (Python 2.7) Luca Cerone <luca.cerone@gmail.com> - 2013-08-21 01:18 -0700
        Re: How to keep cookies when making http requests (Python 2.7) Fábio Santos <fabiosantosart@gmail.com> - 2013-08-21 10:15 +0100
        Re: How to keep cookies when making http requests (Python 2.7) dieter <dieter@handshake.de> - 2013-08-22 08:08 +0200
          Re: How to keep cookies when making http requests (Python 2.7) Luca Cerone <luca.cerone@gmail.com> - 2013-08-27 02:16 -0700
            Re: How to keep cookies when making http requests (Python 2.7) Luca Cerone <luca.cerone@gmail.com> - 2013-08-27 03:17 -0700
              Re: How to keep cookies when making http requests (Python 2.7) dieter <dieter@handshake.de> - 2013-08-28 06:52 +0200
                Re: How to keep cookies when making http requests (Python 2.7) Luca Cerone <luca.cerone@gmail.com> - 2013-08-30 05:01 -0700

#52747 — How to keep cookies when making http requests (Python 2.7)

From	Luca Cerone <luca.cerone@gmail.com>
Date	2013-08-20 15:24 -0700
Subject	How to keep cookies when making http requests (Python 2.7)
Message-ID	<7e79a9b4-0bf9-4756-afd4-3bc127360b95@googlegroups.com>

Hi everybody,
I am trying to write a simple Python script to solve the "riddle" at:
http://quiz.gambitresearch.com/

The quiz is quite easy to solve, one needs to evaluate the expression between the curly brackets (say that the expression has value <val>)
and go to the web page:

http://quiz.gambitresearch/job/<val>

You have to be fast enough, because with the page there is an associated cookie that expires 1 sec after the first request, therefore you need to be quick to access the /job/<val> page.

[I know that this is the correct solution because with a friend we wrote a small script in JavaScript and could access the page with the email address]

As an exercise I have decided to try doing the same with Python.

First I have tried with the following code:

#START SCRIPT

import re
import urllib2

regex = re.compile(r'<span class=\'x\'>\{(.*)\}<span class=\'x\'>')
base_address = "http://quiz.gambitresearch.com/"
base_h = urllib2.urlopen(base_address)
base_page = base_h.read()

val = str(eval(regex.findall(base_page)[0]))

job_address = base_address + "job/" + val
job_h = urllib2.urlopen(job_address)
job_page = job_h.read()

print job_page
#END SCRIPT

job_page has the following content now: "WRONG! (Have you enabled cookies?)"

Trying to solve the issues with the cookies I found the "requests" module that in theory should work.
I therefore rewrote the above script to use request:

#START SCRIPT:
import re
import requests

regex = re.compile(r'<span class=\'x\'>\{(.*)\}<span class=\'x\'>')

base_address = "http://quiz.gambitresearch.com/"

s = requests.Session()
 
base_h = s.get('http://quiz.gambitresearch.com/')
base_page = base_h.text

val = eval( regex.findall( base_page )[0] )

job_address = base_address + "job/" + str(val)
job_h = s.get( job_address )
job_page = job_h.text

print job_page
#END SCRIPT
# print job_page produces "Wrong!".

According to the manual using Session() the cookies should be enabled and persistent for all the session. In fact the cookies in base_h.cookies and in job_h.cookies seem to be the same:

base_h.cookies == job_h.cookies
#returns True

So, why does this script fail to access the job page?
How can I change it so that I it works as intended and job_page prints
the content of the page that displays the email address to use for the job applications?

Thanks a lot in advance for the help!

Best Wishes,
Luca

[toc] | [next] | [standalone]

#52750

From	dieter <dieter@handshake.de>
Date	2013-08-21 08:07 +0200
Message-ID	<mailman.67.1377065271.19984.python-list@python.org>
In reply to	#52747

Luca Cerone <luca.cerone@gmail.com> writes:
> ...

Python has a module for cookie handling: "cookielib" ("cookiejar"
in Python 3).

"urllib2" has a standard way to integrate with this module.
However, I do not know the details (check the documentation
for the modules).

I have used "cookielib" externally to "urllib2". It looks
like this:

from urllib2 import urlopen, Request
from cookielib import CookieJar

cookies = CookieJar()
....
r = Request(...)
cookies.add_cookie_header(r) # set the cookies
R = urlopen(r, ...) # make the request
cookies.extract_cookies(R, r) # remember the new cookies

[toc] | [prev] | [next] | [standalone]

#52754

From	Luca Cerone <luca.cerone@gmail.com>
Date	2013-08-21 01:18 -0700
Message-ID	<8266faf3-892b-4e49-9b38-87f0030250fc@googlegroups.com>
In reply to	#52750

> 
> I have used "cookielib" externally to "urllib2". It looks
> 
> like this:
> 
> from urllib2 import urlopen, Request
> 
> from cookielib import CookieJar
> cookies = CookieJar()
> 
> ....
> 
> r = Request(...)
> 
> cookies.add_cookie_header(r) # set the cookies
> 
> R = urlopen(r, ...) # make the request
> 
> cookies.extract_cookies(R, r) # remember the new cookies

Hi Dieter,
thanks a lot for the help.
I am sorry but your code is not very clear to me.
It seems that you are setting some cookies,
but I can't understand how you use the ones that the site
sends to you when you perform the initial request.

Have you tried this code to check if this work?
If it works as intended can you explain a bit better
what it does exactly?

Thanks again!
Luca

[toc] | [prev] | [next] | [standalone]

#52758

From	Fábio Santos <fabiosantosart@gmail.com>
Date	2013-08-21 10:15 +0100
Message-ID	<mailman.71.1377076529.19984.python-list@python.org>
In reply to	#52754

[Multipart message — attachments visible in raw view] — view raw

On 21 Aug 2013 09:22, "Luca Cerone" <luca.cerone@gmail.com> wrote:
>
> >
> > I have used "cookielib" externally to "urllib2". It looks
> >
> > like this:
> >
> > from urllib2 import urlopen, Request
> >
> > from cookielib import CookieJar
> > cookies = CookieJar()
> >
> > ....
> >
> > r = Request(...)
> >
> > cookies.add_cookie_header(r) # set the cookies
> >
> > R = urlopen(r, ...) # make the request
> >
> > cookies.extract_cookies(R, r) # remember the new cookies
>
> Hi Dieter,
> thanks a lot for the help.
> I am sorry but your code is not very clear to me.
> It seems that you are setting some cookies,
> but I can't understand how you use the ones that the site
> sends to you when you perform the initial request.

This example does both. The cookie jar adds the cookies to the http request
to be sent to the server, and updates the cookies from the response, if any
were sent. It seems pretty clear, seeing that it has a lot of comments.

The cookies from the site are thus in the cookie jar object after the call
to extract_cookies() extracts them from the response.

> Have you tried this code to check if this work?
> If it works as intended can you explain a bit better
> what it does exactly?

You should really test this yourself ;)

> Thanks again!
> Luca

[toc] | [prev] | [next] | [standalone]

#52813

From	dieter <dieter@handshake.de>
Date	2013-08-22 08:08 +0200
Message-ID	<mailman.118.1377151743.19984.python-list@python.org>
In reply to	#52754

Luca Cerone <luca.cerone@gmail.com> writes:

>...
> Have you tried this code to check if this work?

Not this code, but code like this (as I have written).

> If it works as intended can you explain a bit better
> what it does exactly?

Fabio already did the explanation.

Let me make an additional remark however: you should
not expect to get complete details in a list like this - but only
hints towards a solution for your problem (i.e.
there remains some work for you).
Thus, I expect you to read the "cookielib/cookiejar" documentation
(part of Python's standard documentation) in order to understand
my example code - before I would be ready to provide further details.

[toc] | [prev] | [next] | [standalone]

#53040

From	Luca Cerone <luca.cerone@gmail.com>
Date	2013-08-27 02:16 -0700
Message-ID	<5970a1d3-ec1b-4892-b53a-907de332ecaa@googlegroups.com>
In reply to	#52813

Dear all, 
first of all thanks for the help.
As for your remark, you are right, and I usually tend to post questions in a way that is detached from the particular problem I have to solve.
In this case since I only have a limited knowledge of the cookies mechanism (in general, not only in Python), I preferred to ask for the specific case.
I am sorry if I gave you the impression I didn't appreciate your answer,
it was absolutely not my intention.

Cheers,
Luca
> Let me make an additional remark however: you should
> 
> not expect to get complete details in a list like this - but only
> 
> hints towards a solution for your problem (i.e.
> 
> there remains some work for you).
> 
> Thus, I expect you to read the "cookielib/cookiejar" documentation
> 
> (part of Python's standard documentation) in order to understand
> 
> my example code - before I would be ready to provide further details.

[toc] | [prev] | [next] | [standalone]

#53050

From	Luca Cerone <luca.cerone@gmail.com>
Date	2013-08-27 03:17 -0700
Message-ID	<62a5fb2d-b4d0-4de4-a0d8-a7ee4dbb1b90@googlegroups.com>
In reply to	#53040

> 
> > Let me make an additional remark however: you should
> > not expect to get complete details in a list like this - but only
> > hints towards a solution for your problem (i.e. 
> > there remains some work for you).
> > Thus, I expect you to read the "cookielib/cookiejar" documentation
> > (part of Python's standard documentation) in order to understand
> > my example code - before I would be ready to provide further details.

Ok so after reading the documentation for urllib2 and cookielib I came up with the following code:

#START
from urllib2 import urlopen , Request
from cookielib import CookieJar
import re
regex = re.compile(r'<span class=\'x\'>\{(.*)\}<span class=\'x\'>')

base_url = "http://quiz.gambitresearch.com"
job_url  = base_url + "/job/"

cookies = CookieJar()
r = Request(base_url) #prepare the request object
cookies.add_cookie_header(r) #allow to have cookies
R = urlopen(r) #read the url
cookies.extract_cookies(R,r) #take the cookies from the response R and adds #them to the request object 

#build the new url
t = R.read()
v = str(eval(regex.findall(t)[0]))
job_url = job_url + v


# Here I create a new request to the url containing the email address
r2 = Request(job_url)
cookies.add_cookie_header(r2) #I prepare the request for cookies adding the cookies that I extracted before.

#perform the request and print the page
R2 = urlopen(r2)
t2 = R2.read()
print job_url
print t2
#END

This still doesn't work, but I really can't understand why.
As far as I have understood first I have to instantiate a Request object
and allow it to receive and set cookies (I do this with r = Request() and cookies.add_cookie_header(r))
Next I perform the request (urlopen),  save the cookies in the CookieJar (cookies.extract_cookies(R,r)).

I evaluate the new address and I create a new Request for it (r2 = Request)
I add the cookies stored in the cookiejar in my new request (cookies.add_cookie_header(r2))
Then I perform the request (R2 = urlopen(r2)) and read the page (t2 = R2.read())

What am I doing wrong? Do I misunderstand something in the process?

Thanks again in advance for the help,
Cheers,
Luca

[toc] | [prev] | [next] | [standalone]

#53109

From	dieter <dieter@handshake.de>
Date	2013-08-28 06:52 +0200
Message-ID	<mailman.289.1377665572.19984.python-list@python.org>
In reply to	#53050

Luca Cerone <luca.cerone@gmail.com> writes:

> ...
> Ok so after reading the documentation for urllib2 and cookielib I came up with the following code:
>
> #START
> from urllib2 import urlopen , Request
> from cookielib import CookieJar
> import re
> regex = re.compile(r'<span class=\'x\'>\{(.*)\}<span class=\'x\'>')
>
> base_url = "http://quiz.gambitresearch.com"
> job_url  = base_url + "/job/"
>
> cookies = CookieJar()
> r = Request(base_url) #prepare the request object
> cookies.add_cookie_header(r) #allow to have cookies
> R = urlopen(r) #read the url
> cookies.extract_cookies(R,r) #take the cookies from the response R and adds #them to the request object 

"adds them to the request object" should be "adds them to the cookie jar".

> #build the new url
> t = R.read()
> v = str(eval(regex.findall(t)[0]))
> job_url = job_url + v
>
>
> # Here I create a new request to the url containing the email address
> r2 = Request(job_url)
> cookies.add_cookie_header(r2) #I prepare the request for cookies adding the cookies that I extracted before.
>
> #perform the request and print the page
> R2 = urlopen(r2)
> t2 = R2.read()
> print job_url
> print t2
> #END
>
> This still doesn't work, but I really can't understand why.
> As far as I have understood first I have to instantiate a Request object
> and allow it to receive and set cookies (I do this with r = Request() and cookies.add_cookie_header(r))
> Next I perform the request (urlopen),  save the cookies in the CookieJar (cookies.extract_cookies(R,r)).
>
> I evaluate the new address and I create a new Request for it (r2 = Request)
> I add the cookies stored in the cookiejar in my new request (cookies.add_cookie_header(r2))
> Then I perform the request (R2 = urlopen(r2)) and read the page (t2 = R2.read())
>
> What am I doing wrong?

With respect to cookie handling, you do everything right.

There may be other problems with the (wider) process.
Analysing the responses of your requests (reading the status codes,
the response headers and the response bodies) may provide hints
towards the problem.

>Do I misunderstand something in the process?

Not with respect to cookie handling.

[toc] | [prev] | [next] | [standalone]

#53291

From	Luca Cerone <luca.cerone@gmail.com>
Date	2013-08-30 05:01 -0700
Message-ID	<a2b80368-2414-4386-875c-0bbc8a3d913f@googlegroups.com>
In reply to	#53109

Thanks Dieter,
 
> With respect to cookie handling, you do everything right.
> 
> 
> 
> There may be other problems with the (wider) process.
> 
> Analysing the responses of your requests (reading the status codes,
> 
> the response headers and the response bodies) may provide hints
> 
> towards the problem.
> 

I will try to do that and try to see if I can figure out why.

> 
> 
> >Do I misunderstand something in the process?
> 
> 
> 
> Not with respect to cookie handling.

[toc] | [prev] | [standalone]

csiph-web

How to keep cookies when making http requests (Python 2.7)

Contents

#52747 — How to keep cookies when making http requests (Python 2.7)

#52750

#52754

#52758

#52813

#53040

#53050

#53109

#53291