Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #52747 > unrolled thread
| Started by | Luca Cerone <luca.cerone@gmail.com> |
|---|---|
| First post | 2013-08-20 15:24 -0700 |
| Last post | 2013-08-30 05:01 -0700 |
| Articles | 9 — 3 participants |
Back to article view | Back to comp.lang.python
How to keep cookies when making http requests (Python 2.7) Luca Cerone <luca.cerone@gmail.com> - 2013-08-20 15:24 -0700
Re: How to keep cookies when making http requests (Python 2.7) dieter <dieter@handshake.de> - 2013-08-21 08:07 +0200
Re: How to keep cookies when making http requests (Python 2.7) Luca Cerone <luca.cerone@gmail.com> - 2013-08-21 01:18 -0700
Re: How to keep cookies when making http requests (Python 2.7) Fábio Santos <fabiosantosart@gmail.com> - 2013-08-21 10:15 +0100
Re: How to keep cookies when making http requests (Python 2.7) dieter <dieter@handshake.de> - 2013-08-22 08:08 +0200
Re: How to keep cookies when making http requests (Python 2.7) Luca Cerone <luca.cerone@gmail.com> - 2013-08-27 02:16 -0700
Re: How to keep cookies when making http requests (Python 2.7) Luca Cerone <luca.cerone@gmail.com> - 2013-08-27 03:17 -0700
Re: How to keep cookies when making http requests (Python 2.7) dieter <dieter@handshake.de> - 2013-08-28 06:52 +0200
Re: How to keep cookies when making http requests (Python 2.7) Luca Cerone <luca.cerone@gmail.com> - 2013-08-30 05:01 -0700
| From | Luca Cerone <luca.cerone@gmail.com> |
|---|---|
| Date | 2013-08-20 15:24 -0700 |
| Subject | How to keep cookies when making http requests (Python 2.7) |
| Message-ID | <7e79a9b4-0bf9-4756-afd4-3bc127360b95@googlegroups.com> |
Hi everybody,
I am trying to write a simple Python script to solve the "riddle" at:
http://quiz.gambitresearch.com/
The quiz is quite easy to solve, one needs to evaluate the expression between the curly brackets (say that the expression has value <val>)
and go to the web page:
http://quiz.gambitresearch/job/<val>
You have to be fast enough, because with the page there is an associated cookie that expires 1 sec after the first request, therefore you need to be quick to access the /job/<val> page.
[I know that this is the correct solution because with a friend we wrote a small script in JavaScript and could access the page with the email address]
As an exercise I have decided to try doing the same with Python.
First I have tried with the following code:
#START SCRIPT
import re
import urllib2
regex = re.compile(r'<span class=\'x\'>\{(.*)\}<span class=\'x\'>')
base_address = "http://quiz.gambitresearch.com/"
base_h = urllib2.urlopen(base_address)
base_page = base_h.read()
val = str(eval(regex.findall(base_page)[0]))
job_address = base_address + "job/" + val
job_h = urllib2.urlopen(job_address)
job_page = job_h.read()
print job_page
#END SCRIPT
job_page has the following content now: "WRONG! (Have you enabled cookies?)"
Trying to solve the issues with the cookies I found the "requests" module that in theory should work.
I therefore rewrote the above script to use request:
#START SCRIPT:
import re
import requests
regex = re.compile(r'<span class=\'x\'>\{(.*)\}<span class=\'x\'>')
base_address = "http://quiz.gambitresearch.com/"
s = requests.Session()
base_h = s.get('http://quiz.gambitresearch.com/')
base_page = base_h.text
val = eval( regex.findall( base_page )[0] )
job_address = base_address + "job/" + str(val)
job_h = s.get( job_address )
job_page = job_h.text
print job_page
#END SCRIPT
# print job_page produces "Wrong!".
According to the manual using Session() the cookies should be enabled and persistent for all the session. In fact the cookies in base_h.cookies and in job_h.cookies seem to be the same:
base_h.cookies == job_h.cookies
#returns True
So, why does this script fail to access the job page?
How can I change it so that I it works as intended and job_page prints
the content of the page that displays the email address to use for the job applications?
Thanks a lot in advance for the help!
Best Wishes,
Luca
[toc] | [next] | [standalone]
| From | dieter <dieter@handshake.de> |
|---|---|
| Date | 2013-08-21 08:07 +0200 |
| Message-ID | <mailman.67.1377065271.19984.python-list@python.org> |
| In reply to | #52747 |
Luca Cerone <luca.cerone@gmail.com> writes:
> ...
Python has a module for cookie handling: "cookielib" ("cookiejar"
in Python 3).
"urllib2" has a standard way to integrate with this module.
However, I do not know the details (check the documentation
for the modules).
I have used "cookielib" externally to "urllib2". It looks
like this:
from urllib2 import urlopen, Request
from cookielib import CookieJar
cookies = CookieJar()
....
r = Request(...)
cookies.add_cookie_header(r) # set the cookies
R = urlopen(r, ...) # make the request
cookies.extract_cookies(R, r) # remember the new cookies
[toc] | [prev] | [next] | [standalone]
| From | Luca Cerone <luca.cerone@gmail.com> |
|---|---|
| Date | 2013-08-21 01:18 -0700 |
| Message-ID | <8266faf3-892b-4e49-9b38-87f0030250fc@googlegroups.com> |
| In reply to | #52750 |
> > I have used "cookielib" externally to "urllib2". It looks > > like this: > > from urllib2 import urlopen, Request > > from cookielib import CookieJar > cookies = CookieJar() > > .... > > r = Request(...) > > cookies.add_cookie_header(r) # set the cookies > > R = urlopen(r, ...) # make the request > > cookies.extract_cookies(R, r) # remember the new cookies Hi Dieter, thanks a lot for the help. I am sorry but your code is not very clear to me. It seems that you are setting some cookies, but I can't understand how you use the ones that the site sends to you when you perform the initial request. Have you tried this code to check if this work? If it works as intended can you explain a bit better what it does exactly? Thanks again! Luca
[toc] | [prev] | [next] | [standalone]
| From | Fábio Santos <fabiosantosart@gmail.com> |
|---|---|
| Date | 2013-08-21 10:15 +0100 |
| Message-ID | <mailman.71.1377076529.19984.python-list@python.org> |
| In reply to | #52754 |
[Multipart message — attachments visible in raw view] — view raw
On 21 Aug 2013 09:22, "Luca Cerone" <luca.cerone@gmail.com> wrote: > > > > > I have used "cookielib" externally to "urllib2". It looks > > > > like this: > > > > from urllib2 import urlopen, Request > > > > from cookielib import CookieJar > > cookies = CookieJar() > > > > .... > > > > r = Request(...) > > > > cookies.add_cookie_header(r) # set the cookies > > > > R = urlopen(r, ...) # make the request > > > > cookies.extract_cookies(R, r) # remember the new cookies > > Hi Dieter, > thanks a lot for the help. > I am sorry but your code is not very clear to me. > It seems that you are setting some cookies, > but I can't understand how you use the ones that the site > sends to you when you perform the initial request. This example does both. The cookie jar adds the cookies to the http request to be sent to the server, and updates the cookies from the response, if any were sent. It seems pretty clear, seeing that it has a lot of comments. The cookies from the site are thus in the cookie jar object after the call to extract_cookies() extracts them from the response. > Have you tried this code to check if this work? > If it works as intended can you explain a bit better > what it does exactly? You should really test this yourself ;) > Thanks again! > Luca
[toc] | [prev] | [next] | [standalone]
| From | dieter <dieter@handshake.de> |
|---|---|
| Date | 2013-08-22 08:08 +0200 |
| Message-ID | <mailman.118.1377151743.19984.python-list@python.org> |
| In reply to | #52754 |
Luca Cerone <luca.cerone@gmail.com> writes: >... > Have you tried this code to check if this work? Not this code, but code like this (as I have written). > If it works as intended can you explain a bit better > what it does exactly? Fabio already did the explanation. Let me make an additional remark however: you should not expect to get complete details in a list like this - but only hints towards a solution for your problem (i.e. there remains some work for you). Thus, I expect you to read the "cookielib/cookiejar" documentation (part of Python's standard documentation) in order to understand my example code - before I would be ready to provide further details.
[toc] | [prev] | [next] | [standalone]
| From | Luca Cerone <luca.cerone@gmail.com> |
|---|---|
| Date | 2013-08-27 02:16 -0700 |
| Message-ID | <5970a1d3-ec1b-4892-b53a-907de332ecaa@googlegroups.com> |
| In reply to | #52813 |
Dear all, first of all thanks for the help. As for your remark, you are right, and I usually tend to post questions in a way that is detached from the particular problem I have to solve. In this case since I only have a limited knowledge of the cookies mechanism (in general, not only in Python), I preferred to ask for the specific case. I am sorry if I gave you the impression I didn't appreciate your answer, it was absolutely not my intention. Cheers, Luca > Let me make an additional remark however: you should > > not expect to get complete details in a list like this - but only > > hints towards a solution for your problem (i.e. > > there remains some work for you). > > Thus, I expect you to read the "cookielib/cookiejar" documentation > > (part of Python's standard documentation) in order to understand > > my example code - before I would be ready to provide further details.
[toc] | [prev] | [next] | [standalone]
| From | Luca Cerone <luca.cerone@gmail.com> |
|---|---|
| Date | 2013-08-27 03:17 -0700 |
| Message-ID | <62a5fb2d-b4d0-4de4-a0d8-a7ee4dbb1b90@googlegroups.com> |
| In reply to | #53040 |
>
> > Let me make an additional remark however: you should
> > not expect to get complete details in a list like this - but only
> > hints towards a solution for your problem (i.e.
> > there remains some work for you).
> > Thus, I expect you to read the "cookielib/cookiejar" documentation
> > (part of Python's standard documentation) in order to understand
> > my example code - before I would be ready to provide further details.
Ok so after reading the documentation for urllib2 and cookielib I came up with the following code:
#START
from urllib2 import urlopen , Request
from cookielib import CookieJar
import re
regex = re.compile(r'<span class=\'x\'>\{(.*)\}<span class=\'x\'>')
base_url = "http://quiz.gambitresearch.com"
job_url = base_url + "/job/"
cookies = CookieJar()
r = Request(base_url) #prepare the request object
cookies.add_cookie_header(r) #allow to have cookies
R = urlopen(r) #read the url
cookies.extract_cookies(R,r) #take the cookies from the response R and adds #them to the request object
#build the new url
t = R.read()
v = str(eval(regex.findall(t)[0]))
job_url = job_url + v
# Here I create a new request to the url containing the email address
r2 = Request(job_url)
cookies.add_cookie_header(r2) #I prepare the request for cookies adding the cookies that I extracted before.
#perform the request and print the page
R2 = urlopen(r2)
t2 = R2.read()
print job_url
print t2
#END
This still doesn't work, but I really can't understand why.
As far as I have understood first I have to instantiate a Request object
and allow it to receive and set cookies (I do this with r = Request() and cookies.add_cookie_header(r))
Next I perform the request (urlopen), save the cookies in the CookieJar (cookies.extract_cookies(R,r)).
I evaluate the new address and I create a new Request for it (r2 = Request)
I add the cookies stored in the cookiejar in my new request (cookies.add_cookie_header(r2))
Then I perform the request (R2 = urlopen(r2)) and read the page (t2 = R2.read())
What am I doing wrong? Do I misunderstand something in the process?
Thanks again in advance for the help,
Cheers,
Luca
[toc] | [prev] | [next] | [standalone]
| From | dieter <dieter@handshake.de> |
|---|---|
| Date | 2013-08-28 06:52 +0200 |
| Message-ID | <mailman.289.1377665572.19984.python-list@python.org> |
| In reply to | #53050 |
Luca Cerone <luca.cerone@gmail.com> writes:
> ...
> Ok so after reading the documentation for urllib2 and cookielib I came up with the following code:
>
> #START
> from urllib2 import urlopen , Request
> from cookielib import CookieJar
> import re
> regex = re.compile(r'<span class=\'x\'>\{(.*)\}<span class=\'x\'>')
>
> base_url = "http://quiz.gambitresearch.com"
> job_url = base_url + "/job/"
>
> cookies = CookieJar()
> r = Request(base_url) #prepare the request object
> cookies.add_cookie_header(r) #allow to have cookies
> R = urlopen(r) #read the url
> cookies.extract_cookies(R,r) #take the cookies from the response R and adds #them to the request object
"adds them to the request object" should be "adds them to the cookie jar".
> #build the new url
> t = R.read()
> v = str(eval(regex.findall(t)[0]))
> job_url = job_url + v
>
>
> # Here I create a new request to the url containing the email address
> r2 = Request(job_url)
> cookies.add_cookie_header(r2) #I prepare the request for cookies adding the cookies that I extracted before.
>
> #perform the request and print the page
> R2 = urlopen(r2)
> t2 = R2.read()
> print job_url
> print t2
> #END
>
> This still doesn't work, but I really can't understand why.
> As far as I have understood first I have to instantiate a Request object
> and allow it to receive and set cookies (I do this with r = Request() and cookies.add_cookie_header(r))
> Next I perform the request (urlopen), save the cookies in the CookieJar (cookies.extract_cookies(R,r)).
>
> I evaluate the new address and I create a new Request for it (r2 = Request)
> I add the cookies stored in the cookiejar in my new request (cookies.add_cookie_header(r2))
> Then I perform the request (R2 = urlopen(r2)) and read the page (t2 = R2.read())
>
> What am I doing wrong?
With respect to cookie handling, you do everything right.
There may be other problems with the (wider) process.
Analysing the responses of your requests (reading the status codes,
the response headers and the response bodies) may provide hints
towards the problem.
>Do I misunderstand something in the process?
Not with respect to cookie handling.
[toc] | [prev] | [next] | [standalone]
| From | Luca Cerone <luca.cerone@gmail.com> |
|---|---|
| Date | 2013-08-30 05:01 -0700 |
| Message-ID | <a2b80368-2414-4386-875c-0bbc8a3d913f@googlegroups.com> |
| In reply to | #53109 |
Thanks Dieter, > With respect to cookie handling, you do everything right. > > > > There may be other problems with the (wider) process. > > Analysing the responses of your requests (reading the status codes, > > the response headers and the response bodies) may provide hints > > towards the problem. > I will try to do that and try to see if I can figure out why. > > > >Do I misunderstand something in the process? > > > > Not with respect to cookie handling.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web