Groups > comp.lang.python > #33190 > unrolled thread

Bugs: Content-Length not updated by reused urllib.request.Request / has_header() case-sensitive

Started by	Johannes Kleese <j.kleese@arcor.de>
First post	2012-11-12 16:52 +0100
Last post	2012-11-27 18:24 -0500
Articles	5 — 2 participants

Back to article view | Back to comp.lang.python

  Bugs: Content-Length not updated by reused urllib.request.Request / has_header() case-sensitive Johannes Kleese <j.kleese@arcor.de> - 2012-11-12 16:52 +0100
    Re: Bugs: Content-Length not updated by reused urllib.request.Request / has_header() case-sensitive Terry Reedy <tjreedy@udel.edu> - 2012-11-12 16:35 -0500
      Re: Bugs: Content-Length not updated by reused urllib.request.Request / has_header() case-sensitive Johannes Kleese <j.kleese@arcor.de> - 2012-11-13 08:24 +0100
    Re: Bugs: Content-Length not updated by reused urllib.request.Request / has_header() case-sensitive Terry Reedy <tjreedy@udel.edu> - 2012-11-12 20:58 -0500
    Re: Bugs: Content-Length not updated by reused urllib.request.Request / has_header() case-sensitive Terry Reedy <tjreedy@udel.edu> - 2012-11-27 18:24 -0500

#33190 — Bugs: Content-Length not updated by reused urllib.request.Request / has_header() case-sensitive

From	Johannes Kleese <j.kleese@arcor.de>
Date	2012-11-12 16:52 +0100
Subject	Bugs: Content-Length not updated by reused urllib.request.Request / has_header() case-sensitive
Message-ID	<50a12949$0$6566$9b4e6d93@newsspool3.arcor-online.net>

Hi!

(Yes, I did take a look at the issue tracker but couldn't find any
corresponding bug, and no, I don't want to open a new account just for
this one.)

--------------------------------------------------------------------

I'm reusing a single urllib.request.Request object to HTTP-POST data to
the same URL a number of times. While the data itself is sent as
expected every time, the Content-Length header is not updated after the
first request. Tested with Python 3.1.3 and Python 3.1.4.

>>> opener = urllib.request.build_opener()
>>> request = urllib.request.Request("http://example.com/", headers =
{"Content-Type": "application/x-www-form-urlencoded"})


>>> opener.open(request, "1".encode("us-ascii"))

>>> request.data
b'1'
>>> request.header_items()
[('Content-length', '1'), ('Content-type',
'application/x-www-form-urlencoded'), ('Host', 'example.com'),
('User-agent', 'Python-urllib/3.1')]


>>> opener.open(request, "123456789".encode("us-ascii"))

>>> request.data
b'123456789'
>>> request.header_items()
[('Content-length', '1'), ('Content-type',
'application/x-www-form-urlencoded'), ('Host', 'example.com'),
('User-agent', 'Python-urllib/3.1')]

Note that after the second run, Content-Length stays "1", but should be
"9", corresponding to the data b'123456789'. (Request data is not
x-www-form-urlencoded to shorten the test case. Doesn't affect the bug,
though.)




--------------------------------------------------------------------

While at it, I noticed that urllib.request.Request.has_header() and
.get_header() are case-sensitive, while HTTP headers are not (RFC 2616,
4.2). Thus the following, slightly unfortunate behaviour:

>>> request.header_items()
[('Content-length', '1'), ('Content-type',
'application/x-www-form-urlencoded'), ('Host', 'example.com'),
('User-agent', 'Python-urllib/3.1')]

>>> request.has_header("Content-Type")
False
>>> request.has_header("Content-type")
True
>>> request.get_header("Content-Type")
>>> request.get_header("Content-type")
'application/x-www-form-urlencoded'

--------------------------------------------------------------------

Thanks for taking care.

[toc] | [next] | [standalone]

#33200

From	Terry Reedy <tjreedy@udel.edu>
Date	2012-11-12 16:35 -0500
Message-ID	<mailman.3594.1352756164.27098.python-list@python.org>
In reply to	#33190

On 11/12/2012 10:52 AM, Johannes Kleese wrote:
> Hi!
>
> (Yes, I did take a look at the issue tracker but couldn't find any
> corresponding bug, and no, I don't want to open a new account just for
> this one.)

You only have to open a tracker account just once. I am reluctant to 
report this myself as I do not use the module and cannot answer questions.

> I'm reusing a single urllib.request.Request object to HTTP-POST data to
> the same URL a number of times. While the data itself is sent as
> expected every time, the Content-Length header is not updated after the
> first request. Tested with Python 3.1.3 and Python 3.1.4.

3.1 only gets security fixes. Consider upgrading. In any case, suspected 
bugs need to be tested with the latest release, as patches get applied 
daily. As it happens,

import urllib.request
opener = urllib.request.build_opener()
request = urllib.request.Request("http://example.com/", headers =
         {"Content-Type": "application/x-www-form-urlencoded"})

opener.open(request, "1".encode("us-ascii"))
print(request.data, '\n', request.header_items())

opener.open(request, "123456789".encode("us-ascii"))
print(request.data, '\n', request.header_items())

exhibits the same behavior in 3.3.0 of printing ('Content-length', '1') 
in the last output. I agree that that looks wrong, but I do not know if 
such re-use is supposed to be supported.

> While at it, I noticed that urllib.request.Request.has_header() and
> .get_header() are case-sensitive,

Python is case sensitive.

> while HTTP headers are not (RFC 2616, 4.2).
 > Thus the following, slightly unfortunate behaviour:
>
>>>> request.header_items()
> [('Content-length', '1'), ('Content-type',
> 'application/x-www-form-urlencoded'), ('Host', 'example.com'),
> ('User-agent', 'Python-urllib/3.1')]
>
>>>> request.has_header("Content-Type")
> False
>>>> request.has_header("Content-type")
> True
>>>> request.get_header("Content-Type")
# this return None, which is not printed
>>>> request.get_header("Content-type")
> 'application/x-www-form-urlencoded'

Judging from 'Content-type', 'User-agent', 'Content-length', 'Host', 
urllib.request consistently capitalizes the first word of all header 
tags and expects them in that form. If that is not standard, it should 
be documented.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#33227

From	Johannes Kleese <j.kleese@arcor.de>
Date	2012-11-13 08:24 +0100
Message-ID	<50a203cd$0$9519$9b4e6d93@newsspool1.arcor-online.net>
In reply to	#33200

Terry Reedy wrote:
> On 11/12/2012 10:52 AM, Johannes Kleese wrote:

>> Tested with Python 3.1.3 and Python 3.1.4.
> 
> 3.1 only gets security fixes. Consider upgrading. 

Stuck with Debian on a server, thus stuck with 3.1 on development machine.

> exhibits the same behavior in 3.3.0 of printing ('Content-length', '1')
> in the last output. I agree that that looks wrong, but I do not know if
> such re-use is supposed to be supported.

The Request object should then either get it right on re-use (which I'd
prefer), or block re-use.

>> While at it, I noticed that urllib.request.Request.has_header() and
>> .get_header() are case sensitive,
> 
> Python is case sensitive.

True, of course, but

>> HTTP headers are not (RFC 2616, 4.2).

and the functions work on HTTP data, not Python data. After all, we are
lucky to have functions here and not just a dictionary.

Anyway, thanks for reporting!

[toc] | [prev] | [next] | [standalone]

#33212

From	Terry Reedy <tjreedy@udel.edu>
Date	2012-11-12 20:58 -0500
Message-ID	<mailman.3603.1352771959.27098.python-list@python.org>
In reply to	#33190

On 11/12/2012 4:35 PM, Terry Reedy wrote:

> import urllib.request
> opener = urllib.request.build_opener()
> request = urllib.request.Request("http://example.com/", headers =
>          {"Content-Type": "application/x-www-form-urlencoded"})
>
> opener.open(request, "1".encode("us-ascii"))
> print(request.data, '\n', request.header_items())
>
> opener.open(request, "123456789".encode("us-ascii"))
> print(request.data, '\n', request.header_items())
>
> exhibits the same behavior in 3.3.0 of printing ('Content-length', '1')
> in the last output. I agree that that looks wrong, but I do not know if
> such re-use is supposed to be supported.

I opened http://bugs.python.org/issue16464

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#33986

From	Terry Reedy <tjreedy@udel.edu>
Date	2012-11-27 18:24 -0500
Message-ID	<mailman.324.1354058687.29569.python-list@python.org>
In reply to	#33190

On 11/12/2012 8:58 PM, Terry Reedy wrote:
> On 11/12/2012 4:35 PM, Terry Reedy wrote:
>
>> import urllib.request
>> opener = urllib.request.build_opener()
>> request = urllib.request.Request("http://example.com/", headers =
>>          {"Content-Type": "application/x-www-form-urlencoded"})
>>
>> opener.open(request, "1".encode("us-ascii"))
>> print(request.data, '\n', request.header_items())
>>
>> opener.open(request, "123456789".encode("us-ascii"))
>> print(request.data, '\n', request.header_items())
>>
>> exhibits the same behavior in 3.3.0 of printing ('Content-length', '1')
>> in the last output. I agree that that looks wrong, but I do not know if
>> such re-use is supposed to be supported.
>
> I opened http://bugs.python.org/issue16464

A patch has been written by Alexey Kachayev and pushed by Andrew Svetlov 
and the behavior will change in 3.4.0 to allow reuse.

-- 
Terry Jan Reedy

[toc] | [prev] | [standalone]

csiph-web

Bugs: Content-Length not updated by reused urllib.request.Request / has_header() case-sensitive

Contents

#33190 — Bugs: Content-Length not updated by reused urllib.request.Request / has_header() case-sensitive

#33200

#33227

#33212

#33986