Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #27983

Re: Python list archives double-gzipped?

Path csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <andipersti@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'subject:Python': 0.05; 'case.': 0.05; 'url:pipermail': 0.05; 'think,': 0.07; 'bug': 0.10; 'aug': 0.13; 'ignore': 0.13; '(size': 0.16; '[5]': 0.16; 'andreas': 0.16; 'archives.': 0.16; 'bye,': 0.16; 'headers,': 0.16; 'nowadays': 0.16; 'people?': 0.16; 'url:bugzilla': 0.16; 'url:detail': 0.16; 'url:issues': 0.16; 'url:show_bug': 0.16; 'wrote:': 0.17; 'headers': 0.17; 'saying': 0.18; 'tim': 0.18; 'archiving': 0.22; 'browsers': 0.22; 'http': 0.22; 'received:mail- bk0-f46.google.com': 0.22; 'setup,': 0.22; "i've": 0.23; 'second': 0.24; 'header': 0.24; 'header:In-Reply-To:1': 0.25; 'header:User- Agent:1': 0.26; 'looks': 0.26; '(which': 0.26; '[1]': 0.27; '[2]': 0.27; 'received:209.85.214.46': 0.27; 'correct': 0.28; 'subject:list': 0.28; 'noticed': 0.28; 'behaviour': 0.29; 'chase': 0.29; 'url:code': 0.29; 'url:python': 0.32; 'file': 0.32; 'anybody': 0.32; 'info': 0.32; 'to:addr:python-list': 0.33; 'received:google.com': 0.34; 'wrong': 0.34; 'server': 0.35; 'subject:?': 0.35; 'received:209.85': 0.35; 'something': 0.35; 'there': 0.35; 'but': 0.36; 'message-id:@gmail.com': 0.36; 'url:org': 0.36; 'received:209': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'files': 0.38; 'some': 0.38; 'received:10': 0.38; 'to:addr:python.org': 0.39; 'received:209.85.214': 0.39; 'notice': 0.39; 'skip:" 10': 0.40; 'subject:-': 0.40; 'header:Received:5': 0.40; 'url:mail': 0.40; 'think': 0.40; 'most': 0.61; 'leading': 0.61; 'skip:w 30': 0.61; 'url:p': 0.63; 'confirm': 0.64; 'url:cgi': 0.65; 'do:': 0.91; 'url:mozilla': 0.91
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=KSVcbOUMPGzsZT+EV0NsQNKggMpXwPzG2mL6wbjslCU=; b=joQGHcKvy7f3/T6IUCYaiFJIXZu73Cqh4EgjH3NjmSV7aQ+KKGQV/cW1EZq1P3a1mS Z0fX9ZtulcqXzJ8PgsND//BhBxYXRhQ8luIlvVqBThV8dgXxpUjGS21XFv4WmYd9XQQj VPWHCnciB8bi1eFOdsDNobzYgL9SWbO+ppUinE94gmhfw8awNg4lSlVynRQfrDDmHZ/C DXhzqFzLlqDHKGp2VgCXUj7KW6oOriHKcY1WmlP7OhoTh8GppmI9lSEPIj9z9jPp86vw IoWTCz05SlmOAgAwZ2WWLuTyLhhx0tIdUD6pfpnEWnOHWOOOOTZ5cOmINk6I/Oz4L3jI 0Ppg==
Date Mon, 27 Aug 2012 15:52:12 +0200
From Andreas Perstinger <andipersti@gmail.com>
User-Agent Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0
MIME-Version 1.0
To python-list@python.org
Subject Re: Python list archives double-gzipped?
References <503AD027.7000004@tim.thechases.com>
In-Reply-To <503AD027.7000004@tim.thechases.com>
Content-Type text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding 7bit
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.3875.1346075537.4697.python-list@python.org> (permalink)
Lines 56
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1346075537 news.xs4all.nl 6949 [2001:888:2000:d::a6]:60872
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:27983

Show key headers only | View raw


On 27.08.2012 03:40, Tim Chase wrote:
> So it looks like some python-list@ archiving process is double
> gzip'ing the archives.  Can anybody else confirm this and get the
> info the right people?

In January, "random joe" noticed the same problem[1].
I think, Anssi Saari[2] was right in saying that there is something 
wrong in the browser or server setup, because I notice the same 
behaviour with Firefox, Chromium, wget and curl.

$ ll *July*
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 chromium_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 748041 Aug 27 13:41 curl_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 firefox_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 748041 Aug  2 03:27 wget_2012-July.txt.gz

The browsers get a double gzipped file (size 747850) whereas the 
download utilities get a normal gzipped file (size 748041).

After looking at the HTTP request and response headers I've noticed that 
the browsers accept compressed data ("Accept-Encoding: gzip, deflate") 
whereas wget/curl by default don't. After adding that header to 
wget/curl they get the same double gzipped file as the browsers do:

$ ll *July*
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 chromium_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 748041 Aug 27 13:41 curl_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:40 
curl_encoding_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 firefox_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 748041 Aug  2 03:27 wget_2012-July.txt.gz
-rw-rw-r-- 1 andreas andreas 747850 Aug  2 03:27 
wget_encoding_2012-July.txt.gz

I think the following is happening:
If you send the "Accept-Encoding: gzip, deflate"-header, the server will 
gzip the file a second time (which is arguably unnecessary) and responds 
with "Content-Encoding: gzip" and "Content-Type: application/x-gzip" 
(which is IMHO correct according to RFC2616/14.11 and 14.17[3]).
But because many servers apparently don't set correct headers, the 
default behaviour of most browsers nowadays is to ignore the 
content-encoding for gzip files (application/x-gzip - see bug report for 
firefox[4] and chromium[5]) and don't uncompress the outer layer, 
leading to a double gzipped file in this case.

Bye, Andreas

[1] http://mail.python.org/pipermail/python-list/2012-January/617983.html

[2] http://mail.python.org/pipermail/python-list/2012-January/618211.html

[3] http://www.ietf.org/rfc/rfc2616

[4] https://bugzilla.mozilla.org/show_bug.cgi?id=610679#c5

[5] http://code.google.com/p/chromium/issues/detail?id=47951#c9

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Python list archives double-gzipped? Andreas Perstinger <andipersti@gmail.com> - 2012-08-27 15:52 +0200

csiph-web