Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'subject:Python': 0.05; 'case.': 0.05; 'url:pipermail': 0.05; 'think,': 0.07; 'bug': 0.10; 'aug': 0.13; 'ignore': 0.13; '(size': 0.16; '[5]': 0.16; 'andreas': 0.16; 'archives.': 0.16; 'bye,': 0.16; 'headers,': 0.16; 'nowadays': 0.16; 'people?': 0.16; 'url:bugzilla': 0.16; 'url:detail': 0.16; 'url:issues': 0.16; 'url:show_bug': 0.16; 'wrote:': 0.17; 'headers': 0.17; 'saying': 0.18; 'tim': 0.18; 'archiving': 0.22; 'browsers': 0.22; 'http': 0.22; 'received:mail- bk0-f46.google.com': 0.22; 'setup,': 0.22; "i've": 0.23; 'second': 0.24; 'header': 0.24; 'header:In-Reply-To:1': 0.25; 'header:User- Agent:1': 0.26; 'looks': 0.26; '(which': 0.26; '[1]': 0.27; '[2]': 0.27; 'received:209.85.214.46': 0.27; 'correct': 0.28; 'subject:list': 0.28; 'noticed': 0.28; 'behaviour': 0.29; 'chase': 0.29; 'url:code': 0.29; 'url:python': 0.32; 'file': 0.32; 'anybody': 0.32; 'info': 0.32; 'to:addr:python-list': 0.33; 'received:google.com': 0.34; 'wrong': 0.34; 'server': 0.35; 'subject:?': 0.35; 'received:209.85': 0.35; 'something': 0.35; 'there': 0.35; 'but': 0.36; 'message-id:@gmail.com': 0.36; 'url:org': 0.36; 'received:209': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'files': 0.38; 'some': 0.38; 'received:10': 0.38; 'to:addr:python.org': 0.39; 'received:209.85.214': 0.39; 'notice': 0.39; 'skip:" 10': 0.40; 'subject:-': 0.40; 'header:Received:5': 0.40; 'url:mail': 0.40; 'think': 0.40; 'most': 0.61; 'leading': 0.61; 'skip:w 30': 0.61; 'url:p': 0.63; 'confirm': 0.64; 'url:cgi': 0.65; 'do:': 0.91; 'url:mozilla': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=KSVcbOUMPGzsZT+EV0NsQNKggMpXwPzG2mL6wbjslCU=; b=joQGHcKvy7f3/T6IUCYaiFJIXZu73Cqh4EgjH3NjmSV7aQ+KKGQV/cW1EZq1P3a1mS Z0fX9ZtulcqXzJ8PgsND//BhBxYXRhQ8luIlvVqBThV8dgXxpUjGS21XFv4WmYd9XQQj VPWHCnciB8bi1eFOdsDNobzYgL9SWbO+ppUinE94gmhfw8awNg4lSlVynRQfrDDmHZ/C DXhzqFzLlqDHKGp2VgCXUj7KW6oOriHKcY1WmlP7OhoTh8GppmI9lSEPIj9z9jPp86vw IoWTCz05SlmOAgAwZ2WWLuTyLhhx0tIdUD6pfpnEWnOHWOOOOTZ5cOmINk6I/Oz4L3jI 0Ppg== Date: Mon, 27 Aug 2012 15:52:12 +0200 From: Andreas Perstinger User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: python-list@python.org Subject: Re: Python list archives double-gzipped? References: <503AD027.7000004@tim.thechases.com> In-Reply-To: <503AD027.7000004@tim.thechases.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 56 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1346075537 news.xs4all.nl 6949 [2001:888:2000:d::a6]:60872 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:27983 On 27.08.2012 03:40, Tim Chase wrote: > So it looks like some python-list@ archiving process is double > gzip'ing the archives. Can anybody else confirm this and get the > info the right people? In January, "random joe" noticed the same problem[1]. I think, Anssi Saari[2] was right in saying that there is something wrong in the browser or server setup, because I notice the same behaviour with Firefox, Chromium, wget and curl. $ ll *July* -rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 chromium_2012-July.txt.gz -rw-rw-r-- 1 andreas andreas 748041 Aug 27 13:41 curl_2012-July.txt.gz -rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 firefox_2012-July.txt.gz -rw-rw-r-- 1 andreas andreas 748041 Aug 2 03:27 wget_2012-July.txt.gz The browsers get a double gzipped file (size 747850) whereas the download utilities get a normal gzipped file (size 748041). After looking at the HTTP request and response headers I've noticed that the browsers accept compressed data ("Accept-Encoding: gzip, deflate") whereas wget/curl by default don't. After adding that header to wget/curl they get the same double gzipped file as the browsers do: $ ll *July* -rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 chromium_2012-July.txt.gz -rw-rw-r-- 1 andreas andreas 748041 Aug 27 13:41 curl_2012-July.txt.gz -rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:40 curl_encoding_2012-July.txt.gz -rw-rw-r-- 1 andreas andreas 747850 Aug 27 13:48 firefox_2012-July.txt.gz -rw-rw-r-- 1 andreas andreas 748041 Aug 2 03:27 wget_2012-July.txt.gz -rw-rw-r-- 1 andreas andreas 747850 Aug 2 03:27 wget_encoding_2012-July.txt.gz I think the following is happening: If you send the "Accept-Encoding: gzip, deflate"-header, the server will gzip the file a second time (which is arguably unnecessary) and responds with "Content-Encoding: gzip" and "Content-Type: application/x-gzip" (which is IMHO correct according to RFC2616/14.11 and 14.17[3]). But because many servers apparently don't set correct headers, the default behaviour of most browsers nowadays is to ignore the content-encoding for gzip files (application/x-gzip - see bug report for firefox[4] and chromium[5]) and don't uncompress the outer layer, leading to a double gzipped file in this case. Bye, Andreas [1] http://mail.python.org/pipermail/python-list/2012-January/617983.html [2] http://mail.python.org/pipermail/python-list/2012-January/618211.html [3] http://www.ietf.org/rfc/rfc2616 [4] https://bugzilla.mozilla.org/show_bug.cgi?id=610679#c5 [5] http://code.google.com/p/chromium/issues/detail?id=47951#c9