Path: csiph.com!usenet.pasdenom.info!dedibox.gegeweb.org!gegeweb.eu!nntpfeed.proxad.net!proxad.net!feeder1-2.proxad.net!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.020 X-Spam-Evidence: '*H*': 0.96; '*S*': 0.00; 'executed': 0.07; 'fails.': 0.09; 'received:mail-lpp01m010-f46.google.com': 0.09; 'cc:addr :python-list': 0.10; '(!)': 0.16; 'flushed': 0.16; 'investigate': 0.16; 'nfs.': 0.16; 'rewriting': 0.16; 'scratch': 0.16; 'subject:failed': 0.16; '(in': 0.18; 'changes': 0.20; 'file.': 0.20; 'cc:2**0': 0.23; "i've": 0.23; 'seems': 0.23; 'cc:no real name:2**0': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply- To:1': 0.25; '(we': 0.27; 'message-id:@mail.gmail.com': 0.27; "doesn't": 0.28; 'block,': 0.29; 'far.': 0.29; 'received:209.85.215.46': 0.30; 'writes': 0.30; 'checked': 0.30; 'sense': 0.31; 'lists': 0.31; 'file': 0.32; 'could': 0.32; 'problem': 0.33; 'everyone': 0.33; "can't": 0.34; 'program,': 0.34; 'received:google.com': 0.34; 'thanks': 0.34; 'clear': 0.35; 'stores': 0.35; 'open': 0.35; 'received:209.85': 0.35; 'there': 0.35; 'really': 0.36; 'but': 0.36; "i'll": 0.36; 'possible': 0.37; 'two': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'nothing': 0.38; 'subject:-': 0.40; 'header:Received:5': 0.40; 'skip:u 10': 0.60; 'is.': 0.62; 'different': 0.63; 'more': 0.63; 'computers': 0.69; 'nagy': 0.84; 'ages': 0.91; 'fragment': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=qWTAaLdh2+GK0XnB0SYYMRg/TNH/ef99qBDryYrcDno=; b=JyBaFkU6q0iyHcg2DDrraNFVvgOudgqwNdnT4ZerKF7oGorznrDTXOohMvWyNUCfZV 7ROTOV9Orc5Oi8SjzSm9ktDfiswhnus0v4+iXF3jMzbXP0sbCafLozWKA3/gbeRGl3cY PF8TTGiMQz47S4oRbF/2SOiwQfarfaWFMz1Psyu7R8ixhUUjfhi2lyXYPEUsymvtOXNd 1EJOXFZXNTDuYQcHn0vyxtEOpf7Lx4T6pydsMAq17iN+FzS2RY7nJCs5+X2y1E/ZJ20R wPibODiNwverxGJQh8AzUtdgVy3ZT6bcE07KC77hwaD0hfhrjjwhmjGiFZw47l9FC/OY vn6g== MIME-Version: 1.0 In-Reply-To: <50192EBE.3060404@shopzeus.com> References: <50190ED6.1040100@shopzeus.com> <50192EBE.3060404@shopzeus.com> Date: Wed, 1 Aug 2012 14:52:59 +0100 Subject: Re: CRC-checksum failed in gzip From: andrea crotti To: Laszlo Nagy Content-Type: text/plain; charset=ISO-8859-1 Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 32 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1343829180 news.xs4all.nl 6958 [2001:888:2000:d::a6]:38498 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:26359 2012/8/1 Laszlo Nagy : >> there seems to be no clear pattern and just randmoly fails. The file >> is also just open for read from this program, >> so in theory no way that it can be corrupted. > > Yes, there is. Gzip stores CRC for compressed *blocks*. So if the file is > not flushed to the disk, then you can only read a fragment of the block, and > that changes the CRC. > >> >> I also checked with lsof if there are processes that opened it but >> nothing appears.. > > lsof doesn't work very well over nfs. You can have other processes on > different computers (!) writting the file. lsof only lists the processes on > the system it is executed on. > >> >> - can't really try on the local disk, might take ages unfortunately >> (we are rewriting this system from scratch anyway) >> > Thanks a lotl, someone that writes on the file while reading might be an explanation, the problem is that everyone claims that they are only reading the file. Apparently this file is generated once and a long time after only read by two different tools (in sequence), so this could not be possible either in theory.. I'll try to investigate more in this sense since it's the only reasonable explation I've found so far.