Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #26356 > unrolled thread

Re: CRC-checksum failed in gzip

Started byandrea crotti <andrea.crotti.0@gmail.com>
First post2012-08-01 14:01 +0100
Last post2012-08-02 11:59 +0100
Articles 12 — 4 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: CRC-checksum failed in gzip andrea crotti <andrea.crotti.0@gmail.com> - 2012-08-01 14:01 +0100
    Re: CRC-checksum failed in gzip Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-01 16:17 +0000
      Re: CRC-checksum failed in gzip andrea crotti <andrea.crotti.0@gmail.com> - 2012-08-01 17:38 +0100
      Re: CRC-checksum failed in gzip Laszlo Nagy <gandalf@shopzeus.com> - 2012-08-01 19:05 +0200
      Re: CRC-checksum failed in gzip andrea crotti <andrea.crotti.0@gmail.com> - 2012-08-01 18:17 +0100
      Re: CRC-checksum failed in gzip Laszlo Nagy <gandalf@shopzeus.com> - 2012-08-01 19:57 +0200
        Re: CRC-checksum failed in gzip Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2012-08-02 10:49 +0200
          Re: CRC-checksum failed in gzip Laszlo Nagy <gandalf@shopzeus.com> - 2012-08-02 12:14 +0200
      Re: CRC-checksum failed in gzip andrea crotti <andrea.crotti.0@gmail.com> - 2012-08-02 10:26 +0100
      Re: CRC-checksum failed in gzip Laszlo Nagy <gandalf@shopzeus.com> - 2012-08-02 12:21 +0200
      Re: CRC-checksum failed in gzip andrea crotti <andrea.crotti.0@gmail.com> - 2012-08-02 11:57 +0100
      Re: CRC-checksum failed in gzip andrea crotti <andrea.crotti.0@gmail.com> - 2012-08-02 11:59 +0100

#26356 — Re: CRC-checksum failed in gzip

Fromandrea crotti <andrea.crotti.0@gmail.com>
Date2012-08-01 14:01 +0100
SubjectRe: CRC-checksum failed in gzip
Message-ID<mailman.2825.1343826107.4697.python-list@python.org>
Full traceback:

Exception in thread Thread-8:
Traceback (most recent call last):
  File "/user/sim/python/lib/python2.7/threading.py", line 530, in
__bootstrap_inner
    self.run()
  File "/user/sim/tests/llif/AutoTester/src/AutoTester2.py", line 67, in run
    self.processJobData(jobData, logger)
  File "/user/sim/tests/llif/AutoTester/src/AutoTester2.py", line 204,
in processJobData
    self.run_simulator(area, jobData[1] ,log)
  File "/user/sim/tests/llif/AutoTester/src/AutoTester2.py", line 142,
in run_simulator
    report_file, percentage, body_text = SimResults.copy_test_batch(log, area)
  File "/user/sim/tests/llif/AutoTester/src/SimResults.py", line 274,
in copy_test_batch
    out2_lines = out2.read()
  File "/user/sim/python/lib/python2.7/gzip.py", line 245, in read
    self._read(readsize)
  File "/user/sim/python/lib/python2.7/gzip.py", line 316, in _read
    self._read_eof()
  File "/user/sim/python/lib/python2.7/gzip.py", line 338, in _read_eof
    hex(self.crc)))
IOError: CRC check failed 0x4f675fba != 0xa9e45aL


- The file is written with the linux gzip program.
- no I can't reproduce the error with the same exact file that did
failed, that's what is really puzzling,
  there seems to be no clear pattern and just randmoly fails. The file
is also just open for read from this program,
  so in theory no way that it can be corrupted.

  I also checked with lsof if there are processes that opened it but
nothing appears..

- can't really try on the local disk, might take ages unfortunately
(we are rewriting this system from scratch anyway)

[toc] | [next] | [standalone]


#26368

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-08-01 16:17 +0000
Message-ID<501956a7$0$29978$c3e8da3$5496439d@news.astraweb.com>
In reply to#26356
On Wed, 01 Aug 2012 14:01:45 +0100, andrea crotti wrote:

> Full traceback:
> 
> Exception in thread Thread-8:

"DANGER DANGER DANGER WILL ROBINSON!!!"

Why didn't you say that there were threads involved? That puts a 
completely different perspective on the problem.

I *was* going to write back and say that you probably had either file 
system corruption, or network errors. But now that I can see that you 
have threads, I will revise that and say that you probably have a bug in 
your thread handling code.

I must say, Andrea, your initial post asking for help was EXTREMELY 
misleading. You over-simplified the problem to the point that it no 
longer has any connection to the reality of the code you are running. 
Please don't send us on wild goose chases after bugs in code that you 
aren't actually running.


>   there seems to be no clear pattern and just randmoly fails.

When you start using threads, you have to expect these sorts of 
intermittent bugs unless you are very careful.

My guess is that you have a bug where two threads read from the same file 
at the same time. Since each read shares state (the position of the file 
pointer), you're going to get corruption. Because it depends on timing 
details of which threads do what at exactly which microsecond, the effect 
might as well be random.

Example: suppose the file contains three blocks A B and C, and a 
checksum. Thread 8 starts reading the file, and gets block A and B. Then 
thread 2 starts reading it as well, and gets half of block C. Thread 8 
gets the rest of block C, calculates the checksum, and it doesn't match.

I recommend that you run a file system check on the remote disk. If it 
passes, you can eliminate file system corruption. Also, run some network 
diagnostics, to eliminate corruption introduced in the network layer. But 
I expect that you won't find anything there, and the problem is a simple 
thread bug. Simple, but really, really hard to find.

Good luck.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#26370

Fromandrea crotti <andrea.crotti.0@gmail.com>
Date2012-08-01 17:38 +0100
Message-ID<mailman.2837.1343839139.4697.python-list@python.org>
In reply to#26368
2012/8/1 Steven D'Aprano <steve+comp.lang.python@pearwood.info>:
> On Wed, 01 Aug 2012 14:01:45 +0100, andrea crotti wrote:
>
>> Full traceback:
>>
>> Exception in thread Thread-8:
>
> "DANGER DANGER DANGER WILL ROBINSON!!!"
>
> Why didn't you say that there were threads involved? That puts a
> completely different perspective on the problem.
>
> I *was* going to write back and say that you probably had either file
> system corruption, or network errors. But now that I can see that you
> have threads, I will revise that and say that you probably have a bug in
> your thread handling code.
>
> I must say, Andrea, your initial post asking for help was EXTREMELY
> misleading. You over-simplified the problem to the point that it no
> longer has any connection to the reality of the code you are running.
> Please don't send us on wild goose chases after bugs in code that you
> aren't actually running.
>
>
>>   there seems to be no clear pattern and just randmoly fails.
>
> When you start using threads, you have to expect these sorts of
> intermittent bugs unless you are very careful.
>
> My guess is that you have a bug where two threads read from the same file
> at the same time. Since each read shares state (the position of the file
> pointer), you're going to get corruption. Because it depends on timing
> details of which threads do what at exactly which microsecond, the effect
> might as well be random.
>
> Example: suppose the file contains three blocks A B and C, and a
> checksum. Thread 8 starts reading the file, and gets block A and B. Then
> thread 2 starts reading it as well, and gets half of block C. Thread 8
> gets the rest of block C, calculates the checksum, and it doesn't match.
>
> I recommend that you run a file system check on the remote disk. If it
> passes, you can eliminate file system corruption. Also, run some network
> diagnostics, to eliminate corruption introduced in the network layer. But
> I expect that you won't find anything there, and the problem is a simple
> thread bug. Simple, but really, really hard to find.
>
> Good luck.
>

Thanks a lot, that makes a lot of sense..  I haven't given this detail
before because I didn't write this code, and I forgot that there were
threads involved completely, I'm just trying to help to fix this bug.

Your explanation makes a lot of sense, but it's still surprising that
even just reading files without ever writing them can cause troubles
using threads :/

[toc] | [prev] | [next] | [standalone]


#26372

FromLaszlo Nagy <gandalf@shopzeus.com>
Date2012-08-01 19:05 +0200
Message-ID<mailman.2840.1343840725.4697.python-list@python.org>
In reply to#26368
> Thanks a lot, that makes a lot of sense..  I haven't given this detail
> before because I didn't write this code, and I forgot that there were
> threads involved completely, I'm just trying to help to fix this bug.
>
> Your explanation makes a lot of sense, but it's still surprising that
> even just reading files without ever writing them can cause troubles
> using threads :/
Make sure that file objects are not shared between threads. If that is 
possible. It will probably solve the problem (if that is related to 
threads).

[toc] | [prev] | [next] | [standalone]


#26374

Fromandrea crotti <andrea.crotti.0@gmail.com>
Date2012-08-01 18:17 +0100
Message-ID<mailman.2843.1343841485.4697.python-list@python.org>
In reply to#26368
2012/8/1 Laszlo Nagy <gandalf@shopzeus.com>:
>
>> Thanks a lot, that makes a lot of sense..  I haven't given this detail
>> before because I didn't write this code, and I forgot that there were
>> threads involved completely, I'm just trying to help to fix this bug.
>>
>> Your explanation makes a lot of sense, but it's still surprising that
>> even just reading files without ever writing them can cause troubles
>> using threads :/
>
> Make sure that file objects are not shared between threads. If that is
> possible. It will probably solve the problem (if that is related to
> threads).


Well I just have to create a lock I guess right?
with lock:
    # open file
    # read content

[toc] | [prev] | [next] | [standalone]


#26375

FromLaszlo Nagy <gandalf@shopzeus.com>
Date2012-08-01 19:57 +0200
Message-ID<mailman.2845.1343843853.4697.python-list@python.org>
In reply to#26368
>> Make sure that file objects are not shared between threads. If that is
>> possible. It will probably solve the problem (if that is related to
>> threads).
>
> Well I just have to create a lock I guess right?
That is also a solution. You need to call file.read() inside an acquired 
lock.
> with lock:
>      # open file
>      # read content
>
But not that way! Your example will keep the lock acquired for the 
lifetime of the file, so it cannot be shared between threads.

More likely:

## Open file
lock = threading.Lock()
fin = gzip.open(file_path...)
# Now you can share the file object between threads.

# and do this inside any thread:
## data needed. block until the file object becomes usable.
with lock:
     data = fin.read(....) # other threads are blocked while I'm reading
## use your data here, meanwhile other threads can read

[toc] | [prev] | [next] | [standalone]


#26386

FromUlrich Eckhardt <ulrich.eckhardt@dominolaser.com>
Date2012-08-02 10:49 +0200
Message-ID<3hrpe9-hbi.ln1@satorlaser.homedns.org>
In reply to#26375
Am 01.08.2012 19:57, schrieb Laszlo Nagy:
> ## Open file
> lock = threading.Lock()
> fin = gzip.open(file_path...)
> # Now you can share the file object between threads.
>
> # and do this inside any thread:
> ## data needed. block until the file object becomes usable.
> with lock:
>      data = fin.read(....) # other threads are blocked while I'm reading
> ## use your data here, meanwhile other threads can read

Technically, that is correct, but IMHO its complete nonsense to share 
the file object between threads in the first place. If you need the data 
in two threads, just read the file once and then share the read-only, 
immutable content. If the file is small or too large to be held in 
memory at once, just open and read it on demand. This also saves you 
from having to rewind the file every time you read it.

Am I missing something?

Uli

[toc] | [prev] | [next] | [standalone]


#26393

FromLaszlo Nagy <gandalf@shopzeus.com>
Date2012-08-02 12:14 +0200
Message-ID<mailman.2862.1343902460.4697.python-list@python.org>
In reply to#26386
> Technically, that is correct, but IMHO its complete nonsense to share 
> the file object between threads in the first place. If you need the 
> data in two threads, just read the file once and then share the 
> read-only, immutable content. If the file is small or too large to be 
> held in memory at once, just open and read it on demand. This also 
> saves you from having to rewind the file every time you read it.
>
> Am I missing something?
We suspect that his program reads the same file object from different 
threads. At least this would explain his problem. I agree with you - 
usually it is not a good idea to share a file object between threads. 
This is what I told him the first time. But it is not in our hands - he 
already has a program that needs to be fixed. It might be easier for him 
to protect read() calls with a lock. Because it can be done 
automatically, without thinking too much.

[toc] | [prev] | [next] | [standalone]


#26389

Fromandrea crotti <andrea.crotti.0@gmail.com>
Date2012-08-02 10:26 +0100
Message-ID<mailman.2859.1343899619.4697.python-list@python.org>
In reply to#26368
2012/8/1 Steven D'Aprano <steve+comp.lang.python@pearwood.info>:
>
> When you start using threads, you have to expect these sorts of
> intermittent bugs unless you are very careful.
>
> My guess is that you have a bug where two threads read from the same file
> at the same time. Since each read shares state (the position of the file
> pointer), you're going to get corruption. Because it depends on timing
> details of which threads do what at exactly which microsecond, the effect
> might as well be random.
>
> Example: suppose the file contains three blocks A B and C, and a
> checksum. Thread 8 starts reading the file, and gets block A and B. Then
> thread 2 starts reading it as well, and gets half of block C. Thread 8
> gets the rest of block C, calculates the checksum, and it doesn't match.
>
> I recommend that you run a file system check on the remote disk. If it
> passes, you can eliminate file system corruption. Also, run some network
> diagnostics, to eliminate corruption introduced in the network layer. But
> I expect that you won't find anything there, and the problem is a simple
> thread bug. Simple, but really, really hard to find.
>
> Good luck.

One last thing I would like to do before I add this fix is to actually
be able to reproduce this behaviour, and I thought I could just do the
following:

import gzip
import threading


class OpenAndRead(threading.Thread):
    def run(self):
        fz = gzip.open('out2.txt.gz')
        fz.read()
        fz.close()


if __name__ == '__main__':
    for i in range(100):
        OpenAndRead().start()


But no matter how many threads I start, I can't reproduce the CRC
error, any idea how I can try to help it happening?

The code in run should be shared by all the threads since there are no
locks, right?

[toc] | [prev] | [next] | [standalone]


#26395

FromLaszlo Nagy <gandalf@shopzeus.com>
Date2012-08-02 12:21 +0200
Message-ID<mailman.2863.1343902890.4697.python-list@python.org>
In reply to#26368
> One last thing I would like to do before I add this fix is to actually
> be able to reproduce this behaviour, and I thought I could just do the
> following:
>
> import gzip
> import threading
>
>
> class OpenAndRead(threading.Thread):
>      def run(self):
>          fz = gzip.open('out2.txt.gz')
>          fz.read()
>          fz.close()
>
>
> if __name__ == '__main__':
>      for i in range(100):
>          OpenAndRead().start()
>
>
> But no matter how many threads I start, I can't reproduce the CRC
> error, any idea how I can try to help it happening?
Your example did not share the file object between threads. Here an 
example that does that:

class OpenAndRead(threading.Thread):
     def run(self):
	global fz
	fz.read(100)

if __name__ == '__main__':
    fz = gzip.open('out2.txt.gz')
    for i in range(10):
         OpenAndRead().start()

Try this with a huge file. And here is the one that should never throw 
CRC error, because the file object is protected by a lock:

class OpenAndRead(threading.Thread):
     def run(self):
         global fz
         global fl
         with fl:
             fz.read(100)

if __name__ == '__main__':
    fz = gzip.open('out2.txt.gz')
    fl = threading.Lock()
    for i in range(2):
         OpenAndRead().start()

>
> The code in run should be shared by all the threads since there are no
> locks, right?
The code is shared but the file object is not. In your example, a new 
file object is created, every time a thread is started.

[toc] | [prev] | [next] | [standalone]


#26398

Fromandrea crotti <andrea.crotti.0@gmail.com>
Date2012-08-02 11:57 +0100
Message-ID<mailman.2867.1343905029.4697.python-list@python.org>
In reply to#26368
2012/8/2 Laszlo Nagy <gandalf@shopzeus.com>:
>
> Your example did not share the file object between threads. Here an example
> that does that:
>
> class OpenAndRead(threading.Thread):
>     def run(self):
>         global fz
>         fz.read(100)
>
> if __name__ == '__main__':
>
>    fz = gzip.open('out2.txt.gz')
>    for i in range(10):
>         OpenAndRead().start()
>
> Try this with a huge file. And here is the one that should never throw CRC
> error, because the file object is protected by a lock:
>
> class OpenAndRead(threading.Thread):
>     def run(self):
>         global fz
>         global fl
>         with fl:
>             fz.read(100)
>
> if __name__ == '__main__':
>
>    fz = gzip.open('out2.txt.gz')
>    fl = threading.Lock()
>    for i in range(2):
>         OpenAndRead().start()
>
>
>>
>> The code in run should be shared by all the threads since there are no
>> locks, right?
>
> The code is shared but the file object is not. In your example, a new file
> object is created, every time a thread is started.
>


Ok sure that makes sense, but then this explanation is maybe not right
anymore, because I'm quite sure that the file object is *not* shared
between threads, everything happens inside a thread..

I managed to get some errors doing this with a big file
class OpenAndRead(threading.Thread):
     def run(self):
         global fz
         fz.read(100)

if __name__ == '__main__':

    fz = gzip.open('bigfile.avi.gz')
    for i in range(20):
         OpenAndRead().start()

and it doesn't fail without the *global*, but this is definitively not
what the code does, because every thread gets a new file object, it's
not shared..

Anyway we'll read once for all the threads or add the lock, and
hopefully it should solve the problem, even if I'm not convinced yet
that it was this.

[toc] | [prev] | [next] | [standalone]


#26399

Fromandrea crotti <andrea.crotti.0@gmail.com>
Date2012-08-02 11:59 +0100
Message-ID<mailman.2868.1343905170.4697.python-list@python.org>
In reply to#26368
2012/8/2 andrea crotti <andrea.crotti.0@gmail.com>:
>
> Ok sure that makes sense, but then this explanation is maybe not right
> anymore, because I'm quite sure that the file object is *not* shared
> between threads, everything happens inside a thread..
>
> I managed to get some errors doing this with a big file
> class OpenAndRead(threading.Thread):
>      def run(self):
>          global fz
>          fz.read(100)
>
> if __name__ == '__main__':
>
>     fz = gzip.open('bigfile.avi.gz')
>     for i in range(20):
>          OpenAndRead().start()
>
> and it doesn't fail without the *global*, but this is definitively not
> what the code does, because every thread gets a new file object, it's
> not shared..
>
> Anyway we'll read once for all the threads or add the lock, and
> hopefully it should solve the problem, even if I'm not convinced yet
> that it was this.


Just for completeness as suggested this also does not fail:

class OpenAndRead(threading.Thread):
    def __init__(self, lock):
        threading.Thread.__init__(self)
        self.lock = lock

    def run(self):
         global fz
         with self.lock:
             fz.read(100)

if __name__ == '__main__':
    lock = threading.Lock()
    fz = gzip.open('bigfile.avi.gz')
    for i in range(20):
         OpenAndRead(lock).start()

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web