Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #26356 > unrolled thread
| Started by | andrea crotti <andrea.crotti.0@gmail.com> |
|---|---|
| First post | 2012-08-01 14:01 +0100 |
| Last post | 2012-08-02 11:59 +0100 |
| Articles | 12 — 4 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: CRC-checksum failed in gzip andrea crotti <andrea.crotti.0@gmail.com> - 2012-08-01 14:01 +0100
Re: CRC-checksum failed in gzip Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-01 16:17 +0000
Re: CRC-checksum failed in gzip andrea crotti <andrea.crotti.0@gmail.com> - 2012-08-01 17:38 +0100
Re: CRC-checksum failed in gzip Laszlo Nagy <gandalf@shopzeus.com> - 2012-08-01 19:05 +0200
Re: CRC-checksum failed in gzip andrea crotti <andrea.crotti.0@gmail.com> - 2012-08-01 18:17 +0100
Re: CRC-checksum failed in gzip Laszlo Nagy <gandalf@shopzeus.com> - 2012-08-01 19:57 +0200
Re: CRC-checksum failed in gzip Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2012-08-02 10:49 +0200
Re: CRC-checksum failed in gzip Laszlo Nagy <gandalf@shopzeus.com> - 2012-08-02 12:14 +0200
Re: CRC-checksum failed in gzip andrea crotti <andrea.crotti.0@gmail.com> - 2012-08-02 10:26 +0100
Re: CRC-checksum failed in gzip Laszlo Nagy <gandalf@shopzeus.com> - 2012-08-02 12:21 +0200
Re: CRC-checksum failed in gzip andrea crotti <andrea.crotti.0@gmail.com> - 2012-08-02 11:57 +0100
Re: CRC-checksum failed in gzip andrea crotti <andrea.crotti.0@gmail.com> - 2012-08-02 11:59 +0100
| From | andrea crotti <andrea.crotti.0@gmail.com> |
|---|---|
| Date | 2012-08-01 14:01 +0100 |
| Subject | Re: CRC-checksum failed in gzip |
| Message-ID | <mailman.2825.1343826107.4697.python-list@python.org> |
Full traceback:
Exception in thread Thread-8:
Traceback (most recent call last):
File "/user/sim/python/lib/python2.7/threading.py", line 530, in
__bootstrap_inner
self.run()
File "/user/sim/tests/llif/AutoTester/src/AutoTester2.py", line 67, in run
self.processJobData(jobData, logger)
File "/user/sim/tests/llif/AutoTester/src/AutoTester2.py", line 204,
in processJobData
self.run_simulator(area, jobData[1] ,log)
File "/user/sim/tests/llif/AutoTester/src/AutoTester2.py", line 142,
in run_simulator
report_file, percentage, body_text = SimResults.copy_test_batch(log, area)
File "/user/sim/tests/llif/AutoTester/src/SimResults.py", line 274,
in copy_test_batch
out2_lines = out2.read()
File "/user/sim/python/lib/python2.7/gzip.py", line 245, in read
self._read(readsize)
File "/user/sim/python/lib/python2.7/gzip.py", line 316, in _read
self._read_eof()
File "/user/sim/python/lib/python2.7/gzip.py", line 338, in _read_eof
hex(self.crc)))
IOError: CRC check failed 0x4f675fba != 0xa9e45aL
- The file is written with the linux gzip program.
- no I can't reproduce the error with the same exact file that did
failed, that's what is really puzzling,
there seems to be no clear pattern and just randmoly fails. The file
is also just open for read from this program,
so in theory no way that it can be corrupted.
I also checked with lsof if there are processes that opened it but
nothing appears..
- can't really try on the local disk, might take ages unfortunately
(we are rewriting this system from scratch anyway)
[toc] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-08-01 16:17 +0000 |
| Message-ID | <501956a7$0$29978$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #26356 |
On Wed, 01 Aug 2012 14:01:45 +0100, andrea crotti wrote: > Full traceback: > > Exception in thread Thread-8: "DANGER DANGER DANGER WILL ROBINSON!!!" Why didn't you say that there were threads involved? That puts a completely different perspective on the problem. I *was* going to write back and say that you probably had either file system corruption, or network errors. But now that I can see that you have threads, I will revise that and say that you probably have a bug in your thread handling code. I must say, Andrea, your initial post asking for help was EXTREMELY misleading. You over-simplified the problem to the point that it no longer has any connection to the reality of the code you are running. Please don't send us on wild goose chases after bugs in code that you aren't actually running. > there seems to be no clear pattern and just randmoly fails. When you start using threads, you have to expect these sorts of intermittent bugs unless you are very careful. My guess is that you have a bug where two threads read from the same file at the same time. Since each read shares state (the position of the file pointer), you're going to get corruption. Because it depends on timing details of which threads do what at exactly which microsecond, the effect might as well be random. Example: suppose the file contains three blocks A B and C, and a checksum. Thread 8 starts reading the file, and gets block A and B. Then thread 2 starts reading it as well, and gets half of block C. Thread 8 gets the rest of block C, calculates the checksum, and it doesn't match. I recommend that you run a file system check on the remote disk. If it passes, you can eliminate file system corruption. Also, run some network diagnostics, to eliminate corruption introduced in the network layer. But I expect that you won't find anything there, and the problem is a simple thread bug. Simple, but really, really hard to find. Good luck. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | andrea crotti <andrea.crotti.0@gmail.com> |
|---|---|
| Date | 2012-08-01 17:38 +0100 |
| Message-ID | <mailman.2837.1343839139.4697.python-list@python.org> |
| In reply to | #26368 |
2012/8/1 Steven D'Aprano <steve+comp.lang.python@pearwood.info>: > On Wed, 01 Aug 2012 14:01:45 +0100, andrea crotti wrote: > >> Full traceback: >> >> Exception in thread Thread-8: > > "DANGER DANGER DANGER WILL ROBINSON!!!" > > Why didn't you say that there were threads involved? That puts a > completely different perspective on the problem. > > I *was* going to write back and say that you probably had either file > system corruption, or network errors. But now that I can see that you > have threads, I will revise that and say that you probably have a bug in > your thread handling code. > > I must say, Andrea, your initial post asking for help was EXTREMELY > misleading. You over-simplified the problem to the point that it no > longer has any connection to the reality of the code you are running. > Please don't send us on wild goose chases after bugs in code that you > aren't actually running. > > >> there seems to be no clear pattern and just randmoly fails. > > When you start using threads, you have to expect these sorts of > intermittent bugs unless you are very careful. > > My guess is that you have a bug where two threads read from the same file > at the same time. Since each read shares state (the position of the file > pointer), you're going to get corruption. Because it depends on timing > details of which threads do what at exactly which microsecond, the effect > might as well be random. > > Example: suppose the file contains three blocks A B and C, and a > checksum. Thread 8 starts reading the file, and gets block A and B. Then > thread 2 starts reading it as well, and gets half of block C. Thread 8 > gets the rest of block C, calculates the checksum, and it doesn't match. > > I recommend that you run a file system check on the remote disk. If it > passes, you can eliminate file system corruption. Also, run some network > diagnostics, to eliminate corruption introduced in the network layer. But > I expect that you won't find anything there, and the problem is a simple > thread bug. Simple, but really, really hard to find. > > Good luck. > Thanks a lot, that makes a lot of sense.. I haven't given this detail before because I didn't write this code, and I forgot that there were threads involved completely, I'm just trying to help to fix this bug. Your explanation makes a lot of sense, but it's still surprising that even just reading files without ever writing them can cause troubles using threads :/
[toc] | [prev] | [next] | [standalone]
| From | Laszlo Nagy <gandalf@shopzeus.com> |
|---|---|
| Date | 2012-08-01 19:05 +0200 |
| Message-ID | <mailman.2840.1343840725.4697.python-list@python.org> |
| In reply to | #26368 |
> Thanks a lot, that makes a lot of sense.. I haven't given this detail > before because I didn't write this code, and I forgot that there were > threads involved completely, I'm just trying to help to fix this bug. > > Your explanation makes a lot of sense, but it's still surprising that > even just reading files without ever writing them can cause troubles > using threads :/ Make sure that file objects are not shared between threads. If that is possible. It will probably solve the problem (if that is related to threads).
[toc] | [prev] | [next] | [standalone]
| From | andrea crotti <andrea.crotti.0@gmail.com> |
|---|---|
| Date | 2012-08-01 18:17 +0100 |
| Message-ID | <mailman.2843.1343841485.4697.python-list@python.org> |
| In reply to | #26368 |
2012/8/1 Laszlo Nagy <gandalf@shopzeus.com>:
>
>> Thanks a lot, that makes a lot of sense.. I haven't given this detail
>> before because I didn't write this code, and I forgot that there were
>> threads involved completely, I'm just trying to help to fix this bug.
>>
>> Your explanation makes a lot of sense, but it's still surprising that
>> even just reading files without ever writing them can cause troubles
>> using threads :/
>
> Make sure that file objects are not shared between threads. If that is
> possible. It will probably solve the problem (if that is related to
> threads).
Well I just have to create a lock I guess right?
with lock:
# open file
# read content
[toc] | [prev] | [next] | [standalone]
| From | Laszlo Nagy <gandalf@shopzeus.com> |
|---|---|
| Date | 2012-08-01 19:57 +0200 |
| Message-ID | <mailman.2845.1343843853.4697.python-list@python.org> |
| In reply to | #26368 |
>> Make sure that file objects are not shared between threads. If that is
>> possible. It will probably solve the problem (if that is related to
>> threads).
>
> Well I just have to create a lock I guess right?
That is also a solution. You need to call file.read() inside an acquired
lock.
> with lock:
> # open file
> # read content
>
But not that way! Your example will keep the lock acquired for the
lifetime of the file, so it cannot be shared between threads.
More likely:
## Open file
lock = threading.Lock()
fin = gzip.open(file_path...)
# Now you can share the file object between threads.
# and do this inside any thread:
## data needed. block until the file object becomes usable.
with lock:
data = fin.read(....) # other threads are blocked while I'm reading
## use your data here, meanwhile other threads can read
[toc] | [prev] | [next] | [standalone]
| From | Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> |
|---|---|
| Date | 2012-08-02 10:49 +0200 |
| Message-ID | <3hrpe9-hbi.ln1@satorlaser.homedns.org> |
| In reply to | #26375 |
Am 01.08.2012 19:57, schrieb Laszlo Nagy: > ## Open file > lock = threading.Lock() > fin = gzip.open(file_path...) > # Now you can share the file object between threads. > > # and do this inside any thread: > ## data needed. block until the file object becomes usable. > with lock: > data = fin.read(....) # other threads are blocked while I'm reading > ## use your data here, meanwhile other threads can read Technically, that is correct, but IMHO its complete nonsense to share the file object between threads in the first place. If you need the data in two threads, just read the file once and then share the read-only, immutable content. If the file is small or too large to be held in memory at once, just open and read it on demand. This also saves you from having to rewind the file every time you read it. Am I missing something? Uli
[toc] | [prev] | [next] | [standalone]
| From | Laszlo Nagy <gandalf@shopzeus.com> |
|---|---|
| Date | 2012-08-02 12:14 +0200 |
| Message-ID | <mailman.2862.1343902460.4697.python-list@python.org> |
| In reply to | #26386 |
> Technically, that is correct, but IMHO its complete nonsense to share > the file object between threads in the first place. If you need the > data in two threads, just read the file once and then share the > read-only, immutable content. If the file is small or too large to be > held in memory at once, just open and read it on demand. This also > saves you from having to rewind the file every time you read it. > > Am I missing something? We suspect that his program reads the same file object from different threads. At least this would explain his problem. I agree with you - usually it is not a good idea to share a file object between threads. This is what I told him the first time. But it is not in our hands - he already has a program that needs to be fixed. It might be easier for him to protect read() calls with a lock. Because it can be done automatically, without thinking too much.
[toc] | [prev] | [next] | [standalone]
| From | andrea crotti <andrea.crotti.0@gmail.com> |
|---|---|
| Date | 2012-08-02 10:26 +0100 |
| Message-ID | <mailman.2859.1343899619.4697.python-list@python.org> |
| In reply to | #26368 |
2012/8/1 Steven D'Aprano <steve+comp.lang.python@pearwood.info>:
>
> When you start using threads, you have to expect these sorts of
> intermittent bugs unless you are very careful.
>
> My guess is that you have a bug where two threads read from the same file
> at the same time. Since each read shares state (the position of the file
> pointer), you're going to get corruption. Because it depends on timing
> details of which threads do what at exactly which microsecond, the effect
> might as well be random.
>
> Example: suppose the file contains three blocks A B and C, and a
> checksum. Thread 8 starts reading the file, and gets block A and B. Then
> thread 2 starts reading it as well, and gets half of block C. Thread 8
> gets the rest of block C, calculates the checksum, and it doesn't match.
>
> I recommend that you run a file system check on the remote disk. If it
> passes, you can eliminate file system corruption. Also, run some network
> diagnostics, to eliminate corruption introduced in the network layer. But
> I expect that you won't find anything there, and the problem is a simple
> thread bug. Simple, but really, really hard to find.
>
> Good luck.
One last thing I would like to do before I add this fix is to actually
be able to reproduce this behaviour, and I thought I could just do the
following:
import gzip
import threading
class OpenAndRead(threading.Thread):
def run(self):
fz = gzip.open('out2.txt.gz')
fz.read()
fz.close()
if __name__ == '__main__':
for i in range(100):
OpenAndRead().start()
But no matter how many threads I start, I can't reproduce the CRC
error, any idea how I can try to help it happening?
The code in run should be shared by all the threads since there are no
locks, right?
[toc] | [prev] | [next] | [standalone]
| From | Laszlo Nagy <gandalf@shopzeus.com> |
|---|---|
| Date | 2012-08-02 12:21 +0200 |
| Message-ID | <mailman.2863.1343902890.4697.python-list@python.org> |
| In reply to | #26368 |
> One last thing I would like to do before I add this fix is to actually
> be able to reproduce this behaviour, and I thought I could just do the
> following:
>
> import gzip
> import threading
>
>
> class OpenAndRead(threading.Thread):
> def run(self):
> fz = gzip.open('out2.txt.gz')
> fz.read()
> fz.close()
>
>
> if __name__ == '__main__':
> for i in range(100):
> OpenAndRead().start()
>
>
> But no matter how many threads I start, I can't reproduce the CRC
> error, any idea how I can try to help it happening?
Your example did not share the file object between threads. Here an
example that does that:
class OpenAndRead(threading.Thread):
def run(self):
global fz
fz.read(100)
if __name__ == '__main__':
fz = gzip.open('out2.txt.gz')
for i in range(10):
OpenAndRead().start()
Try this with a huge file. And here is the one that should never throw
CRC error, because the file object is protected by a lock:
class OpenAndRead(threading.Thread):
def run(self):
global fz
global fl
with fl:
fz.read(100)
if __name__ == '__main__':
fz = gzip.open('out2.txt.gz')
fl = threading.Lock()
for i in range(2):
OpenAndRead().start()
>
> The code in run should be shared by all the threads since there are no
> locks, right?
The code is shared but the file object is not. In your example, a new
file object is created, every time a thread is started.
[toc] | [prev] | [next] | [standalone]
| From | andrea crotti <andrea.crotti.0@gmail.com> |
|---|---|
| Date | 2012-08-02 11:57 +0100 |
| Message-ID | <mailman.2867.1343905029.4697.python-list@python.org> |
| In reply to | #26368 |
2012/8/2 Laszlo Nagy <gandalf@shopzeus.com>:
>
> Your example did not share the file object between threads. Here an example
> that does that:
>
> class OpenAndRead(threading.Thread):
> def run(self):
> global fz
> fz.read(100)
>
> if __name__ == '__main__':
>
> fz = gzip.open('out2.txt.gz')
> for i in range(10):
> OpenAndRead().start()
>
> Try this with a huge file. And here is the one that should never throw CRC
> error, because the file object is protected by a lock:
>
> class OpenAndRead(threading.Thread):
> def run(self):
> global fz
> global fl
> with fl:
> fz.read(100)
>
> if __name__ == '__main__':
>
> fz = gzip.open('out2.txt.gz')
> fl = threading.Lock()
> for i in range(2):
> OpenAndRead().start()
>
>
>>
>> The code in run should be shared by all the threads since there are no
>> locks, right?
>
> The code is shared but the file object is not. In your example, a new file
> object is created, every time a thread is started.
>
Ok sure that makes sense, but then this explanation is maybe not right
anymore, because I'm quite sure that the file object is *not* shared
between threads, everything happens inside a thread..
I managed to get some errors doing this with a big file
class OpenAndRead(threading.Thread):
def run(self):
global fz
fz.read(100)
if __name__ == '__main__':
fz = gzip.open('bigfile.avi.gz')
for i in range(20):
OpenAndRead().start()
and it doesn't fail without the *global*, but this is definitively not
what the code does, because every thread gets a new file object, it's
not shared..
Anyway we'll read once for all the threads or add the lock, and
hopefully it should solve the problem, even if I'm not convinced yet
that it was this.
[toc] | [prev] | [next] | [standalone]
| From | andrea crotti <andrea.crotti.0@gmail.com> |
|---|---|
| Date | 2012-08-02 11:59 +0100 |
| Message-ID | <mailman.2868.1343905170.4697.python-list@python.org> |
| In reply to | #26368 |
2012/8/2 andrea crotti <andrea.crotti.0@gmail.com>:
>
> Ok sure that makes sense, but then this explanation is maybe not right
> anymore, because I'm quite sure that the file object is *not* shared
> between threads, everything happens inside a thread..
>
> I managed to get some errors doing this with a big file
> class OpenAndRead(threading.Thread):
> def run(self):
> global fz
> fz.read(100)
>
> if __name__ == '__main__':
>
> fz = gzip.open('bigfile.avi.gz')
> for i in range(20):
> OpenAndRead().start()
>
> and it doesn't fail without the *global*, but this is definitively not
> what the code does, because every thread gets a new file object, it's
> not shared..
>
> Anyway we'll read once for all the threads or add the lock, and
> hopefully it should solve the problem, even if I'm not convinced yet
> that it was this.
Just for completeness as suggested this also does not fail:
class OpenAndRead(threading.Thread):
def __init__(self, lock):
threading.Thread.__init__(self)
self.lock = lock
def run(self):
global fz
with self.lock:
fz.read(100)
if __name__ == '__main__':
lock = threading.Lock()
fz = gzip.open('bigfile.avi.gz')
for i in range(20):
OpenAndRead(lock).start()
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web