Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #60169 > unrolled thread
| Started by | John Ladasky <john_ladasky@sbcglobal.net> |
|---|---|
| First post | 2013-11-21 09:01 -0800 |
| Last post | 2013-11-22 09:09 +0000 |
| Articles | 13 — 5 participants |
Back to article view | Back to comp.lang.python
Traceback when using multiprocessing, less than helpful? John Ladasky <john_ladasky@sbcglobal.net> - 2013-11-21 09:01 -0800
Re: Traceback when using multiprocessing, less than helpful? Chris Angelico <rosuav@gmail.com> - 2013-11-22 04:24 +1100
Re: Traceback when using multiprocessing, less than helpful? John Ladasky <john_ladasky@sbcglobal.net> - 2013-11-21 10:25 -0800
Re: Traceback when using multiprocessing, less than helpful? Chris Angelico <rosuav@gmail.com> - 2013-11-22 07:53 +1100
Re: Traceback when using multiprocessing, less than helpful? John Ladasky <john_ladasky@sbcglobal.net> - 2013-11-21 13:19 -0800
Re: Traceback when using multiprocessing, less than helpful? John Ladasky <john_ladasky@sbcglobal.net> - 2013-11-21 13:49 -0800
Re: Traceback when using multiprocessing, less than helpful? Ethan Furman <ethan@stoneleaf.us> - 2013-11-21 14:32 -0800
Re: Traceback when using multiprocessing, less than helpful? Terry Reedy <tjreedy@udel.edu> - 2013-11-21 17:37 -0500
Re: Traceback when using multiprocessing, less than helpful? John Ladasky <john_ladasky@sbcglobal.net> - 2013-11-21 19:57 -0800
Re: Traceback when using multiprocessing, less than helpful? Chris Angelico <rosuav@gmail.com> - 2013-11-22 15:24 +1100
Why pickling (was: Traceback when using multiprocessing) John Ladasky <john_ladasky@sbcglobal.net> - 2013-11-22 08:38 -0800
Re: Why pickling (was: Traceback when using multiprocessing) Chris Angelico <rosuav@gmail.com> - 2013-11-23 10:50 +1100
Re: Traceback when using multiprocessing, less than helpful? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-22 09:09 +0000
| From | John Ladasky <john_ladasky@sbcglobal.net> |
|---|---|
| Date | 2013-11-21 09:01 -0800 |
| Subject | Traceback when using multiprocessing, less than helpful? |
| Message-ID | <e92311cb-6cc5-415a-bbf8-544c0c9c6a54@googlegroups.com> |
Hi folks,
Somewhat over a year ago, I struggled with implementing a routine using multiprocessing.Pool and numpy. I eventually succeeded, but I remember finding it very hard to debug. Now I have managed to provoke an error from that routine again, and once again, I'm struggling.
Here is the end of the traceback, starting with the last line of my code: "result = pool.map(evaluate, bundles)". After that, I'm into Python itself.
File ".../evaluate.py", line 81, in evaluate
result = pool.map(evaluate, bundles)
File "/usr/lib/python3.3/multiprocessing/pool.py", line 228, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.3/multiprocessing/pool.py", line 564, in get
raise self._value
ValueError: operands could not be broadcast together with shapes (1,3) (4)
Notice that no line of numpy appears in the traceback? Still, there are three things that make me think that this error is coming from numpy.
1. "raise self._value" means that an exception is stored in a variable, to be re-raised.
2. The words "operands" and "broadcast" do not appear anywhere in the source code of multiprocessing.pool.
3. The words "operands" and "broadcast" are common to numpy errors I have seen before. Numpy does many very tricky things when dealing with arrays of different dimensions and shapes.
Of course, I am sure that the bug must be in my own code. I even have old programs which are using my evaluate.evaluate() without generating errors. I am comparing the data structures that my working and my non-working programs send to pool.map(). I am comparing the code between my two programs. There is some subtle difference that I haven't spotted.
If I could only see the line of numpy code which is generating the ValueError, I would have a better chance of spotting the bug in my code. So, WHY isn't there any reference to numpy in my traceback?
Here's my theory. The numpy error was generated in a subprocess. The line "raise self._value" is intercepting the exception generated by my subprocess, and passing it back to the master Python interpreter.
Does re-raising an exception, and/or passing an exception from a subprocess, truncate a traceback? That's what I think I'm seeing.
Thanks for any advice!
[toc] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-11-22 04:24 +1100 |
| Message-ID | <mailman.3013.1385054683.18130.python-list@python.org> |
| In reply to | #60169 |
On Fri, Nov 22, 2013 at 4:01 AM, John Ladasky <john_ladasky@sbcglobal.net> wrote: > Here is the end of the traceback, starting with the last line of my code: "result = pool.map(evaluate, bundles)". After that, I'm into Python itself. > > File ".../evaluate.py", line 81, in evaluate > result = pool.map(evaluate, bundles) > File "/usr/lib/python3.3/multiprocessing/pool.py", line 228, in map > return self._map_async(func, iterable, mapstar, chunksize).get() > File "/usr/lib/python3.3/multiprocessing/pool.py", line 564, in get > raise self._value > ValueError: operands could not be broadcast together with shapes (1,3) (4) > > Notice that no line of numpy appears in the traceback? Still, there are three things that make me think that this error is coming from numpy. Hmm. This looks like a possible need for the 'raise from' syntax. I just checked multiprocessing/pool.py from 3.4 alpha, and it has much what you're seeing there, in the definition of AsyncResult (of which MapResult is a subclass). The question is, though, how well does the information traverse the process boundary? ChrisA
[toc] | [prev] | [next] | [standalone]
| From | John Ladasky <john_ladasky@sbcglobal.net> |
|---|---|
| Date | 2013-11-21 10:25 -0800 |
| Message-ID | <9eb71131-7ca0-4a21-a8f3-98371ee8787e@googlegroups.com> |
| In reply to | #60171 |
On Thursday, November 21, 2013 9:24:33 AM UTC-8, Chris Angelico wrote: > Hmm. This looks like a possible need for the 'raise from' syntax. Thank you, Chris, that made me feel like a REAL Python programmer -- I just did some reading, and the "raise from" feature was not implemented until Python 3! And I might actually need it! :^) I think that the article http://www.python.org/dev/peps/pep-3134/ is relevant. Reading it now. To be clear: the complete exception change is stored in every class, it's just not being displayed? I hope that's the case. I shouldn't have to install a "raise from" hook in multiprocessing.map_async itself.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-11-22 07:53 +1100 |
| Message-ID | <mailman.3014.1385067196.18130.python-list@python.org> |
| In reply to | #60172 |
On Fri, Nov 22, 2013 at 5:25 AM, John Ladasky <john_ladasky@sbcglobal.net> wrote: > On Thursday, November 21, 2013 9:24:33 AM UTC-8, Chris Angelico wrote: > >> Hmm. This looks like a possible need for the 'raise from' syntax. > > Thank you, Chris, that made me feel like a REAL Python programmer -- I just did some reading, and the "raise from" feature was not implemented until Python 3! And I might actually need it! :^) > > I think that the article http://www.python.org/dev/peps/pep-3134/ is relevant. Reading it now. To be clear: the complete exception change is stored in every class, it's just not being displayed? I hope that's the case. I shouldn't have to install a "raise from" hook in multiprocessing.map_async itself. > That PEP is all about the 'raise from' notation, yes; but the exception chaining is presumably not being stored, or else you would be able to see it in the default printout. So the best solution to this is, most likely, a patch to multiprocessing to have it chain exceptions properly. I think that would be considered a bugfix, and thus back-ported to all appropriate versions (rather than a feature enhancement that goes in 3.4 or 3.5 only). What you could try is printing out the __cause__ and __context__ of the exception, to see if there's anything useful in them; if there's nothing, the next thing to try would be some kind of wrapper in your inner handler (the evaluate function) that retains additional information. Oh, something else to try: It might be that the proper exception chaining would happen, except that the info isn't traversing processes properly due to pickling or something. Can you patch your code to use threading instead of multiprocessing? That might reveal something. (Don't worry about abysmal performance at this stage.) Hopefully someone with more knowledge of Python's internals can help out, here. One way or another, I suspect this will result in a tracker issue. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | John Ladasky <john_ladasky@sbcglobal.net> |
|---|---|
| Date | 2013-11-21 13:19 -0800 |
| Message-ID | <46e03756-a242-4cdd-a5c0-30fcff34c98c@googlegroups.com> |
| In reply to | #60173 |
On Thursday, November 21, 2013 12:53:07 PM UTC-8, Chris Angelico wrote:
> What you could try is
Suggestion 1:
> printing out the __cause__ and __context__ of
> the exception, to see if there's anything useful in them;
Suggestion 2:
> if there's
> nothing, the next thing to try would be some kind of wrapper in your
> inner handler (the evaluate function) that retains additional
> information.
Suggestion 3:
> Oh, something else to try: It might be that the proper exception
> chaining would happen, except that the info isn't traversing processes
> properly due to pickling or something. Can you patch your code to use
> threading instead of multiprocessing? That might reveal something.
> (Don't worry about abysmal performance at this stage.)
I have tried the first suggestion, at the top level of my code. Here are the modified lines, and the output:
==============================================
try:
out = evaluate(net, domain)
except ValueError as e:
print(type(e))
print(e) # this just produces the exception string itself
print(e.__context__)
print(e.__cause__)
raise e # just so my program actually stops
==============================================
<class 'ValueError'>
operands could not be broadcast together with shapes (1,3) (4)
None
None
==============================================
So, once I catch the exception, both __context__ and __cause__ are undefined.
I will proceed as you have suggested -- but if anything comes to mind based on what I have already done, please feel free to chime in!
[toc] | [prev] | [next] | [standalone]
| From | John Ladasky <john_ladasky@sbcglobal.net> |
|---|---|
| Date | 2013-11-21 13:49 -0800 |
| Message-ID | <e90dbd18-97ae-4e92-b521-5818d015244a@googlegroups.com> |
| In reply to | #60175 |
Followup: I didn't need to go as far as Chris Angelico's second suggestion. I haven't looked at certain parts of my own code for a while, but it turns out that I wrote it REASONABLY logically... My evaluate() calls another function through pool.map_async() -- _evaluate(), which actually processes the data, on a single CPU. So I didn't need to hassle with threading, as Chris suggested. All I did was to import _evaluate in my top-level code, then change my function calls from evaluate() to _evaluate(). Out popped my numpy error, with a proper traceback. I can now debug it! I can probably refactor my code to make it even cleaner. I'll have to deal with the fact that pool.map() requires that all arguments to each subprocess be submitted as a single, iterable object. I didn't want to have to do this when I only had a single process to run, but perhaps the tradeoff will be acceptable. So now, for anyone who is still reading this: is it your opinion that the traceback that I obtained through multiprocessing.pool._map_async().get() SHOULD have allowed me to see what the ultimate cause of the exception was? I think so. Is it a bug? Should I request a bugfix? How do I go about doing that?
[toc] | [prev] | [next] | [standalone]
| From | Ethan Furman <ethan@stoneleaf.us> |
|---|---|
| Date | 2013-11-21 14:32 -0800 |
| Message-ID | <mailman.3016.1385073129.18130.python-list@python.org> |
| In reply to | #60176 |
On 11/21/2013 01:49 PM, John Ladasky wrote: > > So now, for anyone who is still reading this: is it your > opinion that the traceback that I obtained through > multiprocessing.pool._map_async().get() SHOULD have allowed > me to see what the ultimate cause of the exception was? It would certainly be nice. > I think so. Is it a bug? Should I request a bugfix? How > do I go about doing that? Check out bugs.python.org. Search for multiprocessing and tracebacks to see if anything is already there; if not, create a new issue. -- ~Ethan~
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-11-21 17:37 -0500 |
| Message-ID | <mailman.3017.1385073452.18130.python-list@python.org> |
| In reply to | #60169 |
On 11/21/2013 12:01 PM, John Ladasky wrote:
This is a case where you need to dig into the code (or maybe docs) a bit
> File ".../evaluate.py", line 81, in evaluate
> result = pool.map(evaluate, bundles) File
> "/usr/lib/python3.3/multiprocessing/pool.py", line 228, in map
> return self._map_async(func, iterable, mapstar, chunksize).get()
The call to _map_async gets a blank MapResult (a subclass of
ApplyResult), queues tasks to fill it in, and returns the filled in
result. This call is designed to always return as task exceptions are
caught and assigned to MapResult._value in both ApplyResult._set and
MapResult._set.
result = MapResult(self._cache, chunksize, len(iterable), callback,
error_callback=error_callback)
self._taskqueue.put((((result._job, i, mapper, (x,), {})
for i, x in enumerate(task_batches)), None))
return result
It is the subsequent call to get() that 'fails', because it raises
the caught exception.
> File "/usr/lib/python3.3/multiprocessing/pool.py", line 564, in get
> raise self._value
ValueError: operands could not be broadcast together with shapes (1,3) (4)
> Notice that no line of numpy appears in the traceback? Still, there
> are three things that make me think that this error is coming from
> numpy.
It comes from one of your tasks as the 'result', and your tasks use numpy.
> If I could only see the line of numpy code which is generating the
> ValueError, I would have a better chance of spotting the bug in my
> code.
Definitely.
> So, WHY isn't there any reference to numpy in my traceback?
I suspect that raising the exception may replace its __traceback__
attribute. Anyway, there are three things I might try.
1. Use 3.3.3 or latest 3.4 to see if there is any improvement in output.
I vaguely remember a tracker issue that might be related.
2. _map_async takes an error_callback arg that defaults to None and
which is passed on to MapResult. When _value is set to an exception,
"error_callback(_value)" is called in ._set() before the later .get()
re-raises it. pool.map does not allow you to set either the (success)
callback or the error_callback, but pool.map_async does (this is the
difference between the two methods). So switch to the latter so you can
pass a function that uses the traceback module to print (or log) the
traceback attached to _value, assuming that there is one.
3. If that does not work, wrap the current body of your task function in
try: <current suite>
except exception as e:
<use traceback module to add traceback to message>
raise e <or a new exception>
--
Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | John Ladasky <john_ladasky@sbcglobal.net> |
|---|---|
| Date | 2013-11-21 19:57 -0800 |
| Message-ID | <081af7df-2330-4b8b-abbf-4707edfcc17a@googlegroups.com> |
| In reply to | #60169 |
On Thursday, November 21, 2013 2:32:08 PM UTC-8, Ethan Furman wrote: > Check out bugs.python.org. Search for multiprocessing and tracebacks to see > if anything is already there; if not, create a new issue. And on Thursday, November 21, 2013 2:37:13 PM UTC-8, Terry Reedy wrote: > 1. Use 3.3.3 or latest 3.4 to see if there is any improvement in output. > I vaguely remember a tracker issue that might be related. All right, there appear to be two recent bug reports which are relevant. http://bugs.python.org/issue13831 http://bugs.python.org/issue17836 The comments in the first link, from Richard Oudkerk, appear to indicate that pickling an Exception (so that it can be sent between processes) is difficult, perhaps impossible. I have never completely understood what can be pickled, and what cannot -- or, for that matter, why data needs to be pickled to pass it between processes. In any case, a string representation of the traceback can be pickled. For debugging purposes, that can still help. So, if I understand everything correctly, in this link... http://hg.python.org/cpython/rev/c4f92b597074/ ...Richard submits his "hack" (his description) to Python 3.4 which pickles and passes the string. When time permits, I'll try it out. Or maybe I'll wait, since Python 3.4.0 is still in alpha.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-11-22 15:24 +1100 |
| Message-ID | <mailman.3025.1385094254.18130.python-list@python.org> |
| In reply to | #60196 |
On Fri, Nov 22, 2013 at 2:57 PM, John Ladasky <john_ladasky@sbcglobal.net> wrote: > or, for that matter, why data needs to be pickled to pass it between processes. Oh, that part's easy. Let's leave the multiprocessing module out of it for the moment; imagine you spin up two completely separate instances of Python. Create some object in one of them; now, transfer it to the other. How are you going to do it? Ultimately, the operating system isn't going to give you facilities for moving complex objects around - what you almost exclusively get is streams of bytes (or occasionally messaged chunks with lengths, but still of bytes). Pickling is one method of turning an object into a stream of bytes, in such a way that it can be turned back into an equivalent object on the other side. And therein is the problem with exceptions; since the traceback includes references to stack frames and such, it's not as simple as saying "Two to beam up" and hearing the classic sound effect - somehow you need to transfer all the appropriate information across processes. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | John Ladasky <john_ladasky@sbcglobal.net> |
|---|---|
| Date | 2013-11-22 08:38 -0800 |
| Subject | Why pickling (was: Traceback when using multiprocessing) |
| Message-ID | <a8bf86c1-bf44-4bcf-813e-5ad4fdedde63@googlegroups.com> |
| In reply to | #60200 |
On Thursday, November 21, 2013 8:24:05 PM UTC-8, Chris Angelico wrote: > Oh, that part's easy. Let's leave the multiprocessing module out of it > for the moment; imagine you spin up two completely separate instances > of Python. Create some object in one of them; now, transfer it to the > other. How are you going to do it? For what definition of "completely separate"? If I have two instances of the same version of the Python interpreter running on the same hardware, and the same operating system, I expect I would just copy a block of memory from one interpreter to the other, and then write some new pointers. That kind of data sharing has to be the most common kind. It's also the simplest. I understand that pickling allows sharing of Python objects between Python interpreters even if those interpreters run on different CPU's with different memory architecture, different operating systems, etc. It just seems like overkill to me to use pickling in the simple case.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-11-23 10:50 +1100 |
| Subject | Re: Why pickling (was: Traceback when using multiprocessing) |
| Message-ID | <mailman.3058.1385164257.18130.python-list@python.org> |
| In reply to | #60240 |
On Sat, Nov 23, 2013 at 3:38 AM, John Ladasky <john_ladasky@sbcglobal.net> wrote: > On Thursday, November 21, 2013 8:24:05 PM UTC-8, Chris Angelico wrote: > >> Oh, that part's easy. Let's leave the multiprocessing module out of it >> for the moment; imagine you spin up two completely separate instances >> of Python. Create some object in one of them; now, transfer it to the >> other. How are you going to do it? > > For what definition of "completely separate"? > > If I have two instances of the same version of the Python interpreter running on the same hardware, and the same operating system, I expect I would just copy a block of memory from one interpreter to the other, and then write some new pointers. That kind of data sharing has to be the most common kind. It's also the simplest. Okay, so you copy a block of memory. Now how are you going to guarantee that you picked up everything that object references? Python objects frequently reference other objects: send_me = [1.0, 2.0, 3.0] The block of memory might have the addresses of those three floats, but that'll be invalid in the target. Somehow you need to package up this object and everything else you need. Ultimately, you need some system for turning a single object reference (a pointer, if you like) into the entire package of information needed to recreate that object on the other side. That's what pickling is. It's a compact (with people to fight for its compactness, there's current discussion elsewhere about that) format that can be easily transferred around, which refcounted blocks of memory can't. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2013-11-22 09:09 +0000 |
| Message-ID | <mailman.3031.1385111409.18130.python-list@python.org> |
| In reply to | #60196 |
On 22/11/2013 03:57, John Ladasky wrote: > > ...Richard submits his "hack" (his description) to Python 3.4 which pickles and passes the string. When time permits, I'll try it out. Or maybe I'll wait, since Python 3.4.0 is still in alpha. > FTR beta 1 is due this Saturday 24/11/2013. -- Python is the second best programming language in the world. But the best has yet to be invented. Christian Tismer Mark Lawrence
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web