Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #70591 > unrolled thread
| Started by | Matthew Pounsett <matt.pounsett@gmail.com> |
|---|---|
| First post | 2014-04-25 06:43 -0700 |
| Last post | 2014-04-27 07:18 -0700 |
| Articles | 8 — 3 participants |
Back to article view | Back to comp.lang.python
MacOS 10.9.2: threading error using python.org 2.7.6 distribution Matthew Pounsett <matt.pounsett@gmail.com> - 2014-04-25 06:43 -0700
Re: MacOS 10.9.2: threading error using python.org 2.7.6 distribution Chris Angelico <rosuav@gmail.com> - 2014-04-26 00:05 +1000
Re: MacOS 10.9.2: threading error using python.org 2.7.6 distribution Matthew Pounsett <matt.pounsett@gmail.com> - 2014-04-27 07:16 -0700
Re: MacOS 10.9.2: threading error using python.org 2.7.6 distribution Chris Angelico <rosuav@gmail.com> - 2014-04-28 00:33 +1000
Re: MacOS 10.9.2: threading error using python.org 2.7.6 distribution Matthew Pounsett <matt.pounsett@gmail.com> - 2014-04-28 15:50 -0700
Re: MacOS 10.9.2: threading error using python.org 2.7.6 distribution Chris Angelico <rosuav@gmail.com> - 2014-04-29 09:00 +1000
Re: MacOS 10.9.2: threading error using python.org 2.7.6 distribution Ned Deily <nad@acm.org> - 2014-04-25 11:58 -0700
Re: MacOS 10.9.2: threading error using python.org 2.7.6 distribution Matthew Pounsett <matt.pounsett@gmail.com> - 2014-04-27 07:18 -0700
| From | Matthew Pounsett <matt.pounsett@gmail.com> |
|---|---|
| Date | 2014-04-25 06:43 -0700 |
| Subject | MacOS 10.9.2: threading error using python.org 2.7.6 distribution |
| Message-ID | <a087ae04-a87c-4e77-a7dd-2d883d30a6f0@googlegroups.com> |
I've run into a threading error in some code when I run it on MacOS that works flawlessly on a *BSD system running the same version of python. I'm running the python 2.7.6 for MacOS distribution from python.org's downloads page.
I have tried to reproduce the error with a simple example, but so far haven't been able to find the element or my code that triggers the error. I'm hoping someone can suggest some things to try and/or look at. Googling for "pyton" and the error returns exactly two pages, neither of which are any help.
When I run it through the debugger, I'm getting the following from inside threading.start(). python fails to provide a stack trace when I step into _start_new_thread(), which is a pointer to thread.start_new_thread(). It looks like threading.__bootstrap_inner() may be throwing an exception which thread.start_new_thread() is unable to handle, and for some reason the stack is missing so I get no stack trace explaining the error.
It looks like thread.start_new_thread() is in the binary object, so I can't actually step into it and find where the error is occurring.
> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py(745)start()
-> _start_new_thread(self.__bootstrap, ())
(Pdb) s
> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py(750)start()
-> self.__started.wait()
(Pdb) Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
Warning: No stack to get attribute from
My test code (which works) follows the exact same structure as the failing code, making the same calls to the threading module's objects' methods:
----
import threading
class MyThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
print "MyThread runs and exits."
def main():
try:
t = MyThread()
t.start()
except Exception as e:
print "Failed with {!r}".format(e)
if __name__ == '__main__':
main()
----
The actual thread object that's failing looks like this:
class RTF2TXT(threading.Thread):
"""
Takes a directory path and a Queue as arguments. The directory should be
a collection of RTF files, which will be read one-by-one, converted to
text, and each output line will be appended in order to the Queue.
"""
def __init__(self, path, queue):
threading.Thread.__init__(self)
self.path = path
self.queue = queue
def run(self):
logger = logging.getLogger('RTF2TXT')
if not os.path.isdir(self.path):
raise TypeError, "supplied path must be a directory"
for f in sorted(os.listdir(self.path)):
ff = os.path.join(self.path, f)
args = [ UNRTF_BIN, '-P', '.', '-t', 'unrtf.text', ff ]
logger.debug("Processing file {} with args {!r}".format(f, args))
p1 = subprocess.Popen( args, stdout=subprocess.PIPE,
universal_newlines=True)
output = p1.communicate()[0]
try:
output = output.decode('utf-8', 'ignore')
except Exception as err:
logger.error("Failed to decode output: {}".format(err))
logger.error("Output was: {!r}".format(output))
for line in output.split("\n"):
line = line.strip()
self.queue.put(line)
self.queue.put("<EOF>")
Note: I only run one instance of this thread. The Queue object is used to pass work off to another thread for later processing.
If I insert that object into the test code and run it instead of MyThread(), I get the error. I can't see anything in there that should cause problems for the threading module though... especially since this runs fine on another system with the same version of python.
Any thoughts on what's going on here?
[toc] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-04-26 00:05 +1000 |
| Message-ID | <mailman.9495.1398434710.18130.python-list@python.org> |
| In reply to | #70591 |
On Fri, Apr 25, 2014 at 11:43 PM, Matthew Pounsett <matt.pounsett@gmail.com> wrote: > If I insert that object into the test code and run it instead of MyThread(), I get the error. I can't see anything in there that should cause problems for the threading module though... especially since this runs fine on another system with the same version of python. > > Any thoughts on what's going on here? First culprit I'd look at is the mixing of subprocess and threading. It's entirely possible that something goes messy when you fork from a thread. Separately: You're attempting a very messy charset decode there. You attempt to decode as UTF-8, errors ignored, and if that fails, you log an error... and continue on with the original bytes. You're risking shooting yourself in the foot there; I would recommend you have an explicit fall-back (maybe re-decode as Latin-1??), so the next code is guaranteed to be working with Unicode. Currently, it might get a unicode or a str. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Matthew Pounsett <matt.pounsett@gmail.com> |
|---|---|
| Date | 2014-04-27 07:16 -0700 |
| Message-ID | <675725e3-38d2-4d81-bf64-f6d903d4a684@googlegroups.com> |
| In reply to | #70593 |
On Friday, 25 April 2014 10:05:03 UTC-4, Chris Angelico wrote:
> First culprit I'd look at is the mixing of subprocess and threading.
> It's entirely possible that something goes messy when you fork from a
> thread.
I liked the theory, but I've run some tests and can't reproduce the error that way. I'm using all the elements in my test code that the real code runs, and I can't get the same error. Even when I deliberately break things I'm getting a proper exception with stack trace.
class MyThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
logger = logging.getLogger("thread")
p1 = subprocess.Popen( shlex.split( 'echo "MyThread calls echo."'),
stdout=subprocess.PIPE, universal_newlines=True)
logger.debug( p1.communicate()[0].decode('utf-8', 'ignore' ))
logger.debug( "MyThread runs and exits." )
def main():
console = logging.StreamHandler()
console.setFormatter(
logging.Formatter('%(asctime)s [%(name)-12s] %(message)s', '%T'))
logger = logging.getLogger()
logger.addHandler(console)
logger.setLevel(logging.NOTSET)
try:
t = MyThread()
#t = RTF2TXT("../data/SRD/rtf/", Queue.Queue())
t.start()
except Exception as e:
logger.error( "Failed with {!r}".format(e))
if __name__ == '__main__':
main()
> Separately: You're attempting a very messy charset decode there. You
> attempt to decode as UTF-8, errors ignored, and if that fails, you log
> an error... and continue on with the original bytes. You're risking
> shooting yourself in the foot there; I would recommend you have an
> explicit fall-back (maybe re-decode as Latin-1??), so the next code is
> guaranteed to be working with Unicode. Currently, it might get a
> unicode or a str.
Yeah, that was a logic error on my part that I hadn't got around to noticing, since I'd been concentrating on the stuff that was actively breaking. That should have been in an else: block on the end of the try.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-04-28 00:33 +1000 |
| Message-ID | <mailman.9531.1398609228.18130.python-list@python.org> |
| In reply to | #70651 |
On Mon, Apr 28, 2014 at 12:16 AM, Matthew Pounsett
<matt.pounsett@gmail.com> wrote:
> On Friday, 25 April 2014 10:05:03 UTC-4, Chris Angelico wrote:
>> First culprit I'd look at is the mixing of subprocess and threading.
>> It's entirely possible that something goes messy when you fork from a
>> thread.
>
> I liked the theory, but I've run some tests and can't reproduce the error that way. I'm using all the elements in my test code that the real code runs, and I can't get the same error. Even when I deliberately break things I'm getting a proper exception with stack trace.
>
In most contexts, "thread unsafe" simply means that you can't use the
same facilities simultaneously from two threads (eg a lot of database
connection libraries are thread unsafe with regard to a single
connection, as they'll simply write to a pipe or socket and then read
a response from it). But processes and threads are, on many systems,
linked. Just the act of spinning off a new thread and then forking can
potentially cause problems. Those are the exact sorts of issues that
you'll see when you switch OSes, as it's the underlying thread/process
model that's significant. (Particularly of note is that Windows is
*very* different from Unix-based systems, in that subprocess
management is not done by forking. But not applicable here.)
You may want to have a look at subprocess32, which Ned pointed out. I
haven't checked, but I would guess that its API is identical to
subprocess's, so it should be a drop-in replacement ("import
subprocess32 as subprocess"). If that produces the exact same results,
then it's (probably) not thread-safety that's the problem.
>> Separately: You're attempting a very messy charset decode there. You
>> attempt to decode as UTF-8, errors ignored, and if that fails, you log
>> an error... and continue on with the original bytes. You're risking
>> shooting yourself in the foot there; I would recommend you have an
>> explicit fall-back (maybe re-decode as Latin-1??), so the next code is
>> guaranteed to be working with Unicode. Currently, it might get a
>> unicode or a str.
>
> Yeah, that was a logic error on my part that I hadn't got around to noticing, since I'd been concentrating on the stuff that was actively breaking. That should have been in an else: block on the end of the try.
>
Ah good. Keeping bytes versus text separate is something that becomes
particularly important in Python 3, so I always like to encourage
people to get them straight even in Py2. It'll save you some hassle
later on.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Matthew Pounsett <matt.pounsett@gmail.com> |
|---|---|
| Date | 2014-04-28 15:50 -0700 |
| Message-ID | <b442b64d-b79c-4c0e-9df4-e3c3ae47ee9e@googlegroups.com> |
| In reply to | #70654 |
On Sunday, 27 April 2014 10:33:38 UTC-4, Chris Angelico wrote:
> In most contexts, "thread unsafe" simply means that you can't use the
> same facilities simultaneously from two threads (eg a lot of database
> connection libraries are thread unsafe with regard to a single
> connection, as they'll simply write to a pipe or socket and then read
> a response from it). But processes and threads are, on many systems,
> linked. Just the act of spinning off a new thread and then forking can
> potentially cause problems. Those are the exact sorts of issues that
> you'll see when you switch OSes, as it's the underlying thread/process
> model that's significant. (Particularly of note is that Windows is
> *very* different from Unix-based systems, in that subprocess
> management is not done by forking. But not applicable here.)
>
Thanks, I'll keep all that in mind. I have to wonder how much of a problem it is here though, since I was able to demonstrate a functioning fork inside a new thread further up in the discussion.
I have a new development that I find interesting, and I'm wondering if you still think it's the same problem.
I have taken that threading object and turned it into a normal function definition. It's still forking the external tool, but it's doing so in the main thread, and it is finished execution before any other threads are created. And I'm still getting the same error.
Turns out it's not coming from the threading module, but from the subprocess module instead. Specifically, like 709 of /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py
which is this:
try:
self._execute_child(args, executable, preexec_fn, close_fds,
cwd, env, universal_newlines,
startupinfo, creationflags, shell, to_close,
p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite)
except Exception:
I get the "Warning: No stack to get attribute from" twice when that self._execute_child() call is made. I've tried stepping into it to narrow it down further, but I'm getting weird behaviour from the debugger that I've never seen before once I do that. It's making it hard to track down exactly where the error is occurring.
Interestingly, it's not actually raising an exception there. The except block is not being run.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-04-29 09:00 +1000 |
| Message-ID | <mailman.9564.1398726034.18130.python-list@python.org> |
| In reply to | #70699 |
On Tue, Apr 29, 2014 at 8:50 AM, Matthew Pounsett <matt.pounsett@gmail.com> wrote: > Thanks, I'll keep all that in mind. I have to wonder how much of a problem it is here though, since I was able to demonstrate a functioning fork inside a new thread further up in the discussion. > Yeah, it's really hard to pin down sometimes. I once discovered a problem whereby I was unable to spin off subprocesses that did certain things, but I could do a trivial subprocess (I think I fork/exec'd to the echo command or something) and that worked fine. Turned out to be a bug in one of my signal handlers, but the error was being reported at the point of the forking. > I have a new development that I find interesting, and I'm wondering if you still think it's the same problem. > > I have taken that threading object and turned it into a normal function definition. It's still forking the external tool, but it's doing so in the main thread, and it is finished execution before any other threads are created. And I'm still getting the same error. > Interesting. That ought to eliminate all possibility of thread-vs-process issues. Can you post the smallest piece of code that exhibits the same failure? ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Ned Deily <nad@acm.org> |
|---|---|
| Date | 2014-04-25 11:58 -0700 |
| Message-ID | <mailman.9511.1398452352.18130.python-list@python.org> |
| In reply to | #70591 |
In article <CAPTjJmpXuj9N3cdQcH0oJaVkSfVrqJWHH1GSt3FafkcGyw54Ag@mail.gmail.com>, Chris Angelico <rosuav@gmail.com> wrote: > On Fri, Apr 25, 2014 at 11:43 PM, Matthew Pounsett > <matt.pounsett@gmail.com> wrote: > > If I insert that object into the test code and run it instead of > > MyThread(), I get the error. I can't see anything in there that should > > cause problems for the threading module though... especially since this > > runs fine on another system with the same version of python. > > > > Any thoughts on what's going on here? > > First culprit I'd look at is the mixing of subprocess and threading. > It's entirely possible that something goes messy when you fork from a > thread. FWIW, the Python 2 version of subprocess is known to be thread-unsafe. There is a Py2 backport available on PyPI of the improved Python 3 subprocess module: http://bugs.python.org/issue20318 https://pypi.python.org/pypi/subprocess32/ -- Ned Deily, nad@acm.org
[toc] | [prev] | [next] | [standalone]
| From | Matthew Pounsett <matt.pounsett@gmail.com> |
|---|---|
| Date | 2014-04-27 07:18 -0700 |
| Message-ID | <a1bb450b-30d7-4559-9afb-f82932657446@googlegroups.com> |
| In reply to | #70615 |
On Friday, 25 April 2014 14:58:56 UTC-4, Ned Deily wrote: > FWIW, the Python 2 version of subprocess is known to be thread-unsafe. > There is a Py2 backport available on PyPI of the improved Python 3 > subprocess module: Since that't the only thread that calls anything in subprocess, and I'm only running one instance of the thread, I'm not too concerned about how threadsafe subprocess is. In this case it shouldn't matter. Thanks for the info though.. that might be handy at some future point.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web