Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #56646 > unrolled thread
| Started by | Peter Cacioppi <peter.cacioppi@gmail.com> |
|---|---|
| First post | 2013-10-10 23:01 -0700 |
| Last post | 2013-10-11 13:10 -0700 |
| Articles | 8 — 6 participants |
Back to article view | Back to comp.lang.python
Multi-threading in Python vs Java Peter Cacioppi <peter.cacioppi@gmail.com> - 2013-10-10 23:01 -0700
Re: Multi-threading in Python vs Java Cameron Simpson <cs@zip.com.au> - 2013-10-11 17:53 +1100
Re: Multi-threading in Python vs Java Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-10-11 09:30 +0000
Re: Multi-threading in Python vs Java Peter Cacioppi <peter.cacioppi@gmail.com> - 2013-10-11 01:41 -0700
Re: Multi-threading in Python vs Java Chris Angelico <rosuav@gmail.com> - 2013-10-11 19:48 +1100
Re: Multi-threading in Python vs Java Piet van Oostrum <piet@vanoostrum.org> - 2013-10-11 10:55 -0400
Re: Multi-threading in Python vs Java Terry Reedy <tjreedy@udel.edu> - 2013-10-11 15:53 -0400
Re: Multi-threading in Python vs Java Peter Cacioppi <peter.cacioppi@gmail.com> - 2013-10-11 13:10 -0700
| From | Peter Cacioppi <peter.cacioppi@gmail.com> |
|---|---|
| Date | 2013-10-10 23:01 -0700 |
| Subject | Multi-threading in Python vs Java |
| Message-ID | <46669eab-49f4-4daf-a410-abfbe9e87fc3@googlegroups.com> |
Could someone give me a brief thumbnail sketch of the difference between multi-threaded programming in Java. I have a fairly sophisticated algorithm that I developed as both a single threaded and multi-threaded Java application. The multi-threading port was fairly simple, partly because Java has a rich library of thread safe data structures (Atomic Integer, Blocking Queue, Priority Blocking Queue, etc). There is quite a significant performance improvement when multithreading here. I'd like to port the project to Python, partly because Python is a better language (IMHO) and partly because Python plays well with Amazon Web Services. But I'm a little leery that things like the Global Interpret Lock will block the multithreading efficiency, or that a relative lack of concurrent off the shelf data structures will make things much harder. Any advice much appreciated. Thanks.
[toc] | [next] | [standalone]
| From | Cameron Simpson <cs@zip.com.au> |
|---|---|
| Date | 2013-10-11 17:53 +1100 |
| Message-ID | <mailman.988.1381474388.18130.python-list@python.org> |
| In reply to | #56646 |
On 10Oct2013 23:01, Peter Cacioppi <peter.cacioppi@gmail.com> wrote:
> Could someone give me a brief thumbnail sketch of the difference between multi-threaded programming in Java.
>
> I have a fairly sophisticated algorithm that I developed as both a single threaded and multi-threaded Java application. The multi-threading port was fairly simple, partly because Java has a rich library of thread safe data structures (Atomic Integer, Blocking Queue, Priority Blocking Queue, etc).
>
> There is quite a significant performance improvement when multithreading here.
>
> I'd like to port the project to Python, [...]
> But I'm a little leery that things like the Global Interpret Lock will block the multithreading efficiency, or that a relative lack of concurrent off the shelf data structures will make things much harder.
A couple of random items:
A Java process will happily use multiple cores and hyperthreading.
It makes no thread safety guarentees in the language itself,
though as you say it has a host of thread safe tools to make all
this easy to do safely.
As you expect, CPython has the GIL and will only use one CPU-level
thread of execution _for the purely Python code_. No two python
instructions run in parallel. Functions that block or call thread
safe libraries can (and usually do) release the GIL, allowing
other Python code to execute while native non-Python code does
stuff; that will use multiple cores etc.
Other Python implementations may be more aggressive. I'd suppose
Jypthon could multithread like Java, but really I have no experience
with them.
The standard answer with CPython is that if you want to use multiple
cores to run Python code (versus using Python code to orchestrate
native code) you should use the multiprocessing stuff to fork the
interpreter, and then farm out jobs using queues.
Regarding "concurrent off the shelf data structures", I have a bunch
of Python multithreaded stuff and find the stdlib Queues and Locks
(and Semaphores and so on) sufficient. The Queues (including things
like deque) are thread safe, so a lot of the coordination is pretty
easy.
And of course context managers make Locks and Semaphores very easy
and reliable to use:
L = Lock()
.......
with L:
... do locked stuff ...
...
...
I'm sure you'll get longer and more nuanced replies too.
Cheers,
--
Cameron Simpson <cs@zip.com.au>
A squealing tire is a happy tire.
- Bruce MacInnes, Skip Barber Driving School instructor
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-10-11 09:30 +0000 |
| Message-ID | <5257c547$0$29984$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #56655 |
On Fri, 11 Oct 2013 17:53:02 +1100, Cameron Simpson wrote: > Other Python implementations may be more aggressive. I'd suppose Jypthon > could multithread like Java, but really I have no experience with them. Neither Jython nor IronPython have a GIL. > The standard answer with CPython is that if you want to use multiple > cores to run Python code (versus using Python code to orchestrate native > code) you should use the multiprocessing stuff to fork the interpreter, > and then farm out jobs using queues. Note that this really only applies to CPU-bound tasks. For tasks that depend on file IO (reading and writing files), CPython threads will operate in parallel as independently and (almost) as efficiently as those in other languages. That is to say, they will be constrained by the underlying operating system's ability to do file IO, not by the number of cores in your CPU. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Peter Cacioppi <peter.cacioppi@gmail.com> |
|---|---|
| Date | 2013-10-11 01:41 -0700 |
| Message-ID | <6d35b728-85c2-4fc0-bbc8-6c033bdfcfb0@googlegroups.com> |
| In reply to | #56646 |
On Thursday, October 10, 2013 11:01:25 PM UTC-7, Peter Cacioppi wrote:
> Could someone give me a brief thumbnail sketch of the difference between multi-threaded programming in Java.
>
>
>
> I have a fairly sophisticated algorithm that I developed as both a single threaded and multi-threaded Java application. The multi-threading port was fairly simple, partly because Java has a rich library of thread safe data structures (Atomic Integer, Blocking Queue, Priority Blocking Queue, etc).
>
>
>
> There is quite a significant performance improvement when multithreading here.
>
>
>
> I'd like to port the project to Python, partly because Python is a better language (IMHO) and partly because Python plays well with Amazon Web Services.
>
>
>
> But I'm a little leery that things like the Global Interpret Lock will block the multithreading efficiency, or that a relative lack of concurrent off the shelf data structures will make things much harder.
>
>
>
> Any advice much appreciated. Thanks.
I should add that the computational heavy lifting is done in a third party library. So a worker thread looks roughly like this (there is a subtle race condition I'm glossing over).
while len(jobs) :
job = jobs.pop()
model = Model(job) # Model is py interface for a lib written in C
newJobs = model.solve() # This will take a long time
for each newJob in newJobs :
jobs.add(newJob)
Here jobs is a thread safe object that is shared across each worker thread. It holds a priority queue of jobs that can be solved in parallel.
Model is a py class that provides the API to a 3rd party library written in C.I know model.solve() will be the bottleneck operation for all but trivial problems.
So, my hope is that the GIL restrictions won't be problematic here. That is to say, I don't need **Python** code to ever run concurrently. I just need Python to allow a different Python worker thread to execute when all the other worker threads are blocking on the model.solve() task. Once the algorithm is in full swing, it is typical for all the worker threads should be blocking on model.Solve() at the same time.
It's a nice algorithm for high level languages. Java worked well here, I'm hoping py can be nearly as fast with a much more elegant and readable code.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-10-11 19:48 +1100 |
| Message-ID | <mailman.996.1381481315.18130.python-list@python.org> |
| In reply to | #56664 |
On Fri, Oct 11, 2013 at 7:41 PM, Peter Cacioppi <peter.cacioppi@gmail.com> wrote: > So, my hope is that the GIL restrictions won't be problematic here. That is to say, I don't need **Python** code to ever run concurrently. I just need Python to allow a different Python worker thread to execute when all the other worker threads are blocking on the model.solve() task. Once the algorithm is in full swing, it is typical for all the worker threads should be blocking on model.Solve() at the same time. Sounds like Python will serve you just fine! Check out the threading module, knock together a quick test, and spin it up! ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Piet van Oostrum <piet@vanoostrum.org> |
|---|---|
| Date | 2013-10-11 10:55 -0400 |
| Message-ID | <m2eh7rsvpr.fsf@cochabamba.vanoostrum.org> |
| In reply to | #56665 |
Chris Angelico <rosuav@gmail.com> writes: > On Fri, Oct 11, 2013 at 7:41 PM, Peter Cacioppi > <peter.cacioppi@gmail.com> wrote: >> So, my hope is that the GIL restrictions won't be problematic here. That is to say, I don't need **Python** code to ever run concurrently. I just need Python to allow a different Python worker thread to execute when all the other worker threads are blocking on the model.solve() task. Once the algorithm is in full swing, it is typical for all the worker threads should be blocking on model.Solve() at the same time. > > Sounds like Python will serve you just fine! Check out the threading > module, knock together a quick test, and spin it up! But it only works if the external C library has been written to release the GIL around the long computations. If not, then the OP could try to write a wrapper around them that does this. -- Piet van Oostrum <piet@vanoostrum.org> WWW: http://pietvanoostrum.com/ PGP key: [8DAE142BE17999C4]
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-10-11 15:53 -0400 |
| Message-ID | <mailman.1014.1381521241.18130.python-list@python.org> |
| In reply to | #56664 |
On 10/11/2013 4:41 AM, Peter Cacioppi wrote: > I should add that the computational heavy lifting is done in a third party library. So a worker thread looks roughly like this (there is a subtle race condition I'm glossing over). > > while len(jobs) : > job = jobs.pop() > model = Model(job) # Model is py interface for a lib written in C > newJobs = model.solve() # This will take a long time > for each newJob in newJobs : > jobs.add(newJob) > > Here jobs is a thread safe object that is shared across each worker thread. It holds a priority queue of jobs that can be solved in parallel. > > Model is a py class that provides the API to a 3rd party library written in C.I know model.solve() will be the bottleneck operation for all but trivial problems. > > So, my hope is that the GIL restrictions won't be problematic here. That is to say, I don't need **Python** code to ever run concurrently. I just need Python to allow a different Python worker thread to execute when all the other worker threads are blocking on the model.solve() task. Once the algorithm is in full swing, it is typical for all the worker threads should be blocking on model.Solve() at the same time. > > It's a nice algorithm for high level languages. Java worked well here, I'm hoping py can be nearly as fast with a much more elegant and readable code. Given that model.solve takes a 'long time' (seconds, at least), the extra time to start a process over the time to start a thread will be inconsequential. I would therefore look at the multiprocessing module. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Peter Cacioppi <peter.cacioppi@gmail.com> |
|---|---|
| Date | 2013-10-11 13:10 -0700 |
| Message-ID | <3e44108d-fef2-4c52-9f03-2fced186d807@googlegroups.com> |
| In reply to | #56646 |
On Thursday, October 10, 2013 11:01:25 PM UTC-7, Peter Cacioppi wrote: > Could someone give me a brief thumbnail sketch of the difference between multi-threaded programming in Java. > > > > I have a fairly sophisticated algorithm that I developed as both a single threaded and multi-threaded Java application. The multi-threading port was fairly simple, partly because Java has a rich library of thread safe data structures (Atomic Integer, Blocking Queue, Priority Blocking Queue, etc). > > > > There is quite a significant performance improvement when multithreading here. > > > > I'd like to port the project to Python, partly because Python is a better language (IMHO) and partly because Python plays well with Amazon Web Services. > > > > But I'm a little leery that things like the Global Interpret Lock will block the multithreading efficiency, or that a relative lack of concurrent off the shelf data structures will make things much harder. > > > > Any advice much appreciated. Thanks. "Sounds like Python will serve you just fine! Check out the threading module, knock together a quick test, and spin it up!" Thanks, that was my assessment as well, just wanted a double check. At the time of posting I was mentally blocked on how to set up a quick proof of concept, but of course writing the post cleared that up ;) Along with "batteries included" and "we're all adults", I think Python needs a pithy phrase summarizing how well thought out it is. That is to say, the major design decisions were all carefully considered, and as a result things that might appear to be problematic are actually not barriers in practice. My suggestion for this phrase is "Guido was here". So in this case, I thought the GIL would be a fly in the ointment, but on reflection it turned out not to be the case. Guido was here.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web