Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #56646 > unrolled thread

Multi-threading in Python vs Java

Started byPeter Cacioppi <peter.cacioppi@gmail.com>
First post2013-10-10 23:01 -0700
Last post2013-10-11 13:10 -0700
Articles 8 — 6 participants

Back to article view | Back to comp.lang.python


Contents

  Multi-threading in Python vs Java Peter Cacioppi <peter.cacioppi@gmail.com> - 2013-10-10 23:01 -0700
    Re: Multi-threading in Python vs Java Cameron Simpson <cs@zip.com.au> - 2013-10-11 17:53 +1100
      Re: Multi-threading in Python vs Java Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-10-11 09:30 +0000
    Re: Multi-threading in Python vs Java Peter Cacioppi <peter.cacioppi@gmail.com> - 2013-10-11 01:41 -0700
      Re: Multi-threading in Python vs Java Chris Angelico <rosuav@gmail.com> - 2013-10-11 19:48 +1100
        Re: Multi-threading in Python vs Java Piet van Oostrum <piet@vanoostrum.org> - 2013-10-11 10:55 -0400
      Re: Multi-threading in Python vs Java Terry Reedy <tjreedy@udel.edu> - 2013-10-11 15:53 -0400
    Re: Multi-threading in Python vs Java Peter Cacioppi <peter.cacioppi@gmail.com> - 2013-10-11 13:10 -0700

#56646 — Multi-threading in Python vs Java

FromPeter Cacioppi <peter.cacioppi@gmail.com>
Date2013-10-10 23:01 -0700
SubjectMulti-threading in Python vs Java
Message-ID<46669eab-49f4-4daf-a410-abfbe9e87fc3@googlegroups.com>
Could someone give me a brief thumbnail sketch of the difference between multi-threaded programming in Java.

I have a fairly sophisticated algorithm that I developed as both a single threaded and multi-threaded Java application. The multi-threading port was fairly simple, partly because Java has a rich library of thread safe data structures (Atomic Integer, Blocking Queue, Priority Blocking Queue, etc). 

There is quite a significant performance improvement when multithreading here.

I'd like to port the project to Python, partly because Python is a better language (IMHO) and partly because Python plays well with Amazon Web Services. 

But I'm a little leery that things like the Global Interpret Lock will block the multithreading efficiency, or that a relative lack of concurrent off the shelf data structures will make things much harder.

Any advice much appreciated. Thanks.

[toc] | [next] | [standalone]


#56655

FromCameron Simpson <cs@zip.com.au>
Date2013-10-11 17:53 +1100
Message-ID<mailman.988.1381474388.18130.python-list@python.org>
In reply to#56646
On 10Oct2013 23:01, Peter Cacioppi <peter.cacioppi@gmail.com> wrote:
> Could someone give me a brief thumbnail sketch of the difference between multi-threaded programming in Java.
> 
> I have a fairly sophisticated algorithm that I developed as both a single threaded and multi-threaded Java application. The multi-threading port was fairly simple, partly because Java has a rich library of thread safe data structures (Atomic Integer, Blocking Queue, Priority Blocking Queue, etc). 
> 
> There is quite a significant performance improvement when multithreading here.
> 
> I'd like to port the project to Python, [...]
> But I'm a little leery that things like the Global Interpret Lock will block the multithreading efficiency, or that a relative lack of concurrent off the shelf data structures will make things much harder.

A couple of random items:

A Java process will happily use multiple cores and hyperthreading.
It makes no thread safety guarentees in the language itself,
though as you say it has a host of thread safe tools to make all
this easy to do safely.

As you expect, CPython has the GIL and will only use one CPU-level
thread of execution _for the purely Python code_. No two python
instructions run in parallel. Functions that block or call thread
safe libraries can (and usually do) release the GIL, allowing
other Python code to execute while native non-Python code does
stuff; that will use multiple cores etc.

Other Python implementations may be more aggressive. I'd suppose
Jypthon could multithread like Java, but really I have no experience
with them.

The standard answer with CPython is that if you want to use multiple
cores to run Python code (versus using Python code to orchestrate
native code) you should use the multiprocessing stuff to fork the
interpreter, and then farm out jobs using queues.

Regarding "concurrent off the shelf data structures", I have a bunch
of Python multithreaded stuff and find the stdlib Queues and Locks
(and Semaphores and so on) sufficient. The Queues (including things
like deque) are thread safe, so a lot of the coordination is pretty
easy.

And of course context managers make Locks and Semaphores very easy
and reliable to use:

  L = Lock()
  .......
  with L:
      ... do locked stuff ...
      ...
      ...

I'm sure you'll get longer and more nuanced replies too.

Cheers,
--
Cameron Simpson <cs@zip.com.au>

A squealing tire is a happy tire.
        - Bruce MacInnes, Skip Barber Driving School instructor

[toc] | [prev] | [next] | [standalone]


#56670

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-10-11 09:30 +0000
Message-ID<5257c547$0$29984$c3e8da3$5496439d@news.astraweb.com>
In reply to#56655
On Fri, 11 Oct 2013 17:53:02 +1100, Cameron Simpson wrote:

> Other Python implementations may be more aggressive. I'd suppose Jypthon
> could multithread like Java, but really I have no experience with them.

Neither Jython nor IronPython have a GIL.


> The standard answer with CPython is that if you want to use multiple
> cores to run Python code (versus using Python code to orchestrate native
> code) you should use the multiprocessing stuff to fork the interpreter,
> and then farm out jobs using queues.

Note that this really only applies to CPU-bound tasks. For tasks that 
depend on file IO (reading and writing files), CPython threads will 
operate in parallel as independently and (almost) as efficiently as those 
in other languages. That is to say, they will be constrained by the 
underlying operating system's ability to do file IO, not by the number of 
cores in your CPU.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#56664

FromPeter Cacioppi <peter.cacioppi@gmail.com>
Date2013-10-11 01:41 -0700
Message-ID<6d35b728-85c2-4fc0-bbc8-6c033bdfcfb0@googlegroups.com>
In reply to#56646
On Thursday, October 10, 2013 11:01:25 PM UTC-7, Peter Cacioppi wrote:
> Could someone give me a brief thumbnail sketch of the difference between multi-threaded programming in Java.
> 
> 
> 
> I have a fairly sophisticated algorithm that I developed as both a single threaded and multi-threaded Java application. The multi-threading port was fairly simple, partly because Java has a rich library of thread safe data structures (Atomic Integer, Blocking Queue, Priority Blocking Queue, etc). 
> 
> 
> 
> There is quite a significant performance improvement when multithreading here.
> 
> 
> 
> I'd like to port the project to Python, partly because Python is a better language (IMHO) and partly because Python plays well with Amazon Web Services. 
> 
> 
> 
> But I'm a little leery that things like the Global Interpret Lock will block the multithreading efficiency, or that a relative lack of concurrent off the shelf data structures will make things much harder.
> 
> 
> 
> Any advice much appreciated. Thanks.

I should add that the computational heavy lifting is done in a third party library. So a worker thread looks roughly like this (there is a subtle race condition I'm glossing over).

while len(jobs) :
   job = jobs.pop()            
   model = Model(job)      # Model is py interface for a lib written in C
   newJobs = model.solve() # This will take a long time
   for each newJob in newJobs :
     jobs.add(newJob)

Here jobs is a thread safe object that is shared across each worker thread. It holds a priority queue of jobs that can be solved in parallel. 

Model is a py class that provides the API to a 3rd party library written in C.I know model.solve() will be the bottleneck operation for all but trivial problems. 

So, my hope is that the GIL restrictions won't be problematic here. That is to say, I don't need **Python** code to ever run concurrently. I just need Python to allow a different Python worker thread to execute when all the other worker threads are blocking on the model.solve() task. Once the algorithm is in full swing, it is typical for all the worker threads should be blocking on model.Solve() at the same time. 

It's a nice algorithm for high level languages. Java worked well here, I'm hoping py can be nearly as fast with a much more elegant and readable code.




[toc] | [prev] | [next] | [standalone]


#56665

FromChris Angelico <rosuav@gmail.com>
Date2013-10-11 19:48 +1100
Message-ID<mailman.996.1381481315.18130.python-list@python.org>
In reply to#56664
On Fri, Oct 11, 2013 at 7:41 PM, Peter Cacioppi
<peter.cacioppi@gmail.com> wrote:
> So, my hope is that the GIL restrictions won't be problematic here. That is to say, I don't need **Python** code to ever run concurrently. I just need Python to allow a different Python worker thread to execute when all the other worker threads are blocking on the model.solve() task. Once the algorithm is in full swing, it is typical for all the worker threads should be blocking on model.Solve() at the same time.

Sounds like Python will serve you just fine! Check out the threading
module, knock together a quick test, and spin it up!

ChrisA

[toc] | [prev] | [next] | [standalone]


#56686

FromPiet van Oostrum <piet@vanoostrum.org>
Date2013-10-11 10:55 -0400
Message-ID<m2eh7rsvpr.fsf@cochabamba.vanoostrum.org>
In reply to#56665
Chris Angelico <rosuav@gmail.com> writes:

> On Fri, Oct 11, 2013 at 7:41 PM, Peter Cacioppi
> <peter.cacioppi@gmail.com> wrote:
>> So, my hope is that the GIL restrictions won't be problematic here. That is to say, I don't need **Python** code to ever run concurrently. I just need Python to allow a different Python worker thread to execute when all the other worker threads are blocking on the model.solve() task. Once the algorithm is in full swing, it is typical for all the worker threads should be blocking on model.Solve() at the same time.
>
> Sounds like Python will serve you just fine! Check out the threading
> module, knock together a quick test, and spin it up!

But it only works if the external C library has been written to release
the GIL around the long computations. If not, then the OP could try to
write a wrapper around them that does this.
-- 
Piet van Oostrum <piet@vanoostrum.org>
WWW: http://pietvanoostrum.com/
PGP key: [8DAE142BE17999C4]

[toc] | [prev] | [next] | [standalone]


#56704

FromTerry Reedy <tjreedy@udel.edu>
Date2013-10-11 15:53 -0400
Message-ID<mailman.1014.1381521241.18130.python-list@python.org>
In reply to#56664
On 10/11/2013 4:41 AM, Peter Cacioppi wrote:

> I should add that the computational heavy lifting is done in a third party library. So a worker thread looks roughly like this (there is a subtle race condition I'm glossing over).
>
> while len(jobs) :
>     job = jobs.pop()
>     model = Model(job)      # Model is py interface for a lib written in C
>     newJobs = model.solve() # This will take a long time
>     for each newJob in newJobs :
>       jobs.add(newJob)
>
> Here jobs is a thread safe object that is shared across each worker thread. It holds a priority queue of jobs that can be solved in parallel.
>
> Model is a py class that provides the API to a 3rd party library written in C.I know model.solve() will be the bottleneck operation for all but trivial problems.
>
> So, my hope is that the GIL restrictions won't be problematic here. That is to say, I don't need **Python** code to ever run concurrently. I just need Python to allow a different Python worker thread to execute when all the other worker threads are blocking on the model.solve() task. Once the algorithm is in full swing, it is typical for all the worker threads should be blocking on model.Solve() at the same time.
>
> It's a nice algorithm for high level languages. Java worked well here, I'm hoping py can be nearly as fast with a much more elegant and readable code.

Given that model.solve takes a 'long time' (seconds, at least), the 
extra time to start a process over the time to start a thread will be 
inconsequential. I would therefore look at the multiprocessing module.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#56707

FromPeter Cacioppi <peter.cacioppi@gmail.com>
Date2013-10-11 13:10 -0700
Message-ID<3e44108d-fef2-4c52-9f03-2fced186d807@googlegroups.com>
In reply to#56646
On Thursday, October 10, 2013 11:01:25 PM UTC-7, Peter Cacioppi wrote:
> Could someone give me a brief thumbnail sketch of the difference between multi-threaded programming in Java.
> 
> 
> 
> I have a fairly sophisticated algorithm that I developed as both a single threaded and multi-threaded Java application. The multi-threading port was fairly simple, partly because Java has a rich library of thread safe data structures (Atomic Integer, Blocking Queue, Priority Blocking Queue, etc). 
> 
> 
> 
> There is quite a significant performance improvement when multithreading here.
> 
> 
> 
> I'd like to port the project to Python, partly because Python is a better language (IMHO) and partly because Python plays well with Amazon Web Services. 
> 
> 
> 
> But I'm a little leery that things like the Global Interpret Lock will block the multithreading efficiency, or that a relative lack of concurrent off the shelf data structures will make things much harder.
> 
> 
> 
> Any advice much appreciated. Thanks.

"Sounds like Python will serve you just fine! Check out the threading
module, knock together a quick test, and spin it up!"

Thanks, that was my assessment as well, just wanted a double check. At the time of posting I was mentally blocked on how to set up a quick proof of concept, but of course writing the post cleared that up ;)

Along with "batteries included" and "we're all adults", I think Python needs a pithy phrase summarizing how well thought out it is. That is to say, the major design decisions were all carefully considered, and as a result things that might appear to be problematic are actually not barriers in practice. My suggestion for this phrase is "Guido was here". 

So in this case, I thought the GIL would be a fly in the ointment, but on reflection it turned out not to be the case. Guido was here.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web