Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
To: python-list@python.org
From: Dennis Lee Bieber <wlfraed@ix.netcom.com>
Subject: Re: Please help with Threading
Date: Sat, 18 May 2013 15:28:56 -0400
Organization: > Bestiaria Support Staff <
References: <7baacf5a-0c50-4935-ad5b-148c208d759b@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.1815.1368905346.3114.python-list@python.org>
Lines: 36
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:45525

On Sat, 18 May 2013 01:58:13 -0700 (PDT), Jurgens de Bruin
<debruinjj@gmail.com> declaimed the following in
gmane.comp.python.general:

> This is my first script where I want to use the python threading module. I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a  histogram for each dict 16 histograms on a page ( 4x4 ) - this already works. 
> What I currently do is a create a nested list [ [ {}  ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of 16 histograms. Iterating over the outer-list  and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneously and creating the graphs in "parallel". 
> I am trying to use the python threading for this. I create 4 threads loop over the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor the resource  on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%. 
> 

	The odds are good that this is just going to run slower...

	One: The common Python implementation uses a global interpreter lock
to prevent interpreted code from interfering with itself in multiple
threads. So "number cruncher" applications don't gain any speed from
being partitioned into thread -- even on a multicore processor, only one
thread can have the GIL at a time. On top of that, you have the overhead
of the interpreter switching between threads (GIL release on one thread,
GIL acquire for the next thread).

	Python threads work fine if the threads either rely on intelligent
DLLs for number crunching (instead of doing nested Python loops to
process a numeric array you pass it to something like NumPy which
releases the GIL while crunching a copy of the array) or they do lots of
I/O and have to wait for I/O devices (while one thread is waiting for
the write/read operation to complete, another thread can do some number
crunching).

	If you really need to do this type of number crunching in Python
level code, you'll want to look into the multiprocessing library
instead. That will create actual OS processes (each with a copy of the
interpreter, and not sharing memory) and each of those can run on a core
without conflicting on the GIL.
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/