Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!eweka.nl!lightspeed.eweka.nl!194.109.133.83.MISMATCH!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Date: Sun, 10 Mar 2013 22:57:49 -0700
From: Abhinav M Kulkarni <amkulkar@uci.edu>
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130221 Thunderbird/17.0.3
MIME-Version: 1.0
To: python-list@python.org
Subject: Advice regarding multiprocessing module
References: <513D6FEB.9040706@uci.edu>
In-Reply-To: <513D6FEB.9040706@uci.edu>
Content-Type: multipart/alternative; boundary="------------060202080308070203000101"
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.3176.1362981490.2939.python-list@python.org>
Lines: 104
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:41053

This is a multi-part message in MIME format.
--------------060202080308070203000101
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Dear all,

I need some advice regarding use of the multiprocessing module. 
Following is the scenario:

  * I am running gradient descent to estimate parameters of a pairwise
    grid CRF (or a grid based graphical model). There are 106 data
    points. Each data point can be analyzed in parallel.
  * To calculate gradient for each data point, I need to perform
    approximate inference since this is a loopy model. I am using Gibbs
    sampling.
  * My grid is 9x9 so there are 81 variables that I am sampling in one
    sweep of Gibbs sampling. I perform 1000 iterations of Gibbs sampling.
  * My laptop has quad-core Intel i5 processor, so I thought using
    multiprocessing module I can parallelize my code (basically
    calculate gradient in parallel on multiple cores simultaneously).
  * I did not use the multi-threading library because of GIL issues, GIL
    does not allow multiple threads to run at a time.
  * As a result I end up creating a process for each data point (instead
    of a thread that I would ideally like to do, so as to avoid process
    creation overhead).
  * I am using basic NumPy array functionalities.

Previously I was running this code in MATLAB. It runs quite faster, one 
iteration of gradient descent takes around 14 sec in MATLAB using parfor 
loop (parallel loop - data points is analyzed within parallel loop). 
However same program takes almost 215 sec in Python.

I am quite amazed at the slowness of multiprocessing module. Is this 
because of process creation overhead for each data point?

Please keep my email in the replies as I am not a member of this mailing 
list.

Thanks,
Abhinav




--------------060202080308070203000101
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Dear all,<br>
    <div class="moz-forward-container"> <br>
      I need some advice regarding use of the multiprocessing module.
      Following is the scenario:<br>
      <br>
      <ul>
        <li>I am running gradient descent to estimate parameters of a
          pairwise grid CRF (or a grid based graphical model). There are
          106 data points. Each data point can be analyzed in parallel.</li>
        <li>To calculate gradient for each data point, I need to perform
          approximate inference since this is a loopy model. I am using
          Gibbs sampling. <br>
        </li>
        <li>My grid is 9x9 so there are 81 variables that I am sampling
          in one sweep of Gibbs sampling. I perform 1000 iterations of
          Gibbs sampling.</li>
        <li>My laptop has quad-core Intel i5 processor, so I thought
          using multiprocessing module I can parallelize my code
          (basically calculate gradient in parallel on multiple cores
          simultaneously).</li>
        <li>I did not use the multi-threading library because of GIL
          issues, GIL does not allow multiple threads to run at a time.</li>
        <li>As a result I end up creating a process for each data point
          (instead of a thread that I would ideally like to do, so as to
          avoid process creation overhead).</li>
        <li>I am using basic NumPy array functionalities.</li>
      </ul>
      <p>Previously I was running this code in MATLAB. It runs quite
        faster, one iteration of gradient descent takes around 14 sec in
        MATLAB using parfor loop (parallel loop - data points is
        analyzed within parallel loop). However same program takes
        almost 215 sec in Python.<br>
      </p>
      <p>I am quite amazed at the slowness of multiprocessing module. Is
        this because of process creation overhead for each data point?<br>
      </p>
      <p>Please keep my email in the replies as I am not a member of
        this mailing list.<br>
      </p>
      <p>Thanks,<br>
        Abhinav<br>
      </p>
      <br>
    </div>
    <br>
  </body>
</html>

--------------060202080308070203000101--