Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!eweka.nl!lightspeed.eweka.nl!194.109.133.83.MISMATCH!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.002 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'python.': 0.02; 'ideally': 0.04; '(instead': 0.09; 'grid': 0.09; 'subject:module': 0.09; 'subject:skip:m 10': 0.09; 'thread': 0.11; 'do,': 0.15; 'library': 0.15; 'iteration': 0.16; 'matlab': 0.16; 'numpy': 0.16; 'processor,': 0.16; 'sec': 0.16; 'threads': 0.16; 'intel': 0.17; 'variables': 0.17; 'thanks,': 0.18; 'previously': 0.18; '(or': 0.18; 'module': 0.19; 'parameters': 0.20; 'all,': 0.21; 'runs': 0.22; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'creating': 0.26; 'module.': 0.27; 'run': 0.28; 'faster,': 0.29; 'gil': 0.29; 'overhead': 0.29; 'array': 0.29; 'points': 0.29; 'basic': 0.30; 'code': 0.31; 'point': 0.31; 'running': 0.32; 'point,': 0.33; 'subject:regarding': 0.33; 'to:addr:python-list': 0.33; 'received:google.com': 0.34; 'received:209.85': 0.35; 'there': 0.35; 'list.': 0.35; 'does': 0.37; 'quite': 0.37; 'received:209': 0.37; 'data': 0.37; 'perform': 0.38; 'some': 0.38; 'advice': 0.39; 'to:addr:python.org': 0.39; 'takes': 0.39; 'received:192': 0.39; 'received:192.168': 0.40; 'end': 0.40; 'within': 0.64; 'dear': 0.66; 'laptop': 0.66; '215': 0.84; 'amazed': 0.84; 'analyzed': 0.91; 'graphical': 0.91 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:x-forwarded-message-id:content-type :x-gm-message-state; bh=HjxGSxa9b1ka+4TQX6Ur7JIZDg5LO1FhkPO5Si9iXN8=; b=U8z+lCpQFqZeaR3hirBi0pNgcJI2z+8fVRjED5nBd/MqC/UYlZo0VUUL9gonXLNbHx 2JYHSCmVodhUXrPk1yoVNWh/sYTnItS1lpMsLpEoSIM18pmTCAfqn49gdJbnoQO1CXW1 JtdnBSnsQV9+Mqwk+lgGc4TjEMqAXSByPQUHH6/EoKb1xmrOIvq1hER53gmpKKamuZeV wW4pBcjkud/qudzMWlupAi7H6TMyZEJtUPLd54pAxel+LgjCweEp9euegBlvz59o2YA/ GW4U5a76ON/CzIUsmz7NDqAOhOPUQ8JBGI5f+NHk9vFDAKnjx5C/BGE5Qv1dlXBP1kGn s+iA== X-Received: by 10.182.23.50 with SMTP id j18mr19127obf.97.1362981480941; Sun, 10 Mar 2013 22:58:00 -0700 (PDT) Date: Sun, 10 Mar 2013 22:57:49 -0700 From: Abhinav M Kulkarni User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130221 Thunderbird/17.0.3 MIME-Version: 1.0 To: python-list@python.org Subject: Advice regarding multiprocessing module References: <513D6FEB.9040706@uci.edu> In-Reply-To: <513D6FEB.9040706@uci.edu> X-Forwarded-Message-Id: <513D6FEB.9040706@uci.edu> Content-Type: multipart/alternative; boundary="------------060202080308070203000101" X-Gm-Message-State: ALoCoQkuFqZ4M6CmlvpsWSieCV2NDbE685PUfRRaVfW/CCRDHhTiUn708hxc4Ukc+K6AQWcs+fNd X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 104 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1362981491 news.xs4all.nl 6900 [2001:888:2000:d::a6]:59524 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:41053 This is a multi-part message in MIME format. --------------060202080308070203000101 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Dear all, I need some advice regarding use of the multiprocessing module. Following is the scenario: * I am running gradient descent to estimate parameters of a pairwise grid CRF (or a grid based graphical model). There are 106 data points. Each data point can be analyzed in parallel. * To calculate gradient for each data point, I need to perform approximate inference since this is a loopy model. I am using Gibbs sampling. * My grid is 9x9 so there are 81 variables that I am sampling in one sweep of Gibbs sampling. I perform 1000 iterations of Gibbs sampling. * My laptop has quad-core Intel i5 processor, so I thought using multiprocessing module I can parallelize my code (basically calculate gradient in parallel on multiple cores simultaneously). * I did not use the multi-threading library because of GIL issues, GIL does not allow multiple threads to run at a time. * As a result I end up creating a process for each data point (instead of a thread that I would ideally like to do, so as to avoid process creation overhead). * I am using basic NumPy array functionalities. Previously I was running this code in MATLAB. It runs quite faster, one iteration of gradient descent takes around 14 sec in MATLAB using parfor loop (parallel loop - data points is analyzed within parallel loop). However same program takes almost 215 sec in Python. I am quite amazed at the slowness of multiprocessing module. Is this because of process creation overhead for each data point? Please keep my email in the replies as I am not a member of this mailing list. Thanks, Abhinav --------------060202080308070203000101 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Dear all,

I need some advice regarding use of the multiprocessing module. Following is the scenario:

  • I am running gradient descent to estimate parameters of a pairwise grid CRF (or a grid based graphical model). There are 106 data points. Each data point can be analyzed in parallel.
  • To calculate gradient for each data point, I need to perform approximate inference since this is a loopy model. I am using Gibbs sampling.
  • My grid is 9x9 so there are 81 variables that I am sampling in one sweep of Gibbs sampling. I perform 1000 iterations of Gibbs sampling.
  • My laptop has quad-core Intel i5 processor, so I thought using multiprocessing module I can parallelize my code (basically calculate gradient in parallel on multiple cores simultaneously).
  • I did not use the multi-threading library because of GIL issues, GIL does not allow multiple threads to run at a time.
  • As a result I end up creating a process for each data point (instead of a thread that I would ideally like to do, so as to avoid process creation overhead).
  • I am using basic NumPy array functionalities.

Previously I was running this code in MATLAB. It runs quite faster, one iteration of gradient descent takes around 14 sec in MATLAB using parfor loop (parallel loop - data points is analyzed within parallel loop). However same program takes almost 215 sec in Python.

I am quite amazed at the slowness of multiprocessing module. Is this because of process creation overhead for each data point?

Please keep my email in the replies as I am not a member of this mailing list.

Thanks,
Abhinav



--------------060202080308070203000101--