Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.011 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'static': 0.04; 'widely': 0.05; 'subject:Python': 0.06; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'python': 0.11; "wouldn't": 0.14; 'clock.': 0.16; 'finish.': 0.16; 'parameters,': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'regression': 0.16; 'spawn': 0.16; 'supported.': 0.16; 'threads.': 0.16; 'wrote:': 0.18; 'do.': 0.18; 'library': 0.18; "python's": 0.19; 'machine': 0.22; 'settings': 0.22; 'code,': 0.22; 'portion': 0.22; 'separate': 0.22; 'header:User-Agent:1': 0.23; 'mathematical': 0.24; "i've": 0.25; 'developers': 0.25; 'switch': 0.26; 'header:X-Complaints-To:1': 0.27; "doesn't": 0.30; "i'm": 0.30; 'code': 0.31; 'that.': 0.31; 'themselves': 0.32; 'run': 0.32; 'another': 0.32; 'up.': 0.33; 'advice': 0.35; 'there': 0.35; 'version': 0.36; 'really': 0.36; '(e.g.,': 0.36; 'instances': 0.36; 'doing': 0.36; 'thanks': 0.36; 'possible': 0.36; 'subject:?': 0.36; 'example,': 0.37; 'sometimes': 0.38; 'to:addr :python-list': 0.38; 'ability': 0.39; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'called': 0.40; 'algorithms': 0.60; 'most': 0.60; 'free': 0.61; 'effective': 0.61; 'john': 0.61; 'here:': 0.62; 'card': 0.63; 'love': 0.65; 'hours': 0.66; 'due': 0.66; 'jobs': 0.68; 'received:178': 0.74; 'cuda': 0.84; 'hard.': 0.84; 'presumably': 0.84; 'twelve': 0.84; 'browse': 0.95; 'hundred': 0.95 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Sturla Molden Subject: Re: Parallelization of Python on GPU? Date: Thu, 26 Feb 2015 16:40:29 +0000 (UTC) References: <82642f3a-49e8-4982-b135-66ffc04d67d9@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Gmane-NNTP-Posting-Host: ip-155-53-72-178.dialup.ice.net User-Agent: NewsTap/4.0.1 (iPad) X-: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 43 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1424968864 news.xs4all.nl 2952 [2001:888:2000:d::a6]:36973 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:86513 If you are doing SVM regression with scikit-learn you are using libSVM. There is a CUDA accelerated version of this C library here: http://mklab.iti.gr/project/GPU-LIBSVM You can presumably reuse the wrapping code from scikit-learn. Sturla John Ladasky wrote: > I've been working with machine learning for a while. Many of the > standard packages (e.g., scikit-learn) have fitting algorithms which run > in single threads. These algorithms are not themselves parallelized. > Perhaps, due to their unique mathematical requirements, they cannot be paralleized. > > When one is investigating several potential models of one's data with > various settings for free parameters, it is still sometimes possible to > speed things up. On a modern machine, one can use Python's > multiprocessing.Pool to run separate instances of scikit-learn fits. I > am currently using ten of the twelve 3.3 GHz CPU cores on my machine to > do just that. And I can still browse the web with no observable lag. :^) > > Still, I'm waiting hours for jobs to finish. Support vector regression fitting is hard. > > What I would REALLY like to do is to take advantage of my GPU. My NVidia > graphics card has 1152 cores and a 1.0 GHz clock. I wouldn't mind > borrowing a few hundred of those GPU cores at a time, and see what they > can do. In theory, I calculate that I can speed up the job by another five-fold. > > The trick is that each process would need to run some PYTHON code, not > CUDA or OpenCL. The child process code isn't particularly fancy. (I > should, for example, be able to switch that portion of my code to static typing.) > > What is the most effective way to accomplish this task? > > I came across a reference to a package called "Urutu" which may be what I > need, however it doesn't look like it is widely supported. > > I would love it if the Python developers themselves added the ability to > spawn GPU processes to the Multiprocessing module! > > Thanks for any advice and comments.