Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #86459 > unrolled thread
| Started by | John Ladasky <john_ladasky@sbcglobal.net> |
|---|---|
| First post | 2015-02-25 18:35 -0800 |
| Last post | 2015-02-26 21:54 +0100 |
| Articles | 17 — 7 participants |
Back to article view | Back to comp.lang.python
Parallelization of Python on GPU? John Ladasky <john_ladasky@sbcglobal.net> - 2015-02-25 18:35 -0800
Re: Parallelization of Python on GPU? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-02-26 14:02 +1100
Re: Parallelization of Python on GPU? John Ladasky <john_ladasky@sbcglobal.net> - 2015-02-25 20:01 -0800
Re: Parallelization of Python on GPU? Jason Swails <jason.swails@gmail.com> - 2015-02-26 10:06 -0500
Re: Parallelization of Python on GPU? Sturla Molden <sturla.molden@gmail.com> - 2015-02-26 16:53 +0000
Re: Parallelization of Python on GPU? Terry Reedy <tjreedy@udel.edu> - 2015-02-26 12:16 -0500
Re: Parallelization of Python on GPU? Jason Swails <jason.swails@gmail.com> - 2015-02-26 12:48 -0500
Re: Parallelization of Python on GPU? Sturla Molden <sturla.molden@gmail.com> - 2015-02-26 22:10 +0100
Re: Parallelization of Python on GPU? Jason Swails <jason.swails@gmail.com> - 2015-02-26 17:28 -0500
Re: Parallelization of Python on GPU? Ethan Furman <ethan@stoneleaf.us> - 2015-02-25 19:03 -0800
Re: Parallelization of Python on GPU? Ethan Furman <ethan@stoneleaf.us> - 2015-02-25 19:05 -0800
Re: Parallelization of Python on GPU? John Ladasky <john_ladasky@sbcglobal.net> - 2015-02-25 21:53 -0800
Re: Parallelization of Python on GPU? Christian Gollwitzer <auriocus@gmx.de> - 2015-02-27 19:55 +0100
Re: Parallelization of Python on GPU? Jason Swails <jason.swails@gmail.com> - 2015-02-26 10:27 -0500
Re: Parallelization of Python on GPU? Sturla Molden <sturla.molden@gmail.com> - 2015-02-26 16:40 +0000
Re: Parallelization of Python on GPU? John Ladasky <john_ladasky@sbcglobal.net> - 2015-02-26 09:34 -0800
Re: Parallelization of Python on GPU? Sturla Molden <sturla.molden@gmail.com> - 2015-02-26 21:54 +0100
| From | John Ladasky <john_ladasky@sbcglobal.net> |
|---|---|
| Date | 2015-02-25 18:35 -0800 |
| Subject | Parallelization of Python on GPU? |
| Message-ID | <82642f3a-49e8-4982-b135-66ffc04d67d9@googlegroups.com> |
I've been working with machine learning for a while. Many of the standard packages (e.g., scikit-learn) have fitting algorithms which run in single threads. These algorithms are not themselves parallelized. Perhaps, due to their unique mathematical requirements, they cannot be paralleized. When one is investigating several potential models of one's data with various settings for free parameters, it is still sometimes possible to speed things up. On a modern machine, one can use Python's multiprocessing.Pool to run separate instances of scikit-learn fits. I am currently using ten of the twelve 3.3 GHz CPU cores on my machine to do just that. And I can still browse the web with no observable lag. :^) Still, I'm waiting hours for jobs to finish. Support vector regression fitting is hard. What I would REALLY like to do is to take advantage of my GPU. My NVidia graphics card has 1152 cores and a 1.0 GHz clock. I wouldn't mind borrowing a few hundred of those GPU cores at a time, and see what they can do. In theory, I calculate that I can speed up the job by another five-fold. The trick is that each process would need to run some PYTHON code, not CUDA or OpenCL. The child process code isn't particularly fancy. (I should, for example, be able to switch that portion of my code to static typing.) What is the most effective way to accomplish this task? I came across a reference to a package called "Urutu" which may be what I need, however it doesn't look like it is widely supported. I would love it if the Python developers themselves added the ability to spawn GPU processes to the Multiprocessing module! Thanks for any advice and comments.
[toc] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-02-26 14:02 +1100 |
| Message-ID | <54ee8ce2$0$11109$c3e8da3@news.astraweb.com> |
| In reply to | #86459 |
John Ladasky wrote: > What I would REALLY like to do is to take advantage of my GPU. I can't help you with that, but I would like to point out that GPUs typically don't support IEE-754 maths, which means that while they are likely significantly faster, they're also likely significantly less accurate. Any any two different brands/models of GPU are likely to give different results. (Possibly not *very* different, but considering the mess that floating point maths was prior to IEEE-754, possibly *very* different.) Personally, I wouldn't trust GPU floating point for serious work. Maybe for quick and dirty exploration of the data, but I'd then want to repeat any calculations using the main CPU before using the numbers anywhere :-) -- Steve
[toc] | [prev] | [next] | [standalone]
| From | John Ladasky <john_ladasky@sbcglobal.net> |
|---|---|
| Date | 2015-02-25 20:01 -0800 |
| Message-ID | <459a9366-19ef-4f98-9087-e50430a8655e@googlegroups.com> |
| In reply to | #86462 |
On Wednesday, February 25, 2015 at 7:03:23 PM UTC-8, Steven D'Aprano wrote: > I would like to point out that GPUs > typically don't support IEE-754 maths, which means that while they are > likely significantly faster, they're also likely significantly less > accurate. Historically, that has been true. According to this document... https://developer.nvidia.com/sites/default/files/akamai/cuda/files/NVIDIA-CUDA-Floating-Point.pdf ...NVidia's GPU cards which implement "compute capability" versions 2.0 and higher are IEEE-754 compliant, both for single- and double-precision floating point operations. The current "compute capability" version is 5.2, so there are several generations of GPU hardware out there by now which should give satisfactory floating-point results.
[toc] | [prev] | [next] | [standalone]
| From | Jason Swails <jason.swails@gmail.com> |
|---|---|
| Date | 2015-02-26 10:06 -0500 |
| Message-ID | <mailman.19262.1424967059.18130.python-list@python.org> |
| In reply to | #86462 |
On Thu, 2015-02-26 at 14:02 +1100, Steven D'Aprano wrote: > John Ladasky wrote: > > > > What I would REALLY like to do is to take advantage of my GPU. > > I can't help you with that, but I would like to point out that GPUs > typically don't support IEE-754 maths, which means that while they are > likely significantly faster, they're also likely significantly less > accurate. Any any two different brands/models of GPU are likely to give > different results. (Possibly not *very* different, but considering the mess > that floating point maths was prior to IEEE-754, possibly *very* different.) This hasn't been true in NVidia GPUs manufactured since ca. 2008. > Personally, I wouldn't trust GPU floating point for serious work. Maybe for > quick and dirty exploration of the data, but I'd then want to repeat any > calculations using the main CPU before using the numbers anywhere :-) There is a *huge* dash toward GPU computing in the scientific computing sector. Since I started as a graduate student in computational chemistry/physics in 2008, I watched as state-of-the-art supercomputers running tens of thousands to hundreds of thousands of cores were overtaken in performance by a $500 GPU (today the GTX 780 or 980) you can put in a desktop. I went from running all of my calculations on a CPU cluster in 2009 to running 90% of my calculations on a GPU by the time I graduated in 2013... and for people without as ready access to supercomputers as myself the move was even more pronounced. This work is very serious, and numerical precision is typically of immense importance. See, e.g., http://www.sciencedirect.com/science/article/pii/S0010465512003098 and http://pubs.acs.org/doi/abs/10.1021/ct400314y In our software, we can run simulations on a GPU or a CPU and the results are *literally* indistinguishable. The transition to GPUs was accompanied by a series of studies that investigated precisely your concerns... we would never have started using GPUs if we didn't trust GPU numbers as much as we did from the CPU. And NVidia is embracing this revolution (obviously) -- they are putting a lot of time, effort, and money into ensuring the success of GPU high performance computing. It is here to stay in the immediate future, and refusing to use the technology will leave those that *could* benefit from it at a severe disadvantage. (That said, GPUs aren't good at everything, and CPUs are also here to stay.) And GPU performance gains are outpacing CPU performance gains -- I've seen about two orders of magnitude improvement in computational throughput over the past 6 years through the introduction of GPU computing and improvements in GPU hardware. All the best, Jason -- Jason M. Swails BioMaPS, Rutgers University Postdoctoral Researcher
[toc] | [prev] | [next] | [standalone]
| From | Sturla Molden <sturla.molden@gmail.com> |
|---|---|
| Date | 2015-02-26 16:53 +0000 |
| Message-ID | <mailman.19272.1424969625.18130.python-list@python.org> |
| In reply to | #86462 |
GPU computing is great if you have the following: 1. Your data structures are arrays floating point numbers. 2. You have a data-parallel problem. 3. You are happy with single precision. 4. You have time to code erything in CUDA or OpenCL. 5. You have enough video RAM to store your data. For Python the easiest solution is to use Numba Pro. Sturla Jason Swails <jason.swails@gmail.com> wrote: > On Thu, 2015-02-26 at 14:02 +1100, Steven D'Aprano wrote: >> John Ladasky wrote: >> >> >>> What I would REALLY like to do is to take advantage of my GPU. >> >> I can't help you with that, but I would like to point out that GPUs >> typically don't support IEE-754 maths, which means that while they are >> likely significantly faster, they're also likely significantly less >> accurate. Any any two different brands/models of GPU are likely to give >> different results. (Possibly not *very* different, but considering the mess >> that floating point maths was prior to IEEE-754, possibly *very* different.) > > This hasn't been true in NVidia GPUs manufactured since ca. 2008. > >> Personally, I wouldn't trust GPU floating point for serious work. Maybe for >> quick and dirty exploration of the data, but I'd then want to repeat any >> calculations using the main CPU before using the numbers anywhere :-) > > There is a *huge* dash toward GPU computing in the scientific computing > sector. Since I started as a graduate student in computational > chemistry/physics in 2008, I watched as state-of-the-art supercomputers > running tens of thousands to hundreds of thousands of cores were > overtaken in performance by a $500 GPU (today the GTX 780 or 980) you > can put in a desktop. I went from running all of my calculations on a > CPU cluster in 2009 to running 90% of my calculations on a GPU by the > time I graduated in 2013... and for people without as ready access to > supercomputers as myself the move was even more pronounced. > > This work is very serious, and numerical precision is typically of > immense importance. See, e.g., > http://www.sciencedirect.com/science/article/pii/S0010465512003098 and > http://pubs.acs.org/doi/abs/10.1021/ct400314y > > In our software, we can run simulations on a GPU or a CPU and the > results are *literally* indistinguishable. The transition to GPUs was > accompanied by a series of studies that investigated precisely your > concerns... we would never have started using GPUs if we didn't trust > GPU numbers as much as we did from the CPU. > > And NVidia is embracing this revolution (obviously) -- they are putting > a lot of time, effort, and money into ensuring the success of GPU high > performance computing. It is here to stay in the immediate future, and > refusing to use the technology will leave those that *could* benefit > from it at a severe disadvantage. (That said, GPUs aren't good at > everything, and CPUs are also here to stay.) > > And GPU performance gains are outpacing CPU performance gains -- I've > seen about two orders of magnitude improvement in computational > throughput over the past 6 years through the introduction of GPU > computing and improvements in GPU hardware. > > All the best, > Jason
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2015-02-26 12:16 -0500 |
| Message-ID | <mailman.19276.1424971039.18130.python-list@python.org> |
| In reply to | #86462 |
On 2/26/2015 10:06 AM, Jason Swails wrote: > On Thu, 2015-02-26 at 14:02 +1100, Steven D'Aprano wrote: >> John Ladasky wrote: >> >> >>> What I would REALLY like to do is to take advantage of my GPU. >> >> I can't help you with that, but I would like to point out that GPUs >> typically don't support IEE-754 maths, which means that while they are >> likely significantly faster, they're also likely significantly less >> accurate. Any any two different brands/models of GPU are likely to give >> different results. (Possibly not *very* different, but considering the mess >> that floating point maths was prior to IEEE-754, possibly *very* different.) > > This hasn't been true in NVidia GPUs manufactured since ca. 2008. > >> Personally, I wouldn't trust GPU floating point for serious work. Maybe for >> quick and dirty exploration of the data, but I'd then want to repeat any >> calculations using the main CPU before using the numbers anywhere :-) > > There is a *huge* dash toward GPU computing in the scientific computing > sector. Since I started as a graduate student in computational > chemistry/physics in 2008, I watched as state-of-the-art supercomputers > running tens of thousands to hundreds of thousands of cores were > overtaken in performance by a $500 GPU (today the GTX 780 or 980) you > can put in a desktop. I went from running all of my calculations on a > CPU cluster in 2009 to running 90% of my calculations on a GPU by the > time I graduated in 2013... and for people without as ready access to > supercomputers as myself the move was even more pronounced. > > This work is very serious, and numerical precision is typically of > immense importance. See, e.g., > http://www.sciencedirect.com/science/article/pii/S0010465512003098 and > http://pubs.acs.org/doi/abs/10.1021/ct400314y > > In our software, we can run simulations on a GPU or a CPU and the > results are *literally* indistinguishable. The transition to GPUs was > accompanied by a series of studies that investigated precisely your > concerns... we would never have started using GPUs if we didn't trust > GPU numbers as much as we did from the CPU. > > And NVidia is embracing this revolution (obviously) -- they are putting > a lot of time, effort, and money into ensuring the success of GPU high > performance computing. It is here to stay in the immediate future, and > refusing to use the technology will leave those that *could* benefit > from it at a severe disadvantage. (That said, GPUs aren't good at > everything, and CPUs are also here to stay.) > > And GPU performance gains are outpacing CPU performance gains -- I've > seen about two orders of magnitude improvement in computational > throughput over the past 6 years through the introduction of GPU > computing and improvements in GPU hardware. Thanks for the update. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Jason Swails <jason.swails@gmail.com> |
|---|---|
| Date | 2015-02-26 12:48 -0500 |
| Message-ID | <mailman.19278.1424972875.18130.python-list@python.org> |
| In reply to | #86462 |
On Thu, 2015-02-26 at 16:53 +0000, Sturla Molden wrote: > GPU computing is great if you have the following: > > 1. Your data structures are arrays floating point numbers. It actually works equally great, if not better, for integers. > 2. You have a data-parallel problem. This is the biggest one, IMO. ^^^ > 3. You are happy with single precision. NVidia GPUs have double-precision maths in hardware since compute capability 1.2 (GTX 280). That's ca. 2008. In optimized CPU code, you still get ~50% benefit going from double to single precision (it's rarely ever that high, but 20-30% is commonplace in my experience of optimized code). It's admittedly a bigger hit on most GPUs, but there are ways to work around it (e.g., fixed precision), and you can still do double precision work where it's needed. One of the articles I linked previously demonstrates that a hybrid precision model (based on fixed precision) provides exactly the same numerical stability as double precision (which is much better than pure single precision) for that application. Double precision can often be avoided in many parts of a calculation, using it only where those bits matter (like accumulators with potentially small contributions, subtractions of two numbers of similar magnitude, etc.). > 4. You have time to code erything in CUDA or OpenCL. This is the second biggest one, IMO. ^^^ > 5. You have enough video RAM to store your data. Again, it can be worked around, but the frequent GPU->CPU xfers involved if you can't fit everything on the GPU can be painstaking to limit its potentially devastating effects on performance. > > For Python the easiest solution is to use Numba Pro. Agreed, although I've never actually tried PyCUDA before... All the best, Jason
[toc] | [prev] | [next] | [standalone]
| From | Sturla Molden <sturla.molden@gmail.com> |
|---|---|
| Date | 2015-02-26 22:10 +0100 |
| Message-ID | <mailman.19289.1424985028.18130.python-list@python.org> |
| In reply to | #86462 |
On 26/02/15 18:48, Jason Swails wrote: > On Thu, 2015-02-26 at 16:53 +0000, Sturla Molden wrote: >> GPU computing is great if you have the following: >> >> 1. Your data structures are arrays floating point numbers. > > It actually works equally great, if not better, for integers. Right, but not complicated data structures with a lot of references or pointers. It requires data are laid out in regular arrays, and then it acts on these arrays in a data-parallel manner. It is designed to process vertices in parallel for computer graphics, and that is a limitation which is always there. It is not a CPU with 1024 cores. It is a "floating point monster" which can process 1024 vectors in parallel. You write a tiny kernel in a C-like language (CUDA, OpenCL) to process one vector, and then it will apply the kernel to all the vectors in an array of vectors. It is very comparable to how GLSL and Direct3D vertex and fragment shaders work. (The reason for which should be obvious.) The GPU is actually great for a lot of things in science, but it is not a CPU. The biggest mistake in the GPGPU hype is the idea that the GPU will behave like a CPU with many cores. Sturla
[toc] | [prev] | [next] | [standalone]
| From | Jason Swails <jason.swails@gmail.com> |
|---|---|
| Date | 2015-02-26 17:28 -0500 |
| Message-ID | <mailman.19294.1424989738.18130.python-list@python.org> |
| In reply to | #86462 |
[Multipart message — attachments visible in raw view] — view raw
On Thu, Feb 26, 2015 at 4:10 PM, Sturla Molden <sturla.molden@gmail.com> wrote: > On 26/02/15 18:48, Jason Swails wrote: > >> On Thu, 2015-02-26 at 16:53 +0000, Sturla Molden wrote: >> >>> GPU computing is great if you have the following: >>> >>> 1. Your data structures are arrays floating point numbers. >>> >> >> It actually works equally great, if not better, for integers. >> > > Right, but not complicated data structures with a lot of references or > pointers. It requires data are laid out in regular arrays, and then it acts > on these arrays in a data-parallel manner. It is designed to process > vertices in parallel for computer graphics, and that is a limitation which > is always there. It is not a CPU with 1024 cores. It is a "floating point > monster" which can process 1024 vectors in parallel. You write a tiny > kernel in a C-like language (CUDA, OpenCL) to process one vector, and then > it will apply the kernel to all the vectors in an array of vectors. It is > very comparable to how GLSL and Direct3D vertex and fragment shaders work. > (The reason for which should be obvious.) The GPU is actually great for a > lot of things in science, but it is not a CPU. The biggest mistake in the > GPGPU hype is the idea that the GPU will behave like a CPU with many cores. Very well summarized. At least in my field, though, it is well-known that GPUs are not 'uber-fast CPUs'. Algorithms have been redesigned, programs rewritten to take advantage of their architecture. It has been a *massive* investment of time and resources, but (unlike the Xeon Phi coprocessor [1]) has reaped most of its promised rewards. --Jason [1] I couldn't resist the jab. At several times the cost of the top of the line NVidia gaming card, the GPU is about 15-20x faster...
[toc] | [prev] | [next] | [standalone]
| From | Ethan Furman <ethan@stoneleaf.us> |
|---|---|
| Date | 2015-02-25 19:03 -0800 |
| Message-ID | <mailman.19232.1424919818.18130.python-list@python.org> |
| In reply to | #86459 |
[Multipart message — attachments visible in raw view] — view raw
On 02/25/2015 06:35 PM, John Ladasky wrote: > What I would REALLY like to do is to take advantage of my GPU. My NVidia graphics > card has 1152 cores and a 1.0 GHz clock. I wouldn't mind borrowing a few hundred > of those GPU cores at a time, and see what they can do. In theory, I calculate > that I can speed up the job by another five-fold. Only free for academic use: https://developer.nvidia.com/how-to-cuda-python unsure, but looks like free to use: http://mathema.tician.de/software/pycuda/ and, of course, the StackOverflow question: http://stackoverflow.com/q/5957554/208880 -- ~Ethan~
[toc] | [prev] | [next] | [standalone]
| From | Ethan Furman <ethan@stoneleaf.us> |
|---|---|
| Date | 2015-02-25 19:05 -0800 |
| Message-ID | <mailman.19234.1424919987.18130.python-list@python.org> |
| In reply to | #86459 |
[Multipart message — attachments visible in raw view] — view raw
Oh, and this one: http://www.cs.toronto.edu/~tijmen/gnumpy.html -- ~Ethan~
[toc] | [prev] | [next] | [standalone]
| From | John Ladasky <john_ladasky@sbcglobal.net> |
|---|---|
| Date | 2015-02-25 21:53 -0800 |
| Message-ID | <fecd3a22-21bb-42fc-97a4-bbfc54b7958d@googlegroups.com> |
| In reply to | #86465 |
Thanks for the various links, Ethan. I have encountered PyCUDA before, but not the other options. So far, I'm not seeing code examples which appear to do what I would like, which is simply to farm out one Python process to one GPU core. The examples all appear to parallelize array operations. I know, that's the easier way to break up a task. I may have to bite the bullet and learn how to use this: http://mklab.iti.gr/project/GPU-LIBSVM
[toc] | [prev] | [next] | [standalone]
| From | Christian Gollwitzer <auriocus@gmx.de> |
|---|---|
| Date | 2015-02-27 19:55 +0100 |
| Message-ID | <mcqei5$45q$1@dont-email.me> |
| In reply to | #86478 |
Am 26.02.15 um 06:53 schrieb John Ladasky: > Thanks for the various links, Ethan. I have encountered PyCUDA before, but not the other options. > > So far, I'm not seeing code examples which appear to do what I would like, which is simply to farm out one Python process to one GPU core. The examples all appear to parallelize array operations. I know, that's the easier way to break up a task. > > I may have to bite the bullet and learn how to use this: > > http://mklab.iti.gr/project/GPU-LIBSVM > If you can get this to run on your machine, it will surely outperform any efforts what you can do with a python-CUDA bridge on your own. GPU programming is hard, and efficient GPU programming is really hard. To get an impression, this talk shows how some changes to an OpenCL program can improve the speed by 60x compared to a naive implementation: http://web.archive.org/web/20101217181349/http://developer.amd.com/zones/OpenCLZone/Events/assets/Optimizations-ImageConvolution1.pdf Christian
[toc] | [prev] | [next] | [standalone]
| From | Jason Swails <jason.swails@gmail.com> |
|---|---|
| Date | 2015-02-26 10:27 -0500 |
| Message-ID | <mailman.19260.1424964440.18130.python-list@python.org> |
| In reply to | #86459 |
On Wed, 2015-02-25 at 18:35 -0800, John Ladasky wrote: > I've been working with machine learning for a while. Many of the > standard packages (e.g., scikit-learn) have fitting algorithms which > run in single threads. These algorithms are not themselves > parallelized. Perhaps, due to their unique mathematical requirements, > they cannot be paralleized. > > When one is investigating several potential models of one's data with > various settings for free parameters, it is still sometimes possible > to speed things up. On a modern machine, one can use Python's > multiprocessing.Pool to run separate instances of scikit-learn fits. > I am currently using ten of the twelve 3.3 GHz CPU cores on my machine > to do just that. And I can still browse the web with no observable > lag. :^) > > Still, I'm waiting hours for jobs to finish. Support vector > regression fitting is hard. > > What I would REALLY like to do is to take advantage of my GPU. My > NVidia graphics card has 1152 cores and a 1.0 GHz clock. I wouldn't > mind borrowing a few hundred of those GPU cores at a time, and see > what they can do. In theory, I calculate that I can speed up the job > by another five-fold. > > The trick is that each process would need to run some PYTHON code, not > CUDA or OpenCL. The child process code isn't particularly fancy. (I > should, for example, be able to switch that portion of my code to > static typing.) > > What is the most effective way to accomplish this task? GPU computing is a lot more than simply saying "run this on a GPU". To realize the performance gains promised by a GPU, you need to tailor your algorithms to take advantage of their hardware... SIMD reigns supreme where thread divergence and branching are far more expensive than they are in CPU computing. So even if you decide to somehow translate your Python code into a CUDA kernel, there is a good chance that you will be woefully disappointed in the resulting speedup (or even moreso if you actually get a slowdown :)). For example, a simple reduction is more expensive on a GPU than it is on a CPU for small arrays. A dot product, for example, has a part that's super fast on the GPU (element-by-element multiplication), and then a part that gets a lot slower (summing up all elements of the resulting multiplication). Each core on the GPU is a lot slower than a CPU (which is why a 1000-CUDA-core GPU doesn't run anywhere near 1000x faster than a CPU), so you really only get gains when they can all work efficiently together. Another example -- matrix multiplies are *fast*. Diagonalizations are slow (which is why in my field where diagonalizations are common requirements, they are often done on the CPU while *building* the matrix is done on the GPU). > > I came across a reference to a package called "Urutu" which may be > what I need, however it doesn't look like it is widely supported. Urutu seems to be built on PyCUDA and PyOpenCL (which are both written by the same person; Andreas Kloeckner at UIUC in the United States). Another package I would suggest looking into is numba, from Continuum Analytics: https://github.com/numba/numba. Unlike Urutu, their package is built on LLVM and Python bindings they've written to implement numpy-aware JIT capabilities. I believe they also permit compiling down to a GPU kernel through LLVM. One downside I've experienced with that package is that LLVM does not yet have a stable API (as I understand it), so they often lag behind support for the latest versions of LLVM. > > I would love it if the Python developers themselves added the ability > to spawn GPU processes to the Multiprocessing module! I would be stunned if this actually happened. If you're worried about performance, you get at least an order of magnitude performance boost by going to numpy or writing the kernel directly in C or Fortran. CPython itself just isn't structured to run on a GPU... maybe pypy will tackle that at some point in the probably-distant future. All the best, Jason -- Jason M. Swails BioMaPS, Rutgers University Postdoctoral Researcher
[toc] | [prev] | [next] | [standalone]
| From | Sturla Molden <sturla.molden@gmail.com> |
|---|---|
| Date | 2015-02-26 16:40 +0000 |
| Message-ID | <mailman.19268.1424968864.18130.python-list@python.org> |
| In reply to | #86459 |
If you are doing SVM regression with scikit-learn you are using libSVM. There is a CUDA accelerated version of this C library here: http://mklab.iti.gr/project/GPU-LIBSVM You can presumably reuse the wrapping code from scikit-learn. Sturla John Ladasky <john_ladasky@sbcglobal.net> wrote: > I've been working with machine learning for a while. Many of the > standard packages (e.g., scikit-learn) have fitting algorithms which run > in single threads. These algorithms are not themselves parallelized. > Perhaps, due to their unique mathematical requirements, they cannot be paralleized. > > When one is investigating several potential models of one's data with > various settings for free parameters, it is still sometimes possible to > speed things up. On a modern machine, one can use Python's > multiprocessing.Pool to run separate instances of scikit-learn fits. I > am currently using ten of the twelve 3.3 GHz CPU cores on my machine to > do just that. And I can still browse the web with no observable lag. :^) > > Still, I'm waiting hours for jobs to finish. Support vector regression fitting is hard. > > What I would REALLY like to do is to take advantage of my GPU. My NVidia > graphics card has 1152 cores and a 1.0 GHz clock. I wouldn't mind > borrowing a few hundred of those GPU cores at a time, and see what they > can do. In theory, I calculate that I can speed up the job by another five-fold. > > The trick is that each process would need to run some PYTHON code, not > CUDA or OpenCL. The child process code isn't particularly fancy. (I > should, for example, be able to switch that portion of my code to static typing.) > > What is the most effective way to accomplish this task? > > I came across a reference to a package called "Urutu" which may be what I > need, however it doesn't look like it is widely supported. > > I would love it if the Python developers themselves added the ability to > spawn GPU processes to the Multiprocessing module! > > Thanks for any advice and comments.
[toc] | [prev] | [next] | [standalone]
| From | John Ladasky <john_ladasky@sbcglobal.net> |
|---|---|
| Date | 2015-02-26 09:34 -0800 |
| Message-ID | <d6531afe-3e55-4141-a0db-67eed984da5d@googlegroups.com> |
| In reply to | #86513 |
On Thursday, February 26, 2015 at 8:41:26 AM UTC-8, Sturla Molden wrote: > If you are doing SVM regression with scikit-learn you are using libSVM. > There is a CUDA accelerated version of this C library here: > http://mklab.iti.gr/project/GPU-LIBSVM > > You can presumably reuse the wrapping code from scikit-learn. > > Sturla Hi Sturla, I recognize your name from the scikit-learn mailing list. If you look a few posts above yours in this thread, I am aware of gpu-libsvm. I don't know if I'm up to the task of reusing the scikit-learn wrapping code, but I am giving that option some serious thought. It isn't clear to me that gpu-libsvm can handle both SVM and SVR, and I have need of both algorithms. My training data sets are around 5000 vectors long. IF that graph on the gpu-libsvm web page is any indication of what I can expect from my own data (I note that they didn't specify the GPU card they're using), I might realize a 20x increase in speed.
[toc] | [prev] | [next] | [standalone]
| From | Sturla Molden <sturla.molden@gmail.com> |
|---|---|
| Date | 2015-02-26 21:54 +0100 |
| Message-ID | <mailman.19287.1424984054.18130.python-list@python.org> |
| In reply to | #86524 |
On 26/02/15 18:34, John Ladasky wrote: > Hi Sturla, I recognize your name from the scikit-learn mailing list. > > If you look a few posts above yours in this thread, I am aware of gpu-libsvm. I don't know if I'm up to the task of reusing the scikit-learn wrapping code, but I am giving that option some serious thought. It isn't clear to me that gpu-libsvm can handle both SVM and SVR, and I have need of both algorithms. > > My training data sets are around 5000 vectors long. IF that graph on the gpu-libsvm web page is any indication of what I can expect from my own data (I note that they didn't specify the GPU card they're using), I might realize a 20x increase in speed. A GPU is a "floating point monster", not a CPU. It is not designed to run things like CPython. It is also only designed to run threads in parallel on its cores, not processes. And as you know, in Python there is something called GIL. Further the GPU has hard-wired fine-grained load scheduling for data-parallel problems (e.g. matrix multiplication for vertex processing in 3D graphics). It is not like a thread on a GPU is comparable to a thread on a CPU. It is more like a parallel work queue, with the kind of abstraction you find in Apple's GCD. I don't think it really doable to make something like CPython run with thousands of parallel instances on a GPU. A GPU is not designed for that. A GPU is great if you can pass millions of floating point vectors as items to the work queue, with a tiny amount of computation per item. It would be crippled if you passed a thousand CPython interpreters and expect them to do a lot of work. Also, as it is libSVM that does the math in you case, you need to get libSVM to run on the GPU, not CPython. In most cases the best hardware for parallel scientific computing (taking economy and flexibility into account) is a Linux cluster which supports MPI. You can then use mpi4py or Cython to use MPI from your Python code. Sturla
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web