Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed4a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.088 X-Spam-Evidence: '*H*': 0.82; '*S*': 0.00; 'subject:Python': 0.06; 'received:198': 0.07; 'arrays': 0.09; 'bits': 0.09; 'python': 0.11; "(it's": 0.16; '^^^': 0.16; 'agreed,': 0.16; 'compute': 0.16; 'imo.': 0.16; 'integers.': 0.16; 'precision.': 0.16; 'rarely': 0.16; 'subtractions': 0.16; 'wrote:': 0.18; 'numerical': 0.19; 'thu,': 0.19; 'fit': 0.20; 'code,': 0.22; 'previously': 0.22; "i've": 0.25; 'second': 0.26; 'header:In-Reply-To:1': 0.27; 'tried': 0.27; 'point': 0.28; 'fixed': 0.29; '(like': 0.30; 'bigger': 0.30; 'needed.': 0.30; '(which': 0.31; 'code': 0.31; 'etc.).': 0.31; 'worked': 0.33; "can't": 0.35; 'problem.': 0.35; 'computing': 0.35; 'one,': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; '(e.g.,': 0.36; 'ram': 0.36; 'subject:?': 0.36; 'similar': 0.36; 'two': 0.37; 'message- id:@gmail.com': 0.38; 'easiest': 0.38; 'jason': 0.38; 'to:addr :python-list': 0.38; 'to:addr:python.org': 0.39; 'enough': 0.39; 'most': 0.60; 'hardware': 0.61; 'numbers': 0.61; 'matter': 0.61; 'great': 0.65; 'articles': 0.65; 'linked': 0.65; 'biggest': 0.67; 'benefit': 0.68; 'optimized': 0.68; 'limit': 0.70; 'potentially': 0.81; 'around,': 0.84; 'before...': 0.84; 'capability': 0.84; 'cuda': 0.84; 'demonstrates': 0.84; 'devastating': 0.84; 'high,': 0.84; 'maths': 0.84; 'stability': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:subject:from:to:date:in-reply-to:references:content-type :mime-version:content-transfer-encoding; bh=kfnEyeih3HCZRsVbxcuWlfy2cJj4O6U3MU2+AbnlcvI=; b=TU0Tn0VzIeDK+RawUtsgX5eNhc3qseBJhI0gwg9UcUyZugGMsUmEUi4Ti/ZhqCqw+Z CEX+M5GqZGCZwLQIHmKNSiXV5nWjU1nt3vKjEPTGU8qTcNge17tYSfY8jBuFbzQVplNY nDv13q3SoZdOJlUi9WbGq9AurVgW5P2sY4TskgKFL6qWAf8YBEvzGSoSqGD5gYRd0F7C B4By8PQPTI0yF0kiCBBZngPEif54Oy3WLR3Kg7no0qa8OCv0lIE5HQ4PpVr/w1yFGApc 7Z8dtq3jZ8zq64nkNIcEtV3xm0PJ/7jWUg8Rg4/nvcXNYwDG+Qh6+s7RMuNPo9j0sD7S HXjg== X-Received: by 10.140.202.213 with SMTP id x204mr20540756qha.95.1424972872306; Thu, 26 Feb 2015 09:47:52 -0800 (PST) Subject: Re: Parallelization of Python on GPU? From: Jason Swails To: python-list@python.org Date: Thu, 26 Feb 2015 12:48:03 -0500 In-Reply-To: <1915907417446661989.682673sturla.molden-gmail.com@news.gmane.org> References: <82642f3a-49e8-4982-b135-66ffc04d67d9@googlegroups.com> <54ee8ce2$0$11109$c3e8da3@news.astraweb.com> <1424963166.30927.73.camel@gmail.com> <1915907417446661989.682673sturla.molden-gmail.com@news.gmane.org> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.12.8 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 48 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1424972875 news.xs4all.nl 2865 [2001:888:2000:d::a6]:45727 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:86525 On Thu, 2015-02-26 at 16:53 +0000, Sturla Molden wrote: > GPU computing is great if you have the following: > > 1. Your data structures are arrays floating point numbers. It actually works equally great, if not better, for integers. > 2. You have a data-parallel problem. This is the biggest one, IMO. ^^^ > 3. You are happy with single precision. NVidia GPUs have double-precision maths in hardware since compute capability 1.2 (GTX 280). That's ca. 2008. In optimized CPU code, you still get ~50% benefit going from double to single precision (it's rarely ever that high, but 20-30% is commonplace in my experience of optimized code). It's admittedly a bigger hit on most GPUs, but there are ways to work around it (e.g., fixed precision), and you can still do double precision work where it's needed. One of the articles I linked previously demonstrates that a hybrid precision model (based on fixed precision) provides exactly the same numerical stability as double precision (which is much better than pure single precision) for that application. Double precision can often be avoided in many parts of a calculation, using it only where those bits matter (like accumulators with potentially small contributions, subtractions of two numbers of similar magnitude, etc.). > 4. You have time to code erything in CUDA or OpenCL. This is the second biggest one, IMO. ^^^ > 5. You have enough video RAM to store your data. Again, it can be worked around, but the frequent GPU->CPU xfers involved if you can't fit everything on the GPU can be painstaking to limit its potentially devastating effects on performance. > > For Python the easiest solution is to use Numba Pro. Agreed, although I've never actually tried PyCUDA before... All the best, Jason