Groups > comp.lang.python > #61127 > unrolled thread

squeeze out some performance

Started by	Robert Voigtländer <r.voigtlaender@gmail.com>
First post	2013-12-06 00:47 -0800
Last post	2013-12-10 04:14 -0800
Articles	17 — 9 participants

Back to article view | Back to comp.lang.python

  squeeze out some performance Robert Voigtländer <r.voigtlaender@gmail.com> - 2013-12-06 00:47 -0800
    Re: squeeze out some performance Jeremy Sanders <jeremy@jeremysanders.net> - 2013-12-06 10:46 +0100
    Re: squeeze out some performance Chris Angelico <rosuav@gmail.com> - 2013-12-06 23:13 +1100
      Re: squeeze out some performance Robert Voigtländer <r.voigtlaender@gmail.com> - 2013-12-06 08:29 -0800
        Re: squeeze out some performance Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-06 16:36 +0000
          Re: squeeze out some performance Robert Voigtländer <r.voigtlaender@gmail.com> - 2013-12-06 08:44 -0800
    Re: squeeze out some performance John Ladasky <john_ladasky@sbcglobal.net> - 2013-12-06 08:52 -0800
      Re: squeeze out some performance Joel Goldstick <joel.goldstick@gmail.com> - 2013-12-06 17:07 -0500
      Re: squeeze out some performance Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-06 22:38 +0000
      Re: squeeze out some performance Dan Stromberg <drsalists@gmail.com> - 2013-12-06 15:01 -0800
        Re: squeeze out some performance Robert Voigtländer <r.voigtlaender@gmail.com> - 2013-12-09 06:19 -0800
          Re: squeeze out some performance Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-09 14:52 +0000
      Re: squeeze out some performance Robin Becker <robin@reportlab.com> - 2013-12-09 15:54 +0000
      Re: squeeze out some performance Dave Angel <davea@davea.name> - 2013-12-09 15:46 -0500
      Re: squeeze out some performance Robin Becker <robin@reportlab.com> - 2013-12-10 12:54 +0000
    Re: squeeze out some performance Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-09 23:57 +0000
      Re: squeeze out some performance Robert Voigtländer <r.voigtlaender@gmail.com> - 2013-12-10 04:14 -0800

#61127 — squeeze out some performance

From	Robert Voigtländer <r.voigtlaender@gmail.com>
Date	2013-12-06 00:47 -0800
Subject	squeeze out some performance
Message-ID	<ce6477c1-29c0-43b3-9e56-336a5825869b@googlegroups.com>

Hi,

I try to squeeze out some performance of the code pasted on the link below.
http://pastebin.com/gMnqprST

The code will be used to continuously analyze sonar sensor data. I set this up to calculate all coordinates in a sonar cone without heavy use of trigonometry (assuming that this way is faster in the end).

I optimized as much as I could. Maybe one of you has another bright idea to squeeze out a bit more?

Thanks
Robert

[toc] | [next] | [standalone]

#61130

From	Jeremy Sanders <jeremy@jeremysanders.net>
Date	2013-12-06 10:46 +0100
Message-ID	<mailman.3633.1386323183.18130.python-list@python.org>
In reply to	#61127

Robert Voigtländer wrote:

> I try to squeeze out some performance of the code pasted on the link
> below. http://pastebin.com/gMnqprST
> 
> The code will be used to continuously analyze sonar sensor data. I set
> this up to calculate all coordinates in a sonar cone without heavy use of
> trigonometry (assuming that this way is faster in the end).
> 
> I optimized as much as I could. Maybe one of you has another bright idea
> to squeeze out a bit more?

This sort of code is probably harder to make faster in pure python. You 
could try profiling it to see where the hot spots are. Perhaps the choice of 
arrays or sets might have some speed impact.

One idea would be to use something like cython to compile your python code 
to an extension module, with some hints to the types of the various values.

I would go down the geometry route. If you can restate your problem in terms 
of geometry, it might be possible to replace all that code with a few numpy 
array operations.

e.g. for finding pixels in a circle of radius 50
import numpy as np
radiussqd = np.fromfunction(lambda y,x: (y-50)**2+(x-50)**2, (100,100) )
all_y, all_x = np.indices((100,100))
yvals = all_y[radiussqd < 50**2]

Jeremy

[toc] | [prev] | [next] | [standalone]

#61137

From	Chris Angelico <rosuav@gmail.com>
Date	2013-12-06 23:13 +1100
Message-ID	<mailman.3642.1386332017.18130.python-list@python.org>
In reply to	#61127

On Fri, Dec 6, 2013 at 8:46 PM, Jeremy Sanders <jeremy@jeremysanders.net> wrote:
> This sort of code is probably harder to make faster in pure python. You
> could try profiling it to see where the hot spots are. Perhaps the choice of
> arrays or sets might have some speed impact.

I'd make this recommendation MUCH stronger.

Rule 1 of optimization: Don't.
Rule 2 (for experts only): Don't yet.

Once you find that your program actually is running too slowly, then
AND ONLY THEN do you start looking at tightening something up. You'll
be amazed how little you need to change; start with good clean
idiomatic code, and then if it takes too long, you tweak just a couple
of things and it's fast enough. And when you do come to the
tweaking...

Rule 3: Measure twice, cut once.
Rule 4: Actually, measure twenty times, cut once.

Profile your code to find out what's actually slow. This is very
important. Here's an example from a real application (not in Python,
it's in a semantically-similar language called Pike):

https://github.com/Rosuav/Gypsum/blob/d9907e1507c52189c83ae25f5d7be85235b616fa/window.pike

I noticed that I could saturate one CPU core by typing commands very
quickly. Okay. That gets us past the first two rules (it's a MUD
client, it should not be able to saturate one core of an i5). The code
looks roughly like this:

paint():
for line in lines:
if line_is_visible:
paint_line(line)

paint_line():
for piece_of_text in text:
if highlighted: draw_highlighted()
else: draw_not_highlighted()

My first guess was that the actual drawing was taking the time, since
that's a whole lot of GTK calls. But no; the actual problem was the
iteration across all lines and then finding out if they're visible or
not (possibly because it obliterates the CPU caches). Once the
scrollback got to a million lines or so, that was prohibitively
expensive. I didn't realize that until I actually profiled the code
and _measured_ where the time was being spent.

How fast does your code run? How fast do you need it to run? Lots of
optimization questions are answered by "Yaknow what, it don't even
matter", unless you're running in a tight loop, or on a
microcontroller, or something. Halving the time taken sounds great
until you see that it's currently taking 0.0001 seconds and happens in
response to user action.

ChrisA

[toc] | [prev] | [next] | [standalone]

#61164

From	Robert Voigtländer <r.voigtlaender@gmail.com>
Date	2013-12-06 08:29 -0800
Message-ID	<35814163-d0a5-4927-b61d-7d3b4ee0989b@googlegroups.com>
In reply to	#61137

Thanks for your replies.

I already did some basic profiling and optimized a lot. Especially with help of a goof python performance tips list I found.

I think I'll follow the cython path.
The geometry approach also sound good. But it's way above my math/geometry knowledge.

Thanks for your input!

[toc] | [prev] | [next] | [standalone]

#61165

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2013-12-06 16:36 +0000
Message-ID	<mailman.3656.1386347783.18130.python-list@python.org>
In reply to	#61164

On 06/12/2013 16:29, Robert Voigtländer wrote:
> Thanks for your replies.
>
> I already did some basic profiling and optimized a lot. Especially  > with help of a goof python performance tips list I found.
>

Wonderful typo -----^ :)

> I think I'll follow the cython path.
> The geometry approach also sound good. But it's way above my math/geometry knowledge.
>
> Thanks for your input!
>


-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#61167

From	Robert Voigtländer <r.voigtlaender@gmail.com>
Date	2013-12-06 08:44 -0800
Message-ID	<dae19b90-f141-422f-b5fb-bd2631944fe5@googlegroups.com>
In reply to	#61165

Am Freitag, 6. Dezember 2013 17:36:03 UTC+1 schrieb Mark Lawrence:

> > I already did some basic profiling and optimized a lot. Especially  > with help of a goof python performance tips list I found.
> 
> Wonderful typo -----^ :)
> 

Oh well :-) ... it was a good one. Just had a quick look at Cython. Looks great. Thanks for the tip.

[toc] | [prev] | [next] | [standalone]

#61170

From	John Ladasky <john_ladasky@sbcglobal.net>
Date	2013-12-06 08:52 -0800
Message-ID	<9df6ccd7-828d-43be-ac49-fe1c6a38bae7@googlegroups.com>
In reply to	#61127

On Friday, December 6, 2013 12:47:54 AM UTC-8, Robert Voigtländer wrote:

> I try to squeeze out some performance of the code pasted on the link below.
> http://pastebin.com/gMnqprST

Several comments:

1) I find this program to be very difficult to read, largely because there's a whole LOT of duplicated code.  Look at lines 53-80, and lines 108-287, and lines 294-311.  It makes it harder to see what this algorithm actually does.  Is there a way to refactor some of this code to use some shared function calls?

2) I looked up the "Bresenham algorithm", and found two references which may be relevant.  The original algorithm was one which computed good raster approximations to straight lines.  The second algorithm described may be more pertinent to you, because it draws arcs of circles.

    http://en.wikipedia.org/wiki/Bresenham's_line_algorithm
    http://en.wikipedia.org/wiki/Midpoint_circle_algorithm

Both of these algorithms are old, from the 1960's, and can be implemented using very simple CPU register operations and minimal memory.  Both of the web pages I referenced have extensive example code and pseudocode, and discuss optimization.  If you need speed, is this really a job for Python?

3) I THINK that I see some code -- those duplicated parts -- which might benefit from the use of multiprocessing (assuming that you have a multi-core CPU).  But I would have to read more deeply to be sure.  I need to understand the algorithm more completely, and exactly how you have modified it for your needs.

[toc] | [prev] | [next] | [standalone]

#61186

From	Joel Goldstick <joel.goldstick@gmail.com>
Date	2013-12-06 17:07 -0500
Message-ID	<mailman.3667.1386368018.18130.python-list@python.org>
In reply to	#61170

[Multipart message — attachments visible in raw view] — view raw

On Fri, Dec 6, 2013 at 11:52 AM, John Ladasky <john_ladasky@sbcglobal.net>wrote:

> On Friday, December 6, 2013 12:47:54 AM UTC-8, Robert Voigtländer wrote:
>
> > I try to squeeze out some performance of the code pasted on the link
> below.
> > http://pastebin.com/gMnqprST
>

Not that this will speed up your code but you have this:

    if not clockwise:
        s = start
        start = end
        end = s

Python people would write:
    end, start = start, end


You have quite a few if statements that involve multiple comparisons of the
same variable.  Did you know you can do things like this in python:

>>> x = 4
>>> 2 < x < 7
True
>>> x = 55
>>> 2 < x < 7
False


> Several comments:
>
> 1) I find this program to be very difficult to read, largely because
> there's a whole LOT of duplicated code.  Look at lines 53-80, and lines
> 108-287, and lines 294-311.  It makes it harder to see what this algorithm
> actually does.  Is there a way to refactor some of this code to use some
> shared function calls?
>
> 2) I looked up the "Bresenham algorithm", and found two references which
> may be relevant.  The original algorithm was one which computed good raster
> approximations to straight lines.  The second algorithm described may be
> more pertinent to you, because it draws arcs of circles.
>
>     http://en.wikipedia.org/wiki/Bresenham's_line_algorithm
>     http://en.wikipedia.org/wiki/Midpoint_circle_algorithm
>
> Both of these algorithms are old, from the 1960's, and can be implemented
> using very simple CPU register operations and minimal memory.  Both of the
> web pages I referenced have extensive example code and pseudocode, and
> discuss optimization.  If you need speed, is this really a job for Python?
>
> 3) I THINK that I see some code -- those duplicated parts -- which might
> benefit from the use of multiprocessing (assuming that you have a
> multi-core CPU).  But I would have to read more deeply to be sure.  I need
> to understand the algorithm more completely, and exactly how you have
> modified it for your needs.
> --
> https://mail.python.org/mailman/listinfo/python-list
>



-- 
Joel Goldstick
http://joelgoldstick.com

[toc] | [prev] | [next] | [standalone]

#61188

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2013-12-06 22:38 +0000
Message-ID	<mailman.3668.1386369561.18130.python-list@python.org>
In reply to	#61170

On 06/12/2013 16:52, John Ladasky wrote:
> On Friday, December 6, 2013 12:47:54 AM UTC-8, Robert Voigtländer wrote:
>
>> I try to squeeze out some performance of the code pasted on the link below.
>> http://pastebin.com/gMnqprST
>
> Several comments:
>
> 1) I find this program to be very difficult to read, largely because there's a whole LOT of duplicated code.  Look at lines 53-80, and lines 108-287, and lines 294-311.  It makes it harder to see what this algorithm actually does.  Is there a way to refactor some of this code to use some shared function calls?
>

A handy tool for detecting duplicated code here 
http://clonedigger.sourceforge.net/ for anyone who's interested.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#61189

From	Dan Stromberg <drsalists@gmail.com>
Date	2013-12-06 15:01 -0800
Message-ID	<mailman.3669.1386370917.18130.python-list@python.org>
In reply to	#61170

[Multipart message — attachments visible in raw view] — view raw

On Fri, Dec 6, 2013 at 2:38 PM, Mark Lawrence <breamoreboy@yahoo.co.uk>wrote:

> On 06/12/2013 16:52, John Ladasky wrote:
>
>> On Friday, December 6, 2013 12:47:54 AM UTC-8, Robert Voigtländer wrote:
>>
>>  I try to squeeze out some performance of the code pasted on the link
>>> below.
>>> http://pastebin.com/gMnqprST
>>>
>>
>> Several comments:
>>
>> 1) I find this program to be very difficult to read, largely because
>> there's a whole LOT of duplicated code.  Look at lines 53-80, and lines
>> 108-287, and lines 294-311.  It makes it harder to see what this algorithm
>> actually does.  Is there a way to refactor some of this code to use some
>> shared function calls?
>>
>>
> A handy tool for detecting duplicated code here
> http://clonedigger.sourceforge.net/ for anyone who's interested.
>

Pylint does this too...

[toc] | [prev] | [next] | [standalone]

#61388

From	Robert Voigtländer <r.voigtlaender@gmail.com>
Date	2013-12-09 06:19 -0800
Message-ID	<5320aa30-8ea5-4680-8796-ef2c7dac3984@googlegroups.com>
In reply to	#61189

Am Samstag, 7. Dezember 2013 00:01:49 UTC+1 schrieb Dan Stromberg:
> On Fri, Dec 6, 2013 at 2:38 PM, Mark Lawrence <bream...@yahoo.co.uk> wrote:
> 
> 
> On 06/12/2013 16:52, John Ladasky wrote:
> 
> 
> On Friday, December 6, 2013 12:47:54 AM UTC-8, Robert Voigtländer wrote:
> 
> 
> 
> 
> I try to squeeze out some performance of the code pasted on the link below.
> 
> http://pastebin.com/gMnqprST
> 
> 
> 
> 
> Several comments:
> 
> 
> 
> 1) I find this program to be very difficult to read, largely because there's a whole LOT of duplicated code.  Look at lines 53-80, and lines 108-287, and lines 294-311.  It makes it harder to see what this algorithm actually does.  Is there a way to refactor some of this code to use some shared function calls?
> 
> 
> 
> 
> 
> 
> 
> A handy tool for detecting duplicated code here http://clonedigger.sourceforge.net/ for anyone who's interested.
> 
> 
> 
> Pylint does this too...

Thanks again. I'll try to compress the code and have a look at the "multiple comparisons" topic.

Robert

[toc] | [prev] | [next] | [standalone]

#61389

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2013-12-09 14:52 +0000
Message-ID	<mailman.3780.1386600779.18130.python-list@python.org>
In reply to	#61388

On 09/12/2013 14:19, Robert Voigtländer wrote:
> Am Samstag, 7. Dezember 2013 00:01:49 UTC+1 schrieb Dan Stromberg:
>> On Fri, Dec 6, 2013 at 2:38 PM, Mark Lawrence <bream...@yahoo.co.uk> wrote:
>>
>>
>> On 06/12/2013 16:52, John Ladasky wrote:
>>
>>
>> On Friday, December 6, 2013 12:47:54 AM UTC-8, Robert Voigtländer wrote:
>>
>>
>>
>>
>> I try to squeeze out some performance of the code pasted on the link below.
>>
>> http://pastebin.com/gMnqprST
>>
>>
>>
>>
>> Several comments:
>>
>>
>>
>> 1) I find this program to be very difficult to read, largely because there's a whole LOT of duplicated code.  Look at lines 53-80, and lines 108-287, and lines 294-311.  It makes it harder to see what this algorithm actually does.  Is there a way to refactor some of this code to use some shared function calls?
>>
>>
>>
>>
>>
>>
>>
>> A handy tool for detecting duplicated code here http://clonedigger.sourceforge.net/ for anyone who's interested.
>>
>>
>>
>> Pylint does this too...
>
> Thanks again. I'll try to compress the code and have a look at the "multiple comparisons" topic.
>
> Robert
>

Would you be kind enough to compress your messages while you're at it by 
reading and actioning https://wiki.python.org/moin/GoogleGroupsPython 
thanks, a quick glance above will tell you why :)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#61392

From	Robin Becker <robin@reportlab.com>
Date	2013-12-09 15:54 +0000
Message-ID	<mailman.3783.1386604487.18130.python-list@python.org>
In reply to	#61170

On 06/12/2013 22:07, Joel Goldstick wrote:
..........
>>
>
> Not that this will speed up your code but you have this:
>
>      if not clockwise:
>          s = start
>          start = end
>          end = s
>
> Python people would write:
>      end, start = start, end

this works for some small number of variables, but on my machine with python 2.7 
I start losing with 4 variables eg

> C:\code\optichrome\74663>python -mtimeit -s"a=1;b=2;c=3;d=4" "a,b,c,d=b,c,d,a"
> 1000000 loops, best of 3: 0.206 usec per loop
>
> C:\code\optichrome\74663>python -mtimeit -s"a=1;b=2;c=3;d=4" "t=a;a=b;b=c;c=d;d=t"
> 10000000 loops, best of 3: 0.118 usec per loop


It doesn't seem to make much difference that the variables are related as I see 
a similar behaviour for simple assignments

> C:\code\optichrome\74663>python -mtimeit -s"a=1;b=2;c=3;d=4;e=5;f=6;g=7;h=8" "a,b,c,d=e,f,g,h"
> 1000000 loops, best of 3: 0.204 usec per loop
>
> C:\code\optichrome\74663>python -mtimeit -s"a=1;b=2;c=3;d=4;e=5;f=6;g=7;h=8" "a=e;b=f;c=g;d=h"
> 10000000 loops, best of 3: 0.103 usec per loop

for less than 4 variables the tuple method is faster.
-- 
Robin Becker

[toc] | [prev] | [next] | [standalone]

#61413

From	Dave Angel <davea@davea.name>
Date	2013-12-09 15:46 -0500
Message-ID	<mailman.3795.1386621919.18130.python-list@python.org>
In reply to	#61170

On Mon, 09 Dec 2013 15:54:36 +0000, Robin Becker 
<robin@reportlab.com> wrote:
> On 06/12/2013 22:07, Joel Goldstick wrote:
> >      end, start = start, end

> a similar behaviour for simple assignments

> for less than 4 variables the tuple method is faster.

What does speed have to do with it?  When you want to swap two 
variables,  the tuple assignment reads better.

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#61463

From	Robin Becker <robin@reportlab.com>
Date	2013-12-10 12:54 +0000
Message-ID	<mailman.3823.1386680060.18130.python-list@python.org>
In reply to	#61170

On 09/12/2013 20:46, Dave Angel wrote:
> On Mon, 09 Dec 2013 15:54:36 +0000, Robin Becker <robin@reportlab.com> wrote:
>> On 06/12/2013 22:07, Joel Goldstick wrote:
>> >      end, start = start, end
>
>> a similar behaviour for simple assignments
>
>> for less than 4 variables the tuple method is faster.
>
> What does speed have to do with it?  When you want to swap two variables,  the
> tuple assignment reads better.
>
Well the OP is asking about performance so I guess the look and feel might be 
sacrificed for speed in some circumstances.

The tuple approach is more appealing when the lhs & rhs are connected, but it 
seems that even for more general assignments the tuple assignment may be faster 
for small numbers of variables. Looking at the output of dis for this case

d,e,f=c,b,a

it seems that we get code like this
LOAD_NAME                3 (c)
LOAD_NAME                2 (b)
LOAD_NAME                1 (a)
ROT_THREE
ROT_TWO
STORE_NAME               4 (d)
STORE_NAME               5 (e)
STORE_NAME               6 (f)



for

d = c
e = b
f = a

we get this
LOAD_NAME                3 (c)
STORE_NAME               4 (d)
LOAD_NAME                2 (b)
STORE_NAME               5 (e)
LOAD_NAME                1 (a)
STORE_NAME               6 (f)

which is not obviously slower, but I consistently get the former to be faster. I 
suppose it's a cache issue or something. However, for this particular case when 
the variables are not connected I find the tuple assignment less pythonic, but 
perhaps faster (even though in this case no tuples are built).
-- 
Robin Becker

[toc] | [prev] | [next] | [standalone]

#61425

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2013-12-09 23:57 +0000
Message-ID	<mailman.3803.1386633504.18130.python-list@python.org>
In reply to	#61127

On 06/12/2013 08:47, Robert Voigtländer wrote:
> Hi,
>
> I try to squeeze out some performance of the code pasted on the link below.
> http://pastebin.com/gMnqprST
>
> The code will be used to continuously analyze sonar sensor data. I set this up to calculate all coordinates in a sonar cone without heavy use of trigonometry (assuming that this way is faster in the end).
>
> I optimized as much as I could. Maybe one of you has another bright idea to squeeze out a bit more?
>
> Thanks
> Robert
>

Actually for optimised code it looks very similar to some code posted 
here 
http://www.daniweb.com/software-development/python/threads/321181/python-bresenham-circle-arc-algorithm 
over three years ago.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#61462

From	Robert Voigtländer <r.voigtlaender@gmail.com>
Date	2013-12-10 04:14 -0800
Message-ID	<4162ca51-a334-471e-84c7-8478411d41f7@googlegroups.com>
In reply to	#61425

> Actually for optimised code it looks very similar to some code posted 
> 
> here 
> 
> http://www.daniweb.com/software-development/python/threads/321181/python-bresenham-circle-arc-algorithm 
> 
> over three years ago.
> 

This is where it origins from. I just extended it for my needs and now want to optimize it.
List comprehensions instead of some for loops brought another 25%. And made the code shorter.

[toc] | [prev] | [standalone]

csiph-web

squeeze out some performance

Contents

#61127 — squeeze out some performance

#61130

#61137

#61164

#61165

#61167

#61170

#61186

#61188

#61189

#61388

#61389

#61392

#61413

#61463

#61425

#61462