Groups > comp.lang.python > #50974 > unrolled thread

How can I make this piece of code even faster?

Started by	pablobarhamalzas@gmail.com
First post	2013-07-20 13:22 -0700
Last post	2013-07-21 12:39 +0100
Articles	20 — 12 participants

Back to article view | Back to comp.lang.python

  How can I make this piece of code even faster? pablobarhamalzas@gmail.com - 2013-07-20 13:22 -0700
    Re: How can I make this piece of code even faster? Fabio Zadrozny <fabiofz@gmail.com> - 2013-07-20 18:05 -0300
    Re: How can I make this piece of code even faster? Roy Smith <roy@panix.com> - 2013-07-20 17:25 -0400
    Re: How can I make this piece of code even faster? pablobarhamalzas@gmail.com - 2013-07-20 15:45 -0700
    Re: How can I make this piece of code even faster? Chris Angelico <rosuav@gmail.com> - 2013-07-21 08:55 +1000
    Re: How can I make this piece of code even faster? pablobarhamalzas@gmail.com - 2013-07-20 16:24 -0700
      Re: How can I make this piece of code even faster? Chris Angelico <rosuav@gmail.com> - 2013-07-21 09:29 +1000
    Re: How can I make this piece of code even faster? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-21 05:11 +0000
      Re: How can I make this piece of code even faster? Paul Rudin <paul.nospam@rudin.co.uk> - 2013-07-21 08:11 +0100
        Re: How can I make this piece of code even faster? Chris Angelico <rosuav@gmail.com> - 2013-07-21 19:21 +1000
    Re: How can I make this piece of code even faster? Peter Otten <__peter__@web.de> - 2013-07-21 09:10 +0200
    Re: How can I make this piece of code even faster? Serhiy Storchaka <storchaka@gmail.com> - 2013-07-21 10:11 +0300
    Re: How can I make this piece of code even faster? Christian Gollwitzer <auriocus@gmx.de> - 2013-07-21 09:24 +0200
    Re: How can I make this piece of code even faster? pablobarhamalzas@gmail.com - 2013-07-21 03:19 -0700
      Re: How can I make this piece of code even faster? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-21 10:31 +0000
        Re: How can I make this piece of code even faster? pablobarhamalzas@gmail.com - 2013-07-21 03:48 -0700
          Re: How can I make this piece of code even faster? Stefan Behnel <stefan_ml@behnel.de> - 2013-07-21 14:49 +0200
        Re: How can I make this piece of code even faster? Chris Angelico <rosuav@gmail.com> - 2013-07-21 20:48 +1000
      Re: How can I make this piece of code even faster? Michael Torrie <torriem@gmail.com> - 2013-07-21 09:27 -0600
    Re: How can I make this piece of code even faster? Joshua Landau <joshua@landau.ws> - 2013-07-21 12:39 +0100

#50974 — How can I make this piece of code even faster?

From	pablobarhamalzas@gmail.com
Date	2013-07-20 13:22 -0700
Subject	How can I make this piece of code even faster?
Message-ID	<6bf4d298-b425-4357-9c1a-192e6e6cd9f0@googlegroups.com>

Ok, I'm working on a predator/prey simulation, which evolve using genetic algorithms. At the moment, they use a quite simple feed-forward neural network, which can change size over time. Each brain "tick" is performed by the following function (inside the Brain class):

    def tick(self):
        input_num = self.input_num 
        hidden_num = self.hidden_num
        output_num = self.output_num
         
        hidden = [0]*hidden_num
        output = [0]*output_num
        
        inputs = self.input
        h_weight = self.h_weight
        o_weight = self.o_weight
        
        e = math.e
        
        count = -1
        for x in range(hidden_num):
            temp = 0
            for y in range(input_num):
                count += 1
                temp += inputs[y] * h_weight[count]
            hidden[x] = 1/(1+e**(-temp))  
        
        count = -1      
        for x in range(output_num):
            temp = 0 
            for y in range(hidden_num):
                count += 1 
                temp += hidden[y] * o_weight[count]
            output[x] = 1/(1+e**(-temp))  
             
        self.output = output 

The function is actually quite fast (~0.040 seconds per 200 calls, using 10 input, 20 hidden and 3 output neurons), and used to be much slower untill I fiddled about with it a bit to make it faster. However, it is still somewhat slow for what I need it.
 
My question to you is if you an see any obvious (or not so obvious) way of making this faster. I've heard about numpy and have been reading about it, but I really can't see how it could be implemented here.

Cheers!

[toc] | [next] | [standalone]

#50975

From	Fabio Zadrozny <fabiofz@gmail.com>
Date	2013-07-20 18:05 -0300
Message-ID	<mailman.4931.1374354373.3114.python-list@python.org>
In reply to	#50974

[Multipart message — attachments visible in raw view] — view raw

On Sat, Jul 20, 2013 at 5:22 PM, <pablobarhamalzas@gmail.com> wrote:

> Ok, I'm working on a predator/prey simulation, which evolve using genetic
> algorithms. At the moment, they use a quite simple feed-forward neural
> network, which can change size over time. Each brain "tick" is performed by
> the following function (inside the Brain class):
>
>     def tick(self):
>         input_num = self.input_num
>         hidden_num = self.hidden_num
>         output_num = self.output_num
>
>         hidden = [0]*hidden_num
>         output = [0]*output_num
>
>         inputs = self.input
>         h_weight = self.h_weight
>         o_weight = self.o_weight
>
>         e = math.e
>
>         count = -1
>         for x in range(hidden_num):
>             temp = 0
>             for y in range(input_num):
>                 count += 1
>                 temp += inputs[y] * h_weight[count]
>             hidden[x] = 1/(1+e**(-temp))
>
>         count = -1
>         for x in range(output_num):
>             temp = 0
>             for y in range(hidden_num):
>                 count += 1
>                 temp += hidden[y] * o_weight[count]
>             output[x] = 1/(1+e**(-temp))
>
>         self.output = output
>
> The function is actually quite fast (~0.040 seconds per 200 calls, using
> 10 input, 20 hidden and 3 output neurons), and used to be much slower
> untill I fiddled about with it a bit to make it faster. However, it is
> still somewhat slow for what I need it.
>
> My question to you is if you an see any obvious (or not so obvious) way of
> making this faster. I've heard about numpy and have been reading about it,
> but I really can't see how it could be implemented here.
>
> Cheers!
> --
> http://mail.python.org/mailman/listinfo/python-list
>



Low level optimizations:

If you're in Python 2.x (and not 3), you should use xrange() instead of
range(), or maybe even create a local variable and increment it and check
its value within a while (that way you can save a few instructions on
method invocations from xrange/range).

Anyways, if that's not fast enough, just port it to c/c++ (or one of the
alternatives to speed it up while still in python: numba, cython,
shedskin). Or (if you can), try to use PyPy and see if you get more speed
without doing anything.

Cheers,

Fabio

[toc] | [prev] | [next] | [standalone]

#50976

From	Roy Smith <roy@panix.com>
Date	2013-07-20 17:25 -0400
Message-ID	<roy-B4482C.17255220072013@70-1-84-166.pools.spcsdns.net>
In reply to	#50974

In article <6bf4d298-b425-4357-9c1a-192e6e6cd9f0@googlegroups.com>,
 pablobarhamalzas@gmail.com wrote:

> Ok, I'm working on a predator/prey simulation, which evolve using genetic 
> algorithms. At the moment, they use a quite simple feed-forward neural 
> network, which can change size over time. Each brain "tick" is performed by 
> the following function (inside the Brain class):
> 
>     def tick(self):
>         input_num = self.input_num 
>         hidden_num = self.hidden_num
>         output_num = self.output_num
>          
>         hidden = [0]*hidden_num
>         output = [0]*output_num
>         
>         inputs = self.input
>         h_weight = self.h_weight
>         o_weight = self.o_weight
>         
>         e = math.e
>         
>         count = -1
>         for x in range(hidden_num):
>             temp = 0
>             for y in range(input_num):
>                 count += 1
>                 temp += inputs[y] * h_weight[count]
>             hidden[x] = 1/(1+e**(-temp))  
>         
>         count = -1      
>         for x in range(output_num):
>             temp = 0 
>             for y in range(hidden_num):
>                 count += 1 
>                 temp += hidden[y] * o_weight[count]
>             output[x] = 1/(1+e**(-temp))  
>              
>         self.output = output 
> 
> The function is actually quite fast (~0.040 seconds per 200 calls, using 10 
> input, 20 hidden and 3 output neurons), and used to be much slower untill I 
> fiddled about with it a bit to make it faster. However, it is still somewhat 
> slow for what I need it.
>  
> My question to you is if you an see any obvious (or not so obvious) way of 
> making this faster. I've heard about numpy and have been reading about it, 
> but I really can't see how it could be implemented here.

First thing, I would add some instrumentation to see where the most time 
is being spent.  My guess is in the first set of nested loops, where the 
inner loop gets executed hidden_num * input_num (i.e. 10 * 20 = 200) 
times.  But timing data is better than my guess.

Assuming I'm right, though, you do compute range(input_num) 20 times.  
You don't need to do that.  You might try xrange(), or you might just 
factor out creating the list outside the outer loop.  But, none of that 
seems like it should make much difference.

What possible values can temp take?  If it can only take certain 
discrete values and you can enumerate them beforehand, you might want to 
build a dict mapping temp -> 1/(1+e**(-temp)) and then all that math 
becomes just a table lookup.

[toc] | [prev] | [next] | [standalone]

#50982

From	pablobarhamalzas@gmail.com
Date	2013-07-20 15:45 -0700
Message-ID	<49c89529-2e93-474a-9d1f-1ee2288085c1@googlegroups.com>
In reply to	#50974

Hi there.
I'm using python 3, where xrange doesn't exist any more (range is now equivalent). And "temp" doesn't have any fixed discrete values it always takes.

I have tried cython but it doesn't seem to work well (maybe using it wrong?).

Any other ideas?

[toc] | [prev] | [next] | [standalone]

#50983

From	Chris Angelico <rosuav@gmail.com>
Date	2013-07-21 08:55 +1000
Message-ID	<mailman.4934.1374360916.3114.python-list@python.org>
In reply to	#50974

On Sun, Jul 21, 2013 at 6:22 AM,  <pablobarhamalzas@gmail.com> wrote:
>             temp = 0
>             for y in range(input_num):
>                 count += 1
>                 temp += inputs[y] * h_weight[count]
>             hidden[x] = 1/(1+e**(-temp))

It's a micro-optimization that'll probably have negligible effect, but
it can't hurt: Instead of adding to temp and raising e to -temp, carry
the value of temp as a negative number:

                temp -= inputs[y] * h_weight[count]
            hidden[x] = 1/(1+e**temp)

Ditto in the second loop.

Not sure which way performance would go, but would it be more readable
to take an iterator for h_weight and o_weight? Something like this:

# Slot this into your existing structure
        inputs = self.input
        h_weight = iter(self.h_weight)
        o_weight = iter(self.o_weight)

        e = math.e

        for x in range(hidden_num):
            temp = 0
            for y in inputs:
                temp += y * next(h_weight)
            hidden[x] = 1/(1+e**(-temp))

        for x in range(output_num):
            temp = 0
            for y in hidden:
                temp += y * next(o_weight)
            output[x] = 1/(1+e**(-temp))
# End.

If that looks better, the next change I'd look to make is replacing
the 'for y' loops with sum() calls on generators:

temp = sum(y * next(o_weight) for y in hidden)

And finally replace the entire 'for x' loops with list comps... which
makes for two sizeable one-liners, which I like and many people
detest:

    def tick(self):
        inputs = self.inputs
        h_weight = iter(self.h_weight)
        o_weight = iter(self.o_weight)
        e = math.e
        hidden = [1/(1+e**sum(-y * next(h_weight) for y in inputs))
for _ in range(hidden_num)]
        self.output = [1/(1+e**sum(-y * next(o_weight) for y in
hidden)) for _ in range(output_num)]

Up to you to decide whether you find that version more readable, or at
least sufficiently readable, and then to test performance :) But it's
shorter by quite a margin, which I personally like. Oh, and I'm
relying on you to make sure I've made the translation correctly, which
I can't confirm without a pile of input data to test it on. All I can
say is that it's syntactically correct.

ChrisA

[toc] | [prev] | [next] | [standalone]

#50984

From	pablobarhamalzas@gmail.com
Date	2013-07-20 16:24 -0700
Message-ID	<f7c0eeac-73c1-4681-93c0-3cb2ceac4678@googlegroups.com>
In reply to	#50974

Hi there Chris.
Unfortunately, using iterations was about twice as slow as the original implementation, so that's not the solution.
Thank's anyway.

[toc] | [prev] | [next] | [standalone]

#50985

From	Chris Angelico <rosuav@gmail.com>
Date	2013-07-21 09:29 +1000
Message-ID	<mailman.4935.1374362990.3114.python-list@python.org>
In reply to	#50984

On Sun, Jul 21, 2013 at 9:24 AM,  <pablobarhamalzas@gmail.com> wrote:
> Hi there Chris.
> Unfortunately, using iterations was about twice as slow as the original implementation, so that's not the solution.
> Thank's anyway.

Fascinating! Well, was worth a try anyhow. But that's a very surprising result.

ChrisA

[toc] | [prev] | [next] | [standalone]

#50987

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-07-21 05:11 +0000
Message-ID	<51eb6d96$0$29971$c3e8da3$5496439d@news.astraweb.com>
In reply to	#50974

On Sat, 20 Jul 2013 13:22:03 -0700, pablobarhamalzas asked:

"How can I make this piece of code even faster?"

- Use a faster computer.
- Put in more memory.
- If using Unix or Linux, decrease the "nice" priority of the process.

I mention these because sometimes people forget that if you have a choice 
between "spend 10 hours at $70 per hour to optimize code", and "spend 
$200 to put more memory in", putting more memory in may be more cost 
effective.

Other than that, what you describe sounds like it could be a good 
candidate for PyPy to speed the code up, although PyPy is still (mostly) 
Python 2. You could take this question to the pypy mailing list and ask 
there.

http://mail.python.org/mailman/listinfo/pypy-dev

You also might like to try Cython or Numba.

As far as pure-Python optimizations, once you have a decent algorithm, 
there's probably not a lot of room for major speed ups. But a couple of 
thoughts and a possible optimized version come to mind...

1) In general, it is better/faster to iterate over lists directly, than 
indirectly by index number:

for item in sequence:
    process(item)

rather than:

for i in range(len(sequence)):
    item = sequence[i]
    process(item)


If you need both the index and the value:

for i, item in enumerate(sequence):
    print(i, process(item))


In your specific case, if I have understood your code's logic, you can 
just iterate directly over the appropriate lists, once each.



2) You perform an exponentiation using math.e**(-temp). You will probably 
find that math.exp(-temp) is both faster and more accurate.


3) If you need to add numbers, it is better to call sum() or math.fsum() 
than add them by hand. sum() may be a tiny bit faster, or maybe not, but 
fsum() is more accurate for floats.


See below for my suggestion on an optimized version.


> Ok, I'm working on a predator/prey simulation, which evolve using
> genetic algorithms. At the moment, they use a quite simple feed-forward
> neural network, which can change size over time. Each brain "tick" is
> performed by the following function (inside the Brain class):
> 
>     def tick(self):
>         input_num = self.input_num
>         hidden_num = self.hidden_num
>         output_num = self.output_num
>          
>         hidden = [0]*hidden_num
>         output = [0]*output_num
>         
>         inputs = self.input
>         h_weight = self.h_weight
>         o_weight = self.o_weight
>         
>         e = math.e
>         
>         count = -1
>         for x in range(hidden_num):
>             temp = 0
>             for y in range(input_num):
>                 count += 1
>                 temp += inputs[y] * h_weight[count]
>             hidden[x] = 1/(1+e**(-temp))
>         
>         count = -1
>         for x in range(output_num):
>             temp = 0
>             for y in range(hidden_num):
>                 count += 1
>                 temp += hidden[y] * o_weight[count]
>             output[x] = 1/(1+e**(-temp))
>              
>         self.output = output
> 
> The function is actually quite fast (~0.040 seconds per 200 calls, using
> 10 input, 20 hidden and 3 output neurons), and used to be much slower
> untill I fiddled about with it a bit to make it faster. However, it is
> still somewhat slow for what I need it.
>  
> My question to you is if you an see any obvious (or not so obvious) way
> of making this faster. I've heard about numpy and have been reading
> about it, but I really can't see how it could be implemented here.

Here's my suggestion:


    def tick(self):
        exp = math.exp
        sum = math.fsum  # more accurate than builtin sum

        # This assumes that both inputs and h_weight have exactly
        # self.input_num values.
        temp = fsum(i*w for (i, w) in zip(self.inputs, self.h_weight))
        hidden = [1/(1+exp(-temp))]*self.hidden_num

        # This assumes that both outputs and o_weight have exactly
        # self.output_num values.
        temp = fsum(o*w for (o, w) in zip(self.outputs, self.o_weight))
        self.output = [1/(1+exp(-temp))]*self.output_num


I have neither tested that this works the same as your code (or even 
works at all!) nor that it is faster, but I would expect that it will be 
faster.

Good luck!



-- 
Steven

[toc] | [prev] | [next] | [standalone]

#50997

From	Paul Rudin <paul.nospam@rudin.co.uk>
Date	2013-07-21 08:11 +0100
Message-ID	<87fvv8v134.fsf@no-fixed-abode.cable.virginmedia.net>
In reply to	#50987

Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:

> On Sat, 20 Jul 2013 13:22:03 -0700, pablobarhamalzas asked:
>
> "How can I make this piece of code even faster?"
>
> - Use a faster computer.
> - Put in more memory.
> - If using Unix or Linux, decrease the "nice" priority of the process.
>
> I mention these because sometimes people forget that if you have a choice 
> between "spend 10 hours at $70 per hour to optimize code", and "spend 
> $200 to put more memory in", putting more memory in may be more cost 
> effective.
>

Sure - but it's helpful if programmers understand a little bit about the
computational complexity of algorithms. If it's just a question of
making each basic step of your algorithm a bit faster, then it may well
be better to spend money on better hardware than on squeezing more out
of your code. OTOH if you've got an n^2 implementation and there's
actually an n.log n solution available then you should probably re-code.

Of course if what you've got is actually adequate for your use-case then
it maybe that you don't actually need to do anything at all...

[toc] | [prev] | [next] | [standalone]

#51000

From	Chris Angelico <rosuav@gmail.com>
Date	2013-07-21 19:21 +1000
Message-ID	<mailman.4946.1374398521.3114.python-list@python.org>
In reply to	#50997

On Sun, Jul 21, 2013 at 5:11 PM, Paul Rudin <paul.nospam@rudin.co.uk> wrote:
> Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:
>
>> On Sat, 20 Jul 2013 13:22:03 -0700, pablobarhamalzas asked:
>>
>> "How can I make this piece of code even faster?"
>>
>> - Use a faster computer.
>> - Put in more memory.
>> - If using Unix or Linux, decrease the "nice" priority of the process.
>>
>> I mention these because sometimes people forget that if you have a choice
>> between "spend 10 hours at $70 per hour to optimize code", and "spend
>> $200 to put more memory in", putting more memory in may be more cost
>> effective.
>>
>
> Sure - but it's helpful if programmers understand a little bit about the
> computational complexity of algorithms. If it's just a question of
> making each basic step of your algorithm a bit faster, then it may well
> be better to spend money on better hardware than on squeezing more out
> of your code. OTOH if you've got an n^2 implementation and there's
> actually an n.log n solution available then you should probably re-code.

I haven't analyzed every suggestion in this thread in detail, but I
don't think any of them affects the algorithmic complexity of the
code. They're all incremental changes.

ChrisA

[toc] | [prev] | [next] | [standalone]

#50995

From	Peter Otten <__peter__@web.de>
Date	2013-07-21 09:10 +0200
Message-ID	<mailman.4943.1374390609.3114.python-list@python.org>
In reply to	#50974

pablobarhamalzas@gmail.com wrote:

> Ok, I'm working on a predator/prey simulation, which evolve using genetic
> algorithms. At the moment, they use a quite simple feed-forward neural
> network, which can change size over time. Each brain "tick" is performed
> by the following function (inside the Brain class):
> 
>     def tick(self):
>         input_num = self.input_num
>         hidden_num = self.hidden_num
>         output_num = self.output_num
>          
>         hidden = [0]*hidden_num
>         output = [0]*output_num
>         
>         inputs = self.input
>         h_weight = self.h_weight
>         o_weight = self.o_weight
>         
>         e = math.e
>         
>         count = -1
>         for x in range(hidden_num):
>             temp = 0
>             for y in range(input_num):
>                 count += 1
>                 temp += inputs[y] * h_weight[count]
>             hidden[x] = 1/(1+e**(-temp))
>         
>         count = -1
>         for x in range(output_num):
>             temp = 0
>             for y in range(hidden_num):
>                 count += 1
>                 temp += hidden[y] * o_weight[count]
>             output[x] = 1/(1+e**(-temp))
>              
>         self.output = output
> 
> The function is actually quite fast (~0.040 seconds per 200 calls, using
> 10 input, 20 hidden and 3 output neurons), and used to be much slower
> untill I fiddled about with it a bit to make it faster. However, it is
> still somewhat slow for what I need it.
>  
> My question to you is if you an see any obvious (or not so obvious) way of
> making this faster. I've heard about numpy and have been reading about it,
> but I really can't see how it could be implemented here.
> 
> Cheers!

Assuming every list is replaced with a numpy.array,

h_weight.shape == (hidden_num, input_num)
o_weight.shape == (output_num, hidden_num)

and as untested as it gets:

    def tick(self):
        temp = numpy.dot(self.inputs, self.h_weight)
        hidden = 1/(1+numpy.exp(-temp))  
        
        temp = numpy.dot(hidden, self.o_weight)
        self.output = 1/(1+numpy.exp(-temp))  

My prediction: this is probably wrong, but if you can fix the code it will 
be stinkin' fast ;)

[toc] | [prev] | [next] | [standalone]

#50996

From	Serhiy Storchaka <storchaka@gmail.com>
Date	2013-07-21 10:11 +0300
Message-ID	<mailman.4944.1374390704.3114.python-list@python.org>
In reply to	#50974

20.07.13 23:22, pablobarhamalzas@gmail.com написав(ла):
>          e = math.e
>
>          count = -1
>          for x in range(hidden_num):
>              temp = 0
>              for y in range(input_num):
>                  count += 1
>                  temp += inputs[y] * h_weight[count]
>              hidden[x] = 1/(1+e**(-temp))
[...]
> My question to you is if you an see any obvious (or not so obvious) way of making this faster.

1. Use math.exp() instead of math.e**.

2. I'm not sure that it will be faster, but try to use sum().

   temp = sum(inputs[y] * h_weight[count + y] for y in range(input_num))
   count += input_num

or

   temp = sum(map(operator.mul, inputs, h_weight[count:count+input_num]))
   count += input_num

[toc] | [prev] | [next] | [standalone]

#50998

From	Christian Gollwitzer <auriocus@gmx.de>
Date	2013-07-21 09:24 +0200
Message-ID	<ksg1vp$am2$1@dont-email.me>
In reply to	#50974

How about using numpy?

Am 20.07.13 22:22, schrieb pablobarhamalzas@gmail.com:
> Ok, I'm working on a predator/prey simulation, which evolve using genetic algorithms. At the moment, they use a quite simple feed-forward neural network, which can change size over time. Each brain "tick" is performed by the following function (inside the Brain class):
>

>          count = -1
>          for x in range(hidden_num):
>              temp = 0
>              for y in range(input_num):
>                  count += 1
>                  temp += inputs[y] * h_weight[count]
>              hidden[x] = 1/(1+e**(-temp))

I don't really understand this loop, but it looks to me like a 
matrix-vector multiplication of the matrix of weights (indexed with a 
single continous index) with the inputs.

Given that you reshape the weights() array correctly into a 
hidden_num-by-input_num array, this would result in

import numpy as np
hidden = 1.0/(1.0 + np.exp(-np.dot(h_weights, inputs)))

The matrix-vector product is then executed by a compiled loop. You can 
reshape your existing lists into that form by

inputs = np.array(inputs)
h_weight = np.reshape(h_weight, (hidden_num, input_num))

... but of course you should use this only to check whether you get the 
correct result. You don't want to do that in the loop, instead, store 
the weights always in matrix form.

	Christian

[toc] | [prev] | [next] | [standalone]

#51001

From	pablobarhamalzas@gmail.com
Date	2013-07-21 03:19 -0700
Message-ID	<9a207133-1f52-414a-bbbd-581bcee8dc93@googlegroups.com>
In reply to	#50974

Thank's for all the replies! I've tried some of the imporovements you suggested (using math.exp() and sum() or math.fsum()). 
None of that made the code faster, because they are functions you are calling lots of times, and function calling is quite time expensive (same as x**(1/2) is faster than math.sqrt(x)).

I'm going to try to convert tu numpy now (I have no idea how to do it at the moment), thank's again to everyone.

[toc] | [prev] | [next] | [standalone]

#51002

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-07-21 10:31 +0000
Message-ID	<51ebb88e$0$29971$c3e8da3$5496439d@news.astraweb.com>
In reply to	#51001

On Sun, 21 Jul 2013 03:19:24 -0700, pablobarhamalzas wrote:

> Thank's for all the replies! I've tried some of the imporovements you
> suggested (using math.exp() and sum() or math.fsum()). None of that made
> the code faster, because they are functions you are calling lots of
> times, and function calling is quite time expensive (same as x**(1/2) is
> faster than math.sqrt(x)).

You are *badly* mistaken. Not only is sqrt more accurate, but it is also 
much faster.


[steve@ando ~]$ python3.3 -m timeit -s "x = 2.357e7" "x**0.5"
1000000 loops, best of 3: 0.319 usec per loop
[steve@ando ~]$ python3.3 -m timeit -s "x = 2.357e7" -s "from math import 
sqrt" "sqrt(x)"
10000000 loops, best of 3: 0.172 usec per loop


How exactly are you timing the code?



-- 
Steven

[toc] | [prev] | [next] | [standalone]

#51003

From	pablobarhamalzas@gmail.com
Date	2013-07-21 03:48 -0700
Message-ID	<2ff99afd-b734-417f-bbc2-65cf8cc0858e@googlegroups.com>
In reply to	#51002

El domingo, 21 de julio de 2013 12:31:42 UTC+2, Steven D'Aprano  escribió:
> On Sun, 21 Jul 2013 03:19:24 -0700, pablobarhamalzas wrote:
> 
> 
> 
> > Thank's for all the replies! I've tried some of the imporovements you
> 
> > suggested (using math.exp() and sum() or math.fsum()). None of that made
> 
> > the code faster, because they are functions you are calling lots of
> 
> > times, and function calling is quite time expensive (same as x**(1/2) is
> 
> > faster than math.sqrt(x)).
> 
> 
> 
> You are *badly* mistaken. Not only is sqrt more accurate, but it is also 
> 
> much faster.
> 
> 
> 
> 
> 
> [steve@ando ~]$ python3.3 -m timeit -s "x = 2.357e7" "x**0.5"
> 
> 1000000 loops, best of 3: 0.319 usec per loop
> 
> [steve@ando ~]$ python3.3 -m timeit -s "x = 2.357e7" -s "from math import 
> 
> sqrt" "sqrt(x)"
> 
> 10000000 loops, best of 3: 0.172 usec per loop
> 
> 
> 
> 
> 
> How exactly are you timing the code?

I'm timing the whole program with cProfile. Removing math.sqrt() from a function and using **(1/2) instead cut the execution time for a significant amount (~0.035 to ~0.020). I can't see another explanation for the speed increase...

[toc] | [prev] | [next] | [standalone]

#51010

From	Stefan Behnel <stefan_ml@behnel.de>
Date	2013-07-21 14:49 +0200
Message-ID	<mailman.4952.1374410998.3114.python-list@python.org>
In reply to	#51003

pablobarhamalzas@gmail.com, 21.07.2013 12:48:
> El domingo, 21 de julio de 2013 12:31:42 UTC+2, Steven D'Aprano  escribió:
>> [steve@ando ~]$ python3.3 -m timeit -s "x = 2.357e7" "x**0.5"
>> 1000000 loops, best of 3: 0.319 usec per loop
>> [steve@ando ~]$ python3.3 -m timeit -s "x = 2.357e7" -s "from math import 
>> sqrt" "sqrt(x)"
>> 10000000 loops, best of 3: 0.172 usec per loop
>>
>> How exactly are you timing the code?
> 
> I'm timing the whole program with cProfile. Removing math.sqrt() from a function and using **(1/2) instead cut the execution time for a significant amount (~0.035 to ~0.020).

With or without the profiler running? Note that profiling will slow down
your code (especially function calls), often significantly and sometimes
even in such an unbalanced way that it visibly changes its execution
profile. Always make sure you validate your code changes with benchmarks,
outside of the profiler.

Stefan

[toc] | [prev] | [next] | [standalone]

#51004

From	Chris Angelico <rosuav@gmail.com>
Date	2013-07-21 20:48 +1000
Message-ID	<mailman.4947.1374403735.3114.python-list@python.org>
In reply to	#51002

On Sun, Jul 21, 2013 at 8:31 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> On Sun, 21 Jul 2013 03:19:24 -0700, pablobarhamalzas wrote:
>
>> Thank's for all the replies! I've tried some of the imporovements you
>> suggested (using math.exp() and sum() or math.fsum()). None of that made
>> the code faster, because they are functions you are calling lots of
>> times, and function calling is quite time expensive (same as x**(1/2) is
>> faster than math.sqrt(x)).
>
> You are *badly* mistaken. Not only is sqrt more accurate, but it is also
> much faster.
>
>
> [steve@ando ~]$ python3.3 -m timeit -s "x = 2.357e7" "x**0.5"
> 1000000 loops, best of 3: 0.319 usec per loop
> [steve@ando ~]$ python3.3 -m timeit -s "x = 2.357e7" -s "from math import
> sqrt" "sqrt(x)"
> 10000000 loops, best of 3: 0.172 usec per loop

Don't forget the cost of attribute lookup, which adds 50% to the
sqrt() figure. Still faster than exponentiation. (Figures from Python
3.4 alpha, but unlikely to be materially different.)

rosuav@sikorsky:~$ python3 -m timeit -s "x = 2.357e7" "x**0.5"
1000000 loops, best of 3: 0.239 usec per loop
rosuav@sikorsky:~$ python3 -m timeit -s "x = 2.357e7" -s "from math
import sqrt" "sqrt(x)"
10000000 loops, best of 3: 0.102 usec per loop
rosuav@sikorsky:~$ python3 -m timeit -s "x = 2.357e7" -s "import math"
"math.sqrt(x)"
10000000 loops, best of 3: 0.155 usec per loop

ChrisA

[toc] | [prev] | [next] | [standalone]

#51015

From	Michael Torrie <torriem@gmail.com>
Date	2013-07-21 09:27 -0600
Message-ID	<mailman.4957.1374421840.3114.python-list@python.org>
In reply to	#51001

On 07/21/2013 04:19 AM, pablobarhamalzas@gmail.com wrote:
> Thank's for all the replies! I've tried some of the imporovements you suggested (using math.exp() and sum() or math.fsum()). 
> None of that made the code faster, because they are functions you are calling lots of times, and function calling is quite time expensive (same as x**(1/2) is faster than math.sqrt(x)).
> 
> I'm going to try to convert tu numpy now (I have no idea how to do it at the moment), thank's again to everyone.

Perhaps you'd have better results if you'd post a runnable piece of
code.  Otherwise we're just guessing since no one has the ability to
actually run your code.

[toc] | [prev] | [next] | [standalone]

#51008

From	Joshua Landau <joshua@landau.ws>
Date	2013-07-21 12:39 +0100
Message-ID	<mailman.4950.1374406833.3114.python-list@python.org>
In reply to	#50974

On 20 July 2013 21:22,  <pablobarhamalzas@gmail.com> wrote:
> Ok, I'm working on a predator/prey simulation, which evolve using genetic algorithms. At the moment, they use a quite simple feed-forward neural network, which can change size over time. Each brain "tick" is performed by the following function (inside the Brain class):
>
<CODE>
>
> The function is actually quite fast (~0.040 seconds per 200 calls, using 10 input, 20 hidden and 3 output neurons), and used to be much slower untill I fiddled about with it a bit to make it faster. However, it is still somewhat slow for what I need it.
>
> My question to you is if you an see any obvious (or not so obvious) way of making this faster. I've heard about numpy and have been reading about it, but I really can't see how it could be implemented here.

Currently we're just guessing; if you gave us an appropriate stand-in
for "self" (so that we can call the function) we could be helpful much
more easily.

[toc] | [prev] | [standalone]

csiph-web

How can I make this piece of code even faster?

Contents

#50974 — How can I make this piece of code even faster?

#50975

#50976

#50982

#50983

#50984

#50985

#50987

#50997

#51000

#50995

#50996

#50998

#51001

#51002

#51003

#51010

#51004

#51015

#51008