Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #50974 > unrolled thread
| Started by | pablobarhamalzas@gmail.com |
|---|---|
| First post | 2013-07-20 13:22 -0700 |
| Last post | 2013-07-21 12:39 +0100 |
| Articles | 20 — 12 participants |
Back to article view | Back to comp.lang.python
How can I make this piece of code even faster? pablobarhamalzas@gmail.com - 2013-07-20 13:22 -0700
Re: How can I make this piece of code even faster? Fabio Zadrozny <fabiofz@gmail.com> - 2013-07-20 18:05 -0300
Re: How can I make this piece of code even faster? Roy Smith <roy@panix.com> - 2013-07-20 17:25 -0400
Re: How can I make this piece of code even faster? pablobarhamalzas@gmail.com - 2013-07-20 15:45 -0700
Re: How can I make this piece of code even faster? Chris Angelico <rosuav@gmail.com> - 2013-07-21 08:55 +1000
Re: How can I make this piece of code even faster? pablobarhamalzas@gmail.com - 2013-07-20 16:24 -0700
Re: How can I make this piece of code even faster? Chris Angelico <rosuav@gmail.com> - 2013-07-21 09:29 +1000
Re: How can I make this piece of code even faster? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-21 05:11 +0000
Re: How can I make this piece of code even faster? Paul Rudin <paul.nospam@rudin.co.uk> - 2013-07-21 08:11 +0100
Re: How can I make this piece of code even faster? Chris Angelico <rosuav@gmail.com> - 2013-07-21 19:21 +1000
Re: How can I make this piece of code even faster? Peter Otten <__peter__@web.de> - 2013-07-21 09:10 +0200
Re: How can I make this piece of code even faster? Serhiy Storchaka <storchaka@gmail.com> - 2013-07-21 10:11 +0300
Re: How can I make this piece of code even faster? Christian Gollwitzer <auriocus@gmx.de> - 2013-07-21 09:24 +0200
Re: How can I make this piece of code even faster? pablobarhamalzas@gmail.com - 2013-07-21 03:19 -0700
Re: How can I make this piece of code even faster? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-21 10:31 +0000
Re: How can I make this piece of code even faster? pablobarhamalzas@gmail.com - 2013-07-21 03:48 -0700
Re: How can I make this piece of code even faster? Stefan Behnel <stefan_ml@behnel.de> - 2013-07-21 14:49 +0200
Re: How can I make this piece of code even faster? Chris Angelico <rosuav@gmail.com> - 2013-07-21 20:48 +1000
Re: How can I make this piece of code even faster? Michael Torrie <torriem@gmail.com> - 2013-07-21 09:27 -0600
Re: How can I make this piece of code even faster? Joshua Landau <joshua@landau.ws> - 2013-07-21 12:39 +0100
| From | pablobarhamalzas@gmail.com |
|---|---|
| Date | 2013-07-20 13:22 -0700 |
| Subject | How can I make this piece of code even faster? |
| Message-ID | <6bf4d298-b425-4357-9c1a-192e6e6cd9f0@googlegroups.com> |
Ok, I'm working on a predator/prey simulation, which evolve using genetic algorithms. At the moment, they use a quite simple feed-forward neural network, which can change size over time. Each brain "tick" is performed by the following function (inside the Brain class):
def tick(self):
input_num = self.input_num
hidden_num = self.hidden_num
output_num = self.output_num
hidden = [0]*hidden_num
output = [0]*output_num
inputs = self.input
h_weight = self.h_weight
o_weight = self.o_weight
e = math.e
count = -1
for x in range(hidden_num):
temp = 0
for y in range(input_num):
count += 1
temp += inputs[y] * h_weight[count]
hidden[x] = 1/(1+e**(-temp))
count = -1
for x in range(output_num):
temp = 0
for y in range(hidden_num):
count += 1
temp += hidden[y] * o_weight[count]
output[x] = 1/(1+e**(-temp))
self.output = output
The function is actually quite fast (~0.040 seconds per 200 calls, using 10 input, 20 hidden and 3 output neurons), and used to be much slower untill I fiddled about with it a bit to make it faster. However, it is still somewhat slow for what I need it.
My question to you is if you an see any obvious (or not so obvious) way of making this faster. I've heard about numpy and have been reading about it, but I really can't see how it could be implemented here.
Cheers!
[toc] | [next] | [standalone]
| From | Fabio Zadrozny <fabiofz@gmail.com> |
|---|---|
| Date | 2013-07-20 18:05 -0300 |
| Message-ID | <mailman.4931.1374354373.3114.python-list@python.org> |
| In reply to | #50974 |
[Multipart message — attachments visible in raw view] — view raw
On Sat, Jul 20, 2013 at 5:22 PM, <pablobarhamalzas@gmail.com> wrote: > Ok, I'm working on a predator/prey simulation, which evolve using genetic > algorithms. At the moment, they use a quite simple feed-forward neural > network, which can change size over time. Each brain "tick" is performed by > the following function (inside the Brain class): > > def tick(self): > input_num = self.input_num > hidden_num = self.hidden_num > output_num = self.output_num > > hidden = [0]*hidden_num > output = [0]*output_num > > inputs = self.input > h_weight = self.h_weight > o_weight = self.o_weight > > e = math.e > > count = -1 > for x in range(hidden_num): > temp = 0 > for y in range(input_num): > count += 1 > temp += inputs[y] * h_weight[count] > hidden[x] = 1/(1+e**(-temp)) > > count = -1 > for x in range(output_num): > temp = 0 > for y in range(hidden_num): > count += 1 > temp += hidden[y] * o_weight[count] > output[x] = 1/(1+e**(-temp)) > > self.output = output > > The function is actually quite fast (~0.040 seconds per 200 calls, using > 10 input, 20 hidden and 3 output neurons), and used to be much slower > untill I fiddled about with it a bit to make it faster. However, it is > still somewhat slow for what I need it. > > My question to you is if you an see any obvious (or not so obvious) way of > making this faster. I've heard about numpy and have been reading about it, > but I really can't see how it could be implemented here. > > Cheers! > -- > http://mail.python.org/mailman/listinfo/python-list > Low level optimizations: If you're in Python 2.x (and not 3), you should use xrange() instead of range(), or maybe even create a local variable and increment it and check its value within a while (that way you can save a few instructions on method invocations from xrange/range). Anyways, if that's not fast enough, just port it to c/c++ (or one of the alternatives to speed it up while still in python: numba, cython, shedskin). Or (if you can), try to use PyPy and see if you get more speed without doing anything. Cheers, Fabio
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2013-07-20 17:25 -0400 |
| Message-ID | <roy-B4482C.17255220072013@70-1-84-166.pools.spcsdns.net> |
| In reply to | #50974 |
In article <6bf4d298-b425-4357-9c1a-192e6e6cd9f0@googlegroups.com>, pablobarhamalzas@gmail.com wrote: > Ok, I'm working on a predator/prey simulation, which evolve using genetic > algorithms. At the moment, they use a quite simple feed-forward neural > network, which can change size over time. Each brain "tick" is performed by > the following function (inside the Brain class): > > def tick(self): > input_num = self.input_num > hidden_num = self.hidden_num > output_num = self.output_num > > hidden = [0]*hidden_num > output = [0]*output_num > > inputs = self.input > h_weight = self.h_weight > o_weight = self.o_weight > > e = math.e > > count = -1 > for x in range(hidden_num): > temp = 0 > for y in range(input_num): > count += 1 > temp += inputs[y] * h_weight[count] > hidden[x] = 1/(1+e**(-temp)) > > count = -1 > for x in range(output_num): > temp = 0 > for y in range(hidden_num): > count += 1 > temp += hidden[y] * o_weight[count] > output[x] = 1/(1+e**(-temp)) > > self.output = output > > The function is actually quite fast (~0.040 seconds per 200 calls, using 10 > input, 20 hidden and 3 output neurons), and used to be much slower untill I > fiddled about with it a bit to make it faster. However, it is still somewhat > slow for what I need it. > > My question to you is if you an see any obvious (or not so obvious) way of > making this faster. I've heard about numpy and have been reading about it, > but I really can't see how it could be implemented here. First thing, I would add some instrumentation to see where the most time is being spent. My guess is in the first set of nested loops, where the inner loop gets executed hidden_num * input_num (i.e. 10 * 20 = 200) times. But timing data is better than my guess. Assuming I'm right, though, you do compute range(input_num) 20 times. You don't need to do that. You might try xrange(), or you might just factor out creating the list outside the outer loop. But, none of that seems like it should make much difference. What possible values can temp take? If it can only take certain discrete values and you can enumerate them beforehand, you might want to build a dict mapping temp -> 1/(1+e**(-temp)) and then all that math becomes just a table lookup.
[toc] | [prev] | [next] | [standalone]
| From | pablobarhamalzas@gmail.com |
|---|---|
| Date | 2013-07-20 15:45 -0700 |
| Message-ID | <49c89529-2e93-474a-9d1f-1ee2288085c1@googlegroups.com> |
| In reply to | #50974 |
Hi there. I'm using python 3, where xrange doesn't exist any more (range is now equivalent). And "temp" doesn't have any fixed discrete values it always takes. I have tried cython but it doesn't seem to work well (maybe using it wrong?). Any other ideas?
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-07-21 08:55 +1000 |
| Message-ID | <mailman.4934.1374360916.3114.python-list@python.org> |
| In reply to | #50974 |
On Sun, Jul 21, 2013 at 6:22 AM, <pablobarhamalzas@gmail.com> wrote:
> temp = 0
> for y in range(input_num):
> count += 1
> temp += inputs[y] * h_weight[count]
> hidden[x] = 1/(1+e**(-temp))
It's a micro-optimization that'll probably have negligible effect, but
it can't hurt: Instead of adding to temp and raising e to -temp, carry
the value of temp as a negative number:
temp -= inputs[y] * h_weight[count]
hidden[x] = 1/(1+e**temp)
Ditto in the second loop.
Not sure which way performance would go, but would it be more readable
to take an iterator for h_weight and o_weight? Something like this:
# Slot this into your existing structure
inputs = self.input
h_weight = iter(self.h_weight)
o_weight = iter(self.o_weight)
e = math.e
for x in range(hidden_num):
temp = 0
for y in inputs:
temp += y * next(h_weight)
hidden[x] = 1/(1+e**(-temp))
for x in range(output_num):
temp = 0
for y in hidden:
temp += y * next(o_weight)
output[x] = 1/(1+e**(-temp))
# End.
If that looks better, the next change I'd look to make is replacing
the 'for y' loops with sum() calls on generators:
temp = sum(y * next(o_weight) for y in hidden)
And finally replace the entire 'for x' loops with list comps... which
makes for two sizeable one-liners, which I like and many people
detest:
def tick(self):
inputs = self.inputs
h_weight = iter(self.h_weight)
o_weight = iter(self.o_weight)
e = math.e
hidden = [1/(1+e**sum(-y * next(h_weight) for y in inputs))
for _ in range(hidden_num)]
self.output = [1/(1+e**sum(-y * next(o_weight) for y in
hidden)) for _ in range(output_num)]
Up to you to decide whether you find that version more readable, or at
least sufficiently readable, and then to test performance :) But it's
shorter by quite a margin, which I personally like. Oh, and I'm
relying on you to make sure I've made the translation correctly, which
I can't confirm without a pile of input data to test it on. All I can
say is that it's syntactically correct.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | pablobarhamalzas@gmail.com |
|---|---|
| Date | 2013-07-20 16:24 -0700 |
| Message-ID | <f7c0eeac-73c1-4681-93c0-3cb2ceac4678@googlegroups.com> |
| In reply to | #50974 |
Hi there Chris. Unfortunately, using iterations was about twice as slow as the original implementation, so that's not the solution. Thank's anyway.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-07-21 09:29 +1000 |
| Message-ID | <mailman.4935.1374362990.3114.python-list@python.org> |
| In reply to | #50984 |
On Sun, Jul 21, 2013 at 9:24 AM, <pablobarhamalzas@gmail.com> wrote: > Hi there Chris. > Unfortunately, using iterations was about twice as slow as the original implementation, so that's not the solution. > Thank's anyway. Fascinating! Well, was worth a try anyhow. But that's a very surprising result. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-07-21 05:11 +0000 |
| Message-ID | <51eb6d96$0$29971$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #50974 |
On Sat, 20 Jul 2013 13:22:03 -0700, pablobarhamalzas asked:
"How can I make this piece of code even faster?"
- Use a faster computer.
- Put in more memory.
- If using Unix or Linux, decrease the "nice" priority of the process.
I mention these because sometimes people forget that if you have a choice
between "spend 10 hours at $70 per hour to optimize code", and "spend
$200 to put more memory in", putting more memory in may be more cost
effective.
Other than that, what you describe sounds like it could be a good
candidate for PyPy to speed the code up, although PyPy is still (mostly)
Python 2. You could take this question to the pypy mailing list and ask
there.
http://mail.python.org/mailman/listinfo/pypy-dev
You also might like to try Cython or Numba.
As far as pure-Python optimizations, once you have a decent algorithm,
there's probably not a lot of room for major speed ups. But a couple of
thoughts and a possible optimized version come to mind...
1) In general, it is better/faster to iterate over lists directly, than
indirectly by index number:
for item in sequence:
process(item)
rather than:
for i in range(len(sequence)):
item = sequence[i]
process(item)
If you need both the index and the value:
for i, item in enumerate(sequence):
print(i, process(item))
In your specific case, if I have understood your code's logic, you can
just iterate directly over the appropriate lists, once each.
2) You perform an exponentiation using math.e**(-temp). You will probably
find that math.exp(-temp) is both faster and more accurate.
3) If you need to add numbers, it is better to call sum() or math.fsum()
than add them by hand. sum() may be a tiny bit faster, or maybe not, but
fsum() is more accurate for floats.
See below for my suggestion on an optimized version.
> Ok, I'm working on a predator/prey simulation, which evolve using
> genetic algorithms. At the moment, they use a quite simple feed-forward
> neural network, which can change size over time. Each brain "tick" is
> performed by the following function (inside the Brain class):
>
> def tick(self):
> input_num = self.input_num
> hidden_num = self.hidden_num
> output_num = self.output_num
>
> hidden = [0]*hidden_num
> output = [0]*output_num
>
> inputs = self.input
> h_weight = self.h_weight
> o_weight = self.o_weight
>
> e = math.e
>
> count = -1
> for x in range(hidden_num):
> temp = 0
> for y in range(input_num):
> count += 1
> temp += inputs[y] * h_weight[count]
> hidden[x] = 1/(1+e**(-temp))
>
> count = -1
> for x in range(output_num):
> temp = 0
> for y in range(hidden_num):
> count += 1
> temp += hidden[y] * o_weight[count]
> output[x] = 1/(1+e**(-temp))
>
> self.output = output
>
> The function is actually quite fast (~0.040 seconds per 200 calls, using
> 10 input, 20 hidden and 3 output neurons), and used to be much slower
> untill I fiddled about with it a bit to make it faster. However, it is
> still somewhat slow for what I need it.
>
> My question to you is if you an see any obvious (or not so obvious) way
> of making this faster. I've heard about numpy and have been reading
> about it, but I really can't see how it could be implemented here.
Here's my suggestion:
def tick(self):
exp = math.exp
sum = math.fsum # more accurate than builtin sum
# This assumes that both inputs and h_weight have exactly
# self.input_num values.
temp = fsum(i*w for (i, w) in zip(self.inputs, self.h_weight))
hidden = [1/(1+exp(-temp))]*self.hidden_num
# This assumes that both outputs and o_weight have exactly
# self.output_num values.
temp = fsum(o*w for (o, w) in zip(self.outputs, self.o_weight))
self.output = [1/(1+exp(-temp))]*self.output_num
I have neither tested that this works the same as your code (or even
works at all!) nor that it is faster, but I would expect that it will be
faster.
Good luck!
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Paul Rudin <paul.nospam@rudin.co.uk> |
|---|---|
| Date | 2013-07-21 08:11 +0100 |
| Message-ID | <87fvv8v134.fsf@no-fixed-abode.cable.virginmedia.net> |
| In reply to | #50987 |
Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes: > On Sat, 20 Jul 2013 13:22:03 -0700, pablobarhamalzas asked: > > "How can I make this piece of code even faster?" > > - Use a faster computer. > - Put in more memory. > - If using Unix or Linux, decrease the "nice" priority of the process. > > I mention these because sometimes people forget that if you have a choice > between "spend 10 hours at $70 per hour to optimize code", and "spend > $200 to put more memory in", putting more memory in may be more cost > effective. > Sure - but it's helpful if programmers understand a little bit about the computational complexity of algorithms. If it's just a question of making each basic step of your algorithm a bit faster, then it may well be better to spend money on better hardware than on squeezing more out of your code. OTOH if you've got an n^2 implementation and there's actually an n.log n solution available then you should probably re-code. Of course if what you've got is actually adequate for your use-case then it maybe that you don't actually need to do anything at all...
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-07-21 19:21 +1000 |
| Message-ID | <mailman.4946.1374398521.3114.python-list@python.org> |
| In reply to | #50997 |
On Sun, Jul 21, 2013 at 5:11 PM, Paul Rudin <paul.nospam@rudin.co.uk> wrote: > Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes: > >> On Sat, 20 Jul 2013 13:22:03 -0700, pablobarhamalzas asked: >> >> "How can I make this piece of code even faster?" >> >> - Use a faster computer. >> - Put in more memory. >> - If using Unix or Linux, decrease the "nice" priority of the process. >> >> I mention these because sometimes people forget that if you have a choice >> between "spend 10 hours at $70 per hour to optimize code", and "spend >> $200 to put more memory in", putting more memory in may be more cost >> effective. >> > > Sure - but it's helpful if programmers understand a little bit about the > computational complexity of algorithms. If it's just a question of > making each basic step of your algorithm a bit faster, then it may well > be better to spend money on better hardware than on squeezing more out > of your code. OTOH if you've got an n^2 implementation and there's > actually an n.log n solution available then you should probably re-code. I haven't analyzed every suggestion in this thread in detail, but I don't think any of them affects the algorithmic complexity of the code. They're all incremental changes. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2013-07-21 09:10 +0200 |
| Message-ID | <mailman.4943.1374390609.3114.python-list@python.org> |
| In reply to | #50974 |
pablobarhamalzas@gmail.com wrote:
> Ok, I'm working on a predator/prey simulation, which evolve using genetic
> algorithms. At the moment, they use a quite simple feed-forward neural
> network, which can change size over time. Each brain "tick" is performed
> by the following function (inside the Brain class):
>
> def tick(self):
> input_num = self.input_num
> hidden_num = self.hidden_num
> output_num = self.output_num
>
> hidden = [0]*hidden_num
> output = [0]*output_num
>
> inputs = self.input
> h_weight = self.h_weight
> o_weight = self.o_weight
>
> e = math.e
>
> count = -1
> for x in range(hidden_num):
> temp = 0
> for y in range(input_num):
> count += 1
> temp += inputs[y] * h_weight[count]
> hidden[x] = 1/(1+e**(-temp))
>
> count = -1
> for x in range(output_num):
> temp = 0
> for y in range(hidden_num):
> count += 1
> temp += hidden[y] * o_weight[count]
> output[x] = 1/(1+e**(-temp))
>
> self.output = output
>
> The function is actually quite fast (~0.040 seconds per 200 calls, using
> 10 input, 20 hidden and 3 output neurons), and used to be much slower
> untill I fiddled about with it a bit to make it faster. However, it is
> still somewhat slow for what I need it.
>
> My question to you is if you an see any obvious (or not so obvious) way of
> making this faster. I've heard about numpy and have been reading about it,
> but I really can't see how it could be implemented here.
>
> Cheers!
Assuming every list is replaced with a numpy.array,
h_weight.shape == (hidden_num, input_num)
o_weight.shape == (output_num, hidden_num)
and as untested as it gets:
def tick(self):
temp = numpy.dot(self.inputs, self.h_weight)
hidden = 1/(1+numpy.exp(-temp))
temp = numpy.dot(hidden, self.o_weight)
self.output = 1/(1+numpy.exp(-temp))
My prediction: this is probably wrong, but if you can fix the code it will
be stinkin' fast ;)
[toc] | [prev] | [next] | [standalone]
| From | Serhiy Storchaka <storchaka@gmail.com> |
|---|---|
| Date | 2013-07-21 10:11 +0300 |
| Message-ID | <mailman.4944.1374390704.3114.python-list@python.org> |
| In reply to | #50974 |
20.07.13 23:22, pablobarhamalzas@gmail.com написав(ла): > e = math.e > > count = -1 > for x in range(hidden_num): > temp = 0 > for y in range(input_num): > count += 1 > temp += inputs[y] * h_weight[count] > hidden[x] = 1/(1+e**(-temp)) [...] > My question to you is if you an see any obvious (or not so obvious) way of making this faster. 1. Use math.exp() instead of math.e**. 2. I'm not sure that it will be faster, but try to use sum(). temp = sum(inputs[y] * h_weight[count + y] for y in range(input_num)) count += input_num or temp = sum(map(operator.mul, inputs, h_weight[count:count+input_num])) count += input_num
[toc] | [prev] | [next] | [standalone]
| From | Christian Gollwitzer <auriocus@gmx.de> |
|---|---|
| Date | 2013-07-21 09:24 +0200 |
| Message-ID | <ksg1vp$am2$1@dont-email.me> |
| In reply to | #50974 |
How about using numpy? Am 20.07.13 22:22, schrieb pablobarhamalzas@gmail.com: > Ok, I'm working on a predator/prey simulation, which evolve using genetic algorithms. At the moment, they use a quite simple feed-forward neural network, which can change size over time. Each brain "tick" is performed by the following function (inside the Brain class): > > count = -1 > for x in range(hidden_num): > temp = 0 > for y in range(input_num): > count += 1 > temp += inputs[y] * h_weight[count] > hidden[x] = 1/(1+e**(-temp)) I don't really understand this loop, but it looks to me like a matrix-vector multiplication of the matrix of weights (indexed with a single continous index) with the inputs. Given that you reshape the weights() array correctly into a hidden_num-by-input_num array, this would result in import numpy as np hidden = 1.0/(1.0 + np.exp(-np.dot(h_weights, inputs))) The matrix-vector product is then executed by a compiled loop. You can reshape your existing lists into that form by inputs = np.array(inputs) h_weight = np.reshape(h_weight, (hidden_num, input_num)) ... but of course you should use this only to check whether you get the correct result. You don't want to do that in the loop, instead, store the weights always in matrix form. Christian
[toc] | [prev] | [next] | [standalone]
| From | pablobarhamalzas@gmail.com |
|---|---|
| Date | 2013-07-21 03:19 -0700 |
| Message-ID | <9a207133-1f52-414a-bbbd-581bcee8dc93@googlegroups.com> |
| In reply to | #50974 |
Thank's for all the replies! I've tried some of the imporovements you suggested (using math.exp() and sum() or math.fsum()). None of that made the code faster, because they are functions you are calling lots of times, and function calling is quite time expensive (same as x**(1/2) is faster than math.sqrt(x)). I'm going to try to convert tu numpy now (I have no idea how to do it at the moment), thank's again to everyone.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-07-21 10:31 +0000 |
| Message-ID | <51ebb88e$0$29971$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #51001 |
On Sun, 21 Jul 2013 03:19:24 -0700, pablobarhamalzas wrote: > Thank's for all the replies! I've tried some of the imporovements you > suggested (using math.exp() and sum() or math.fsum()). None of that made > the code faster, because they are functions you are calling lots of > times, and function calling is quite time expensive (same as x**(1/2) is > faster than math.sqrt(x)). You are *badly* mistaken. Not only is sqrt more accurate, but it is also much faster. [steve@ando ~]$ python3.3 -m timeit -s "x = 2.357e7" "x**0.5" 1000000 loops, best of 3: 0.319 usec per loop [steve@ando ~]$ python3.3 -m timeit -s "x = 2.357e7" -s "from math import sqrt" "sqrt(x)" 10000000 loops, best of 3: 0.172 usec per loop How exactly are you timing the code? -- Steven
[toc] | [prev] | [next] | [standalone]
| From | pablobarhamalzas@gmail.com |
|---|---|
| Date | 2013-07-21 03:48 -0700 |
| Message-ID | <2ff99afd-b734-417f-bbc2-65cf8cc0858e@googlegroups.com> |
| In reply to | #51002 |
El domingo, 21 de julio de 2013 12:31:42 UTC+2, Steven D'Aprano escribió: > On Sun, 21 Jul 2013 03:19:24 -0700, pablobarhamalzas wrote: > > > > > Thank's for all the replies! I've tried some of the imporovements you > > > suggested (using math.exp() and sum() or math.fsum()). None of that made > > > the code faster, because they are functions you are calling lots of > > > times, and function calling is quite time expensive (same as x**(1/2) is > > > faster than math.sqrt(x)). > > > > You are *badly* mistaken. Not only is sqrt more accurate, but it is also > > much faster. > > > > > > [steve@ando ~]$ python3.3 -m timeit -s "x = 2.357e7" "x**0.5" > > 1000000 loops, best of 3: 0.319 usec per loop > > [steve@ando ~]$ python3.3 -m timeit -s "x = 2.357e7" -s "from math import > > sqrt" "sqrt(x)" > > 10000000 loops, best of 3: 0.172 usec per loop > > > > > > How exactly are you timing the code? I'm timing the whole program with cProfile. Removing math.sqrt() from a function and using **(1/2) instead cut the execution time for a significant amount (~0.035 to ~0.020). I can't see another explanation for the speed increase...
[toc] | [prev] | [next] | [standalone]
| From | Stefan Behnel <stefan_ml@behnel.de> |
|---|---|
| Date | 2013-07-21 14:49 +0200 |
| Message-ID | <mailman.4952.1374410998.3114.python-list@python.org> |
| In reply to | #51003 |
pablobarhamalzas@gmail.com, 21.07.2013 12:48: > El domingo, 21 de julio de 2013 12:31:42 UTC+2, Steven D'Aprano escribió: >> [steve@ando ~]$ python3.3 -m timeit -s "x = 2.357e7" "x**0.5" >> 1000000 loops, best of 3: 0.319 usec per loop >> [steve@ando ~]$ python3.3 -m timeit -s "x = 2.357e7" -s "from math import >> sqrt" "sqrt(x)" >> 10000000 loops, best of 3: 0.172 usec per loop >> >> How exactly are you timing the code? > > I'm timing the whole program with cProfile. Removing math.sqrt() from a function and using **(1/2) instead cut the execution time for a significant amount (~0.035 to ~0.020). With or without the profiler running? Note that profiling will slow down your code (especially function calls), often significantly and sometimes even in such an unbalanced way that it visibly changes its execution profile. Always make sure you validate your code changes with benchmarks, outside of the profiler. Stefan
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-07-21 20:48 +1000 |
| Message-ID | <mailman.4947.1374403735.3114.python-list@python.org> |
| In reply to | #51002 |
On Sun, Jul 21, 2013 at 8:31 PM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > On Sun, 21 Jul 2013 03:19:24 -0700, pablobarhamalzas wrote: > >> Thank's for all the replies! I've tried some of the imporovements you >> suggested (using math.exp() and sum() or math.fsum()). None of that made >> the code faster, because they are functions you are calling lots of >> times, and function calling is quite time expensive (same as x**(1/2) is >> faster than math.sqrt(x)). > > You are *badly* mistaken. Not only is sqrt more accurate, but it is also > much faster. > > > [steve@ando ~]$ python3.3 -m timeit -s "x = 2.357e7" "x**0.5" > 1000000 loops, best of 3: 0.319 usec per loop > [steve@ando ~]$ python3.3 -m timeit -s "x = 2.357e7" -s "from math import > sqrt" "sqrt(x)" > 10000000 loops, best of 3: 0.172 usec per loop Don't forget the cost of attribute lookup, which adds 50% to the sqrt() figure. Still faster than exponentiation. (Figures from Python 3.4 alpha, but unlikely to be materially different.) rosuav@sikorsky:~$ python3 -m timeit -s "x = 2.357e7" "x**0.5" 1000000 loops, best of 3: 0.239 usec per loop rosuav@sikorsky:~$ python3 -m timeit -s "x = 2.357e7" -s "from math import sqrt" "sqrt(x)" 10000000 loops, best of 3: 0.102 usec per loop rosuav@sikorsky:~$ python3 -m timeit -s "x = 2.357e7" -s "import math" "math.sqrt(x)" 10000000 loops, best of 3: 0.155 usec per loop ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2013-07-21 09:27 -0600 |
| Message-ID | <mailman.4957.1374421840.3114.python-list@python.org> |
| In reply to | #51001 |
On 07/21/2013 04:19 AM, pablobarhamalzas@gmail.com wrote: > Thank's for all the replies! I've tried some of the imporovements you suggested (using math.exp() and sum() or math.fsum()). > None of that made the code faster, because they are functions you are calling lots of times, and function calling is quite time expensive (same as x**(1/2) is faster than math.sqrt(x)). > > I'm going to try to convert tu numpy now (I have no idea how to do it at the moment), thank's again to everyone. Perhaps you'd have better results if you'd post a runnable piece of code. Otherwise we're just guessing since no one has the ability to actually run your code.
[toc] | [prev] | [next] | [standalone]
| From | Joshua Landau <joshua@landau.ws> |
|---|---|
| Date | 2013-07-21 12:39 +0100 |
| Message-ID | <mailman.4950.1374406833.3114.python-list@python.org> |
| In reply to | #50974 |
On 20 July 2013 21:22, <pablobarhamalzas@gmail.com> wrote: > Ok, I'm working on a predator/prey simulation, which evolve using genetic algorithms. At the moment, they use a quite simple feed-forward neural network, which can change size over time. Each brain "tick" is performed by the following function (inside the Brain class): > <CODE> > > The function is actually quite fast (~0.040 seconds per 200 calls, using 10 input, 20 hidden and 3 output neurons), and used to be much slower untill I fiddled about with it a bit to make it faster. However, it is still somewhat slow for what I need it. > > My question to you is if you an see any obvious (or not so obvious) way of making this faster. I've heard about numpy and have been reading about it, but I really can't see how it could be implemented here. Currently we're just guessing; if you gave us an appropriate stand-in for "self" (so that we can call the function) we could be helpful much more easily.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web