Groups > comp.lang.python > #6891 > unrolled thread

Why is this so much faster?

Started by	Keir Rice <keirrice@gmail.com>
First post	2011-06-02 15:07 -0700
Last post	2011-06-02 17:50 -0600
Articles	3 — 3 participants

Back to article view | Back to comp.lang.python

  Why is this so much faster? Keir Rice <keirrice@gmail.com> - 2011-06-02 15:07 -0700
    Re: Why is this so much faster? Terry Reedy <tjreedy@udel.edu> - 2011-06-02 19:04 -0400
    Re: Why is this so much faster? Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-02 17:50 -0600

#6891 — Why is this so much faster?

From	Keir Rice <keirrice@gmail.com>
Date	2011-06-02 15:07 -0700
Subject	Why is this so much faster?
Message-ID	<8bccb0f0-4c1a-4ec8-91b4-4f420d0d8eb9@glegroupsg2000goo.googlegroups.com>

Hi All,

The following function was showing up in my profiles as a large bottle neck:

# Slow version
def RMSBand(self, histogram):
	"""Calculates the root-mean-squared value for the given colour stream histogram."""
	intermediateResult = map(lambda (i, h): h*(i**2), zip(histogram, range(255)))
	totalSum = sum(intermediateResult)
	
	# calculate rms
	return math.sqrt(totalSum / self.Area())

So after a bit of trial and error I came up with the following function which is a lot faster:

# Fast version
def RMSBand(self, histogram):
	"""Calculates the root-mean-squared value for the given colour stream histogram."""
	totalSum = 0
	for i, h in enumerate(histogram):
		totalSum += h*(i**2)
	
	# calculate rms
	return math.sqrt(totalSum / self.Area())

My question is why is the second method so much faster?
Is it the removal of the extra function calls?
Is it skipping the creation of a list?
Do the basic arithmetic operators get pushed up into C code?

Any insights into the whats really happening behind the scenes would be appreciated.

Keir

[toc] | [next] | [standalone]

#6893

From	Terry Reedy <tjreedy@udel.edu>
Date	2011-06-02 19:04 -0400
Message-ID	<mailman.2407.1307055911.9059.python-list@python.org>
In reply to	#6891

On 6/2/2011 6:07 PM, Keir Rice wrote:
> Hi All,
>
> The following function was showing up in my profiles as a large bottle neck:
>
> # Slow version
> def RMSBand(self, histogram):
> 	"""Calculates the root-mean-squared value for the given colour stream histogram."""
> 	intermediateResult = map(lambda (i, h): h*(i**2), zip(histogram, range(255)))
> 	totalSum = sum(intermediateResult)
> 	
> 	# calculate rms
> 	return math.sqrt(totalSum / self.Area())
>
> So after a bit of trial and error I came up with the following function which is a lot faster:
>
> # Fast version
> def RMSBand(self, histogram):
> 	"""Calculates the root-mean-squared value for the given colour stream histogram."""
> 	totalSum = 0
> 	for i, h in enumerate(histogram):
> 		totalSum += h*(i**2)

> 	# calculate rms
> 	return math.sqrt(totalSum / self.Area())
>
> My question is why is the second method so much faster?
> Is it the removal of the extra function calls?

Yes. Map is only 'fast' when one already has a function and is going to 
call it repeatedly regardless of the other code. When one has an 
expression, wrapping it as a function to use map is surely slower. Have 
you tried

   return math.sqrt(sum([h*i*i for i,h in enumerate(histogram)])
     / self.Area())

or same without [] brackets?

i*i should be faster than i**2 in any version.

> Is it skipping the creation of a list?

A bit.

See Tim's response.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#6899

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2011-06-02 17:50 -0600
Message-ID	<mailman.2410.1307058647.9059.python-list@python.org>
In reply to	#6891

On Thu, Jun 2, 2011 at 4:38 PM, Tim Delaney <timothy.c.delaney@gmail.com> wrote:
> First of all, do you have the parameters to the lambda the wrong way around
> in the map() version? zip(histogram, range(255)) will return (histogram
> value, index), but enumerate(histogram) will return (index, histogram
> value). But the parameters haven't been swapped, resulting in the histogram
> value being squared in the map version, but the index being squared in the
> manual summing version. Depending on the values, this could result in a
> large performance increase in the second version (if the total value exceeds
> the maximum size of your platform's "long" type).

In addition to what Tim said, I question whether you should even be
multiplying by the index at all.  The usual definition of "root mean
square" is this:

math.sqrt(sum(x * x for x in values) / len(values))

I don't know the problem that you're trying to solve, though, so maybe
I am just being confused by your terminology.

Cheers,
Ian

[toc] | [prev] | [standalone]

csiph-web

Why is this so much faster?

Contents

#6891 — Why is this so much faster?

#6893

#6899