Groups > comp.lang.java.programmer > #8832 > unrolled thread

Getter performance

Started by	Aéris <aeris@imirhil.fr>
First post	2011-10-15 23:03 +0200
Last post	2011-10-22 21:11 +0200
Articles	20 on this page of 25 — 15 participants

Back to article view | Back to comp.lang.java.programmer

  Getter performance Aéris <aeris@imirhil.fr> - 2011-10-15 23:03 +0200
    Re: Getter performance Arne Vajhøj <arne@vajhoej.dk> - 2011-10-15 17:36 -0400
      Re: Getter performance Arne Vajhøj <arne@vajhoej.dk> - 2011-10-15 17:42 -0400
        Re: Getter performance BGB <cr88192@hotmail.com> - 2011-10-15 15:00 -0700
          Re: Getter performance markspace <-@.> - 2011-10-15 15:20 -0700
            Re: Getter performance David Lamb <dalamb@cs.queensu.ca> - 2011-10-20 12:45 -0400
              Re: Getter performance Roedy Green <see_website@mindprod.com.invalid> - 2011-10-21 14:27 -0700
                Re: Getter performance Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2011-10-21 18:57 -0700
                  Re: Getter performance Patricia Shanahan <pats@acm.org> - 2011-10-22 07:27 +0100
                  Re: Getter performance Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2011-10-22 09:57 -0300
                Re: Getter performance Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-10-21 22:12 -0400
                  Re: Getter performance BGB <cr88192@hotmail.com> - 2011-11-13 10:14 -0700
                    Re: Getter performance Lew <lewbloch@gmail.com> - 2011-11-13 11:00 -0800
                Re: Getter performance Arne Vajhøj <arne@vajhoej.dk> - 2011-11-06 16:09 -0500
      Re: Getter performance Aéris <aeris@imirhil.fr> - 2011-10-15 23:59 +0200
        Re: Getter performance Arne Vajhøj <arne@vajhoej.dk> - 2011-10-15 19:44 -0400
          Re: Getter performance Aéris <aeris@imirhil.fr> - 2011-10-16 13:14 +0200
            Re: Getter performance Lars Enderin <lars.enderin@telia.com> - 2011-10-16 16:28 +0200
    Re: Getter performance Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-10-16 09:47 -0400
    Re: Getter performance Jaap Droogers <JaapDroogers@unusable.meel.homelinux.net> - 2011-10-16 22:12 +0200
      Re: Getter performance BGB <cr88192@hotmail.com> - 2011-10-16 13:58 -0700
      Re: Getter performance David Lamb <dalamb@cs.queensu.ca> - 2011-10-20 12:51 -0400
        Re: Getter performance Paul Cager <paul.cager@googlemail.com> - 2011-10-21 08:49 -0700
    Re: Getter performance Roedy Green <see_website@mindprod.com.invalid> - 2011-10-21 08:02 -0700
    Re: Getter performance Wanja Gayk <brixomatic@yahoo.com> - 2011-10-22 21:11 +0200

Page 1 of 2 [1] 2 Next page →

#8832 — Getter performance

From	Aéris <aeris@imirhil.fr>
Date	2011-10-15 23:03 +0200
Subject	Getter performance
Message-ID	<4e99f537$0$623$426a74cc@news.free.fr>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello everybody.

I work on an application where performances are important.

To optimize it, I thought a direct access to a variable (foo.bar) would
be more efficient than a getter call (foo.getBar()).
I thought by avoiding the call to a method and all that goes with it
(context switching, stacking, return value…), I can save time, but this
code sample prove the contrary :
	http://pastebin.com/bP1nqxce
	Direct variable access : 1041 ms
	Getter call : 556 ms

The difference is even more important if I don't modify the variable
value (lines 29 and 36 commented) :
	Direct variable access : 95 ms
	Getter call : 4 ms

How can we explain this not obvious huge difference ( 50 and 95% ) ?

- -- 
Aeris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOmfUwAAoJEK8zQvxDY4P97f4H/2qXiwLnDwuKxsHEkGYtk0va
HSASqi5Dj5g0RxD6h4G65UveBI1ALLFZAJmjrsklOl2OgN5YSvmudxpLBzcdk53S
4ans2g5lLzc21Mw5UtdW5oz3TeZZ6Z0Cbna9MloDopw6wJM06VzblMPbNr+XQfwt
S49rnd0RckKToR8lzveIJXNWuPmxKmh+JsV85ZFWepSLXB4xhsGCxuNdWq/Opjq5
s1nPvCUL5R6vKEbspyV4BRqrDEXwNKE4EPTHqgTNkklc9IMNTDjM0bPM2PhJwKlW
iX8xFqMeyizSAOPmNdaRq907LLSZhFo5jSc2JsKBj7AYb4oy8ewP0ozYnDMaFt8=
=AztU
-----END PGP SIGNATURE-----

[toc] | [next] | [standalone]

#8834

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2011-10-15 17:36 -0400
Message-ID	<4e99fcdf$0$295$14726298@news.sunsite.dk>
In reply to	#8832

On 10/15/2011 5:03 PM, Aéris wrote:
> I work on an application where performances are important.
>
> To optimize it, I thought a direct access to a variable (foo.bar) would
> be more efficient than a getter call (foo.getBar()).

Wrong approach. Write nice clean code. If it is fast enough then
fine. If not then measure where the bottleneck is. It seems highly
unlikely to be in getters.

> I thought by avoiding the call to a method and all that goes with it
> (context switching, stacking, return value…), I can save time, but this
> code sample prove the contrary :
> 	http://pastebin.com/bP1nqxce
> 	Direct variable access : 1041 ms
> 	Getter call : 556 ms
>
> The difference is even more important if I don't modify the variable
> value (lines 29 and 36 commented) :
> 	Direct variable access : 95 ms
> 	Getter call : 4 ms
>
> How can we explain this not obvious huge difference ( 50 and 95% ) ?

First thing would be to run a lot more than 100000 times. Such
small intervals will be very random on a multi tasking OS.

Arne

[toc] | [prev] | [next] | [standalone]

#8835

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2011-10-15 17:42 -0400
Message-ID	<4e99fe53$0$281$14726298@news.sunsite.dk>
In reply to	#8834

On 10/15/2011 5:36 PM, Arne Vajhøj wrote:
> On 10/15/2011 5:03 PM, Aéris wrote:
>> I work on an application where performances are important.
>>
>> To optimize it, I thought a direct access to a variable (foo.bar) would
>> be more efficient than a getter call (foo.getBar()).
>
> Wrong approach. Write nice clean code. If it is fast enough then
> fine. If not then measure where the bottleneck is. It seems highly
> unlikely to be in getters.
>
>> I thought by avoiding the call to a method and all that goes with it
>> (context switching, stacking, return value…), I can save time, but this
>> code sample prove the contrary :
>> http://pastebin.com/bP1nqxce
>> Direct variable access : 1041 ms
>> Getter call : 556 ms
>>
>> The difference is even more important if I don't modify the variable
>> value (lines 29 and 36 commented) :
>> Direct variable access : 95 ms
>> Getter call : 4 ms
>>
>> How can we explain this not obvious huge difference ( 50 and 95% ) ?
>
> First thing would be to run a lot more than 100000 times. Such
> small intervals will be very random on a multi tasking OS.

With 1000000 and -server I get:

1763
1608

which is rather close.

Arne

[toc] | [prev] | [next] | [standalone]

#8837

From	BGB <cr88192@hotmail.com>
Date	2011-10-15 15:00 -0700
Message-ID	<j7cvqo$tcp$1@news.albasani.net>
In reply to	#8835

On 10/15/2011 2:42 PM, Arne Vajhøj wrote:
> On 10/15/2011 5:36 PM, Arne Vajhøj wrote:
>> On 10/15/2011 5:03 PM, Aéris wrote:
>>> I work on an application where performances are important.
>>>
>>> To optimize it, I thought a direct access to a variable (foo.bar) would
>>> be more efficient than a getter call (foo.getBar()).
>>
>> Wrong approach. Write nice clean code. If it is fast enough then
>> fine. If not then measure where the bottleneck is. It seems highly
>> unlikely to be in getters.
>>
>>> I thought by avoiding the call to a method and all that goes with it
>>> (context switching, stacking, return value…), I can save time, but this
>>> code sample prove the contrary :
>>> http://pastebin.com/bP1nqxce
>>> Direct variable access : 1041 ms
>>> Getter call : 556 ms
>>>
>>> The difference is even more important if I don't modify the variable
>>> value (lines 29 and 36 commented) :
>>> Direct variable access : 95 ms
>>> Getter call : 4 ms
>>>
>>> How can we explain this not obvious huge difference ( 50 and 95% ) ?
>>
>> First thing would be to run a lot more than 100000 times. Such
>> small intervals will be very random on a multi tasking OS.
>
> With 1000000 and -server I get:
>
> 1763
> 1608
>
> which is rather close.
>

this is an interesting result...

I was aware that the JVM optimized getters, but unclear is why they are 
slightly faster than direct field access, rather than exactly the same 
speed (or sometimes slightly faster and sometimes slightly slower).

[toc] | [prev] | [next] | [standalone]

#8838

From	markspace <-@.>
Date	2011-10-15 15:20 -0700
Message-ID	<j7d0v7$eo5$1@dont-email.me>
In reply to	#8837

On 10/15/2011 3:00 PM, BGB wrote:

> On 10/15/2011 2:42 PM, Arne Vajhøj wrote:
>> With 1000000 and -server I get:
>>
>> 1763
>> 1608
>>
>> which is rather close.

> this is an interesting result...
>
> I was aware that the JVM optimized getters, but unclear is why they are
> slightly faster than direct field access, rather than exactly the same
> speed (or sometimes slightly faster and sometimes slightly slower).

I don't think the result indicates that getters are slightly faster, or 
vice versa.  It's just to close too call, and there's isn't really 
anyway to say one is universally better, speed-wise.

Software engineering-wise, and best practice-wise, the getter is 
undoubtedly the better option.

[toc] | [prev] | [next] | [standalone]

#9035

From	David Lamb <dalamb@cs.queensu.ca>
Date	2011-10-20 12:45 -0400
Message-ID	<KeYnq.9220$wa5.7836@newsfe17.iad>
In reply to	#8838

On 15/10/2011 6:20 PM, markspace wrote:
> On 10/15/2011 3:00 PM, BGB wrote:
>
>> On 10/15/2011 2:42 PM, Arne Vajhøj wrote:
>>> With 1000000 and -server I get:
>>>
>>> 1763
>>> 1608
>>>
>>> which is rather close.
>
>> this is an interesting result...
>>
>> I was aware that the JVM optimized getters, but unclear is why they are
>> slightly faster than direct field access, rather than exactly the same
>> speed (or sometimes slightly faster and sometimes slightly slower).
>
>
> I don't think the result indicates that getters are slightly faster, or
> vice versa. It's just to close too call, and there's isn't really anyway
> to say one is universally better, speed-wise.

A really thorough investigation would have to take into account a lot of 
performance factors; on multitasking systems (that is, most systems) a 
single expensive process swap that happened on one loop and not the 
other would make a huge difference.  You'd need to run the whole 
experiment several times (not just lots of times in one program run). 
But markspace' comment is almost certainly correct: too close to call.

> Software engineering-wise, and best practice-wise, the getter is
> undoubtedly the better option.

Exactly.

[toc] | [prev] | [next] | [standalone]

#9079

From	Roedy Green <see_website@mindprod.com.invalid>
Date	2011-10-21 14:27 -0700
Message-ID	<lfo3a75gk7h46q1csvk6k6jrks9qce80ni@4ax.com>
In reply to	#9035

On Thu, 20 Oct 2011 12:45:30 -0400, David Lamb <dalamb@cs.queensu.ca>
wrote, quoted or indirectly quoted someone who said :

> You'd need to run the whole 
>experiment several times (not just lots of times in one program run). 
>But markspace' comment is almost certainly correct: too close to call

One thing that bedevils such testing is garbage collection which
happens at "random" times and makes things look slower than they
really are.  That is why you need very long tests to average them out.
See http://mindprod.com/jgloss/benchmark.html
for some hints.

You to avoid the initial class load time or the Hotspot time to
optimise if you are after just the speed of the final code.

Knuth has frightened people from even investigating speed out of
curiosity. He was trying to stop people from writing needlessly
optimised unreadable code.

If there are two ways of doing things, both equally readable and
maintainable, it strikes me as silly to blind yourself to knowing
which is faster. You might as well do it that way whether it is
necessary or not.

The approach I see advocated could be parodied by saying you should
select a random sort algorithm including insertion sort, and bubble
sort and only putting in a clever algorithm after benchmarking and
analysis prove it is necessary.

I think you should know without benchmarking each individual case
which Map or Collection would be best for a given task.
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
It should not be considered an error when the user starts something
already started or stops something already stopped. This applies
to browsers, services, editors... It is inexcusable to 
punish the user by requiring some elaborate sequence to atone,
e.g. open the task editor, find and kill some processes.

[toc] | [prev] | [next] | [standalone]

#9083

From	Daniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date	2011-10-21 18:57 -0700
Message-ID	<rqpoq.6270$UK6.64@newsfe06.iad>
In reply to	#9079

On 10/21/11 2:27 PM, Roedy Green wrote:
> I think you should know without benchmarking each individual case
> which Map or Collection would be best for a given task.
I nearly agree.

An experienced programmer should know which Map or Collection would work 
sufficiently well for a given task.  "Best", by some definitions, would 
almost always be a custom class (perhaps not even implementing Map or 
Collection).  The thing is that best may be much better than needed.

[toc] | [prev] | [next] | [standalone]

#9085

From	Patricia Shanahan <pats@acm.org>
Date	2011-10-22 07:27 +0100
Message-ID	<pemdnXtu9MKv_z_TnZ2dnUVZ_uadnZ2d@earthlink.com>
In reply to	#9083

Daniel Pitts wrote:
> On 10/21/11 2:27 PM, Roedy Green wrote:
>> I think you should know without benchmarking each individual case
>> which Map or Collection would be best for a given task.
> I nearly agree.
> 
> An experienced programmer should know which Map or Collection would work 
> sufficiently well for a given task.  "Best", by some definitions, would 
> almost always be a custom class (perhaps not even implementing Map or 
> Collection).  The thing is that best may be much better than needed.

Also, even between two algorithms, "best" may require a lot of
experiments, and be machine dependent.

For example, consider binary vs. linear search of a sequential array.
Binary search does O(log n) comparisons, but with unpredictable
conditional branches leading to a high cost per branch. Linear search
does O(n) comparisons, but most of its conditional branches go the same
way, making them relatively cheap.

They are of about equal coding cost, because Arrays has methods to do
binary search, and linear search is extremely simple.

For very short arrays I would definitely use linear search. For very
large arrays I would definitely use binary search. How about an array
length 100? Which is faster?

I think either would be "sufficiently good" except in very rare
conditions. If I had a program with a performance bottleneck on a search
of a length 100 array, I would measure in the context of that program.

Patricia

[toc] | [prev] | [next] | [standalone]

#9088

From	Arved Sandstrom <asandstrom3minus1@eastlink.ca>
Date	2011-10-22 09:57 -0300
Message-ID	<X4zoq.10182$eY3.4818@newsfe15.iad>
In reply to	#9083

On 11-10-21 10:57 PM, Daniel Pitts wrote:
> On 10/21/11 2:27 PM, Roedy Green wrote:
>> I think you should know without benchmarking each individual case
>> which Map or Collection would be best for a given task.

> I nearly agree.
> 
> An experienced programmer should know which Map or Collection would work
> sufficiently well for a given task.  "Best", by some definitions, would
> almost always be a custom class (perhaps not even implementing Map or
> Collection).  The thing is that best may be much better than needed.

Roedy may have meant "best" in this sense. It's a word with lots of
definitions, one of which is "most suitable". For a software
developer/engineer I'd think that "most suitable" would always be the
reigning definition of "best"; another definition (like the one you
posit as a possibility, leading perhaps to a custom class) might be more
of a computer science thing.

Roedy's right, though, with his overall comments, although I wouldn't
blame anything on The Donald [1]. For starters, way too many people -
having never read anything by Knuth except various quotes, have only
ever seen "premature optimization is the root of all evil" in isolation.
They've never read the important sentences surrounding that one, that
supply essential context.

I've seen it myself, that folks seize on this one sentence and disparage
reasonable design-time efforts to think about performance. And I've
heard often enough from other developers who should know better that we
should "write it first and we'll profile it later if it's too slow". Or
a memory hog. These folks forget that Knuth talked about _premature_
optimization: there's a lot of design and implementation-time
optimization that ain't.

c2.com on their page about "UniformlySlowCode"
(http://c2.com/cgi/wiki?UniformlySlowCode) discusses a situation that
can happen a lot if you don't practice experience-seasoned non-premature
optimization. I've seen this happen more than once, that an app is
sluggish and a memory beast, and yet when you analyze it you find that
you have to fix, well, everything.

AHS

1. Man, how can Trump be The Donald? It's just not right. :-) I've set
matters straight here.

-- 
I tend to watch a little TV... Court TV, once in a while. Some of the
cases I get interested in.
-- O. J. Simpson

[toc] | [prev] | [next] | [standalone]

#9084

From	Eric Sosman <esosman@ieee-dot-org.invalid>
Date	2011-10-21 22:12 -0400
Message-ID	<j7t8pu$1gk$2@dont-email.me>
In reply to	#9079

On 10/21/2011 5:27 PM, Roedy Green wrote:
>[...]
> Knuth has frightened people from even investigating speed out of
> curiosity.

     Oh, yeah, right.  As in "Ri-i-i-i-ght."

     That's why he didn't write three entire volumes of TAOCP (and
more in the works) in an effort to reach beyond O() notation to the
constant factors that lie behind.  That's why he didn't devote a
whole lot of text and a whole lot of subtle mathematics to winkling
out differences between O(n log n) Quicksort and O(n log n) Mergesort
and O(n log n) Heapsort and O(n log n(?)) Shellsort.  That's why he
never explored threaded trees (they improve some operations by a
factor of only O(log n) so who cares?).  That's why he never bothered
with Marsaglia's rectangle-wedge-tail method for generating normally-
distributed random numbers; it's just a trifling micro-optimization.

     Oh, sorry, wait: By "frightened" you don't mean "dissuaded," but
literally "scared."  Well, I confess that some of Knuth's math goes
over my head.  But I feel at the same time that someone who shrinks
from even trying to understand it does not merit the title of
"engineer."

-- 
Eric Sosman
esosman@ieee-dot-org.invalid

[toc] | [prev] | [next] | [standalone]

#9916

From	BGB <cr88192@hotmail.com>
Date	2011-11-13 10:14 -0700
Message-ID	<j9otv4$7j0$1@news.albasani.net>
In reply to	#9084

On 10/21/2011 7:12 PM, Eric Sosman wrote:
> On 10/21/2011 5:27 PM, Roedy Green wrote:
>> [...]
>> Knuth has frightened people from even investigating speed out of
>> curiosity.
>
> Oh, yeah, right. As in "Ri-i-i-i-ght."
>
> That's why he didn't write three entire volumes of TAOCP (and
> more in the works) in an effort to reach beyond O() notation to the
> constant factors that lie behind. That's why he didn't devote a
> whole lot of text and a whole lot of subtle mathematics to winkling
> out differences between O(n log n) Quicksort and O(n log n) Mergesort
> and O(n log n) Heapsort and O(n log n(?)) Shellsort. That's why he
> never explored threaded trees (they improve some operations by a
> factor of only O(log n) so who cares?). That's why he never bothered
> with Marsaglia's rectangle-wedge-tail method for generating normally-
> distributed random numbers; it's just a trifling micro-optimization.
>
> Oh, sorry, wait: By "frightened" you don't mean "dissuaded," but
> literally "scared." Well, I confess that some of Knuth's math goes
> over my head. But I feel at the same time that someone who shrinks
> from even trying to understand it does not merit the title of
> "engineer."
>

I tend to distinguish between what would be called "micro optimizations" 
and "macro optimization".

worrying about the cost of an if/else block, a switch, or performing a 
function/method call, is a micro optimization.

worrying about the choice of an algorithm or program architecture is a 
macro-optimization. that is more about what most of the examples were about.

generally, macro-optimizations can be reasoned about, and improved upon 
from experience.

OTOH, most micro-optimizations are worrying about small constant 
factors, and tend to make code look nasty if used (and so are better off 
used sparingly).

the main issue is that many people, when faced with poor performance (or 
worry about possible poor performance), look right to trying to 
micro-optimize whatever they perceive "might be" slow (which tends to be 
bottom-level costs, like the costs of some operation X, ...), rather 
than evaluating the larger scale architectural issues ("is there some 
way I can avoid this operation altogether?").

but, there can be some overlap:
for example, I recently migrated an interpreter of mine from using a 
"loop-and-switch" strategy to using threaded code.

this could be argued to be a micro-optimization (me afraid of the small 
constant overhead of the switch), but actually it was more motivated by 
flexibility: there are some things which can be done with threaded code 
which can't be so readily done using fixed-form logic within a switch.

so, debatably, it wasn't even really an optimization in the first place, 
it only had the side-effect of slightly improving performance.

OTOH, I recently also made an optimization to my type-checking code 
which reduced the cost of a type-lookup operation by feeding the pointer 
through a hash (seeing if the pointer recently had its type looked up).

however, even though it did effect behavior (it skipped the more 
expensive operation of looking up an object in the heap), I still 
classify this as a micro-optimization (however, to my defense, the 
profiler was showing a lot of time going into type-lookups in this case).

similarly, it exhibited the usual micro-optimization property in that it 
added some hair to the effected code (some logic to worry about a hash, 
and a few calls added elsewhere to flush the hash).

more so, it is not likely to be a uniformly distributed optimization: it 
will not improve the performance of type-checking things like fixnums, 
which don't involve such a heap lookup in the first place.

or such...

[toc] | [prev] | [next] | [standalone]

#9926

From	Lew <lewbloch@gmail.com>
Date	2011-11-13 11:00 -0800
Message-ID	<4525243.79.1321210833177.JavaMail.geo-discussion-forums@prgt40>
In reply to	#9916

BGB wrote:
> Eric Sosman wrote:
>> Roedy Green wrote:
>>> [...]
>>> Knuth has frightened people from even investigating speed out of
>>> curiosity.
>>
>> Oh, yeah, right. As in "Ri-i-i-i-ght."

+1

Blaming Knuth is about as accurate as saying Jesus caused people to wage war.

>> That's why he didn't write three entire volumes of TAOCP (and
>> more in the works) in an effort to reach beyond O() notation to the
>> constant factors that lie behind. That's why he didn't devote a
>> whole lot of text and a whole lot of subtle mathematics to winkling
>> out differences between O(n log n) Quicksort and O(n log n) Mergesort
>> and O(n log n) Heapsort and O(n log n(?)) Shellsort. That's why he
>> never explored threaded trees (they improve some operations by a
>> factor of only O(log n) so who cares?). That's why he never bothered
>> with Marsaglia's rectangle-wedge-tail method for generating normally-
>> distributed random numbers; it's just a trifling micro-optimization.
>>
>> Oh, sorry, wait: By "frightened" you don't mean "dissuaded," but
>> literally "scared." Well, I confess that some of Knuth's math goes

Again, Knuth did not cause some low-browed programmer wannabe to feel fear; that's entirely their own responsibility.

>> over my head. But I feel at the same time that someone who shrinks
>> from even trying to understand it does not merit the title of
>> "engineer."

We all feel fear.  It's what we do in the face of it that counts.

> I tend to distinguish between what would be called "micro optimizations" 
> and "macro optimization".
> 
> worrying about the cost of an if/else block, a switch, or performing a 
> function/method call, is a micro optimization.
> 
> worrying about the choice of an algorithm or program architecture is a 
> macro-optimization. that is more about what most of the examples were about.
> 
> generally, macro-optimizations can be reasoned about, and improved upon 
> from experience.
> 
> OTOH, most micro-optimizations are worrying about small constant 
> factors, and tend to make code look nasty if used (and so are better off 
> used sparingly).
> 
> 
> the main issue is that many people, when faced with poor performance (or 
> worry about possible poor performance), look right to trying to 
> micro-optimize whatever they perceive "might be" slow (which tends to be 
> bottom-level costs, like the costs of some operation X, ...), rather 
> than evaluating the larger scale architectural issues ("is there some 
> way I can avoid this operation altogether?").
> 
> 
> but, there can be some overlap:
> for example, I recently migrated an interpreter of mine from using a 
> "loop-and-switch" strategy to using threaded code.
> 
> this could be argued to be a micro-optimization (me afraid of the small 
> constant overhead of the switch), but actually it was more motivated by 
> flexibility: there are some things which can be done with threaded code 
> which can't be so readily done using fixed-form logic within a switch.
> 
> so, debatably, it wasn't even really an optimization in the first place, 
> it only had the side-effect of slightly improving performance.
> 
> 
> OTOH, I recently also made an optimization to my type-checking code 
> which reduced the cost of a type-lookup operation by feeding the pointer 
> through a hash (seeing if the pointer recently had its type looked up).
> 
> however, even though it did effect behavior (it skipped the more 
> expensive operation of looking up an object in the heap), I still 
> classify this as a micro-optimization (however, to my defense, the 
> profiler was showing a lot of time going into type-lookups in this case).
> 
> similarly, it exhibited the usual micro-optimization property in that it 
> added some hair to the effected code (some logic to worry about a hash, 
> and a few calls added elsewhere to flush the hash).
> 
> more so, it is not likely to be a uniformly distributed optimization: it 
> will not improve the performance of type-checking things like fixnums, 
> which don't involve such a heap lookup in the first place.

The problem isn't micro-optimization, as your experience supports.  The problem, as Knuth said, is _premature_ optimization.  That's optimization when you don't have enough data to support that what you do actually optimizes anything.

Your "macro-optimization" is optimization for which you have enough evidence early on.  Choice of algorithm falls into this category.

The scenario you describe of incidental optimization, where a change made for algorithmic or modeling reasons improved performance, is not "optimization" in the sense of change intended to produce better performance.  Unless you've done extensive measurement on multiple platforms in varied usage patterns, you actually don't have enough data to reliably aver improved performance, but since that's irrelevant to the change's purpose that's all right.

The scenario where you intentionally changed a detail of the code to improve performance, based on profiling and evidence that the work actually had the desired effect, may be "micro-optimization", but so what?  It's not premature.

You actually demonstrated beautifully in your post what is and is not proper for optimization.

-- 
Lew

[toc] | [prev] | [next] | [standalone]

#9675

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2011-11-06 16:09 -0500
Message-ID	<4eb6f7a3$0$290$14726298@news.sunsite.dk>
In reply to	#9079

On 10/21/2011 5:27 PM, Roedy Green wrote:
> Knuth has frightened people from even investigating speed out of
> curiosity. He was trying to stop people from writing needlessly
> optimised unreadable code.

True - we never see questions about micro optimizing here on usenet.

Hey wait a minute !

Arne

[toc] | [prev] | [next] | [standalone]

#8836

From	Aéris <aeris@imirhil.fr>
Date	2011-10-15 23:59 +0200
Message-ID	<4e9a0253$0$27973$426a74cc@news.free.fr>
In reply to	#8834

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Le 15/10/2011 23:36, Arne Vajhøj a écrit :
> Wrong approach. Write nice clean code. If it is fast enough then
> fine. If not then measure where the bottleneck is. It seems highly
> unlikely to be in getters.

I agree with you, my code is clean and fast enough for the moment but
this question of getter is more general and nag me since few time

> First thing would be to run a lot more than 100000 times. Such
> small intervals will be very random on a multi tasking OS.

Tested on Integer.MAX_VALUE, same result.

But I notice that if I read « tmp » value, results are differents.
Adding
	if (tmp.isEmpty()) {
		System.out.println("Empty");
	}
leads to more expected result :
	Direct access : 106 ms
	Getter call : 2223 ms
Seems Java compiler is very efficient and don't really call the getter
if the return value is not read and the getter has no side-effect.

- -- 
Aeris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOmgJOAAoJEK8zQvxDY4P9wqIIAJnfxjQgsoUMmKr7CxIamTBr
KeQNH7gCGX709/d2ZAb3PCRQ9UTKiuteEU8JHal6735ziyifIBtwxHBtQTB6jduC
74uW1aMzMUYJbRpBL0wViL7SYW7Ob57a1Oz8/bYl1+htDEVb2FWMMmfgbhDf9HeT
MXMilvcCVlDs4OxG3NnFL14Y7Nv1He6YzV3BaMOIz1serH9pZy1/NCgdazfhwrZH
6bGwtYET2hbzabda9o2CrX8jRjenyLWz1ad9QPanLvHDWJRf+Y316lNXLmvhbR++
+cQzQJ8hmj9nlyFGJuQ1tG0VzTE/E/qgcdN/sCOWiApC4Nhsp3tUD+68H02bFTU=
=UnUI
-----END PGP SIGNATURE-----

[toc] | [prev] | [next] | [standalone]

#8839

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2011-10-15 19:44 -0400
Message-ID	<4e9a1ae6$0$282$14726298@news.sunsite.dk>
In reply to	#8836

On 10/15/2011 5:59 PM, Aéris wrote:
> Le 15/10/2011 23:36, Arne Vajhøj a écrit :
>> Wrong approach. Write nice clean code. If it is fast enough then
>> fine. If not then measure where the bottleneck is. It seems highly
>> unlikely to be in getters.
>
> I agree with you, my code is clean and fast enough for the moment but
> this question of getter is more general and nag me since few time

If this is for fun, then it is OK. 25 years ago I spent a lot
of time trying to optimize Sieve of Eratosthenes in VAX
assembler. Fun.

But this is work, then stop wasting your employers money and
focus on real problems.

>> First thing would be to run a lot more than 100000 times. Such
>> small intervals will be very random on a multi tasking OS.
>
> Tested on Integer.MAX_VALUE, same result.
>
> But I notice that if I read « tmp » value, results are differents.
> Adding
> 	if (tmp.isEmpty()) {
> 		System.out.println("Empty");
> 	}
> leads to more expected result :
> 	Direct access : 106 ms
> 	Getter call : 2223 ms
> Seems Java compiler is very efficient and don't really call the getter
> if the return value is not read and the getter has no side-effect.

The JIT compiler can do many things.

Another problem with microbenchmarks is that they typical
measure scenario  X1 vs Y1 and there are no guarantee that
X1 faster than Y1 implies that sum(Wi*Xi) is faster
that sum(Wi*Yi) for i=1..n, which is the real world.

Small differences can mean a lot of difference for a
single scenario, which is not relevant for the real world
with hundreds or thousands of almost identical but still
different scenarios.

Arne

[toc] | [prev] | [next] | [standalone]

#8859

From	Aéris <aeris@imirhil.fr>
Date	2011-10-16 13:14 +0200
Message-ID	<4e9abc79$0$3198$426a34cc@news.free.fr>
In reply to	#8839

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Le 16/10/2011 01:44, Arne Vajhøj a écrit :
> But this is work, then stop wasting your employers money and
> focus on real problems.

This is not my employer's money, but my free time =)

- -- 
Aeris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOmrx0AAoJEK8zQvxDY4P9zscIAL+ERs3ds9+Qc5dbsq/pGduN
zAu7LmCOU37+pVuPKZuiwV7YQOwM4TDHutVLX3vfocYsTkxCZSz6LyqBrWBxMAOx
NIc6c608ndRjre44aHX3m2Fk5lZFqJ+8pIRzDVIPCKImBMgsdO/EuYZdn9a+BiD7
q9vcGquiL/zMVUJNuKas8ahiqsquL1l1t0Mi0s1ooeqwsBp0Thw1Xg6qNoAQZ2/c
1u2/mo80Ixb2976BlkQir0enjYQV0D3lHobc4SKZ+Dc8vhS7i+MkjcHBqifPxYMK
b6rERWKaJXVdCerE/QKyAtnmEG9Pc5q7dQNSjgSLvg6EJ4deVKuU/2iUtpzE8Wo=
=WQdS
-----END PGP SIGNATURE-----

[toc] | [prev] | [next] | [standalone]

#8864

From	Lars Enderin <lars.enderin@telia.com>
Date	2011-10-16 16:28 +0200
Message-ID	<4E9AE9F4.8070406@telia.com>
In reply to	#8859

2011-10-16 13:14, Aéris skrev:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Le 16/10/2011 01:44, Arne Vajhøj a écrit :
>> But this is work, then stop wasting your employers money and
>> focus on real problems.
> 
> This is not my employer's money, but my free time =)
> 
> - -- 
> Aeris
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iQEcBAEBAgAGBQJOmrx0AAoJEK8zQvxDY4P9zscIAL+ERs3ds9+Qc5dbsq/pGduN
> zAu7LmCOU37+pVuPKZuiwV7YQOwM4TDHutVLX3vfocYsTkxCZSz6LyqBrWBxMAOx
> NIc6c608ndRjre44aHX3m2Fk5lZFqJ+8pIRzDVIPCKImBMgsdO/EuYZdn9a+BiD7
> q9vcGquiL/zMVUJNuKas8ahiqsquL1l1t0Mi0s1ooeqwsBp0Thw1Xg6qNoAQZ2/c
> 1u2/mo80Ixb2976BlkQir0enjYQV0D3lHobc4SKZ+Dc8vhS7i+MkjcHBqifPxYMK
> b6rERWKaJXVdCerE/QKyAtnmEG9Pc5q7dQNSjgSLvg6EJ4deVKuU/2iUtpzE8Wo=
> =WQdS
> -----END PGP SIGNATURE-----

Why are you wasting bandwidth by including PGP signatures? Your
contributions are not all that important.

[toc] | [prev] | [next] | [standalone]

#8861

From	Eric Sosman <esosman@ieee-dot-org.invalid>
Date	2011-10-16 09:47 -0400
Message-ID	<j7en9t$r5o$1@dont-email.me>
In reply to	#8832

On 10/15/2011 5:03 PM, Aéris wrote:
>
> I work on an application where performances are important.
>
> To optimize it, I thought a direct access to a variable (foo.bar) would
> be more efficient than a getter call (foo.getBar()).
> I thought by avoiding the call to a method and all that goes with it
> (context switching, stacking, return value…), I can save time, but this
> code sample prove the contrary :
> 	http://pastebin.com/bP1nqxce
> 	Direct variable access : 1041 ms
> 	Getter call : 556 ms
>
> The difference is even more important if I don't modify the variable
> value (lines 29 and 36 commented) :
> 	Direct variable access : 95 ms
> 	Getter call : 4 ms
>
> How can we explain this not obvious huge difference ( 50 and 95% ) ?

     Insufficiently precise measurement.  You are trying to tell
whether two nickels are heavier than four pennies by piling each
stack of coins in turn atop a bulldozer and weighing the bulldozer.
If you want to measure tiny effects, seek a measurement framework
that doesn't involve huge effects.

     Fundamentally, though, you're going about the larger task in
the wrong way: You are simply guessing that the speed of access to
a variable is important to the performance of your application.  Do
you have even the slightest shred of evidence that this is so?  Or
are you trying to improve your car's fuel economy by scraping off
all the paint to reduce the vehicle weight?

     1: Decide what performance criteria the application must satisfy.
Make this quantifiable: "Throughput of at least X doodads per eyeblink"
or "95% of responses in no more than Y milliquivers," not undecidable
like "As fast as possible."

     2: (Optional) Make back-of-the-envelope calculations about data
rates, data set sizes, and so on, just to estimate the resources
that will be required.  Is the load within reach of a laptop, or do
you need a clustered network of supercomputers?  Can you handle the
traffic with WiFi, or do you need sixteen 10Gb/s fibers in parallel?

     3: Write the application, as cleanly and simply as you can in
light of [1] and [2].  During this stage, bear in mind that even the
most expert of programmers is usually surprised about where his code
spends the majority of its time (this is not opinion; it's been found
over and over again in actual experiment).  In other words, favor
"clean and simple" over "clever and reputedly fast."

     4: Measure the performance, and compare to the thresholds of [1].
If you satisfy [1], stop!

     5: (Only if [1] isn't satisfied) Apply profilers, debuggers, and
other tools to discover which parts of the program are too slow.  Apply
your efforts to speeding up those parts only; pay no attention to the
others.  Then return to [4].  In very rare cases (usually when [2] was
skipped or when some external agency forces a change in [1]), return
to [3] and start over.

     Do not begin by trying to micro-optimize; that way lies madness.
Trust me; I know: Been there, done that, bear the scars.

-- 
Eric Sosman
esosman@ieee-dot-org.invalid

[toc] | [prev] | [next] | [standalone]

#8870

From	Jaap Droogers <JaapDroogers@unusable.meel.homelinux.net>
Date	2011-10-16 22:12 +0200
Message-ID	<4e9b3abe$0$31993$e4fe514c@dreader24.news.xs4all.nl>
In reply to	#8832

On 15-10-11 23:03, schreef Aéris:
> Hello everybody.
>
> I work on an application where performances are important.
>
> To optimize it, I thought a direct access to a variable (foo.bar) would
> be more efficient than a getter call (foo.getBar()).
> I thought by avoiding the call to a method and all that goes with it
> (context switching, stacking, return value…), I can save time, but this
> code sample prove the contrary :
> 	http://pastebin.com/bP1nqxce
> 	Direct variable access : 1041 ms
> 	Getter call : 556 ms
>
> The difference is even more important if I don't modify the variable
> value (lines 29 and 36 commented) :
> 	Direct variable access : 95 ms
> 	Getter call : 4 ms
>
> How can we explain this not obvious huge difference ( 50 and 95% ) ?
>

You forgot the JIT compiler. All the nice clean code you have written is 
optimized by the compiler. You don't write C code, remember.
Why is Java so fast: it uses the JIT compiler which optimizes the code 
for the machine the program is running on.
if you write
   int a = myObject.getA();

the compiler compiles it probably as:
   int a = myObject.a;

There is also a change that you try to optimize your code and the JIT 
compiler does not now how to optimize it, so your code runs slower.

Jaap

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

Getter performance

Contents

#8832 — Getter performance

#8834

#8835

#8837

#8838

#9035

#9079

#9083

#9085

#9088

#9084

#9916

#9926

#9675

#8836

#8839

#8859

#8864

#8861

#8870