Groups > comp.lang.ruby > #2426 > unrolled thread

Hash Surprises with Fixnum, #hash, and #eql?

Started by	Clifford Heath <no@spam.please.net>
First post	2011-04-07 14:01 +1000
Last post	2011-04-07 17:16 -0700
Articles	20 on this page of 44 — 9 participants

Back to article view | Back to comp.lang.ruby

  Hash Surprises with Fixnum, #hash, and #eql? Clifford Heath <no@spam.please.net> - 2011-04-07 14:01 +1000
    Re: Hash Surprises with Fixnum, #hash, and #eql? Robert Klemme <shortcutter@googlemail.com> - 2011-04-07 04:19 -0500
      Re: Hash Surprises with Fixnum, #hash, and #eql? Phillip Gawlowski <cmdjackryan@googlemail.com> - 2011-04-07 06:20 -0500
        Re: Hash Surprises with Fixnum, #hash, and #eql? Robert Klemme <shortcutter@googlemail.com> - 2011-04-07 06:53 -0500
          Re: Hash Surprises with Fixnum, #hash, and #eql? Phillip Gawlowski <cmdjackryan@googlemail.com> - 2011-04-07 08:21 -0500
            Re: Hash Surprises with Fixnum, #hash, and #eql? Robert Klemme <shortcutter@googlemail.com> - 2011-04-07 08:43 -0500
              Re: Hash Surprises with Fixnum, #hash, and #eql? Phillip Gawlowski <cmdjackryan@googlemail.com> - 2011-04-07 09:02 -0500
                Re: Hash Surprises with Fixnum, #hash, and #eql? Robert Klemme <shortcutter@googlemail.com> - 2011-04-07 09:28 -0500
                  Re: Hash Surprises with Fixnum, #hash, and #eql? Phillip Gawlowski <cmdjackryan@googlemail.com> - 2011-04-07 09:49 -0500
                    Re: Hash Surprises with Fixnum, #hash, and #eql? Robert Klemme <shortcutter@googlemail.com> - 2011-04-07 10:11 -0500
                      Re: Hash Surprises with Fixnum, #hash, and #eql? Phillip Gawlowski <cmdjackryan@googlemail.com> - 2011-04-07 10:31 -0500
                Re: Hash Surprises with Fixnum, #hash, and #eql? Brian Candler <b.candler@pobox.com> - 2011-04-07 11:06 -0500
                  Re: Hash Surprises with Fixnum, #hash, and #eql? Vincent Manis <vmanis@telus.net> - 2011-04-07 17:42 -0500
                  Re: Hash Surprises with Fixnum, #hash, and #eql? Robert Klemme <shortcutter@googlemail.com> - 2011-04-08 01:55 -0500
      Re: Hash Surprises with Fixnum, #hash, and #eql? Clifford Heath <no@spam.please.net> - 2011-04-08 17:29 +1000
        Re: Hash Surprises with Fixnum, #hash, and #eql? Robert Klemme <shortcutter@googlemail.com> - 2011-04-08 05:12 -0500
          Re: Hash Surprises with Fixnum, #hash, and #eql? Clifford Heath <no@spam.please.net> - 2011-04-09 11:21 +1000
            Re: Hash Surprises with Fixnum, #hash, and #eql? Charles Oliver Nutter <headius@headius.com> - 2011-04-10 19:02 -0500
              Re: Hash Surprises with Fixnum, #hash, and #eql? Clifford Heath <no@spam.please.net> - 2011-04-11 13:17 +1000
                Re: Hash Surprises with Fixnum, #hash, and #eql? Robert Klemme <shortcutter@googlemail.com> - 2011-04-12 04:09 -0500
                  Re: Hash Surprises with Fixnum, #hash, and #eql? Clifford Heath <no@spam.please.net> - 2011-04-13 12:02 +1000
                    Re: Hash Surprises with Fixnum, #hash, and #eql? Charles Oliver Nutter <headius@headius.com> - 2011-04-13 00:51 -0500
                      Re: Hash Surprises with Fixnum, #hash, and #eql? Clifford Heath <no@spam.please.net> - 2011-04-13 16:32 +1000
                        Re: Hash Surprises with Fixnum, #hash, and #eql? Charles Oliver Nutter <headius@headius.com> - 2011-04-13 07:23 -0500
                          Re: Hash Surprises with Fixnum, #hash, and #eql? Clifford Heath <no@spam.please.net> - 2011-04-14 07:38 +1000
                            Re: Hash Surprises with Fixnum, #hash, and #eql? Charles Oliver Nutter <headius@headius.com> - 2011-04-14 00:36 -0500
                              Re: Hash Surprises with Fixnum, #hash, and #eql? Clifford Heath <no@spam.please.net> - 2011-04-14 15:55 +1000
                                Re: Hash Surprises with Fixnum, #hash, and #eql? Charles Oliver Nutter <headius@headius.com> - 2011-04-19 17:41 -0500
                                  Re: Hash Surprises with Fixnum, #hash, and #eql? Clifford Heath <no@spam.please.net> - 2011-04-21 12:49 +1000
                                    Re: Hash Surprises with Fixnum, #hash, and #eql? Robert Klemme <shortcutter@googlemail.com> - 2011-04-21 06:28 -0500
                                      Re: Hash Surprises with Fixnum, #hash, and #eql? Clifford Heath <no@spam.please.net> - 2011-04-21 22:32 +1000
                                        Re: Hash Surprises with Fixnum, #hash, and #eql? Robert Klemme <shortcutter@googlemail.com> - 2011-04-21 08:51 -0500
                                        Re: Hash Surprises with Fixnum, #hash, and #eql? John W Higgins <wishdev@gmail.com> - 2011-04-21 11:11 -0500
                                      Re: Hash Surprises with Fixnum, #hash, and #eql? Charles Oliver Nutter <headius@headius.com> - 2011-04-27 00:33 -0500
                                        Re: Hash Surprises with Fixnum, #hash, and #eql? Robert Klemme <shortcutter@googlemail.com> - 2011-04-27 01:44 -0500
                                          Re: Hash Surprises with Fixnum, #hash, and #eql? Charles Oliver Nutter <headius@headius.com> - 2011-04-27 16:22 -0500
                                    Re: Hash Surprises with Fixnum, #hash, and #eql? Charles Oliver Nutter <headius@headius.com> - 2011-04-27 00:23 -0500
                                      Re: Hash Surprises with Fixnum, #hash, and #eql? Clifford Heath <no@spam.please.net> - 2011-04-30 17:43 +1000
                                        Re: Hash Surprises with Fixnum, #hash, and #eql? Charles Oliver Nutter <headius@headius.com> - 2011-05-01 23:01 -0500
                    Re: Hash Surprises with Fixnum, #hash, and #eql? Robert Klemme <shortcutter@googlemail.com> - 2011-04-13 02:08 -0500
                      Re: Hash Surprises with Fixnum, #hash, and #eql? Clifford Heath <no@spam.please.net> - 2011-04-14 07:36 +1000
        Re: Hash Surprises with Fixnum, #hash, and #eql? Xavier Noria <fxn@hashref.com> - 2011-04-08 19:00 -0500
          Re: Hash Surprises with Fixnum, #hash, and #eql? Clifford Heath <no@spam.please.net> - 2011-04-09 11:22 +1000
    Re: Hash Surprises with Fixnum, #hash, and #eql? Mazi Ayışığı <mazi.ayisigi@gmail.com> - 2011-04-07 17:16 -0700

Page 2 of 3 — ← Prev page 1 [2] 3 Next page →

#2724

From	Clifford Heath <no@spam.please.net>
Date	2011-04-13 12:02 +1000
Message-ID	<4da50433$0$3032$afc38c87@news.optusnet.com.au>
In reply to	#2673

On 04/12/11 19:09, Robert Klemme wrote:
> On Mon, Apr 11, 2011 at 5:20 AM, Clifford Heath<no@spam.please.net>  wrote:
>> On 04/11/11 10:02, Charles Oliver Nutter wrote:
> monkey patch Hash since you did already so for Fixnum.  On one hand
> you use Ruby's openness to change core classes to achieve what you
> want but on the other you seem to refuse to change another to make
> your change complete.  The only reason for this that I can detect is
> that you were surprised and your expectations were not met.  But now
> since you have learned otherwise what stops you from dealing with the
> situation in the pragmatic way that is so typical for Ruby?

I'm trying to express a logical programming paradigm, for users
of my API - for myself I'm happy to work around whatever I need
to.

>> Another thing that could be done to assist; make sure
>> that a Hash only ever calls eql? on objects in the hash,
>> not on lookup keys.
> But it also would not have any positive impact for all others plus
> that a change always brings a certain amount of risk of introducing
> errors.  Note though, that there might be situations where you want
> the exact opposite:

No. The reason a Hash uses eql? is to know whether it has found the
element which it's looking for *in the Hash*. It should only do that
comparison in one direction.

> You happen to need the way that is not possible right now.

On the contrary, I could live with it being either way.
The different interpreters do it both ways, or not at all.
That is the fact. That you've missed it makes me think you
*still* haven't actually run my code on the three interpreters
I mentioned. if that's true, it just underscores my suspicion
that you're engaging in defensive sophistry rather than rational
argument.

>> People don't do it because it doesn't work, not because
>> it wouldn't be useful.
> How do you know?

Because, as you say they deal with things in "the pragmatic way that
is so typical for Ruby". in other words, they're prepared to hack
around Ruby's hacks. Forgive me for the crime of suggesting that we
should improve things instead.

Clifford Heath.

[toc] | [prev] | [next] | [standalone]

#2730

From	Charles Oliver Nutter <headius@headius.com>
Date	2011-04-13 00:51 -0500
Message-ID	<BANLkTinKDHUDuatvb6ELR7+odUjPyEQZRg@mail.gmail.com>
In reply to	#2724

On Tue, Apr 12, 2011 at 9:05 PM, Clifford Heath <no@spam.please.net> wrote:
> On the contrary, I could live with it being either way.
> The different interpreters do it both ways, or not at all.
> That is the fact. That you've missed it makes me think you
> *still* haven't actually run my code on the three interpreters
> I mentioned. if that's true, it just underscores my suspicion
> that you're engaging in defensive sophistry rather than rational
> argument.

Perhaps I'm missing something. Should I be seeing different results than this?

https://gist.github.com/917031

- Charlie

[toc] | [prev] | [next] | [standalone]

#2733

From	Clifford Heath <no@spam.please.net>
Date	2011-04-13 16:32 +1000
Message-ID	<4da54384$0$13392$afc38c87@news.optusnet.com.au>
In reply to	#2730

On 04/13/11 15:51, Charles Oliver Nutter wrote:
> On Tue, Apr 12, 2011 at 9:05 PM, Clifford Heath<no@spam.please.net>  wrote:
>> On the contrary, I could live with it being either way.
>> The different interpreters do it both ways, or not at all.
>> That is the fact. That you've missed it makes me think you
>> *still* haven't actually run my code on the three interpreters
>> I mentioned. if that's true, it just underscores my suspicion
>> that you're engaging in defensive sophistry rather than rational
>> argument.
>
> Perhaps I'm missing something. Should I be seeing different results than this?
>
> https://gist.github.com/917031

That result is reasonable, using MRI. Interesting, and nice, that it
figures out it doesn't need to call Fixnum#hash to be able to choose
a bucket. (BTW, that *uncommon* case means there's a test and branch
which is superfluous and excessively costly for the more common case,
according to arguments you've made against detecting monkey patching)

If you do the same thing with JRuby however, the first test contains this:

Looking up using Integer:
nil

I.e. using Integer, JRuby doesn't call Fixnum#eql? the way MRI does.

Do it again using Rubinius, and you get this:

Looking up using Integer:
   <Fixnum 23>.hash => 47
nil

I.e. Rubinius calls Fixnum#hash, but not Fixnum#eql?

So both non-MRI interpreters fail, and in different ways.

Clifford Heath.

[toc] | [prev] | [next] | [standalone]

#2749

From	Charles Oliver Nutter <headius@headius.com>
Date	2011-04-13 07:23 -0500
Message-ID	<BANLkTimgdOCwVjPkm142FLNjhwEavJ+G7g@mail.gmail.com>
In reply to	#2733

On Wed, Apr 13, 2011 at 1:35 AM, Clifford Heath <no@spam.please.net> wrote:
> On 04/13/11 15:51, Charles Oliver Nutter wrote:
>> Perhaps I'm missing something. Should I be seeing different results than
>> this?
>>
>> https://gist.github.com/917031
>
> That result is reasonable, using MRI. Interesting, and nice, that it
> figures out it doesn't need to call Fixnum#hash to be able to choose
> a bucket. (BTW, that *uncommon* case means there's a test and branch
> which is superfluous and excessively costly for the more common case,
> according to arguments you've made against detecting monkey patching)
>
> If you do the same thing with JRuby however, the first test contains this:

This is JRuby 1.6.1.

- Charlie

[toc] | [prev] | [next] | [standalone]

#2791

From	Clifford Heath <no@spam.please.net>
Date	2011-04-14 07:38 +1000
Message-ID	<4da617c5$0$19225$afc38c87@news.optusnet.com.au>
In reply to	#2749

On 04/13/11 22:23, Charles Oliver Nutter wrote:
> On Wed, Apr 13, 2011 at 1:35 AM, Clifford Heath<no@spam.please.net>  wrote:
>> On 04/13/11 15:51, Charles Oliver Nutter wrote:
>>> Perhaps I'm missing something. Should I be seeing different results than
>>> this?
>>>
>>> https://gist.github.com/917031
>>
>> That result is reasonable, using MRI. Interesting, and nice, that it
>> figures out it doesn't need to call Fixnum#hash to be able to choose
>> a bucket. (BTW, that *uncommon* case means there's a test and branch
>> which is superfluous and excessively costly for the more common case,
>> according to arguments you've made against detecting monkey patching)
>>
>> If you do the same thing with JRuby however, the first test contains this:
>
> This is JRuby 1.6.1.

Yep... same as 1.6.0 and 1.5.6. What's you point?

[toc] | [prev] | [next] | [standalone]

#2823

From	Charles Oliver Nutter <headius@headius.com>
Date	2011-04-14 00:36 -0500
Message-ID	<BANLkTinw3vhDG0iqhfexj5WthWMyOGMZwg@mail.gmail.com>
In reply to	#2791

On Wed, Apr 13, 2011 at 4:40 PM, Clifford Heath <no@spam.please.net> wrote:
>> This is JRuby 1.6.1.
>
> Yep... same as 1.6.0 and 1.5.6. What's you point?

I'm confused...you made it sound as though JRuby results would be
different than the ones I posted. Am I missing something? Those
results *were* run with JRuby, and I was hoping you could tell me what
was missing...

- Charlie

[toc] | [prev] | [next] | [standalone]

#2826

From	Clifford Heath <no@spam.please.net>
Date	2011-04-14 15:55 +1000
Message-ID	<4da68c34$0$13393$afc38c87@news.optusnet.com.au>
In reply to	#2823

On 04/14/11 15:36, Charles Oliver Nutter wrote:
> On Wed, Apr 13, 2011 at 4:40 PM, Clifford Heath<no@spam.please.net>  wrote:
>>> This is JRuby 1.6.1.
>> Yep... same as 1.6.0 and 1.5.6. What's you point?
>
> I'm confused...you made it sound as though JRuby results would be
> different than the ones I posted. Am I missing something? Those
> results *were* run with JRuby, and I was hoping you could tell me what
> was missing...

I presented results from MRI 1.8.7, JRuby 1.6.0, and Rubinius,
and showed that they all had different shortcuts, and none
reliably kept the Hash contract (of using eql? and hash).
I.e. you can't rely on sensible code working the same in MRI
and JRuby.

I think that's a problem. If you don't, then I'm done...

Unless you care to point me to the place in the JRuby code
where this shortcut occurs (where I could make a change to
make it invisible), and a performance benchmark that would
show the effect of doing so. Then I'll happily make the
experiment to see whether I'm right (and the shortcut can
be made invisible without affecting performance measurably,
i.e. above the noise level of the benchmark). If I'm not,
I'll openly admit I was wrong... but I've done some pretty
hardcore optimizing in machine code before, and I think I
can win this one.

[toc] | [prev] | [next] | [standalone]

#3193

From	Charles Oliver Nutter <headius@headius.com>
Date	2011-04-19 17:41 -0500
Message-ID	<BANLkTik3Z+TQMVWsoDUX790DkOhV8NF4KA@mail.gmail.com>
In reply to	#2826

On Thu, Apr 14, 2011 at 12:55 AM, Clifford Heath <no@spam.please.net> wrote:
> I presented results from MRI 1.8.7, JRuby 1.6.0, and Rubinius,
> and showed that they all had different shortcuts, and none
> reliably kept the Hash contract (of using eql? and hash).
> I.e. you can't rely on sensible code working the same in MRI
> and JRuby.

I posted results on JRuby master (1.6.1). You responded to that email with:

"That result is reasonable, using MRI. Interesting, and nice, that it
figures out it doesn't need to call Fixnum#hash to be able to choose
a bucket. (BTW, that *uncommon* case means there's a test and branch
which is superfluous and excessively costly for the more common case,
according to arguments you've made against detecting monkey patching)

If you do the same thing with JRuby however, the first test contains this:

Looking up using Integer:
nil"

I'm not sure you actually looked at my results.

> I think that's a problem. If you don't, then I'm done...

It may be a problem, or it may not. When running your example,
however, it seems to call your monkeypatched code in many places where
you claim it doesn't. So I'm still confused. I know we don't call
hash/eql? in all cases, but I'm trying to quantify what the correct
behavior should be and what that behavior would cost.

I'd be happy to continue discussing this as a JRuby issue. Would you
file something at http://bugs.jruby.org with expected and actual JRuby
1.6.1 results?

> Unless you care to point me to the place in the JRuby code
> where this shortcut occurs (where I could make a change to
> make it invisible), and a performance benchmark that would
> show the effect of doing so. Then I'll happily make the
> experiment to see whether I'm right (and the shortcut can
> be made invisible without affecting performance measurably,
> i.e. above the noise level of the benchmark). If I'm not,
> I'll openly admit I was wrong... but I've done some pretty
> hardcore optimizing in machine code before, and I think I
> can win this one.

I think you are underestimating the cost of performing a dynamic call.
Even in an optimizing VM (like JRuby/JVM) there's a much higher cost
for a dynamic call to "hash" than to just check that it's a Fixnum and
branch to custom logic. *Way* higher cost.

Here's stock JRuby 1.6.1, which isn't dispatching to "hash" for Fixnums:

~/projects/jruby ➔ jruby --server -rbenchmark -e "5.times { h = {};
h[1000] = 1000; puts Benchmark.measure { 10_000_000.times { h[1000] }
} }"
  0.908000   0.000000   0.908000 (  0.859000)
  0.623000   0.000000   0.623000 (  0.622000)
  0.699000   0.000000   0.699000 (  0.699000)
  0.747000   0.000000   0.747000 (  0.747000)
  0.753000   0.000000   0.753000 (  0.753000)

Here's the same benchmark, dispatching to "hash" through a per-class
cache (faster than typical call-site caching, roughly on par with
inlined calls):

~/projects/jruby ➔ jruby --server -rbenchmark -e "5.times { h = {};
h[1000] = 1000; puts Benchmark.measure { 10_000_000.times { h[1000] }
} }"
  1.634000   0.000000   1.634000 (  1.580000)
  1.297000   0.000000   1.297000 (  1.297000)
  1.356000   0.000000   1.356000 (  1.355000)
  1.334000   0.000000   1.334000 (  1.334000)
  1.343000   0.000000   1.343000 (  1.344000)

Now using an even faster check, that only dispatches to "hash" if the
object is a Fixnum and the Fixnum class has not been reopened. Notice
it's faster than full dyncall, but still a good bit slower than the
fast path:

~/projects/jruby ➔ jruby --server -rbenchmark -e "5.times { h = {};
h[1000] = 1000; puts Benchmark.measure { 10_000_000.times { h[1000] }
} }"
  1.057000   0.000000   1.057000 (  1.014000)
  0.977000   0.000000   0.977000 (  0.976000)
  0.885000   0.000000   0.885000 (  0.885000)
  0.903000   0.000000   0.903000 (  0.903000)
  0.871000   0.000000   0.871000 (  0.871000)

And 1.9.2 to compare:

~/projects/jruby ➔ ruby1.9 -rbenchmark -e "5.times { h = {}; h[1000] =
1000; puts Benchmark.measure { 10_000_000.times { h[1000] } } }"
  1.270000   0.010000   1.280000 (  1.313950)
  1.270000   0.010000   1.280000 (  1.307163)
  1.270000   0.000000   1.270000 (  1.295588)
  1.260000   0.010000   1.270000 (  1.285083)
  1.260000   0.010000   1.270000 (  1.307108)

Bottom line is that *any* additional branching logic will add
overhead, and full dynamic calling introduces even more overhead on
just about any implementation. Whether that's a fair trade-off is not
for me to decide ;)

- Charlie

[toc] | [prev] | [next] | [standalone]

#3284

From	Clifford Heath <no@spam.please.net>
Date	2011-04-21 12:49 +1000
Message-ID	<4daf9b56$0$3035$afc38c87@news.optusnet.com.au>
In reply to	#3193

Charles,

Thanks for persisting with me. I'm not deliberately being unreasonable :)
I think there are real issues around this case and others like them,
regarding what we expect from Ruby's standard behaviour.

It turns out that I've worked around the original issue, which occurred
in my forthcoming activefacts-api gem (forthcoming because although it's
old code, its about to get to 1.0 and get publicly announced). The
workaround is mostly successful in hiding the problem(s).

But anyhow...

On 04/20/11 08:41, Charles Oliver Nutter wrote:
> On Thu, Apr 14, 2011 at 12:55 AM, Clifford Heath<no@spam.please.net>  wrote:
>> I presented results from MRI 1.8.7, JRuby 1.6.0, and Rubinius,
>> and showed that they all had different shortcuts, and none
>> reliably kept the Hash contract (of using eql? and hash).
>> I.e. you can't rely on sensible code working the same in MRI
>> and JRuby.
>
> I posted results on JRuby master (1.6.1). You responded to that email with:
>
> "That result is reasonable, using MRI. Interesting, and nice, that it
> figures out it doesn't need to call Fixnum#hash to be able to choose
> a bucket. (BTW, that *uncommon* case means there's a test and branch
> which is superfluous and excessively costly for the more common case,
> according to arguments you've made against detecting monkey patching)

I'm sorry for this comment, it's wrong. I must have been confused.
MRI doesn't call Fixnum#hash in any case, but uses a shortcut.

> If you do the same thing with JRuby however, the first test contains this:
> Looking up using Integer:
> nil"
> I'm not sure you actually looked at my results.
>> I think that's a problem. If you don't, then I'm done...
> It may be a problem, or it may not. When running your example,
> however, it seems to call your monkeypatched code in many places where
> you claim it doesn't. So I'm still confused.

I think that will have to go down to communication error.  Let it ride.

> I know we don't call
> hash/eql? in all cases, but I'm trying to quantify what the correct
> behavior should be and what that behavior would cost.

The correct behavior would result in that last "Looking up using Integer:"
to find the value, as MRI does. MRI doesn't shortcut Fixnum#eql? in the
hash lookup, and my Number hash function is defined in terms of Integer#hash,
so it works. It would be better if MRI knew it shouldn't shortcut Fixnum#hash,
but I can live without that as long it's clear.

> I'd be happy to continue discussing this as a JRuby issue. Would you
> file something at http://bugs.jruby.org with expected and actual JRuby
> 1.6.1 results?

Ok, perhaps. I think it's a Ruby issue though, not just a JRuby one.

>> Unless you care to point me to the place in the JRuby code
>> where this shortcut occurs (where I could make a change to
>> make it invisible), and a performance benchmark that would
>> show the effect of doing so. Then I'll happily make the
>> experiment to see whether I'm right (and the shortcut can
>> be made invisible without affecting performance measurably,
>> i.e. above the noise level of the benchmark). If I'm not,
>> I'll openly admit I was wrong... but I've done some pretty
>> hardcore optimizing in machine code before, and I think I
>> can win this one.
>
> I think you are underestimating the cost of performing a dynamic call.

I'm not. I'm expecting that JRuby would detect that a core Fixnum
method has been monkey-patched, and set a global variable. If  the
variable is set (an inline check, susceptible to branch-prediction)
then it would default to conservative behaviour by calling dispatch,
otherwise continue with the shortcuts as normal.

I think this is what you mean when you say:

> Now using an even faster check, that only dispatches to "hash" if the
> object is a Fixnum and the Fixnum class has not been reopened. Notice
> it's faster than full dyncall, but still a good bit slower than the
> fast path:

If I read the result correctly, that's 0.871/0.753 or 15.6% slower,
on a test that does nothing but Fixnum hash lookups in a hash with
a single element.

What I'd ask is, what's the percentage of time in actual programs
typically spent doing Fixnum hash lookups? I'd wager it's less than
1%, meaning your 16% is now 0.16% - and it makes the behaviour
incorrect per the Ruby spec. Personally, I'd take the hit.

The cost of the inline check of this variable is what I was implying
would vanish into the dust in performance tests, as the branch
prediction figures out that "we always go this way". Is your
implementation of this check implemented to make best advantage of
branch prediction? Frankly I'm rather amazed that it's as big as 18%
merely to decide to use the shortcut. Does that %age drop when using
a hash with more than one entry? It seems like a benchmark that was
created to magnify the change, not to realistically demonstrate the
effect.

> Bottom line is that *any* additional branching logic will add
> overhead, and full dynamic calling introduces even more overhead on
> just about any implementation. Whether that's a fair trade-off is not
> for me to decide ;)

Well, yes and no. If the effect is 0.16% in typical apps, you'd be
well within your right to maintain it should work like MRI and/or
the documentation.

Do you still think this needs a JRuby bug report?

Clifford Heath.

[toc] | [prev] | [next] | [standalone]

#3310

From	Robert Klemme <shortcutter@googlemail.com>
Date	2011-04-21 06:28 -0500
Message-ID	<BANLkTimxZGo48NVU3UPmrcqW+qVO9VqR1w@mail.gmail.com>
In reply to	#3284

On Thu, Apr 21, 2011 at 4:53 AM, Clifford Heath <no@spam.please.net> wrote:

> On 04/20/11 08:41, Charles Oliver Nutter wrote:

>> I think you are underestimating the cost of performing a dynamic call.
>
> I'm not. I'm expecting that JRuby would detect that a core Fixnum
> method has been monkey-patched, and set a global variable. If  the
> variable is set (an inline check, susceptible to branch-prediction)
> then it would default to conservative behaviour by calling dispatch,
> otherwise continue with the shortcuts as normal.

> The cost of the inline check of this variable is what I was implying
> would vanish into the dust in performance tests, as the branch
> prediction figures out that "we always go this way".

Please keep in mind that in a multithreaded environment there is
synchronization overhead.  A solution could use an AtomicBoolean
stored somewhere as static final.  Now all threads that need to make
the decision need to go through this.  Even if it is "only" volatile
semantics (and not synchronized) and allows for concurrent reads there
is a price to pay.  Using a ThreadLocal which is initialized during
thread construction or lazily would reduce synchronization overhead at
the risk of the flag value becoming outdated - an issue which becomes
worse with thread lifetime.  Applications which use a thread pool
could suffer.

But I agree, the effect vastly depends on the frequency of Hash
accesses with a Fixnum key.  Unfortunately I guess nobody has figures
about this - and even if, those will probably largely vary with type
of application.

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]

#3312

From	Clifford Heath <no@spam.please.net>
Date	2011-04-21 22:32 +1000
Message-ID	<4db023d4$0$13394$afc38c87@news.optusnet.com.au>
In reply to	#3310

On 04/21/11 21:28, Robert Klemme wrote:
> Please keep in mind that in a multithreaded environment there is
> synchronization overhead.  A solution could use an AtomicBoolean

Oh get real. This is a single variable which may, during the course of
a single execution, *once* change from false to true. In so doing, it
enables a slightly more conservative approach to compatibility for one
small side-effect of a shortcut, which probably doesn't even matter to
the application, and which is almost certainly set by the one thread
that cares about that side effect. It *so* doesn't need to be synchronised.

[toc] | [prev] | [next] | [standalone]

#3314

From	Robert Klemme <shortcutter@googlemail.com>
Date	2011-04-21 08:51 -0500
Message-ID	<BANLkTi=S1UAt4Gvk+R9bzQHOyAKWUidCWg@mail.gmail.com>
In reply to	#3312

On Thu, Apr 21, 2011 at 2:36 PM, Clifford Heath <no@spam.please.net> wrote:
> On 04/21/11 21:28, Robert Klemme wrote:
>>
>> Please keep in mind that in a multithreaded environment there is
>> synchronization overhead.  A solution could use an AtomicBoolean
>
> Oh get real. This is a single variable which may, during the course of
> a single execution, *once* change from false to true.

In multithreaded applications the change frequency is not important.
What's important is the access frequency (read and write) because even
if memory does not change access needs to be guarded by proper
synchronization means for an application to work correctly in light of
multiple threads.

> In so doing, it
> enables a slightly more conservative approach to compatibility for one
> small side-effect of a shortcut, which probably doesn't even matter to
> the application, and which is almost certainly set by the one thread
> that cares about that side effect. It *so* doesn't need to be synchronised.

As I said, it does not need to be guarded by "synchronized".  But it
needs to be at least volatile for multithreaded applications to work
properly - otherwise you might see odd effects.  For each *access*
(that is write *and* read) there needs to be a memory barrier which
essentially means that modified thread local memory must be published
so other threads can see it.  Since many pages might be modified and
there is NUMA this can actually have a measurable cost.

The JVM memory model is a fascinating topic but significantly more
complex than one might be inclined to believe at first.

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]

#3318

From	John W Higgins <wishdev@gmail.com>
Date	2011-04-21 11:11 -0500
Message-ID	<BANLkTikZAien8QEJBLu6FFPP7HsS-UrPyQ@mail.gmail.com>
In reply to	#3312

[Note:  parts of this message were removed to make it a legal post.]

Good Morning,

On Thu, Apr 21, 2011 at 5:36 AM, Clifford Heath <no@spam.please.net> wrote:

> On 04/21/11 21:28, Robert Klemme wrote:
>
>> Please keep in mind that in a multithreaded environment there is
>> synchronization overhead.  A solution could use an AtomicBoolean
>>
>
> Oh get real. This is a single variable which may, during the course of
> a single execution, *once* change from false to true. In so doing, it
> enables a slightly more conservative approach to compatibility for one
> small side-effect of a shortcut, which probably doesn't even matter to
> the application, and which is almost certainly set by the one thread
> that cares about that side effect. It *so* doesn't need to be synchronised.
>
>
I'm sorry but the old saying "people in glass houses shouldn't throw stones"
really comes to mind here. This entire thread has been a rant against
implementation semantics being "inconsistent" in 0.001% of applications (if
that) and then you come back with - oh we can cheat here because your fix
would "almost certainly set by the one thread that care about that side
effect".

You don't get to cut a corner in providing a solution to a problem you
believe is cutting a corner. You are asking every one else to take a
performance hit (however small) to fix a problem that you won't even fix
properly? That doesn't seem appropriate at all.

John

[toc] | [prev] | [next] | [standalone]

#3544

From	Charles Oliver Nutter <headius@headius.com>
Date	2011-04-27 00:33 -0500
Message-ID	<BANLkTimLJ-3YERtGAr-7RdMeDtnQ2RC6zw@mail.gmail.com>
In reply to	#3310

On Thu, Apr 21, 2011 at 6:28 AM, Robert Klemme
<shortcutter@googlemail.com> wrote:
> Please keep in mind that in a multithreaded environment there is
> synchronization overhead.  A solution could use an AtomicBoolean
> stored somewhere as static final.  Now all threads that need to make
> the decision need to go through this.  Even if it is "only" volatile
> semantics (and not synchronized) and allows for concurrent reads there
> is a price to pay.  Using a ThreadLocal which is initialized during
> thread construction or lazily would reduce synchronization overhead at
> the risk of the flag value becoming outdated - an issue which becomes
> worse with thread lifetime.  Applications which use a thread pool
> could suffer.

In this case, I'm not using a synchronized, atomic, *or* boolean
field. Because of the rarity of Fixnum and Float modification and the
potential for heavy perf impact, I'm considering redefinition of
methods in one thread while another thread is calling those methods as
somewhat undefined, at least for Fixnum and Float. That's not perfect
(JVM could optimize such that one thread's modifications never are
seen by another thread), but it's closer.

It's also worth pointing out that usually modifications to Fixnum or
Float are done for DSL purposes, where there's less likelihood of
heavy threading effects.

You're right, though...if I made that field volatile (it doesn't need
to be Atomic, since I only ever read *or* write, never both), the perf
impact would be higher.

> But I agree, the effect vastly depends on the frequency of Hash
> accesses with a Fixnum key.  Unfortunately I guess nobody has figures
> about this - and even if, those will probably largely vary with type
> of application.

I operate at too low a level to see the 10000-foot view of application
performance. In other words, I spend my time optimizing individual
core methods, individual Ruby language features, and runtime-level
operations like calls and constant lookup...rather than really looking
at full app performance. Once you get to the scale of a real app, the
performance bottlenecks from badly-written code, slow IO, excessive
object creation, slow libraries and other userland issues almost
always trump runtime-level speed. As an example, I point at the fact
that Ruby 1.9 is almost always much faster than Ruby 1.8, but Rails
under Ruby 1.9 is only marginally faster than Rails on Ruby 1.8 (or so
I've seen when people try to measure it).

The benefit of a faster and faster runtime is often outweighed by
writing better Ruby code in the first place. But I don't live in the
application world...I work on JRuby low-level performance. You have to
do the rest :)

- Charlie

[toc] | [prev] | [next] | [standalone]

#3551

From	Robert Klemme <shortcutter@googlemail.com>
Date	2011-04-27 01:44 -0500
Message-ID	<BANLkTin_8Mk+tpJJXx=1LbF0_11N_A4suA@mail.gmail.com>
In reply to	#3544

On Wed, Apr 27, 2011 at 7:33 AM, Charles Oliver Nutter
<headius@headius.com> wrote:
> On Thu, Apr 21, 2011 at 6:28 AM, Robert Klemme
> <shortcutter@googlemail.com> wrote:
>> Please keep in mind that in a multithreaded environment there is
>> synchronization overhead.  A solution could use an AtomicBoolean
>> stored somewhere as static final.  Now all threads that need to make
>> the decision need to go through this.  Even if it is "only" volatile
>> semantics (and not synchronized) and allows for concurrent reads there
>> is a price to pay.  Using a ThreadLocal which is initialized during
>> thread construction or lazily would reduce synchronization overhead at
>> the risk of the flag value becoming outdated - an issue which becomes
>> worse with thread lifetime.  Applications which use a thread pool
>> could suffer.
>
> In this case, I'm not using a synchronized, atomic, *or* boolean
> field. Because of the rarity of Fixnum and Float modification and the
> potential for heavy perf impact, I'm considering redefinition of
> methods in one thread while another thread is calling those methods as
> somewhat undefined, at least for Fixnum and Float. That's not perfect
> (JVM could optimize such that one thread's modifications never are
> seen by another thread), but it's closer.

Is redefinition per thread a standard JRuby feature or is this
something you would add?  If it was something that would need adding I
wouldn't bother to do it.  That sounds like a major change.

> It's also worth pointing out that usually modifications to Fixnum or
> Float are done for DSL purposes, where there's less likelihood of
> heavy threading effects.
>
> You're right, though...if I made that field volatile (it doesn't need
> to be Atomic, since I only ever read *or* write, never both), the perf
> impact would be higher.

Right you are.  That occurred to me after posting as well but I didn't
bother to correct myself as the memory effects are identical.  The
only difference is one more dereferencing (which can make a difference
as you pointed out).

>> But I agree, the effect vastly depends on the frequency of Hash
>> accesses with a Fixnum key.  Unfortunately I guess nobody has figures
>> about this - and even if, those will probably largely vary with type
>> of application.
>
> I operate at too low a level to see the 10000-foot view of application
> performance. In other words, I spend my time optimizing individual
> core methods, individual Ruby language features, and runtime-level
> operations like calls and constant lookup...rather than really looking
> at full app performance. Once you get to the scale of a real app, the
> performance bottlenecks from badly-written code, slow IO, excessive
> object creation, slow libraries and other userland issues almost
> always trump runtime-level speed. As an example, I point at the fact
> that Ruby 1.9 is almost always much faster than Ruby 1.8, but Rails
> under Ruby 1.9 is only marginally faster than Rails on Ruby 1.8 (or so
> I've seen when people try to measure it).

That exactly demonstrates your point: Rails will spend most of its
time doing IO (to and from the database, to and from network clients).
 OTOH the small difference shows that Rails code cannot be awfully
written because otherwise you would likely notice a bigger difference.
:-)

> The benefit of a faster and faster runtime is often outweighed by
> writing better Ruby code in the first place. But I don't live in the
> application world...I work on JRuby low-level performance. You have to
> do the rest :)

Will do. :-)  I'd also love to lend JRuby a hand but unfortunately I
can't find the time right now.

Cheers

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]

#3582

From	Charles Oliver Nutter <headius@headius.com>
Date	2011-04-27 16:22 -0500
Message-ID	<BANLkTi=KOb9FLUwYkmSOA_yGu_XcN6aNsg@mail.gmail.com>
In reply to	#3551

On Wed, Apr 27, 2011 at 1:44 AM, Robert Klemme
<shortcutter@googlemail.com> wrote:
> Is redefinition per thread a standard JRuby feature or is this
> something you would add?  If it was something that would need adding I
> wouldn't bother to do it.  That sounds like a major change.

It's something I've thought about, a la Groovy's "categories"
(thread-downstream selector namespacing, basically), but no, there's
no support for this currently in JRuby (and even in Groovy it is an
oft-maligned feature, since it imposes a perf penalty even if you
aren't using it since you have to constantly check a thread-local for
an installed category.)

> Will do. :-)  I'd also love to lend JRuby a hand but unfortunately I
> can't find the time right now.

Well, you can always help by submitting docs, editing wiki, answering
JRuby ML questions, answering StackOverflow questions, submitting
JRuby talks to conferences, presenting at user groups and
clubs...perhaps there's something that fits your schedule better :)

- Charlie

[toc] | [prev] | [next] | [standalone]

#3543

From	Charles Oliver Nutter <headius@headius.com>
Date	2011-04-27 00:23 -0500
Message-ID	<BANLkTinGv0eF6Br409j6PmRPY=q3NnH6yg@mail.gmail.com>
In reply to	#3284

On Wed, Apr 20, 2011 at 9:53 PM, Clifford Heath <no@spam.please.net> wrote:
> On 04/20/11 08:41, Charles Oliver Nutter wrote:
>> I think you are underestimating the cost of performing a dynamic call.
>
> I'm not. I'm expecting that JRuby would detect that a core Fixnum
> method has been monkey-patched, and set a global variable. If  the
> variable is set (an inline check, susceptible to branch-prediction)
> then it would default to conservative behaviour by calling dispatch,
> otherwise continue with the shortcuts as normal.
>
> I think this is what you mean when you say:
>
>> Now using an even faster check, that only dispatches to "hash" if the
>> object is a Fixnum and the Fixnum class has not been reopened. Notice
>> it's faster than full dyncall, but still a good bit slower than the
>> fast path:
>
> If I read the result correctly, that's 0.871/0.753 or 15.6% slower,
> on a test that does nothing but Fixnum hash lookups in a hash with
> a single element.
>
> What I'd ask is, what's the percentage of time in actual programs
> typically spent doing Fixnum hash lookups? I'd wager it's less than
> 1%, meaning your 16% is now 0.16% - and it makes the behaviour
> incorrect per the Ruby spec. Personally, I'd take the hit.
>
> The cost of the inline check of this variable is what I was implying
> would vanish into the dust in performance tests, as the branch
> prediction figures out that "we always go this way". Is your
> implementation of this check implemented to make best advantage of
> branch prediction? Frankly I'm rather amazed that it's as big as 18%
> merely to decide to use the shortcut. Does that %age drop when using
> a hash with more than one entry? It seems like a benchmark that was
> created to magnify the change, not to realistically demonstrate the
> effect.

Well, it has to get the value from somewhere. In JRuby, the
Fixnum/Float checks are each a field on org.jruby.Ruby (the JRuby
"runtime" object in play), which is accessible via a field on
org.jruby.runtime.ThreadContext (passed on the call stack to most
methods) or via the current object's metaclass, an org.jruby.RubyClass
object with a "runtime" field. So at a best, it's two field
dereferences and one (inlined) virtual method invocation, and at worst
it's three field references and two (probably inlined) virtual method
invocations. Perhaps that defeats branch prediction, since we're at
least two memory hops away from the value we're checking.

Branch prediction works great when the data is in a register. It
doesn't do as well when there's memory dereferences in the way.

>> Bottom line is that *any* additional branching logic will add
>> overhead, and full dynamic calling introduces even more overhead on
>> just about any implementation. Whether that's a fair trade-off is not
>> for me to decide ;)
>
> Well, yes and no. If the effect is 0.16% in typical apps, you'd be
> well within your right to maintain it should work like MRI and/or
> the documentation.
>
> Do you still think this needs a JRuby bug report?

In general we have followed the path of matching MRI in such cases,
except when there's a very strong argument not to do so. If you can
convince Ruby core, or if you think you have a strong enough case for
deviating from MRI's behavior, go ahead and file the bug and we can
continue exploring it there.

- Charlie

[toc] | [prev] | [next] | [standalone]

#3603

From	Clifford Heath <no@spam.please.net>
Date	2011-04-30 17:43 +1000
Message-ID	<4dbbbdaf$0$15070$afc38c87@news.optusnet.com.au>
In reply to	#3543

On 04/27/11 15:23, Charles Oliver Nutter wrote:
> On Wed, Apr 20, 2011 at 9:53 PM, Clifford Heath<no@spam.please.net>  wrote:
>> I'm not. I'm expecting that JRuby would detect that a core Fixnum
>> method has been monkey-patched, and set a global variable.
>> ...
> Well, it has to get the value from somewhere. ... it's two field
> dereferences and one (inlined) virtual method invocation

Ouch. No wonder it hurts. I hadn't looked into the internals of JRuby,
but I assumed that you had some native C (JNI or whatever) in there,
which could make this feasible.

I can totally understand you not wanting any native extensions though!

It seems that a collection of such flags at known offsets inside a
singleton instance could make this a lot quicker. There's a finite
need for such things, so it's not as though it would pervade all of
the interpreter.

Your argument (from a previous response) that cross-thread effects of
monkey-patching was "somewhat undefined" was my thinking also, but
contrary to John's accusation, thought it could be (mostly?) hidden
under the existing synchronisation around method definition.

> In general we have followed the path of matching MRI in such cases,
> except when there's a very strong argument not to do so. If you can
> convince Ruby core, or if you think you have a strong enough case for
> deviating from MRI's behavior, go ahead and file the bug and we can
> continue exploring it there.

I think the argument that "folk don't do it, therefore they wouldn't"
isn't a strong one. Look for example at Rail's HashWithIndifferentAccess,
which makes string and symbol keys interchangeable. I merely want to
do the same thing with Fixnums and Floats.

If you can't make it quicker (than you outlined), best to drop it I
guess. I've mostly worked around the need for my current library.

Clifford Heath.

[toc] | [prev] | [next] | [standalone]

#3820

From	Charles Oliver Nutter <headius@headius.com>
Date	2011-05-01 23:01 -0500
Message-ID	<BANLkTin1r1tchcetTXj9Lb5f6PRZEk5hVw@mail.gmail.com>
In reply to	#3603

On Sat, Apr 30, 2011 at 2:45 AM, Clifford Heath <no@spam.please.net> wrote:
> Ouch. No wonder it hurts. I hadn't looked into the internals of JRuby,
> but I assumed that you had some native C (JNI or whatever) in there,
> which could make this feasible.
>
> I can totally understand you not wanting any native extensions though!

None of the JRuby runtime is native code, and obviously that's how
we'd like to keep it. Even if we were to do it in C, we would still
need to do some object-walking, since JRuby supports having multiple
JRuby instances in the same process (that's how a single JRuby process
can serve many different applications at the same time).

> It seems that a collection of such flags at known offsets inside a
> singleton instance could make this a lot quicker. There's a finite
> need for such things, so it's not as though it would pervade all of
> the interpreter.

Such a singleton would still need to be rooted to a specific JRuby
instance. The logic as it stands now is pretty much a
JRuby-instance-global flag.

> Your argument (from a previous response) that cross-thread effects of
> monkey-patching was "somewhat undefined" was my thinking also, but
> contrary to John's accusation, thought it could be (mostly?) hidden
> under the existing synchronisation around method definition.

The JVM memory model allows for the JVM to optimize non-volatile
memory accesses away if it can prove the memory location is never
modified by the same thread. Only when specifing volatility will it
guarantee the memory access is always performed with full CPU cache
semantics.

> I think the argument that "folk don't do it, therefore they wouldn't"
> isn't a strong one. Look for example at Rail's HashWithIndifferentAccess,
> which makes string and symbol keys interchangeable. I merely want to
> do the same thing with Fixnums and Floats.

Certainly...but at the moment no implementations agree on what is
correct. Your vision might be correct, but until it's "standard" we
would probably not implement it.

> If you can't make it quicker (than you outlined), best to drop it I
> guess. I've mostly worked around the need for my current library.

It's probably possible to reduce the cost under Java 7, which includes
capabilities to have nearly guard-free dynamic invocation. But there's
no way I know of to make this completely cost-free under Java 6, which
we still support.

- Charlie

[toc] | [prev] | [next] | [standalone]

#2736

From	Robert Klemme <shortcutter@googlemail.com>
Date	2011-04-13 02:08 -0500
Message-ID	<BANLkTi=sCMWYdncsVNzPa712wnZObTx1-g@mail.gmail.com>
In reply to	#2724

On Wed, Apr 13, 2011 at 4:05 AM, Clifford Heath <no@spam.please.net> wrote:
> On 04/12/11 19:09, Robert Klemme wrote:
>>
>> On Mon, Apr 11, 2011 at 5:20 AM, Clifford Heath<no@spam.please.net>
>>  wrote:

>> You happen to need the way that is not possible right now.
>
> On the contrary, I could live with it being either way.
> The different interpreters do it both ways, or not at all.
> That is the fact. That you've missed it makes me think you
> *still* haven't actually run my code on the three interpreters
> I mentioned. if that's true, it just underscores my suspicion
> that you're engaging in defensive sophistry rather than rational
> argument.

I have run the code and I haven't missed the fact.  Right, that wasn't
obvious from my statement above.  I qualify the behavior you are
seeing as "undefined" (across all Ruby implementations) so it doesn't
matter to me in which ways the test fails on different platforms.

>>> People don't do it because it doesn't work, not because
>>> it wouldn't be useful.
>>
>> How do you know?
>
> Because, as you say they deal with things in "the pragmatic way that
> is so typical for Ruby". in other words, they're prepared to hack
> around Ruby's hacks.

I asked how you know that there is a significant number of cases where
people stumble into this and what you basically say is "I know it
because people pragmatically work around this and do not mention it".
This could not be further away from anything like evidence or even a
hint.

> Forgive me for the crime of suggesting that we
> should improve things instead.

I'm all for improvements.  But that's the exact point where we differ:
in my opinion there is no overall improvement - as Charlie laid out
nicely: there may be an improvement for a rare case but many other,
more regular cases suffer.

Regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]

Page 2 of 3 — ← Prev page 1 [2] 3 Next page →

csiph-web

Hash Surprises with Fixnum, #hash, and #eql?

Contents

#2724

#2730

#2733

#2749

#2791

#2823

#2826

#3193

#3284

#3310

#3312

#3314

#3318

#3544

#3551

#3582

#3543

#3603

#3820

#2736