Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #21962 > unrolled thread

String.substring in JDK 1.7.0_6+

Started byjlp <jlp@jlp.com>
First post2013-02-01 19:33 +0100
Last post2013-02-03 15:09 +0100
Articles 14 — 5 participants

Back to article view | Back to comp.lang.java.programmer


Contents

  String.substring in JDK 1.7.0_6+ jlp <jlp@jlp.com> - 2013-02-01 19:33 +0100
    Re: String.substring in JDK 1.7.0_6+ markspace <markspace@nospam.nospam> - 2013-02-01 10:38 -0800
      Re: String.substring in JDK 1.7.0_6+ jlp <jlp@jlp.com> - 2013-02-01 19:42 +0100
        Re: String.substring in JDK 1.7.0_6+ markspace <markspace@nospam.nospam> - 2013-02-01 10:45 -0800
          Re: String.substring in JDK 1.7.0_6+ jlp <jlp@jlp.com> - 2013-02-01 19:57 +0100
            Re: String.substring in JDK 1.7.0_6+ markspace <markspace@nospam.nospam> - 2013-02-01 11:20 -0800
    Re: String.substring in JDK 1.7.0_6+ Jan Burse <janburse@fastmail.fm> - 2013-02-01 20:34 +0100
    Re: String.substring in JDK 1.7.0_6+ Kevin McMurtrie <mcmurtrie@pixelmemory.us> - 2013-02-01 20:58 -0800
      Re: String.substring in JDK 1.7.0_6+ markspace <markspace@nospam.nospam> - 2013-02-01 22:55 -0800
        Re: String.substring in JDK 1.7.0_6+ Kevin McMurtrie <mcmurtrie@pixelmemory.us> - 2013-02-02 08:43 -0800
          Re: String.substring in JDK 1.7.0_6+ markspace <markspace@nospam.nospam> - 2013-02-02 10:56 -0800
            Re: String.substring in JDK 1.7.0_6+ Kevin McMurtrie <mcmurtrie@pixelmemory.us> - 2013-02-02 14:46 -0800
              Re: String.substring in JDK 1.7.0_6+ markspace <markspace@nospam.nospam> - 2013-02-02 15:31 -0800
              Re: String.substring in JDK 1.7.0_6+ Robert Klemme <shortcutter@googlemail.com> - 2013-02-03 15:09 +0100

#21962 — String.substring in JDK 1.7.0_6+

Fromjlp <jlp@jlp.com>
Date2013-02-01 19:33 +0100
SubjectString.substring in JDK 1.7.0_6+
Message-ID<510c0a6a$0$8985$ba4acef3@reader.news.orange.fr>
The String class was modified in JDK 1.7.0_6.
String.substring that was 0(1)  before JDK 1.7.0_6, now becomes O(n)

All is well explained at :
http://java-performance.info/changes-to-string-java-1-7-0_06/

I wrote a small test:
https://gist.github.com/4692960

java -Xms128M -Xmx128M teststring.Main 100000 1000000

On my desktop:
jdk 1.7.0_11 => 33 seconds / 252 KBytes Memory
jdk 1.6.0_38 => 25 milliseconds / 782 KBytes Memory
more than 1000 times faster ! ( Ok! for this stupid test ;-) )

I don't think it is a good improvement ! Uses less memory, but you 
retrieve it, when the object is garbaged
What do you think about this ?

-- 
Cordialement
Jean-Louis Pasturel

[toc] | [next] | [standalone]


#21963

Frommarkspace <markspace@nospam.nospam>
Date2013-02-01 10:38 -0800
Message-ID<keh22t$de$1@dont-email.me>
In reply to#21962
On 2/1/2013 10:33 AM, jlp wrote:
> What do you think about this ?
>

I think micro-benchmarks don't work.

[toc] | [prev] | [next] | [standalone]


#21964

Fromjlp <jlp@jlp.com>
Date2013-02-01 19:42 +0100
Message-ID<510c0c7c$0$1368$ba4acef3@reader.news.orange.fr>
In reply to#21963
Le 01/02/2013 19:38, markspace a écrit :
> On 2/1/2013 10:33 AM, jlp wrote:
>> What do you think about this ?
>>
>
> I think micro-benchmarks don't work.
>
>

What is wrong in this test ?

-- 
Cordialement
Jean-Louis Pasturel

[toc] | [prev] | [next] | [standalone]


#21965

Frommarkspace <markspace@nospam.nospam>
Date2013-02-01 10:45 -0800
Message-ID<keh2g5$3am$1@dont-email.me>
In reply to#21964
On 2/1/2013 10:42 AM, jlp wrote:
> Le 01/02/2013 19:38, markspace a écrit :
>> On 2/1/2013 10:33 AM, jlp wrote:
>>> What do you think about this ?
>>>
>>
>> I think micro-benchmarks don't work.
>>
>>
>
> What is wrong in this test ?
>


It's a micro-benchmark.

[toc] | [prev] | [next] | [standalone]


#21966

Fromjlp <jlp@jlp.com>
Date2013-02-01 19:57 +0100
Message-ID<510c1023$0$9019$ba4acef3@reader.news.orange.fr>
In reply to#21965
Le 01/02/2013 19:45, markspace a écrit :
> On 2/1/2013 10:42 AM, jlp wrote:
>> Le 01/02/2013 19:38, markspace a écrit :
>>> On 2/1/2013 10:33 AM, jlp wrote:
>>>> What do you think about this ?
>>>>
>>>
>>> I think micro-benchmarks don't work.
>>>
>>>
>>
>> What is wrong in this test ?
>>
>
>
> It's a micro-benchmark.
>
ok ;-)

But it seems problematic in "real world"
https://jira.springsource.org/browse/SPR-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:changehistory-tabpanel
http://grokbase.com/t/gg/scala-user/131993ttrq/why-is-string-grouped-so-slow

-- 
Cordialement
Jean-Louis Pasturel

[toc] | [prev] | [next] | [standalone]


#21968

Frommarkspace <markspace@nospam.nospam>
Date2013-02-01 11:20 -0800
Message-ID<keh4hf$ic7$1@dont-email.me>
In reply to#21966
On 2/1/2013 10:57 AM, jlp wrote:

> But it seems problematic in "real world"
> https://jira.springsource.org/browse/SPR-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:changehistory-tabpanel


That's a better test, using production code.  However, note that it's
scoped to only one small portion of the code, only while loading a lot
of small scripts.

It's really common for code that was once "working" to suddenly develop 
undesirable characteristics as it's exposed to new input or new 
environments or new anything.  It's just something that happens and part 
of the normal maintenance of code.

What do I think of it?  It's normal.

[toc] | [prev] | [next] | [standalone]


#21970

FromJan Burse <janburse@fastmail.fm>
Date2013-02-01 20:34 +0100
Message-ID<keh5bl$kq8$1@news.albasani.net>
In reply to#21962
There seems to be more going on with Strings:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6962931

But I didn't find the original change request (CR)
or request for enhancement concerning the copy
semantics. There was some estimate on applications
impact and allotment for redesigning applications.

I didn't know about hash32 thing in the link below.
Your micro benchmark doesn't test hashCode, does it?

jlp schrieb:
> The String class was modified in JDK 1.7.0_6.
> String.substring that was 0(1)  before JDK 1.7.0_6, now becomes O(n)
>
> All is well explained at :
> http://java-performance.info/changes-to-string-java-1-7-0_06/
>
> I wrote a small test:
> https://gist.github.com/4692960
>
> java -Xms128M -Xmx128M teststring.Main 100000 1000000
>
> On my desktop:
> jdk 1.7.0_11 => 33 seconds / 252 KBytes Memory
> jdk 1.6.0_38 => 25 milliseconds / 782 KBytes Memory
> more than 1000 times faster ! ( Ok! for this stupid test ;-) )
>
> I don't think it is a good improvement ! Uses less memory, but you
> retrieve it, when the object is garbaged
> What do you think about this ?
>

[toc] | [prev] | [next] | [standalone]


#21999

FromKevin McMurtrie <mcmurtrie@pixelmemory.us>
Date2013-02-01 20:58 -0800
Message-ID<510c9cea$0$80106$742ec2ed@news.sonic.net>
In reply to#21962
In article <510c0a6a$0$8985$ba4acef3@reader.news.orange.fr>,
 jlp <jlp@jlp.com> wrote:

> The String class was modified in JDK 1.7.0_6.
> String.substring that was 0(1)  before JDK 1.7.0_6, now becomes O(n)
> 
> All is well explained at :
> http://java-performance.info/changes-to-string-java-1-7-0_06/
> 
> I wrote a small test:
> https://gist.github.com/4692960
> 
> java -Xms128M -Xmx128M teststring.Main 100000 1000000
> 
> On my desktop:
> jdk 1.7.0_11 => 33 seconds / 252 KBytes Memory
> jdk 1.6.0_38 => 25 milliseconds / 782 KBytes Memory
> more than 1000 times faster ! ( Ok! for this stupid test ;-) )
> 
> I don't think it is a good improvement ! Uses less memory, but you 
> retrieve it, when the object is garbaged
> What do you think about this ?

It's an unbelievable change.  Buffer sharing in Java 1-6 had the simple 
workaround of calling the String(String) constructor.  There's no 
workaround for Strings getting much slower in Java 7+.

What Oracle did in Java 7 would only make sense if CharSequence had 
buffer sharing and better support.  I just checked Java 7, and 
CharSequence looks more useless than ever.  String.subSequence() 
allocates a new char[] and the usual parsers (Integer, Long, Float) 
still only accept a String.  Slow Strings it is.


Side rant:
Sun broke buffer sharing between StringBuffer and String back in Java 5.  
The reason was so that AbstractStringBuilder class could support the 
implementations for both StringBuffer and StringBuilder.  Had they kept 
the implementations split, we could still have a very fast 
StringBuffer.toString().  As a final F-U, none of classes can be 
extended even through there's no buffer sharing that can be hacked.
-- 
I will not see posts from Google because I must filter them as spam

[toc] | [prev] | [next] | [standalone]


#22000

Frommarkspace <markspace@nospam.nospam>
Date2013-02-01 22:55 -0800
Message-ID<keid8j$2og$1@dont-email.me>
In reply to#21999
On 2/1/2013 8:58 PM, Kevin McMurtrie wrote:

>  and the usual parsers (Integer, Long, Float)
>still only accept a String.  Slow Strings it is.
>

This I agree is a bit of a bummer, it would be useful for the parsers to 
take CharSequence for flexibility.  Integers aren't hard to parse but 
floats and doubles are non-trivial.

Note however that Scanner accepts both Readable (a Reader) and 
ReadableByteChannel in its constructors.

>
> Side rant:
> Sun broke buffer sharing between StringBuffer and String back in Java 5.

Probably because Strings needed to be immutable and there's no way to do 
that when sharing a mutable buffer.

> we could still have a very fast
> StringBuffer.toString().

Nope, see above.

>  As a final F-U, none of classes can be
> extended even through there's no buffer sharing that can be hacked.
>

Probably because they don't want you doing stupid broken things, like 
trying to share buffers between immutable and mutable objects.

I still agree that CharSequence could be made more useful though, that's 
a good idea.  Hmmm.


[toc] | [prev] | [next] | [standalone]


#22007

FromKevin McMurtrie <mcmurtrie@pixelmemory.us>
Date2013-02-02 08:43 -0800
Message-ID<510d4249$0$80118$742ec2ed@news.sonic.net>
In reply to#22000
In article <keid8j$2og$1@dont-email.me>,
 markspace <markspace@nospam.nospam> wrote:

> On 2/1/2013 8:58 PM, Kevin McMurtrie wrote:
> 
> >  and the usual parsers (Integer, Long, Float)
> >still only accept a String.  Slow Strings it is.
> >
> 
> This I agree is a bit of a bummer, it would be useful for the parsers to 
> take CharSequence for flexibility.  Integers aren't hard to parse but 
> floats and doubles are non-trivial.
> 
> Note however that Scanner accepts both Readable (a Reader) and 
> ReadableByteChannel in its constructors.
> 
> >
> > Side rant:
> > Sun broke buffer sharing between StringBuffer and String back in Java 5.
> 
> Probably because Strings needed to be immutable and there's no way to do 
> that when sharing a mutable buffer.

The original StringBuffer class was synchronized, final, and protected 
its internal char[].  There was no way to trick it into altering the 
buffer after turning it into String.  At worst it was the same speed as 
today's StringBuilder.  For the common case of being a single-use 
object, it was much faster.


> > we could still have a very fast
> > StringBuffer.toString().
> 
> Nope, see above.
> 
> >  As a final F-U, none of classes can be
> > extended even through there's no buffer sharing that can be hacked.
> >
> 
> Probably because they don't want you doing stupid broken things, like 
> trying to share buffers between immutable and mutable objects.
> 
> I still agree that CharSequence could be made more useful though, that's 
> a good idea.  Hmmm.
-- 
I will not see posts from Google because I must filter them as spam

[toc] | [prev] | [next] | [standalone]


#22017

Frommarkspace <markspace@nospam.nospam>
Date2013-02-02 10:56 -0800
Message-ID<kejngo$lp1$1@dont-email.me>
In reply to#22007
On 2/2/2013 8:43 AM, Kevin McMurtrie wrote:

> The original StringBuffer class was synchronized, final, and protected
> its internal char[].  There was no way to trick it into altering the
> buffer after turning it into String.


Other than calling append(), you mean?  Maybe StringBuffer also cleared 
its buffer so it couldn't be reused (although I don't see that in the 
docs), however I'd bet that current implementations rely on JIT compiler 
to optimize away unneeded buffer copies.

I'm thinking this does mobile a disservice, however, because JIT might 
be hard to do on small devices.

[toc] | [prev] | [next] | [standalone]


#22027

FromKevin McMurtrie <mcmurtrie@pixelmemory.us>
Date2013-02-02 14:46 -0800
Message-ID<510d972c$0$80186$742ec2ed@news.sonic.net>
In reply to#22017
In article <kejngo$lp1$1@dont-email.me>,
 markspace <markspace@nospam.nospam> wrote:

> On 2/2/2013 8:43 AM, Kevin McMurtrie wrote:
> 
> > The original StringBuffer class was synchronized, final, and protected
> > its internal char[].  There was no way to trick it into altering the
> > buffer after turning it into String.
> 
> 
> Other than calling append(), you mean?  Maybe StringBuffer also cleared 
> its buffer so it couldn't be reused (although I don't see that in the 
> docs), however I'd bet that current implementations rely on JIT compiler 
> to optimize away unneeded buffer copies.
> 
> I'm thinking this does mobile a disservice, however, because JIT might 
> be hard to do on small devices.

The original StringBuffer went to copy-on-write mode after calling 
toString().  You can go read the old code for yourself.  There was no 
JVM trick involved.

Some future JVMs do have JIT tricks to improve String performance.  It's 
not clear how that would perform or what the side effects would be.  One 
proposed trick was to make the String(String) constructor a no-op.  That 
could have disastrous consequences.  For example, the code below uses 
specific object references as signals.  If the String constructor was a 
no-op, the signal references would be interned constants that are not 
unique.

   static final String eofMarker = new String("EOF");
   static final String flushMarker = new String("Flush");
   final ArrayBlockingQueue<String> queue= new
         ArrayBlockingQueue<String>(1000);
   
   void processQueue() throws InterruptedException
   {
      String str;
      while ((str= queue.take()) != eofMarker)
      {
         if (str == flushMarker)
         {
            //Flush
            
         }
         else
         {
            //Process string
            
         }
      }
   }
-- 
I will not see posts from Google because I must filter them as spam

[toc] | [prev] | [next] | [standalone]


#22028

Frommarkspace <markspace@nospam.nospam>
Date2013-02-02 15:31 -0800
Message-ID<kek7jh$fva$1@dont-email.me>
In reply to#22027
On 2/2/2013 2:46 PM, Kevin McMurtrie wrote:

> Some future JVMs do have JIT tricks to improve String performance.  It's
> not clear how that would perform or what the side effects would be.  One

The main think I'd like to see as a "trick" would be to spot when an 
array is not accessed after a copy, thus negating the need for a copy.

String constructor:

   public String( char[] chars ) {
     this.buffer = Arrays.copyOf( chars, chars.length );
   }

Usage:

   public String someMethod() {
     char[] myBuff = ...  // a local variable
     return new String( myBuff );
   }

Spotting that the copy isn't needed because myBuff is local and can't be 
accessed after the return is one obvious optimization.

If this type of analysis is very hard, I can see the original 
implementation of StringBuilder would be advantageous.  OTOH, it doesn't 
look hard, and I'd bet there's a lot of situations where checking a 
"copy-on-write bit" is a bigger performance hit.

[toc] | [prev] | [next] | [standalone]


#22042

FromRobert Klemme <shortcutter@googlemail.com>
Date2013-02-03 15:09 +0100
Message-ID<an79diFtf59U1@mid.individual.net>
In reply to#22027
On 02.02.2013 23:46, Kevin McMurtrie wrote:

> The original StringBuffer went to copy-on-write mode after calling
> toString().  You can go read the old code for yourself.  There was no
> JVM trick involved.

Exactly.

> Some future JVMs do have JIT tricks to improve String performance.

Even todays JVMs have vastly changed GC behavior vs. Java 1.4.  From a 
GC point of view the changed behavior might actually be advantageous 
because a StringBuilder and StringBuffer are typically short lived 
objects (i.e. used to create a String) and they as well as their 
internal char[] will be collected quickly and with low overhead because 
they usually never survive one young GC cycle.  But the created String 
might live longer and so it may actually make sense to have a char[] of 
exactly the length needed for the String (vs. the char[] that was 
allocated by the StringBuilder during construction which could be 
significantly longer than the resulting String - especially if the 
StringBuilder was used multiple times).  If the String doesn't live 
longer same reasoning as above applies - there is just a tad more 
garbage created.

I have to say I trust engineers at Sun / Oracle to have done their 
homework and measurements.  I do not believe that these changes are done 
so lightheartedly and so I also believe that there is not as much to 
worry about as some debaters suggest.

> It's
> not clear how that would perform or what the side effects would be.  One
> proposed trick was to make the String(String) constructor a no-op.  That
> could have disastrous consequences.  For example, the code below uses
> specific object references as signals.  If the String constructor was a
> no-op, the signal references would be interned constants that are not
> unique.

I believe it is a bad idea to use specific String objects for signaling. 
  An instance of Object is better IMHO.  (Yeah, I know some nasty casts 
will be needed.  But at least you are sure that nothing weird happens 
under the hood.)

Kind regards

	robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.java.programmer


csiph-web