Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #21962 > unrolled thread
| Started by | jlp <jlp@jlp.com> |
|---|---|
| First post | 2013-02-01 19:33 +0100 |
| Last post | 2013-02-03 15:09 +0100 |
| Articles | 14 — 5 participants |
Back to article view | Back to comp.lang.java.programmer
String.substring in JDK 1.7.0_6+ jlp <jlp@jlp.com> - 2013-02-01 19:33 +0100
Re: String.substring in JDK 1.7.0_6+ markspace <markspace@nospam.nospam> - 2013-02-01 10:38 -0800
Re: String.substring in JDK 1.7.0_6+ jlp <jlp@jlp.com> - 2013-02-01 19:42 +0100
Re: String.substring in JDK 1.7.0_6+ markspace <markspace@nospam.nospam> - 2013-02-01 10:45 -0800
Re: String.substring in JDK 1.7.0_6+ jlp <jlp@jlp.com> - 2013-02-01 19:57 +0100
Re: String.substring in JDK 1.7.0_6+ markspace <markspace@nospam.nospam> - 2013-02-01 11:20 -0800
Re: String.substring in JDK 1.7.0_6+ Jan Burse <janburse@fastmail.fm> - 2013-02-01 20:34 +0100
Re: String.substring in JDK 1.7.0_6+ Kevin McMurtrie <mcmurtrie@pixelmemory.us> - 2013-02-01 20:58 -0800
Re: String.substring in JDK 1.7.0_6+ markspace <markspace@nospam.nospam> - 2013-02-01 22:55 -0800
Re: String.substring in JDK 1.7.0_6+ Kevin McMurtrie <mcmurtrie@pixelmemory.us> - 2013-02-02 08:43 -0800
Re: String.substring in JDK 1.7.0_6+ markspace <markspace@nospam.nospam> - 2013-02-02 10:56 -0800
Re: String.substring in JDK 1.7.0_6+ Kevin McMurtrie <mcmurtrie@pixelmemory.us> - 2013-02-02 14:46 -0800
Re: String.substring in JDK 1.7.0_6+ markspace <markspace@nospam.nospam> - 2013-02-02 15:31 -0800
Re: String.substring in JDK 1.7.0_6+ Robert Klemme <shortcutter@googlemail.com> - 2013-02-03 15:09 +0100
| From | jlp <jlp@jlp.com> |
|---|---|
| Date | 2013-02-01 19:33 +0100 |
| Subject | String.substring in JDK 1.7.0_6+ |
| Message-ID | <510c0a6a$0$8985$ba4acef3@reader.news.orange.fr> |
The String class was modified in JDK 1.7.0_6. String.substring that was 0(1) before JDK 1.7.0_6, now becomes O(n) All is well explained at : http://java-performance.info/changes-to-string-java-1-7-0_06/ I wrote a small test: https://gist.github.com/4692960 java -Xms128M -Xmx128M teststring.Main 100000 1000000 On my desktop: jdk 1.7.0_11 => 33 seconds / 252 KBytes Memory jdk 1.6.0_38 => 25 milliseconds / 782 KBytes Memory more than 1000 times faster ! ( Ok! for this stupid test ;-) ) I don't think it is a good improvement ! Uses less memory, but you retrieve it, when the object is garbaged What do you think about this ? -- Cordialement Jean-Louis Pasturel
[toc] | [next] | [standalone]
| From | markspace <markspace@nospam.nospam> |
|---|---|
| Date | 2013-02-01 10:38 -0800 |
| Message-ID | <keh22t$de$1@dont-email.me> |
| In reply to | #21962 |
On 2/1/2013 10:33 AM, jlp wrote: > What do you think about this ? > I think micro-benchmarks don't work.
[toc] | [prev] | [next] | [standalone]
| From | jlp <jlp@jlp.com> |
|---|---|
| Date | 2013-02-01 19:42 +0100 |
| Message-ID | <510c0c7c$0$1368$ba4acef3@reader.news.orange.fr> |
| In reply to | #21963 |
Le 01/02/2013 19:38, markspace a écrit : > On 2/1/2013 10:33 AM, jlp wrote: >> What do you think about this ? >> > > I think micro-benchmarks don't work. > > What is wrong in this test ? -- Cordialement Jean-Louis Pasturel
[toc] | [prev] | [next] | [standalone]
| From | markspace <markspace@nospam.nospam> |
|---|---|
| Date | 2013-02-01 10:45 -0800 |
| Message-ID | <keh2g5$3am$1@dont-email.me> |
| In reply to | #21964 |
On 2/1/2013 10:42 AM, jlp wrote: > Le 01/02/2013 19:38, markspace a écrit : >> On 2/1/2013 10:33 AM, jlp wrote: >>> What do you think about this ? >>> >> >> I think micro-benchmarks don't work. >> >> > > What is wrong in this test ? > It's a micro-benchmark.
[toc] | [prev] | [next] | [standalone]
| From | jlp <jlp@jlp.com> |
|---|---|
| Date | 2013-02-01 19:57 +0100 |
| Message-ID | <510c1023$0$9019$ba4acef3@reader.news.orange.fr> |
| In reply to | #21965 |
Le 01/02/2013 19:45, markspace a écrit : > On 2/1/2013 10:42 AM, jlp wrote: >> Le 01/02/2013 19:38, markspace a écrit : >>> On 2/1/2013 10:33 AM, jlp wrote: >>>> What do you think about this ? >>>> >>> >>> I think micro-benchmarks don't work. >>> >>> >> >> What is wrong in this test ? >> > > > It's a micro-benchmark. > ok ;-) But it seems problematic in "real world" https://jira.springsource.org/browse/SPR-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:changehistory-tabpanel http://grokbase.com/t/gg/scala-user/131993ttrq/why-is-string-grouped-so-slow -- Cordialement Jean-Louis Pasturel
[toc] | [prev] | [next] | [standalone]
| From | markspace <markspace@nospam.nospam> |
|---|---|
| Date | 2013-02-01 11:20 -0800 |
| Message-ID | <keh4hf$ic7$1@dont-email.me> |
| In reply to | #21966 |
On 2/1/2013 10:57 AM, jlp wrote: > But it seems problematic in "real world" > https://jira.springsource.org/browse/SPR-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:changehistory-tabpanel That's a better test, using production code. However, note that it's scoped to only one small portion of the code, only while loading a lot of small scripts. It's really common for code that was once "working" to suddenly develop undesirable characteristics as it's exposed to new input or new environments or new anything. It's just something that happens and part of the normal maintenance of code. What do I think of it? It's normal.
[toc] | [prev] | [next] | [standalone]
| From | Jan Burse <janburse@fastmail.fm> |
|---|---|
| Date | 2013-02-01 20:34 +0100 |
| Message-ID | <keh5bl$kq8$1@news.albasani.net> |
| In reply to | #21962 |
There seems to be more going on with Strings: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6962931 But I didn't find the original change request (CR) or request for enhancement concerning the copy semantics. There was some estimate on applications impact and allotment for redesigning applications. I didn't know about hash32 thing in the link below. Your micro benchmark doesn't test hashCode, does it? jlp schrieb: > The String class was modified in JDK 1.7.0_6. > String.substring that was 0(1) before JDK 1.7.0_6, now becomes O(n) > > All is well explained at : > http://java-performance.info/changes-to-string-java-1-7-0_06/ > > I wrote a small test: > https://gist.github.com/4692960 > > java -Xms128M -Xmx128M teststring.Main 100000 1000000 > > On my desktop: > jdk 1.7.0_11 => 33 seconds / 252 KBytes Memory > jdk 1.6.0_38 => 25 milliseconds / 782 KBytes Memory > more than 1000 times faster ! ( Ok! for this stupid test ;-) ) > > I don't think it is a good improvement ! Uses less memory, but you > retrieve it, when the object is garbaged > What do you think about this ? >
[toc] | [prev] | [next] | [standalone]
| From | Kevin McMurtrie <mcmurtrie@pixelmemory.us> |
|---|---|
| Date | 2013-02-01 20:58 -0800 |
| Message-ID | <510c9cea$0$80106$742ec2ed@news.sonic.net> |
| In reply to | #21962 |
In article <510c0a6a$0$8985$ba4acef3@reader.news.orange.fr>, jlp <jlp@jlp.com> wrote: > The String class was modified in JDK 1.7.0_6. > String.substring that was 0(1) before JDK 1.7.0_6, now becomes O(n) > > All is well explained at : > http://java-performance.info/changes-to-string-java-1-7-0_06/ > > I wrote a small test: > https://gist.github.com/4692960 > > java -Xms128M -Xmx128M teststring.Main 100000 1000000 > > On my desktop: > jdk 1.7.0_11 => 33 seconds / 252 KBytes Memory > jdk 1.6.0_38 => 25 milliseconds / 782 KBytes Memory > more than 1000 times faster ! ( Ok! for this stupid test ;-) ) > > I don't think it is a good improvement ! Uses less memory, but you > retrieve it, when the object is garbaged > What do you think about this ? It's an unbelievable change. Buffer sharing in Java 1-6 had the simple workaround of calling the String(String) constructor. There's no workaround for Strings getting much slower in Java 7+. What Oracle did in Java 7 would only make sense if CharSequence had buffer sharing and better support. I just checked Java 7, and CharSequence looks more useless than ever. String.subSequence() allocates a new char[] and the usual parsers (Integer, Long, Float) still only accept a String. Slow Strings it is. Side rant: Sun broke buffer sharing between StringBuffer and String back in Java 5. The reason was so that AbstractStringBuilder class could support the implementations for both StringBuffer and StringBuilder. Had they kept the implementations split, we could still have a very fast StringBuffer.toString(). As a final F-U, none of classes can be extended even through there's no buffer sharing that can be hacked. -- I will not see posts from Google because I must filter them as spam
[toc] | [prev] | [next] | [standalone]
| From | markspace <markspace@nospam.nospam> |
|---|---|
| Date | 2013-02-01 22:55 -0800 |
| Message-ID | <keid8j$2og$1@dont-email.me> |
| In reply to | #21999 |
On 2/1/2013 8:58 PM, Kevin McMurtrie wrote: > and the usual parsers (Integer, Long, Float) >still only accept a String. Slow Strings it is. > This I agree is a bit of a bummer, it would be useful for the parsers to take CharSequence for flexibility. Integers aren't hard to parse but floats and doubles are non-trivial. Note however that Scanner accepts both Readable (a Reader) and ReadableByteChannel in its constructors. > > Side rant: > Sun broke buffer sharing between StringBuffer and String back in Java 5. Probably because Strings needed to be immutable and there's no way to do that when sharing a mutable buffer. > we could still have a very fast > StringBuffer.toString(). Nope, see above. > As a final F-U, none of classes can be > extended even through there's no buffer sharing that can be hacked. > Probably because they don't want you doing stupid broken things, like trying to share buffers between immutable and mutable objects. I still agree that CharSequence could be made more useful though, that's a good idea. Hmmm.
[toc] | [prev] | [next] | [standalone]
| From | Kevin McMurtrie <mcmurtrie@pixelmemory.us> |
|---|---|
| Date | 2013-02-02 08:43 -0800 |
| Message-ID | <510d4249$0$80118$742ec2ed@news.sonic.net> |
| In reply to | #22000 |
In article <keid8j$2og$1@dont-email.me>, markspace <markspace@nospam.nospam> wrote: > On 2/1/2013 8:58 PM, Kevin McMurtrie wrote: > > > and the usual parsers (Integer, Long, Float) > >still only accept a String. Slow Strings it is. > > > > This I agree is a bit of a bummer, it would be useful for the parsers to > take CharSequence for flexibility. Integers aren't hard to parse but > floats and doubles are non-trivial. > > Note however that Scanner accepts both Readable (a Reader) and > ReadableByteChannel in its constructors. > > > > > Side rant: > > Sun broke buffer sharing between StringBuffer and String back in Java 5. > > Probably because Strings needed to be immutable and there's no way to do > that when sharing a mutable buffer. The original StringBuffer class was synchronized, final, and protected its internal char[]. There was no way to trick it into altering the buffer after turning it into String. At worst it was the same speed as today's StringBuilder. For the common case of being a single-use object, it was much faster. > > we could still have a very fast > > StringBuffer.toString(). > > Nope, see above. > > > As a final F-U, none of classes can be > > extended even through there's no buffer sharing that can be hacked. > > > > Probably because they don't want you doing stupid broken things, like > trying to share buffers between immutable and mutable objects. > > I still agree that CharSequence could be made more useful though, that's > a good idea. Hmmm. -- I will not see posts from Google because I must filter them as spam
[toc] | [prev] | [next] | [standalone]
| From | markspace <markspace@nospam.nospam> |
|---|---|
| Date | 2013-02-02 10:56 -0800 |
| Message-ID | <kejngo$lp1$1@dont-email.me> |
| In reply to | #22007 |
On 2/2/2013 8:43 AM, Kevin McMurtrie wrote: > The original StringBuffer class was synchronized, final, and protected > its internal char[]. There was no way to trick it into altering the > buffer after turning it into String. Other than calling append(), you mean? Maybe StringBuffer also cleared its buffer so it couldn't be reused (although I don't see that in the docs), however I'd bet that current implementations rely on JIT compiler to optimize away unneeded buffer copies. I'm thinking this does mobile a disservice, however, because JIT might be hard to do on small devices.
[toc] | [prev] | [next] | [standalone]
| From | Kevin McMurtrie <mcmurtrie@pixelmemory.us> |
|---|---|
| Date | 2013-02-02 14:46 -0800 |
| Message-ID | <510d972c$0$80186$742ec2ed@news.sonic.net> |
| In reply to | #22017 |
In article <kejngo$lp1$1@dont-email.me>,
markspace <markspace@nospam.nospam> wrote:
> On 2/2/2013 8:43 AM, Kevin McMurtrie wrote:
>
> > The original StringBuffer class was synchronized, final, and protected
> > its internal char[]. There was no way to trick it into altering the
> > buffer after turning it into String.
>
>
> Other than calling append(), you mean? Maybe StringBuffer also cleared
> its buffer so it couldn't be reused (although I don't see that in the
> docs), however I'd bet that current implementations rely on JIT compiler
> to optimize away unneeded buffer copies.
>
> I'm thinking this does mobile a disservice, however, because JIT might
> be hard to do on small devices.
The original StringBuffer went to copy-on-write mode after calling
toString(). You can go read the old code for yourself. There was no
JVM trick involved.
Some future JVMs do have JIT tricks to improve String performance. It's
not clear how that would perform or what the side effects would be. One
proposed trick was to make the String(String) constructor a no-op. That
could have disastrous consequences. For example, the code below uses
specific object references as signals. If the String constructor was a
no-op, the signal references would be interned constants that are not
unique.
static final String eofMarker = new String("EOF");
static final String flushMarker = new String("Flush");
final ArrayBlockingQueue<String> queue= new
ArrayBlockingQueue<String>(1000);
void processQueue() throws InterruptedException
{
String str;
while ((str= queue.take()) != eofMarker)
{
if (str == flushMarker)
{
//Flush
}
else
{
//Process string
}
}
}
--
I will not see posts from Google because I must filter them as spam
[toc] | [prev] | [next] | [standalone]
| From | markspace <markspace@nospam.nospam> |
|---|---|
| Date | 2013-02-02 15:31 -0800 |
| Message-ID | <kek7jh$fva$1@dont-email.me> |
| In reply to | #22027 |
On 2/2/2013 2:46 PM, Kevin McMurtrie wrote:
> Some future JVMs do have JIT tricks to improve String performance. It's
> not clear how that would perform or what the side effects would be. One
The main think I'd like to see as a "trick" would be to spot when an
array is not accessed after a copy, thus negating the need for a copy.
String constructor:
public String( char[] chars ) {
this.buffer = Arrays.copyOf( chars, chars.length );
}
Usage:
public String someMethod() {
char[] myBuff = ... // a local variable
return new String( myBuff );
}
Spotting that the copy isn't needed because myBuff is local and can't be
accessed after the return is one obvious optimization.
If this type of analysis is very hard, I can see the original
implementation of StringBuilder would be advantageous. OTOH, it doesn't
look hard, and I'd bet there's a lot of situations where checking a
"copy-on-write bit" is a bigger performance hit.
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2013-02-03 15:09 +0100 |
| Message-ID | <an79diFtf59U1@mid.individual.net> |
| In reply to | #22027 |
On 02.02.2013 23:46, Kevin McMurtrie wrote: > The original StringBuffer went to copy-on-write mode after calling > toString(). You can go read the old code for yourself. There was no > JVM trick involved. Exactly. > Some future JVMs do have JIT tricks to improve String performance. Even todays JVMs have vastly changed GC behavior vs. Java 1.4. From a GC point of view the changed behavior might actually be advantageous because a StringBuilder and StringBuffer are typically short lived objects (i.e. used to create a String) and they as well as their internal char[] will be collected quickly and with low overhead because they usually never survive one young GC cycle. But the created String might live longer and so it may actually make sense to have a char[] of exactly the length needed for the String (vs. the char[] that was allocated by the StringBuilder during construction which could be significantly longer than the resulting String - especially if the StringBuilder was used multiple times). If the String doesn't live longer same reasoning as above applies - there is just a tad more garbage created. I have to say I trust engineers at Sun / Oracle to have done their homework and measurements. I do not believe that these changes are done so lightheartedly and so I also believe that there is not as much to worry about as some debaters suggest. > It's > not clear how that would perform or what the side effects would be. One > proposed trick was to make the String(String) constructor a no-op. That > could have disastrous consequences. For example, the code below uses > specific object references as signals. If the String constructor was a > no-op, the signal references would be interned constants that are not > unique. I believe it is a bad idea to use specific String objects for signaling. An instance of Object is better IMHO. (Yeah, I know some nasty casts will be needed. But at least you are sure that nothing weird happens under the hood.) Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.java.programmer
csiph-web