Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!.POSTED!not-for-mail From: Jan Burse Newsgroups: comp.lang.java.programmer Subject: Re: StringBuilder Date: Sat, 17 Sep 2011 20:35:50 +0200 Organization: albasani.net Lines: 150 Message-ID: References: <96f358c8-a024-40db-b60b-300186c2f813@o10g2000vby.googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: news.albasani.net eJUWyQIAH9dwubdVkYB1xx7bL+mJerSFukLCm5yHXtSBZxMOUJY8hO5iPb/ulzk4sZo2LIN1v7VekeeAYWcdFGtokqVsq5ZkbdDJyaMasTucVeNJXRDguLSDZQOWLDOs NNTP-Posting-Date: Sat, 17 Sep 2011 18:35:50 +0000 (UTC) Injection-Info: news.albasani.net; logging-data="6rBG77O8i+KzwjuyurmmonU+m2QEBV3lcU3xNmuEdPEjrK+Am2sQF8/qk97ro+hHHolWoZl5RxHGtTx+CD9Bck7nq2XK6LRVGqI1mYoYQ1pV8TGQdpnICo3HRmFGpmvW"; mail-complaints-to="abuse@albasani.net" User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:6.0.2) Gecko/20110902 Firefox/6.0.2 SeaMonkey/2.3.3 In-Reply-To: Cancel-Lock: sha1:MGPTMTEF82GpyVN+A+j5FYdezh0= Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:8108 Stanimir Stamenkov schrieb: > Mon, 05 Sep 2011 05:27:15 +0200, /Jan Burse/: > >> If you then explicitly use StringBuilder you are >> faster, because you save the new StringBuilder() and toString(). >> >> So this is faster, since it uses 1 new and 1 toString(): > > The StringBuilder.toString() is really fast - that's the point, and I > don't think it is worth mentioning it. > I am not sure whether I can agree directly. The StringBuilder is a mutable object. The String is a immutable object. Therefore the obvious fast implementation that would share the buffer between StringBuilder and String does not work. Because the following code would break the immutability of String: StringBuilder buf=new StringBuilder(); buf.append("Hello World!"); String str=buf.toString(); buf.replace(6,11,"Java"); System.out.println("str="+str); By a side effect via buf replace the value of the string str would change. Therefore we find the following slow implementation of toString() in the reference implementation. Please note the comment: 429 public String toString() { 430 // Create a copy, don't share the array 431 return new String(value, 0, count); 432 } http://kickjava.com/src/java/lang/StringBuilder.java.htm And if we look at the used constructor, it does really make a copy. There would be a non public constructor in String that allows some sharing, and that is for example used to implement substring. But this time a constructor is used that does not do a sharing: 197 public String(char value[], int offset, int count) { 198 if (offset < 0) { 199 throw new StringIndexOutOfBoundsException(offset); 200 } 201 if (count < 0) { 202 throw new StringIndexOutOfBoundsException(count); 203 } 204 // Note: offset or count might be near -1>>>1. 205 if (offset > value.length - count) { 206 throw new StringIndexOutOfBoundsException (offset + count); 207 } 208 char[] v = new char[count]; 209 System.arraycopy(value, offset, v, 0, count); 210 this.offset = 0; 211 this.count = count; 212 this.value = v; 213 } http://kickjava.com/src/java/lang/String.java.htm Eventually some programm analysis would allow sharing. But the copying has also a positive effect. When the StringBuilder by manipulation has gained a much greater capacity than necessary, then the copying will create a smaller char array, so that less space is used as soon as the StringBuilder is reclaimed. But maybe you are right, that toString() is nevertheless fast. Since a) allocating objects is usually fast and b) System array copy can also be fast. And together with the capacity reducing effect this could all lead to a small overhead. BTW: OpenJDK uses the same code. In Harmony we find a shared flag in the AbstractStringBuilder, and a heuristic when sharing is done or not. The non public String constructor is used for sharing: public String toString() { if (count == 0) { return ""; //$NON-NLS-1$ } // Optimize String sharing for more performance int wasted = value.length - count; if (wasted >= 256 || (wasted >= INITIAL_CAPACITY && wasted >= (count >> 1))) { return new String(value, 0, count); } shared = true; return new String(0, count, value); } http://www.java2s.com/Open-Source/Java-Document/Apache-Harmony-Java-SE/java-package/java/lang/AbstractStringBuilder.java.htm There is then a little overhead in the basic operations of StringBuilder to check for sharing, and in case that there is sharing, a copy is made. final void replace0(int start, int end, String string) { [...] if (!shared) { // index == count case is no-op System.arraycopy(value, end, value, start + stringLength, count - end); } else { char[] newData = new char[value.length]; System.arraycopy(value, 0, newData, 0, start); // index == count case is no-op System.arraycopy(value, end, newData, start + stringLength, count - end); value = newData; shared = false; } Probably gain in speed by the sharing compensates for this little extra check needed everwhere. So probably toString() is relatively fast here, assuming that sharing happens enough often. When we look at the loop example then we can positively influence sharing when we give a good initial capacity, because then waste is small. But giving an initial capacity for the whole loop is propably non trivial. How does the digit size of squares develop. So assume our StringBuilder grows according to its enlargeBuffer rule. In the case of Harmony the capacity is growing by a factor 1.5 and by adding 2. So initially we will have waste >= count/2 whenever an enlargement happend, because of the adding of two we have waste = count/2 + 2. So no sharing will happen. When we then have added n characters, we will have waste' = count/2 + 2 - n and count' = count + n. We have only waste' < count' / 2 when 2 - n < n / 2. So only after adding 2 characters sharing will happen again for shure. So the heuristic has a little glitch. But never mind. Best Regards