Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: blmblm@myrealbox.com Newsgroups: comp.lang.java.programmer Subject: Re: StringBuilder Difficulties Date: 6 Jul 2011 16:59:04 GMT Organization: None Lines: 173 Message-ID: <97jiioFr6pU2@mid.individual.net> References: <97hd9sFa1jU2@mid.individual.net> X-Trace: individual.net va66QuwuIplvkKmFOoN+EgGiUNnaQRUEk1hmXmUrxJ1Euvz0bF X-Orig-Path: not-for-mail Cancel-Lock: sha1:WRurVcxWrnJvtEmwcHUdoWJ/ck4= X-Newsreader: trn 4.0-test76 (Apr 2, 2001) Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:5910 In article , Gene Wirchenko wrote: > On 5 Jul 2011 21:16:44 GMT, blmblm@myrealbox.com > wrote: > > >In article , > >Gene Wirchenko wrote: > >> On 5 Jul 2011 19:12:39 GMT, blmblm@myrealbox.com > >> wrote: > >> > >> [snip] > >> > >> >What I found is that HashSet was noticeably faster on all the > >> >systems where I ran the benchmarks. Unless you need for the set > >> >to be sorted (and it's not apparent from your code that you do), > >> >why not .... ? (I'm curious too about why you chose TreeSet in > >> >the first place. ? ) > >> > >> 1) I can output the set in order without having to do anything else. > >> My real program has a lot of debugging info dumping. (Read as "checks > >> that I have not done something wrong".) > > > > > >Ah. Well, yes, then you probably do need a SortedSet, though > >considering that you initially build the set from a string that's in > >order, maybe you could use that (the string) instead. Just sayin', > >maybe. > > I set the string value that way so that the binary search would > work. Ah, okay. > >> 2) When I read "hash", I think "collision", and I get nervous. > >> Nothing I read reassured me that that could not happen. > > >Why would that make you nervous? If you're worried about > >correctness, um, as far as I know hash tables are supposed to > >deal with collisions in some way that preserves the overall "map" > >semantics. Performance may suffer if there are a lot of collisions, > >but -- benchmark on the system(s) of interest and check? > > I would need to know what the behaviour is supposed to be. Maybe there's something I'm not understanding, but as far as I know hash tables (including the Java HashMap class that, according to the API, is used by HashSet) are required to do something that allows storing multiple distinct keys that have the same hash code ("collisions"). The implementation that comes to mind is one that has some sort of list for each possible hashcode value. Obviously(?) performance suffers if very many of these lists have more than one or a few elements, but correctness (in the sense of considering keys to be equal based only on what equals() returns and not on what hashCode() returns) is preserved. I don't think I'm explaining this very well. Maybe the following short test program will do better -- it's a totally contrived example of using a HashSet to store elements that are distinct but have the same hash code: import java.util.*; public class HashSetTest { public static class Foo { public final int n; public Foo(int n) { this.n = n; } @Override public int hashCode() { return 0; } @Override public boolean equals(Object obj) { if (obj instanceof Foo) { return n == ((Foo) obj).n; } else { return false; } } } public static void main(String[] args) { Set s = new HashSet(); // add what should be ten distinct elements for (int i = 0; i < 10; ++i) { s.add(new Foo(i)); } // add what should be duplicate elements for (int i = 0; i < 10; ++i) { s.add(new Foo(i)); } // iterate over all elements of s -- this "foreach" syntax // is possible AFAIK with all classes that implement Collection for (Foo foo : s) { // note use of C-like printf System.out.printf("foo with value %d, hashcode %d\n", foo.n, foo.hashCode()); } } } When I run this I get 10 lines of output, indicating to me that whether two elements are considered duplicates depends on equality as determined by the equals() method rather than on hashcode. Or maybe I'm once again totally misunderstanding you, and your actual concern is something else .... > I > have something that works for now. Optimisation comes after getting > it working. Well, yeah, sure, but this rather seems at odds with your having done benchmarking of various ways of looking up a character in a set of characters. Anyway if at some point it seems that performance of your lookup isn't good enough, it might be worthwhile to explore HashSet, if you can think of some way to get the debugging information you need that doesn't depend on being able to retrieve elements from the set in sorted order. (E.g., maybe you could do a one-time operation that retrieves all the elements, sorts them, and saves the result.) Or, as supercali* suggests in another post, you could use a LinkedHashSet. Replace HashSet with LinkedHashSet in the above test program to see the difference. This would require you to build your set from elements in the right order, but assuming building the set is a one-time thing, it should be easy enough to sort the elements before adding them, or so I would think. > >> 3) I had to pick something. If it works, I can change it later. If > >> it does not, I have not solved my problem yet. The former is safer. > > >Sure. Then again, if you were only concerned about getting something > >that works, why try various alternatives .... > > 1) Just in case there was a BIG difference. In my experiments HashSet was about twice as fast as TreeSet. Of course if you needed the sortedness, HashSet is off the table, but LinkedHashSet (which I did not know about!) seems like it would do what you need and is about as fast as HashSet on the systems where I tried it. The code using java.util.regex.* wasn't as fast on the old system where I did my initial experiments, but on a newer system it seems to give performance comparable to (Linked)HashSet. Just sayin'. > 2) To learn more > about Java. At some point though, I have to do something useful with > it. Sure. I'm dimly aware that I'm probably flogging a horse that has long since shuffled off this mortal coil. > My preprocessor is shaping up nicely. I have to write the symbol > define command code, handle the include command, and handle output to > a file. That is about it. Which means that there are probably only > about three other things that I will come up. And then you can go back to coding in a language you like better? :-) -- B. L. Massingill ObDisclaimer: I don't speak for my employers; they return the favor.