Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail From: Joshua Cranmer Newsgroups: comp.lang.java.programmer Subject: Re: Passing a Method Name to a Method, Redux Date: Thu, 23 Jun 2011 19:36:53 -0700 Organization: A noiseless patient Spider Lines: 59 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Fri, 24 Jun 2011 02:36:58 +0000 (UTC) Injection-Info: mx04.eternal-september.org; posting-host="Q8HyEFb0j2lB0WC1MU3ArQ"; logging-data="21324"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+m1lFDYXzt9zvNIHNaVqyv3yfbG0ArDuY=" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.18) Gecko/20110616 Lightning/1.0b2 Thunderbird/3.1.11 In-Reply-To: Cancel-Lock: sha1:piB6kyIeF/pI671yp+iu/BeWMhk= Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:5621 On 6/23/2011 4:03 PM, Gene Wirchenko wrote: > Dear Java'ers: > > I have completed my benchmarking. The code is below. Note that > the difference in the bodies between ParseSequentialSearch(), > ParseBinarySearch(), and ParseTreesetSearch() is but one line. I > really would have preferred not having to duplicate the code. > > Oddly, the timings have a LOT of noise in them. In some runs, a > sequential search has out-performed a binary search. Occasionally, a > sequential search has beaten both a binary search and a Treeset > search. The times for sequential searching are only a bit worse than > for binary searching. Treeset searching is about 20% faster. Any > explanations? Here is probably the best implementation: static boolean[] isIdentifierChar; static String slowSearch; static void initIdentifierChars(String validChars) { isIdentifierChar = new boolean[128]; for (char c : validChars.toCharArray()) if (c < 128) isIdentifierChar[c] = true; slowSearch = validChars; } static boolean isValidChar(char c) { if (c >= 128) return slowSearch.indexOf(c) >= 0; return isIdentifieChar[c]; } If it's really performance critical, you can just make the character array the full 64K characters and skip the search, so testing for valid identifiers is simply an index lookup. There are other performance improvements: 1. Use StringBuilder. It's mutable, so it avoids copies (kind of like ArrayList). Heck, that's what your code is doing, it just keeps copying back and forth between strings. 2. It's probably better to throw out StringBuilder and use token extents (i.e., from index 5 to 9 is this identifier). It saves copying and can allow for better error messages in parsers by being able to pinpoint line/column numbers accurately. But yeah, I would have coded my benchmark like that. Since you seem to be concerned with microoptimization here, anything which looks different from what the real parser implementation would show up as having impact in timing numbers. Finally, I might add, if you're writing a full parser, it might be better to just use an existing Java lexer rather than rolling your own unless you have a *very* good reason not to. -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth