Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: Joshua Cranmer <Pidgeot18@verizon.invalid>
Newsgroups: comp.lang.java.programmer
Subject: Re: Passing a Method Name to a Method, Redux
Date: Thu, 23 Jun 2011 19:36:53 -0700
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <iu0t8a$kqc$1@dont-email.me>
References: <fpg7079ca2dtgipdphr8rm234kgmkd1t3l@4ax.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 24 Jun 2011 02:36:58 +0000 (UTC)
Injection-Info: mx04.eternal-september.org; posting-host="Q8HyEFb0j2lB0WC1MU3ArQ"; logging-data="21324"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+m1lFDYXzt9zvNIHNaVqyv3yfbG0ArDuY="
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.18) Gecko/20110616 Lightning/1.0b2 Thunderbird/3.1.11
In-Reply-To: <fpg7079ca2dtgipdphr8rm234kgmkd1t3l@4ax.com>
Cancel-Lock: sha1:piB6kyIeF/pI671yp+iu/BeWMhk=
Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:5621

On 6/23/2011 4:03 PM, Gene Wirchenko wrote:
> Dear Java'ers:
>
>       I have completed my benchmarking.  The code is below.  Note that
> the difference in the bodies between ParseSequentialSearch(),
> ParseBinarySearch(), and ParseTreesetSearch() is but one line.  I
> really would have preferred not having to duplicate the code.
>
>       Oddly, the timings have a LOT of noise in them.  In some runs, a
> sequential search has out-performed a binary search.  Occasionally, a
> sequential search has beaten both a binary search and a Treeset
> search.  The times for sequential searching are only a bit worse than
> for binary searching.  Treeset searching is about 20% faster.  Any
> explanations?

Here is probably the best implementation:

static boolean[] isIdentifierChar;
static String slowSearch;
static void initIdentifierChars(String validChars) {
   isIdentifierChar = new boolean[128];
   for (char c : validChars.toCharArray())
     if (c < 128)
       isIdentifierChar[c] = true;
   slowSearch = validChars;
}

static boolean isValidChar(char c) {
   if (c >= 128)
     return slowSearch.indexOf(c) >= 0;
   return isIdentifieChar[c];
}

If it's really performance critical, you can just make the character 
array the full 64K characters and skip the search, so testing for valid 
identifiers is simply an index lookup.

There are other performance improvements:
1. Use StringBuilder. It's mutable, so it avoids copies (kind of like 
ArrayList). Heck, that's what your code is doing, it just keeps copying 
back and forth between strings.

2. It's probably better to throw out StringBuilder and use token extents 
(i.e., from index 5 to 9 is this identifier). It saves copying and can 
allow for better error messages in parsers by being able to pinpoint 
line/column numbers accurately.

But yeah, I would have coded my benchmark like that. Since you seem to 
be concerned with microoptimization here, anything which looks different 
from what the real parser implementation would show up as having impact 
in timing numbers.

Finally, I might add, if you're writing a full parser, it might be 
better to just use an existing Java lexer rather than rolling your own 
unless you have a *very* good reason not to.

-- 
Beware of bugs in the above code; I have only proved it correct, not 
tried it. -- Donald E. Knuth