Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #19984

Re: SortedMap question?

From Joshua Cranmer <Pidgeot18@verizon.invalid>
Newsgroups comp.lang.java.programmer
Subject Re: SortedMap question?
Date 2012-11-26 14:26 -0600
Organization A noiseless patient Spider
Message-ID <k90ja3$bg0$1@dont-email.me> (permalink)
References (3 earlier) <50b26f67$0$295$14726298@news.sunsite.dk> <k8tsfj$nn5$1@dont-email.me> <50b28b1a$0$289$14726298@news.sunsite.dk> <k8u6fq$i0q$1@dont-email.me> <ydnvccs87wo.fsf@shell.xmission.com>

Show all headers | View raw


On 11/26/2012 1:45 PM, Jim Janney wrote:
> Eric Sosman <esosman@comcast-dot-net.invalid> writes:
>
>>      Or look at String#indexOf(String).  Some fast algorithms
>> like Boyer-Moore have already been mentioned in this thread, yet
>> Java's own implementation uses a naive method that is linear at
>> best, super-linear if unlucky (I think it can be O(M*N), where M
>> and N are the lengths of the two Strings).  If the naive method
>> were replaced by Boyer-Moore or Knuth-Morris-Pratt or some other
>> better-O() algorithm, would you expect performance to improve?
>
> Boyer-Moore and its descendants invest more time up front building the
> jump table, in return for faster searching once the table is built.
> That pays off if you're searching for the same pattern across a large
> volume of text.  But for the common case where strings are relatively
> short, I'd expect the brute-force (not naive) approach to be the better
> choice.

Well, it really pays off when the following conditions hold:
1. The string to search for is long.
2. The string to search for repeats a largeish prefix in itself later 
(for some of the later descendants)
3. You are likely to find *a lot* of partial matches of the string (so 
full backtracking measurably hurts).

I suspect English words in English text don't satisfy these conditions 
very well, and I presume that (or similar scenarios) would explain a lot 
of String.indexOf searches.

Where this would pay off would be something like trying to find DNA 
subsequences: you have a 1-in-4 chance of partially matching a given 
letter, your target string is likely to be lengthy and have lots of 
similarities within it, and the text being searched is extremely long.

There are other penalties to advanced substring finding algorithms than 
just the build-the-lookup table penalty. The most important is that 
using the jump table has a minor microarchitectural penalty. Other 
penalties include increased amount of memory you need in the cache and 
increased surface for bugs.

-- 
Beware of bugs in the above code; I have only proved it correct, not 
tried it. -- Donald E. Knuth

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

SortedMap question? Knute Johnson <nospam@rabbitbrush.frazmtn.com> - 2012-11-24 15:23 -0800
  Re: SortedMap question? Joerg Meier <joergmmeier@arcor.de> - 2012-11-25 00:57 +0100
    Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 08:49 -0500
      Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 08:59 -0500
        Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 14:02 +0000
          Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:20 -0500
        Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:24 -0500
          Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 21:25 -0500
            Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 22:24 -0500
            Re: SortedMap question? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-26 15:45 -0800
              Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-26 20:29 -0500
    Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 09:02 -0500
      Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-25 17:47 +0000
        Re: SortedMap question? Knute Johnson <nospam@rabbitbrush.frazmtn.com> - 2012-11-25 16:25 -0800
          Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 19:36 -0500
            Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 22:58 -0500
          Re: SortedMap question? Lew <lewbloch@gmail.com> - 2012-11-25 16:53 -0800
          Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-26 02:20 +0000
      Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:20 -0500
        Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 14:44 -0500
          Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:18 -0500
            Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 17:35 -0500
              Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 19:58 -0500
              Re: SortedMap question? Jim Janney <jjanney@shell.xmission.com> - 2012-11-26 12:45 -0700
                Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-26 14:26 -0600
        Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 14:16 -0600
          Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:21 -0500
            Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 17:07 -0600
              Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 20:05 -0500
          Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-26 01:34 +0000
            Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 20:36 -0600
              Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-12-15 12:27 +0000
  Re: SortedMap question? Robert Klemme <shortcutter@googlemail.com> - 2012-11-25 01:36 +0100
  Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 20:00 -0500
  Re: SortedMap question? markspace <-@.> - 2012-11-24 17:34 -0800
    Re: SortedMap question? Robert Klemme <shortcutter@googlemail.com> - 2012-11-25 15:28 +0100
      Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-25 17:49 +0000
  Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 14:00 +0000

csiph-web