Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #19984
| Path | csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!eternal-september.org!feeder.eternal-september.org!mx04.eternal-september.org!.POSTED!not-for-mail |
|---|---|
| From | Joshua Cranmer <Pidgeot18@verizon.invalid> |
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: SortedMap question? |
| Date | Mon, 26 Nov 2012 14:26:26 -0600 |
| Organization | A noiseless patient Spider |
| Lines | 43 |
| Message-ID | <k90ja3$bg0$1@dont-email.me> (permalink) |
| References | <k8rkud$i3o$1@dont-email.me> <q4qoyde15lpk$.13amx8nbfrj28.dlg@40tude.net> <k8t8dp$tv5$1@dont-email.me> <50b26f67$0$295$14726298@news.sunsite.dk> <k8tsfj$nn5$1@dont-email.me> <50b28b1a$0$289$14726298@news.sunsite.dk> <k8u6fq$i0q$1@dont-email.me> <ydnvccs87wo.fsf@shell.xmission.com> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=UTF-8; format=flowed |
| Content-Transfer-Encoding | 7bit |
| Injection-Date | Mon, 26 Nov 2012 20:26:43 +0000 (UTC) |
| Injection-Info | mx04.eternal-september.org; posting-host="5a9707252ba5efb9bece56d1f4656a90"; logging-data="11776"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1//4bRqPSU63lKwcOhkEJOe6ZsG+cLHWEY=" |
| User-Agent | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Thunderbird/17.0 |
| In-Reply-To | <ydnvccs87wo.fsf@shell.xmission.com> |
| Cancel-Lock | sha1:+HBfbE0+WLW8ItzqlkakzfAsj9Y= |
| Xref | csiph.com comp.lang.java.programmer:19984 |
Show key headers only | View raw
On 11/26/2012 1:45 PM, Jim Janney wrote: > Eric Sosman <esosman@comcast-dot-net.invalid> writes: > >> Or look at String#indexOf(String). Some fast algorithms >> like Boyer-Moore have already been mentioned in this thread, yet >> Java's own implementation uses a naive method that is linear at >> best, super-linear if unlucky (I think it can be O(M*N), where M >> and N are the lengths of the two Strings). If the naive method >> were replaced by Boyer-Moore or Knuth-Morris-Pratt or some other >> better-O() algorithm, would you expect performance to improve? > > Boyer-Moore and its descendants invest more time up front building the > jump table, in return for faster searching once the table is built. > That pays off if you're searching for the same pattern across a large > volume of text. But for the common case where strings are relatively > short, I'd expect the brute-force (not naive) approach to be the better > choice. Well, it really pays off when the following conditions hold: 1. The string to search for is long. 2. The string to search for repeats a largeish prefix in itself later (for some of the later descendants) 3. You are likely to find *a lot* of partial matches of the string (so full backtracking measurably hurts). I suspect English words in English text don't satisfy these conditions very well, and I presume that (or similar scenarios) would explain a lot of String.indexOf searches. Where this would pay off would be something like trying to find DNA subsequences: you have a 1-in-4 chance of partially matching a given letter, your target string is likely to be lengthy and have lots of similarities within it, and the text being searched is extremely long. There are other penalties to advanced substring finding algorithms than just the build-the-lookup table penalty. The most important is that using the jump table has a minor microarchitectural penalty. Other penalties include increased amount of memory you need in the cache and increased surface for bugs. -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
SortedMap question? Knute Johnson <nospam@rabbitbrush.frazmtn.com> - 2012-11-24 15:23 -0800
Re: SortedMap question? Joerg Meier <joergmmeier@arcor.de> - 2012-11-25 00:57 +0100
Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 08:49 -0500
Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 08:59 -0500
Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 14:02 +0000
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:20 -0500
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:24 -0500
Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 21:25 -0500
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 22:24 -0500
Re: SortedMap question? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-26 15:45 -0800
Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-26 20:29 -0500
Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 09:02 -0500
Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-25 17:47 +0000
Re: SortedMap question? Knute Johnson <nospam@rabbitbrush.frazmtn.com> - 2012-11-25 16:25 -0800
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 19:36 -0500
Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 22:58 -0500
Re: SortedMap question? Lew <lewbloch@gmail.com> - 2012-11-25 16:53 -0800
Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-26 02:20 +0000
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:20 -0500
Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 14:44 -0500
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:18 -0500
Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 17:35 -0500
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 19:58 -0500
Re: SortedMap question? Jim Janney <jjanney@shell.xmission.com> - 2012-11-26 12:45 -0700
Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-26 14:26 -0600
Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 14:16 -0600
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:21 -0500
Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 17:07 -0600
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 20:05 -0500
Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-26 01:34 +0000
Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 20:36 -0600
Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-12-15 12:27 +0000
Re: SortedMap question? Robert Klemme <shortcutter@googlemail.com> - 2012-11-25 01:36 +0100
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 20:00 -0500
Re: SortedMap question? markspace <-@.> - 2012-11-24 17:34 -0800
Re: SortedMap question? Robert Klemme <shortcutter@googlemail.com> - 2012-11-25 15:28 +0100
Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-25 17:49 +0000
Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 14:00 +0000
csiph-web