Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #19895

Re: optimsed HashMap

From Robert Klemme <shortcutter@googlemail.com>
Newsgroups comp.lang.java.programmer
Subject Re: optimsed HashMap
Date 2012-11-24 16:24 +0100
Message-ID <ahc75eFn0gqU1@mid.individual.net> (permalink)
References <8i70b8d0pm6ibk03ti4t2pv60jd0bctlcs@4ax.com> <5Yidnbg3DrTK2S3NnZ2dnUVZ_uudnZ2d@earthlink.com> <AOKdncX-1L4NPC3NnZ2dnUVZ8k2dnZ2d@bt.com> <p7c1b8dk6uebqjvil7iuo56cq40j0haj94@4ax.com>

Show all headers | View raw


On 24.11.2012 12:39, Roedy Green wrote:
> On Sat, 24 Nov 2012 10:21:14 -0000, "Chris Uppal"
> <chris.uppal@metagnostic.REMOVE-THIS.org> wrote, quoted or indirectly
> quoted someone who said :
>
>> Look into the literature on fast text searching (for instance bit-parallel
>> matching).  It's not entirely clear to me what Roedy is trying to do, but it
>> sounds as if "bulk" matching/searching might be relevant.
>
> Yes a Boyer-Moore to simultaneously search for the whole list of
> words, then when it has a hit see if it has word in isolation rather
> than a word fragment.

Here's another approach:

1. fill a HashMap with the translations.
2. Create a tree or trie from the keys.
3. Convert the trie to a regular expression optimized for NFA automata 
(such as is used in Java std. library).
4. Surround the regexp with additional regexp to ensure word matches and 
probably exclude matching inside HTML tags
5. Scan the document with Matcher.find()

The idea of item 3 is to create a regexp with as little backtracking as 
possible.  For example, from

foo
foot
fuss

you make

(?:f(?:oot?)|uss)

Not sure though whether it is dramatically faster or slower than a 
standard string search like Boyer-Moore - probably not.

Kind regards

	robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-23 17:12 -0800
  Re: optimsed HashMap Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 20:19 -0500
  Re: optimsed HashMap markspace <-@.> - 2012-11-23 17:33 -0800
    Re: optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-23 22:42 -0800
      Re: optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-24 03:34 -0800
        Re: optimsed HashMap Knute Johnson <nospam@knutejohnson.com> - 2012-11-24 08:39 -0800
          Re: optimsed HashMap Knute Johnson <nospam@rabbitbrush.frazmtn.com> - 2012-11-24 15:14 -0800
      Re: optimsed HashMap Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 13:24 -0500
      Re: optimsed HashMap markspace <-@.> - 2012-11-24 10:44 -0800
        Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 13:40 +0000
      Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-26 22:03 +0100
        Re: optimsed HashMap Robert Klemme <shortcutter@googlemail.com> - 2012-11-26 23:32 +0100
          Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-27 03:24 +0100
            Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-27 03:35 +0100
              Re: optimsed HashMap Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-27 08:44 -0500
                Re: optimsed HashMap Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-27 14:20 -0800
                Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-30 03:35 +0100
  Re: optimsed HashMap Patricia Shanahan <pats@acm.org> - 2012-11-23 19:51 -0800
    Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-24 10:21 +0000
      Re: optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-24 03:39 -0800
        Re: optimsed HashMap Robert Klemme <shortcutter@googlemail.com> - 2012-11-24 16:24 +0100
          Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 13:50 +0000
            Re: optimsed HashMap Robert Klemme <shortcutter@googlemail.com> - 2012-11-25 15:30 +0100
      Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-26 21:13 +0000
    Re: optimsed HashMap Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 13:16 -0500
  Re: optimsed HashMap v_borchert@despammed.com (Volker Borchert) - 2012-11-24 08:05 +0000
  Re: optimsed HashMap Silvio <silvio@internet.com> - 2012-11-26 11:57 +0100
  Re: optimsed HashMap Jim Janney <jjanney@shell.xmission.com> - 2012-11-26 11:13 -0700
  Re: optimsed HashMap Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-26 15:44 -0800
    Re: optimsed HashMap Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-26 20:28 -0500
      Re: optimsed HashMap Arved Sandstrom <asandstrom2@eastlink.ca> - 2012-11-27 06:01 -0400
        Re: optimsed HashMap Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-27 08:56 -0500
      Re: optimsed HashMap Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-27 14:16 -0800

csiph-web