Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #14874

Re: Patricia trie vs binary search.

From Lew <noone@lewscanon.com>
Newsgroups comp.lang.java.programmer
Subject Re: Patricia trie vs binary search.
Date 2012-05-28 21:54 -0700
Organization albasani.net
Message-ID <jq1kpt$1ee$1@news.albasani.net> (permalink)
References <jpmev4$63l$1@dont-email.me> <uAewr.45671$On2.38133@newsfe16.iad> <s0m5s7t5gt33plp5m6cn07g4ovehvedfb7@4ax.com> <ADDwr.2826$9Q6.1871@newsfe18.iad> <hr87s756sdaku47lh8sebnegsmskclu637@4ax.com>

Show all headers | View raw


On 05/28/2012 09:20 AM, Gene Wirchenko wrote:
> On Sun, 27 May 2012 22:00:14 -0700, Daniel Pitts
> <newsgroup.nospam@virtualinfinity.net>  wrote:
>
>> On 5/27/12 6:44 PM, Gene Wirchenko wrote:
>>> On Sat, 26 May 2012 17:30:17 -0700, Daniel Pitts
>>> <newsgroup.nospam@virtualinfinity.net>   wrote:
>>>
>>> [snip]
>>>
>>>> I tend to use a Deterministic Finite State Automata for this.  You can
>>>> load the entire English dictionary fairly easily with that scheme. Yes,
>>>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> you use a bit of memory, but unless you're doing this on an embedded
>>>> device, its probably not enough memory to be concerned about.
>>>
>>>        Including all affixes?
>> I suppose it depends on the particular dictionary, but we're only
>> talking a few hundred thousand entries, at least with the Moby
>> word-lists as a base:
>
>       Considering how many affixes can be applied to some words, I find
> that very questionable:
>            self:
>                 *ish
>                 *ishly
>                 un*ish
>                 un*ishly
>                 *less
>                 *lessly
>                 un*less
>                 un*lessly
>            position:
>                 *s
>                 *ed
>                 *al
>                 *ally
>                 re*
>                 re*s
>                 re*ed
>                 *less
>                 mis*
>                 *er
>                 *ers
>            friend
>                 *s
>                 *ly
>                 *liness
>                 be*
>                 be*ing
>                 be*ed
>                 be*er
>                 be*ers
> These are not particularly extreme examples.

It's not a question of how extreme the examples are but how many there are.

Not all words can be legitimately affixed. Many can be affixed by algorithm, 
or by bitmaps as to which affixes apply, so you only store the root, the 
bitmap and perhaps one more form.

I don't know how much memory expansion you think your factors will cause, as 
you only hand wave and say there will be some and act like it's a problem, but 
let's say it doubles the size of the dictionary. By Daniel's experiment 
upthread, that would bring it to around 8 MiB, let's round and say 10MiB. 
Being text and all, that should compress to about 3 MiB or less.

Now I am interested to hear what sort of trouble you assert that 3 MiB or so 
of storage will cause.

-- 
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Patricia trie vs binary search. markspace <-@.> - 2012-05-24 16:07 -0700
  Re: Patricia trie vs binary search. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2012-05-24 23:39 +0000
    Re: Patricia trie vs binary search. markspace <-@.> - 2012-05-24 17:56 -0700
      Password quality (Was: Patricia trie vs binary search.) Lew <lewbloch@gmail.com> - 2012-05-25 09:41 -0700
        Re: Password quality (Was: Patricia trie vs binary search.) markspace <-@.> - 2012-05-25 12:17 -0700
  Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-26 17:30 -0700
    Re: Patricia trie vs binary search. markspace <-@.> - 2012-05-26 18:17 -0700
    Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-27 18:44 -0700
      Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-27 22:00 -0700
        Re: Patricia trie vs binary search. markspace <-@.> - 2012-05-28 08:20 -0700
          Re: Patricia trie vs binary search. markspace <-@.> - 2012-05-28 14:38 -0700
        Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-28 09:20 -0700
          Re: Patricia trie vs binary search. Lew <noone@lewscanon.com> - 2012-05-28 21:54 -0700
            Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-29 09:14 -0700
              Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 09:55 -0700
                Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-29 11:17 -0700
                Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 11:22 -0700
                Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-29 14:44 -0700
                Re: Patricia trie vs binary search. Lew <lewbloch@gmail.com> - 2012-05-29 14:03 -0700
                Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-29 14:49 -0700
                Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 15:23 -0700
                Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 15:39 -0700
                Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-29 16:08 -0700
                Re: Patricia trie vs binary search. Lew <lewbloch@gmail.com> - 2012-05-29 18:25 -0700
            Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 09:16 -0700
              Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 13:37 -0400
                Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 10:49 -0700
                Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 13:58 -0400
                Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 14:20 -0400
                Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 11:21 -0700
                Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 14:29 -0400
                Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 15:00 -0400
  Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 09:24 -0400

csiph-web