Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #14891

Re: Patricia trie vs binary search.

From Daniel Pitts <newsgroup.nospam@virtualinfinity.net>
Newsgroups comp.lang.java.programmer
Subject Re: Patricia trie vs binary search.
References (2 earlier) <s0m5s7t5gt33plp5m6cn07g4ovehvedfb7@4ax.com> <ADDwr.2826$9Q6.1871@newsfe18.iad> <hr87s756sdaku47lh8sebnegsmskclu637@4ax.com> <jq1kpt$1ee$1@news.albasani.net> <7ss9s7tu1a8v9qbkc3qnhfg8rhck3ov4bi@4ax.com>
Message-ID <Gb7xr.31449$TC4.15127@newsfe14.iad> (permalink)
Date 2012-05-29 09:55 -0700

Show all headers | View raw


On 5/29/12 9:14 AM, Gene Wirchenko wrote:
> On Mon, 28 May 2012 21:54:39 -0700, Lew<noone@lewscanon.com>  wrote:
>
>> On 05/28/2012 09:20 AM, Gene Wirchenko wrote:
>>> On Sun, 27 May 2012 22:00:14 -0700, Daniel Pitts
>>> <newsgroup.nospam@virtualinfinity.net>   wrote:
>>>
>>>> On 5/27/12 6:44 PM, Gene Wirchenko wrote:
>
> [snip]
>
>>> These are not particularly extreme examples.
>>
>> It's not a question of how extreme the examples are but how many there are.
>
>       I mentioned extremity just to point out that such base words are
> not that unusual.  There are a lot of them.
>
>> Not all words can be legitimately affixed. Many can be affixed by algorithm,
>> or by bitmaps as to which affixes apply, so you only store the root, the
>> bitmap and perhaps one more form.
>
>       There are multiple affixes.  I suppose they could be combined
> into aggregate affixes.  e.g. -less + -ness ->  -lessness.
>
>> I don't know how much memory expansion you think your factors will cause, as
>> you only hand wave and say there will be some and act like it's a problem, but
>> let's say it doubles the size of the dictionary. By Daniel's experiment
>
>       Let's not make up data.  I would like to know how much of the
> English language actually is in his dataset.
Agreed, let's not make up data. Using the Moby word-list as a guide, it 
includes a significant number of those particular suffixes and prefixes 
you've mentioned. Granted, its not a complete word-list, but then again 
nothing is.  It is "complete enough" for most purposes.

>
>> upthread, that would bring it to around 8 MiB, let's round and say 10MiB.
>> Being text and all, that should compress to about 3 MiB or less.
>>
>> Now I am interested to hear what sort of trouble you assert that 3 MiB or so
>> of storage will cause.
>
>       Assuming your facts and then calling me on what you made up?
>
> http://oxforddictionaries.com/words/how-many-words-are-there-in-the-english-language
> is short but interesting reading.  Its final paragraph: "This suggests
> that there are, at the very least, a quarter of a million distinct
> English words, excluding inflections, and words from technical and
> regional vocabulary not covered by the OED, or words not yet added to
> the published dictionary, of which perhaps 20 per cent are no longer
> in current use. If distinct senses were counted, the total would
> probably approach three quarters of a million."
>
>       Now, add on what they exclude.  Is one million words out of line?
>       Itis one thing to have some of a language encoded.  It is quite
> another to try to handle everything.  Exceptional cases tend to be
> more difficult to handle.

Are you arguing that a modern system can't handle that number of words? 
A modern desktop has more than enough memory to easily handle a quarter 
*billion* words, which is a 100 times greater than your guessed upper limit.

And that's *without* compression.

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Patricia trie vs binary search. markspace <-@.> - 2012-05-24 16:07 -0700
  Re: Patricia trie vs binary search. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2012-05-24 23:39 +0000
    Re: Patricia trie vs binary search. markspace <-@.> - 2012-05-24 17:56 -0700
      Password quality (Was: Patricia trie vs binary search.) Lew <lewbloch@gmail.com> - 2012-05-25 09:41 -0700
        Re: Password quality (Was: Patricia trie vs binary search.) markspace <-@.> - 2012-05-25 12:17 -0700
  Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-26 17:30 -0700
    Re: Patricia trie vs binary search. markspace <-@.> - 2012-05-26 18:17 -0700
    Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-27 18:44 -0700
      Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-27 22:00 -0700
        Re: Patricia trie vs binary search. markspace <-@.> - 2012-05-28 08:20 -0700
          Re: Patricia trie vs binary search. markspace <-@.> - 2012-05-28 14:38 -0700
        Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-28 09:20 -0700
          Re: Patricia trie vs binary search. Lew <noone@lewscanon.com> - 2012-05-28 21:54 -0700
            Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-29 09:14 -0700
              Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 09:55 -0700
                Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-29 11:17 -0700
                Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 11:22 -0700
                Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-29 14:44 -0700
                Re: Patricia trie vs binary search. Lew <lewbloch@gmail.com> - 2012-05-29 14:03 -0700
                Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-29 14:49 -0700
                Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 15:23 -0700
                Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 15:39 -0700
                Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-29 16:08 -0700
                Re: Patricia trie vs binary search. Lew <lewbloch@gmail.com> - 2012-05-29 18:25 -0700
            Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 09:16 -0700
              Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 13:37 -0400
                Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 10:49 -0700
                Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 13:58 -0400
                Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 14:20 -0400
                Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 11:21 -0700
                Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 14:29 -0400
                Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 15:00 -0400
  Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 09:24 -0400

csiph-web