Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #14891
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: Patricia trie vs binary search. |
| References | (2 earlier) <s0m5s7t5gt33plp5m6cn07g4ovehvedfb7@4ax.com> <ADDwr.2826$9Q6.1871@newsfe18.iad> <hr87s756sdaku47lh8sebnegsmskclu637@4ax.com> <jq1kpt$1ee$1@news.albasani.net> <7ss9s7tu1a8v9qbkc3qnhfg8rhck3ov4bi@4ax.com> |
| Message-ID | <Gb7xr.31449$TC4.15127@newsfe14.iad> (permalink) |
| Date | 2012-05-29 09:55 -0700 |
On 5/29/12 9:14 AM, Gene Wirchenko wrote: > On Mon, 28 May 2012 21:54:39 -0700, Lew<noone@lewscanon.com> wrote: > >> On 05/28/2012 09:20 AM, Gene Wirchenko wrote: >>> On Sun, 27 May 2012 22:00:14 -0700, Daniel Pitts >>> <newsgroup.nospam@virtualinfinity.net> wrote: >>> >>>> On 5/27/12 6:44 PM, Gene Wirchenko wrote: > > [snip] > >>> These are not particularly extreme examples. >> >> It's not a question of how extreme the examples are but how many there are. > > I mentioned extremity just to point out that such base words are > not that unusual. There are a lot of them. > >> Not all words can be legitimately affixed. Many can be affixed by algorithm, >> or by bitmaps as to which affixes apply, so you only store the root, the >> bitmap and perhaps one more form. > > There are multiple affixes. I suppose they could be combined > into aggregate affixes. e.g. -less + -ness -> -lessness. > >> I don't know how much memory expansion you think your factors will cause, as >> you only hand wave and say there will be some and act like it's a problem, but >> let's say it doubles the size of the dictionary. By Daniel's experiment > > Let's not make up data. I would like to know how much of the > English language actually is in his dataset. Agreed, let's not make up data. Using the Moby word-list as a guide, it includes a significant number of those particular suffixes and prefixes you've mentioned. Granted, its not a complete word-list, but then again nothing is. It is "complete enough" for most purposes. > >> upthread, that would bring it to around 8 MiB, let's round and say 10MiB. >> Being text and all, that should compress to about 3 MiB or less. >> >> Now I am interested to hear what sort of trouble you assert that 3 MiB or so >> of storage will cause. > > Assuming your facts and then calling me on what you made up? > > http://oxforddictionaries.com/words/how-many-words-are-there-in-the-english-language > is short but interesting reading. Its final paragraph: "This suggests > that there are, at the very least, a quarter of a million distinct > English words, excluding inflections, and words from technical and > regional vocabulary not covered by the OED, or words not yet added to > the published dictionary, of which perhaps 20 per cent are no longer > in current use. If distinct senses were counted, the total would > probably approach three quarters of a million." > > Now, add on what they exclude. Is one million words out of line? > Itis one thing to have some of a language encoded. It is quite > another to try to handle everything. Exceptional cases tend to be > more difficult to handle. Are you arguing that a modern system can't handle that number of words? A modern desktop has more than enough memory to easily handle a quarter *billion* words, which is a 100 times greater than your guessed upper limit. And that's *without* compression.
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Patricia trie vs binary search. markspace <-@.> - 2012-05-24 16:07 -0700
Re: Patricia trie vs binary search. glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2012-05-24 23:39 +0000
Re: Patricia trie vs binary search. markspace <-@.> - 2012-05-24 17:56 -0700
Password quality (Was: Patricia trie vs binary search.) Lew <lewbloch@gmail.com> - 2012-05-25 09:41 -0700
Re: Password quality (Was: Patricia trie vs binary search.) markspace <-@.> - 2012-05-25 12:17 -0700
Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-26 17:30 -0700
Re: Patricia trie vs binary search. markspace <-@.> - 2012-05-26 18:17 -0700
Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-27 18:44 -0700
Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-27 22:00 -0700
Re: Patricia trie vs binary search. markspace <-@.> - 2012-05-28 08:20 -0700
Re: Patricia trie vs binary search. markspace <-@.> - 2012-05-28 14:38 -0700
Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-28 09:20 -0700
Re: Patricia trie vs binary search. Lew <noone@lewscanon.com> - 2012-05-28 21:54 -0700
Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-29 09:14 -0700
Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 09:55 -0700
Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-29 11:17 -0700
Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 11:22 -0700
Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-29 14:44 -0700
Re: Patricia trie vs binary search. Lew <lewbloch@gmail.com> - 2012-05-29 14:03 -0700
Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-29 14:49 -0700
Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 15:23 -0700
Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 15:39 -0700
Re: Patricia trie vs binary search. Gene Wirchenko <genew@ocis.net> - 2012-05-29 16:08 -0700
Re: Patricia trie vs binary search. Lew <lewbloch@gmail.com> - 2012-05-29 18:25 -0700
Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 09:16 -0700
Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 13:37 -0400
Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 10:49 -0700
Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 13:58 -0400
Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 14:20 -0400
Re: Patricia trie vs binary search. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-29 11:21 -0700
Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 14:29 -0400
Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 15:00 -0400
Re: Patricia trie vs binary search. Jeff Higgins <jeff@invalid.invalid> - 2012-05-29 09:24 -0400
csiph-web