Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!npeer01.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail From: Lew Newsgroups: comp.lang.java.programmer Subject: Password quality (Was: Patricia trie vs binary search.) Date: Fri, 25 May 2012 09:41:25 -0700 (PDT) Organization: http://groups.google.com Lines: 61 Message-ID: References: NNTP-Posting-Host: 69.28.149.29 Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: posting.google.com 1337964086 25933 127.0.0.1 (25 May 2012 16:41:26 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Fri, 25 May 2012 16:41:26 +0000 (UTC) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=69.28.149.29; posting-account=CP-lKQoAAAAGtB5diOuGlDQk0jIwmH0T User-Agent: G2/1.0 X-Received-Bytes: 3560 Xref: csiph.com comp.lang.java.programmer:14795 markspace wrote: > Thanks for those pointers (pun not intended). The "sub-string search"=20 > was referring to was picking words out of a longer, undelimited string. >=20 > The example I posted looked for bad passwords within a larger password. >=20 > "passxword" contains at least two bad passwords, pass and word,=20 > according to the dictionary file I download from here (that's the=20 > bpw.txt file in the sample code): >=20 > >=20 > The code I posted find pass, ass, word, and or in the string=20 > "passxword", which are all bad passwords according to that link above.=20 > (I deliberately avoid reporting single letters as bad). I wonder about eliminating two-letter combinations. How much entropy does t= hat add (or subtract) from passwords? It's practicable and arguably more reliable to use passphrases comprising= =20 all natural words whose entropy exceeds that of a fairly long Mxyzptlk=AE= =20 key. (Note: "Mxyzptlk" may well pass all your password checks, yet is highl= y=20 guessable. Equally flawed are other stringlets that pass naive checks,=20 like "XB-17", "UB40" and others.) See=20 http://world.std.com/~reinhold/diceware.html=20 for how to create high-entropy, highly memorable passphrases. > Practically speaking, hashing appears to be just as fast for a lookup,=20 > but hashing doesn't tell you when to stop, so it's always N^2 for this=20 > sort of problem. My algorithm should average less, hopefully much less= =20 > if you have very long input strings. >=20 > I really don't know what set me off looking at this; it's just a little= =20 > random experiment I suppose. Your main question of space- and time-efficient substring matching is=20 a fairly well-studied problem. I don't off the top of my head have better= =20 answers, but your approach to experiment is viable. You could instrument (profile) your code over different algorithms and=20 input dataset profiles (that is, relative proportion of long, medium-length= =20 and short strings, relative proportion of "bad" substrings in the mix, and= =20 other factors that affect the "shape" of the data your algorithms process). Algorithm analysis should give you the big O, but not the constant factors= =20 or where they break for lower /n/. A lot of algorithm analysis critically depends on what constitutes "average= "=20 datasets. --=20 Lew