Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!npeer01.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
From: Lew <lewbloch@gmail.com>
Newsgroups: comp.lang.java.programmer
Subject: Password quality (Was: Patricia trie vs binary search.)
Date: Fri, 25 May 2012 09:41:25 -0700 (PDT)
Organization: http://groups.google.com
Lines: 61
Message-ID: <a076f3cb-d062-41e1-9a9a-43de60de82fb@googlegroups.com>
References: <jpmev4$63l$1@dont-email.me> <jpmgrb$o5a$1@speranza.aioe.org> <jpmlbf$a3o$1@dont-email.me>
NNTP-Posting-Host: 69.28.149.29
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Trace: posting.google.com 1337964086 25933 127.0.0.1 (25 May 2012 16:41:26 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Fri, 25 May 2012 16:41:26 +0000 (UTC)
In-Reply-To: <jpmlbf$a3o$1@dont-email.me>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=69.28.149.29; posting-account=CP-lKQoAAAAGtB5diOuGlDQk0jIwmH0T
User-Agent: G2/1.0
X-Received-Bytes: 3560
Xref: csiph.com comp.lang.java.programmer:14795

markspace wrote:
> Thanks for those pointers (pun not intended).  The "sub-string search"=20
> was referring to was picking words out of a longer, undelimited string.
>=20
> The example I posted looked for bad passwords within a larger password.
>=20
> "passxword" contains at least two bad passwords, pass and word,=20
> according to the dictionary file I download from here (that's the=20
> bpw.txt file in the sample code):
>=20
> <http://www.godlikeproductions.com/badpasswords.php>
>=20
> The code I posted find pass, ass, word, and or in the string=20
> "passxword", which are all bad passwords according to that link above.=20
> (I deliberately avoid reporting single letters as bad).

I wonder about eliminating two-letter combinations. How much entropy does t=
hat add (or subtract) from passwords?

It's practicable and arguably more reliable to use passphrases comprising=
=20
all natural words whose entropy exceeds that of a fairly long Mxyzptlk=AE=
=20
key. (Note: "Mxyzptlk" may well pass all your password checks, yet is highl=
y=20
guessable. Equally flawed are other stringlets that pass naive checks,=20
like "XB-17", "UB40" and others.) See=20
http://world.std.com/~reinhold/diceware.html=20
for how to create high-entropy, highly memorable passphrases.

> Practically speaking, hashing appears to be just as fast for a lookup,=20
> but hashing doesn't tell you when to stop, so it's always N^2 for this=20
> sort of problem.  My algorithm should average less, hopefully much less=
=20
> if you have very long input strings.
>=20
> I really don't know what set me off looking at this; it's just a little=
=20
> random experiment I suppose.

Your main question of space- and time-efficient substring matching is=20
a fairly well-studied problem. I don't off the top of my head have better=
=20
answers, but your approach to experiment is viable.

You could instrument (profile) your code over different algorithms and=20
input dataset profiles (that is, relative proportion of long, medium-length=
=20
and short strings, relative proportion of "bad" substrings in the mix, and=
=20
other factors that affect the "shape" of the data your algorithms process).
Algorithm analysis should give you the big O, but not the constant factors=
=20
or where they break for lower /n/.

A lot of algorithm analysis critically depends on what constitutes "average=
"=20
datasets.

--=20
Lew