Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
Sender: skip.montanaro@gmail.com
In-Reply-To: <CAPTjJmqg968SLy4ozduegj7hdi6Ox+WKmeSf6P5cc4bvoes4Kg@mail.gmail.com>
References: <CAPTjJmo9W42iRTm=Ro7aP0m1sk4QF7uK3U-UsunZ5VD6DFspQg@mail.gmail.com> <CANc-5UyntLBKUy7JAGoA-sU_V84AMfwfBMfvZtS-GGKtaAM8ZA@mail.gmail.com> <CAPTjJmrQi+E-SzBW8E4NmoZFJDWTO5Xwqca=wMwyw55WFt89iQ@mail.gmail.com> <CANc-5UxRtz_RMo7ex0Bc0wnCwXACZi7iE1NPYuO8rwNHO3oH6w@mail.gmail.com> <CAPTjJmpg9QPTOytus8-Ls9MBHz41oEYd4KbZwkDVh2AWQ8CSrA@mail.gmail.com> <CANc-5Uzpo9YjZH9FSUEC9bF6YvZ=4QkDUXJRfg+W8RvR4PfM_A@mail.gmail.com> <CANc-5Uywe+36K0XQeULNz98KDBQqUHtQWQoUC4MgprrZRvshaA@mail.gmail.com> <CANc-5Uw_sMMCFWpSbfv6r61-tVFf4qsy3r17yG6otRtf2KFv2Q@mail.gmail.com> <CAPTjJmrUK4ohh98jWcHeuh=RKq-5dNSD-PiFFMvthwV7xAYnhg@mail.gmail.com> <CANc-5Uww_rT0fCY-PiHVZ7CfZ-cn=qgDtWvCfmdw_TpSixR+cQ@mail.gmail.com> <CAPTjJmqg968SLy4ozduegj7hdi6Ox+WKmeSf6P5cc4bvoes4Kg@mail.gmail.com>
Date: Thu, 28 Aug 2014 00:38:03 -0500
Subject: Re: PyPI password rules
From: Skip Montanaro <skip@pobox.com>
To: Chris Angelico <rosuav@gmail.com>
Content-Type: multipart/alternative; boundary=089e01494906c1cb990501a9f02e
Cc: "python-list@python.org" <python-list@python.org>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.13541.1409204287.18130.python-list@python.org>
Lines: 50
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:77183

--089e01494906c1cb990501a9f02e
Content-Type: text/plain; charset=UTF-8

On Thu, Aug 28, 2014 at 12:08 AM, Chris Angelico <rosuav@gmail.com> wrote:

> Interesting. I suspect this may have issues, as you're doing these
> checks progressively; something that's common in the early posts will
> be weighted without regard to subsequent posts (you're requiring 100
> unique words before recording anything, but that's still not all that
> many).
>

I'm not really that worried about it. The number of words and their counts
grows rapidly, so the risk of choosing "common" words which aren't actually
used much is small. If it was a problem, I do have the word frequencies
available. I could simply sort by the counts and choose the 2**N most
frequently occurring words, then toss out "bad" words before generating
passwords.

Skip

--089e01494906c1cb990501a9f02e
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">=
On Thu, Aug 28, 2014 at 12:08 AM, Chris Angelico <span dir=3D"ltr">&lt;<a h=
ref=3D"mailto:rosuav@gmail.com" target=3D"_blank">rosuav@gmail.com</a>&gt;<=
/span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div id=3D":4or" class=3D"a3s" style=3D"over=
flow:hidden">Interesting. I suspect this may have issues, as you&#39;re doi=
ng these<br>

checks progressively; something that&#39;s common in the early posts will<b=
r>
be weighted without regard to subsequent posts (you&#39;re requiring 100<br=
>
unique words before recording anything, but that&#39;s still not all that<b=
r>
many).</div></blockquote></div><br>I&#39;m not really that worried about it=
. The number of words and their counts grows rapidly, so the risk of choosi=
ng &quot;common&quot; words which aren&#39;t actually used much is small. I=
f it was a problem, I do have the word frequencies available. I could simpl=
y sort by the counts and choose the 2**N most frequently occurring words, t=
hen toss out &quot;bad&quot; words before generating passwords.</div>
<div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra">Skip</div><=
div class=3D"gmail_extra"><br></div></div>

--089e01494906c1cb990501a9f02e--