Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.021 X-Spam-Evidence: '*H*': 0.96; '*S*': 0.00; 'subject:password': 0.05; 'subsequent': 0.05; 'subject:PyPI': 0.09; 'toss': 0.09; 'cc:addr :python-list': 0.11; "(you're": 0.16; 'frequencies': 0.16; 'from:addr:pobox.com': 0.16; 'from:addr:skip': 0.16; 'sender:addr:gmail.com': 0.17; 'wrote:': 0.18; 'thu,': 0.19; 'aug': 0.22; 'email addr:gmail.com>': 0.22; 'cc:addr:python.org': 0.22; "aren't": 0.24; 'skip': 0.24; 'cc:2**0': 0.24; 'sort': 0.25; 'posts': 0.26; 'header:In-Reply- To:1': 0.27; 'chris': 0.29; 'am,': 0.29; 'words': 0.29; 'message- id:@mail.gmail.com': 0.30; "i'm": 0.30; 'could': 0.34; 'common': 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'really': 0.36; 'choosing': 0.36; 'words,': 0.36; 'doing': 0.36; 'skip:& 10': 0.38; 'checks': 0.38; 'requiring': 0.38; 'generating': 0.39; 'most': 0.60; 'issues,': 0.61; 'simply': 0.61; "you're": 0.61; 'choose': 0.64; 'to:addr:gmail.com': 0.65; 'frequently': 0.68; 'risk': 0.72; '100': 0.79; 'counts': 0.83 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=ALG6v4uwUg9phr10kMkXthT80qOe4Q7ItiYRbqk2W/Q=; b=aEKjeAYsq0y98SztPPPiwy5PH7bPTgbUL3X81Mz7VJK0Z4/jGbaPUtYCuuQ2T+Hpns U/pgeRLPNEHiChVJ8g8SkfsJJ+nV04aarSGXQ87OZU3rBjj4I1bpOlCcE0NLObdVD08F CImCnlZPssWwFxR4uKoc0CJ/TN+v5q1mxnErIXV8pWhW75YpVE96DFofFN7Of+u++pPg qtWZZfRsRRGEsW0LFHkGU1op8LL0Z9AVyN8ZmHpdV7YXwKPJ0ejzWJH/YeX3ZMdMgRHT BBTbEWf1buTVSil6HLzfOreiBAMQ80xMQq5DtHL7dFWS1aqcgIRwAWTcAfkBtghfbUmM O78Q== MIME-Version: 1.0 X-Received: by 10.50.164.202 with SMTP id ys10mr2925131igb.6.1409204283884; Wed, 27 Aug 2014 22:38:03 -0700 (PDT) Sender: skip.montanaro@gmail.com In-Reply-To: References: Date: Thu, 28 Aug 2014 00:38:03 -0500 X-Google-Sender-Auth: nYkmEuK8-aVisfTNgkQTtVBb4Gg Subject: Re: PyPI password rules From: Skip Montanaro To: Chris Angelico Content-Type: multipart/alternative; boundary=089e01494906c1cb990501a9f02e Cc: "python-list@python.org" X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 50 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1409204287 news.xs4all.nl 2914 [2001:888:2000:d::a6]:53964 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:77183 --089e01494906c1cb990501a9f02e Content-Type: text/plain; charset=UTF-8 On Thu, Aug 28, 2014 at 12:08 AM, Chris Angelico wrote: > Interesting. I suspect this may have issues, as you're doing these > checks progressively; something that's common in the early posts will > be weighted without regard to subsequent posts (you're requiring 100 > unique words before recording anything, but that's still not all that > many). > I'm not really that worried about it. The number of words and their counts grows rapidly, so the risk of choosing "common" words which aren't actually used much is small. If it was a problem, I do have the word frequencies available. I could simply sort by the counts and choose the 2**N most frequently occurring words, then toss out "bad" words before generating passwords. Skip --089e01494906c1cb990501a9f02e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

= On Thu, Aug 28, 2014 at 12:08 AM, Chris Angelico <rosuav@gmail.com><= /span> wrote:
Interesting. I suspect this may have issues, as you're doi= ng these
checks progressively; something that's common in the early posts will be weighted without regard to subsequent posts (you're requiring 100 unique words before recording anything, but that's still not all that many).

I'm not really that worried about it= . The number of words and their counts grows rapidly, so the risk of choosi= ng "common" words which aren't actually used much is small. I= f it was a problem, I do have the word frequencies available. I could simpl= y sort by the counts and choose the 2**N most frequently occurring words, t= hen toss out "bad" words before generating passwords.

Skip
<= div class=3D"gmail_extra">
--089e01494906c1cb990501a9f02e--