Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #75871 > unrolled thread
| Started by | Paul Wolf <paulwolf333@gmail.com> |
|---|---|
| First post | 2014-08-08 02:01 -0700 |
| Last post | 2014-08-10 10:38 -0600 |
| Articles | 9 on this page of 29 — 10 participants |
Back to article view | Back to comp.lang.python
Template language for random string generation Paul Wolf <paulwolf333@gmail.com> - 2014-08-08 02:01 -0700
Re: Template language for random string generation Chris Angelico <rosuav@gmail.com> - 2014-08-08 19:22 +1000
Re: Template language for random string generation Paul Wolf <paulwolf333@gmail.com> - 2014-08-08 02:42 -0700
Re: Template language for random string generation Ned Batchelder <ned@nedbatchelder.com> - 2014-08-08 07:20 -0400
Re: Template language for random string generation Paul Wolf <paulwolf333@gmail.com> - 2014-08-08 06:02 -0700
Re: Template language for random string generation Chris Angelico <rosuav@gmail.com> - 2014-08-08 21:29 +1000
Re: Template language for random string generation Paul Wolf <paulwolf333@gmail.com> - 2014-08-08 06:03 -0700
Re: Template language for random string generation Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-09 00:08 +1000
Re: Template language for random string generation Skip Montanaro <skip@pobox.com> - 2014-08-08 09:35 -0500
Re: Template language for random string generation cwolf.algo@gmail.com - 2014-08-08 11:43 -0700
Re: Template language for random string generation Nick Cash <nick.cash@npcinternational.com> - 2014-08-08 20:28 +0000
Re: Template language for random string generation Ian Kelly <ian.g.kelly@gmail.com> - 2014-08-08 16:03 -0600
Re: Template language for random string generation Paul Wolf <paulwolf333@gmail.com> - 2014-08-08 23:52 -0700
Re: Template language for random string generation Ian Kelly <ian.g.kelly@gmail.com> - 2014-08-09 01:49 -0600
Re: Template language for random string generation Ian Kelly <ian.g.kelly@gmail.com> - 2014-08-09 01:57 -0600
Re: Template language for random string generation Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-08-10 05:43 -0700
Re: Template language for random string generation Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-11 02:31 +1000
Re: Template language for random string generation Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-08-10 11:28 -0700
Re: Template language for random string generation Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-11 12:22 +1000
Re: Template language for random string generation Chris Angelico <rosuav@gmail.com> - 2014-08-11 12:31 +1000
Re: Template language for random string generation Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-08-11 00:01 -0700
Re: Template language for random string generation Chris Angelico <rosuav@gmail.com> - 2014-08-11 05:25 +1000
Re: Template language for random string generation Paul Wolf <paulwolf333@gmail.com> - 2014-08-10 22:06 -0700
Re: Template language for random string generation Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-08-11 08:58 +0100
Re: Template language for random string generation Paul Wolf <paulwolf333@gmail.com> - 2014-08-10 09:34 -0700
Re: Template language for random string generation Ian Kelly <ian.g.kelly@gmail.com> - 2014-08-10 10:47 -0600
Re: Template language for random string generation Paul Wolf <paulwolf333@gmail.com> - 2014-08-10 21:56 -0700
Re: Template language for random string generation Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-08-10 11:48 -0700
Re: Template language for random string generation Ian Kelly <ian.g.kelly@gmail.com> - 2014-08-10 10:38 -0600
Page 2 of 2 — ← Prev page 1 [2]
| From | Devin Jeanpierre <jeanpierreda@gmail.com> |
|---|---|
| Date | 2014-08-11 00:01 -0700 |
| Message-ID | <mailman.12838.1407740566.18130.python-list@python.org> |
| In reply to | #76014 |
On Sun, Aug 10, 2014 at 7:22 PM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > Devin Jeanpierre wrote: > >> On Sun, Aug 10, 2014 at 9:31 AM, Steven D'Aprano >> <steve+comp.lang.python@pearwood.info> wrote: > >>> I don't think that using a good, but not cryptographically-strong, random >>> number generator to generate passwords is a serious vulnerability. What's >>> your threat model? >> >> I've always wanted a password generator that worked on the fly based >> off of a master password. If the passwords are generated randomly but >> not cryptographically securely so, then given sufficiently many >> passwords, the master password might be deduced. > > o_O > > So, what you're saying is that you're concerned that if an attacker has all > your passwords, they might be able to generate new passwords? No, I meant what I said. I was pretty specific. -- Devin
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-08-11 05:25 +1000 |
| Message-ID | <mailman.12826.1407698764.18130.python-list@python.org> |
| In reply to | #75984 |
On Mon, Aug 11, 2014 at 2:31 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> Personally, I think even the OP's specified language is too complex. For
> example, it supports literal text, but given the use-case (password
> generators) do we really want to support templates like "password[\d]"? I
> don't think so, and if somebody did, they can trivially say "password" +
> SG('[\d]').render().
What if you're using this to generate IDs for something (think Youtube
video references), and you want to have an alphabetic portion and a
numeric portion separated by a hyphen? I think there is a use-case for
interior literal text, because otherwise you'd have to either split
the result or do two calls to the generator.
> Here, let me google that for you :-)
>
> https://duckduckgo.com/html/?q=python+crypto
Hehe. :)
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Paul Wolf <paulwolf333@gmail.com> |
|---|---|
| Date | 2014-08-10 22:06 -0700 |
| Message-ID | <11d608c5-b615-4571-a146-b3506c44f24c@googlegroups.com> |
| In reply to | #75984 |
On Sunday, 10 August 2014 17:31:01 UTC+1, Steven D'Aprano wrote:
> Devin Jeanpierre wrote:
>
>
>
> > On Fri, Aug 8, 2014 at 2:01 AM, Paul Wolf <paulwolf333@gmail.com> wrote:
>
> >> This is a proposal with a working implementation for a random string
>
> >> generation template syntax for Python. `strgen` is a module for
>
> >> generating random strings in Python using a regex-like template language.
>
> >> Example:
>
> >>
>
> >> >>> from strgen import StringGenerator as SG
>
> >> >>> SG("[\l\d]{8:15}&[\d]&[\p]").render()
>
> >> u'F0vghTjKalf4^mGLk'
>
> >
>
> > Why aren't you using regular expressions? I am all for conciseness,
>
> > but using an existing format is so helpful...
>
>
>
> You've just answered your own question:
>
>
>
> > Unfortunately, the equivalent regexp probably looks like
>
> > r'(?=.*[0-9])(?=.*[A-Z])(?=.*[a-z])[a-zA-Z0-9]{8:15}'
>
>
>
> Apart from being needlessly verbose, regex syntax is not appropriate because
>
> it specifies too much, specifies too little, and specifies the wrong
>
> things. It specifies too much: regexes like ^ and $ are meaningless in this
>
> case. It specifies too little: there's no regex for the "shuffle operator".
>
> And it specifies the wrong things: regexes like (?= ...) as used in your
>
> example are for matching, not generating strings, and it isn't clear
>
> what "match any character but don't consume any of the string" means when
>
> generating strings.
>
>
>
> Personally, I think even the OP's specified language is too complex. For
>
> example, it supports literal text, but given the use-case (password
>
> generators) do we really want to support templates like "password[\d]"? I
>
> don't think so, and if somebody did, they can trivially say "password" +
>
> SG('[\d]').render().
>
>
>
> Larry Wall (the creator of Perl) has stated that one of the mistakes with
>
> Perl's regular expression mini-language is that the Huffman coding is
>
> wrong. Common things should be short, uncommon things can afford to be
>
> longer. Since the most common thing for password generation is to specify
>
> character classes, they should be short, e.g. d rather than [\d] (one
>
> character versus four).
>
>
>
> The template given could potentially be simplified to:
>
>
>
> "(LD){8:15}&D&P"
>
>
>
> where the round brackets () are purely used for grouping. Character codes
>
> are specified by a single letter. (I use uppercase to avoid the problem
>
> that l & 1 look very similar. YMMV.) The model here is custom format codes
>
> from spreadsheets, which should be comfortable to anyone who is familiar
>
> with Excel or OpenOffice. If you insist on having the facility to including
>
> literal text in your templates, might I suggest:
>
>
>
> "'password'd" # Literal string "password", followed by a single digit.
>
>
>
> but personally I believe that for the use-case given, that's a mistake.
>
>
>
> Alternatively, date/time templates use two-character codes like %Y %m etc,
>
> which is better than
>
>
>
>
>
>
>
> > (I've been working on this kind of thing with regexps, but it's still
>
> > incomplete.)
>
> >
>
> >> * Uses SystemRandom class (if available, or falls back to Random)
>
> >
>
> > This sounds cryptographically weak. Isn't the normal thing to do to
>
> > use a cryptographic hash function to generate a pseudorandom sequence?
>
>
>
> I don't think that using a good, but not cryptographically-strong, random
>
> number generator to generate passwords is a serious vulnerability. What's
>
> your threat model? Attacks on passwords tend to be one of a very few:
>
>
>
> - dictionary attacks (including tables of common passwords and
>
> simple transformations of words, e.g. 'pas5w0d');
>
>
>
> - brute force against short and weak passwords;
>
>
>
> - attacking the hash function used to store passwords (not the password
>
> itself), e.g. rainbow tables;
>
>
>
> - keyloggers or some other way of stealing the password (including
>
> phishing sites and the ever-popular "beat them with a lead pipe
>
> until they give up the password");
>
>
>
> - other social attacks, e.g. guessing that the person's password is their
>
> date of birth in reverse.
>
>
>
> But unless the random number generator is *ridiculously* weak ("9, 9, 9, 9,
>
> 9, 9, ...") I can't see any way to realistically attack the password
>
> generator based on the weakness of the random number generator. Perhaps I'm
>
> missing something?
>
>
>
>
>
> > Someone should write a cryptographically secure pseudorandom number
>
> > generator library for Python. :(
>
>
>
> Here, let me google that for you :-)
>
>
>
> https://duckduckgo.com/html/?q=python+crypto
>
>
>
>
>
>
>
> --
>
> Steven
I should clarify that the use case of password generation is only one of the use cases out of several that strgen is intended to support. It is also for:
Test data generation:
[\l]{1:20}&[._]{0:1}@[\l]{15}.(com|net|org)
email addresses that use word characters and might have a period or an underscore in the first part. Or
((john|robert|harry)|(mary|agnes|shelly)) (smith|jones|taylor)
produce names with roughly equal distribution of female/male first names. I contemplated - but did not implement - a feature where you can give strgen named functions that generate the required string (using whatever selection process that implementation chooses):
($malefirstname|$femalefirstname) $lastname
where
def malefirstname():
# get a name from the database at random
Voucher generation:
[\d]{10}
10-digit voucher numbers.
In none of the foregoing is security a concern, it should be noted.
> Since the most common thing for password generation is to specify
> character classes, they should be short, e.g. d rather than [\d] (one
> character versus four).
But you assume only standard character classes and not custom ones like "[aeiuy]", not to mention unicode ranges outside of the English language.
> If you insist on having the facility to including
literal text in your templates,
I do :-), as per above.
> might I suggest:
"'password'd" # Literal string "password", followed by a single digit.
As per above, I think the more verbose notation for character classes is necessary. Although your suggestion is not a bad one. I could have taken a route where you define the character classes with aliases and then construct a very lean template. That is effectively what the - unimplemented - function expressions do in the example above.
The ability to produce weak passwords ('[abc]{3}') is something I chose not to take up in the strgen module because it should be (mostly) agnostic about what constitutes good security and to support a broader set of use cases as per above.
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2014-08-11 08:58 +0100 |
| Message-ID | <mailman.12839.1407743904.18130.python-list@python.org> |
| In reply to | #76020 |
On 11/08/2014 06:06, Paul Wolf wrote: I'm pleased to see that you have answers. In return would you please read and action this https://wiki.python.org/moin/GoogleGroupsPython to prevent us seeing double line spacing and single line paragraphs, thanks. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Paul Wolf <paulwolf333@gmail.com> |
|---|---|
| Date | 2014-08-10 09:34 -0700 |
| Message-ID | <b8cf8463-ba05-4f2f-b572-60f73a7a9917@googlegroups.com> |
| In reply to | #75981 |
On Sunday, 10 August 2014 13:43:04 UTC+1, Devin Jeanpierre wrote:
> On Fri, Aug 8, 2014 at 2:01 AM, Paul Wolf <paulwolf333@gmail.com> wrote:
>
> > This is a proposal with a working implementation for a random string generation template syntax for Python. `strgen` is a module for generating random strings in Python using a regex-like template language. Example:
>
> >
>
> > >>> from strgen import StringGenerator as SG
>
> > >>> SG("[\l\d]{8:15}&[\d]&[\p]").render()
>
> > u'F0vghTjKalf4^mGLk'
>
>
>
> Why aren't you using regular expressions? I am all for conciseness,
>
> but using an existing format is so helpful...
>
>
>
> Unfortunately, the equivalent regexp probably looks like
>
> r'(?=.*[0-9])(?=.*[A-Z])(?=.*[a-z])[a-zA-Z0-9]{8:15}'
>
>
>
> (I've been working on this kind of thing with regexps, but it's still
>
> incomplete.)
>
>
>
> > * Uses SystemRandom class (if available, or falls back to Random)
>
>
>
> This sounds cryptographically weak. Isn't the normal thing to do to
>
> use a cryptographic hash function to generate a pseudorandom sequence?
>
>
>
> Someone should write a cryptographically secure pseudorandom number
>
> generator library for Python. :(
>
>
>
> (I think OpenSSL comes with one, but then you can't choose the seed.)
>
>
>
> -- Devin
> Why aren't you using regular expressions?
I guess you answered your own question with your example:
* No one will want to write that expression
* The regex expression doesn't work anyway
* The purpose of regex is just too different from the purpose of strgen
The purpose of strgen is to make life easier for developers and provide benefits that get pushed downstream (to users of the software that gets produced with it). Adopting a syntax similar to regex is only necessary or useful to the extent it achieves that.
I should also clarify that when I say the strgen template language is the converse of regular expressions, this is the case conceptually, not formally. Matching text strings is fundamentally different from producing randomized strings. For instance, a template language that validates the output would have to do frequency analysis. But that is getting too far off the purpose of strgen, although such a mechanism would certainly have its place.
> This sounds cryptographically weak.
Whether using SystemRandom is cryptographically weak is not something I'm taking up here. Someone already suggested allowing the class to accept a different random source provider. That's an excellent idea. I wanted to make sure strgen does whatever they would do anyway hand-coding using the Python Standard Library except vastly more flexible, easier to edit and shorter. strgen is two things: a proposed standard way of expressing a string generation specification that relies heavily on randomness and a wrapper around the standard library. I specifically did not want to try to write better cryptographic routines.
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2014-08-10 10:47 -0600 |
| Message-ID | <mailman.12819.1407689317.18130.python-list@python.org> |
| In reply to | #75985 |
On Sun, Aug 10, 2014 at 10:34 AM, Paul Wolf <paulwolf333@gmail.com> wrote: > For instance, a template language that validates the output would have to do frequency analysis. But that is getting too far off the purpose of strgen, although such a mechanism would certainly have its place. I don't think that would be necessary. The question being asked with validation is "can this string be generated from this template", not "is this string generated from this template with relatively high probability".
[toc] | [prev] | [next] | [standalone]
| From | Paul Wolf <paulwolf333@gmail.com> |
|---|---|
| Date | 2014-08-10 21:56 -0700 |
| Message-ID | <47417cbb-5624-4cbd-96ce-6e2f064f527f@googlegroups.com> |
| In reply to | #75987 |
On Sunday, 10 August 2014 17:47:48 UTC+1, Ian wrote: > On Sun, Aug 10, 2014 at 10:34 AM, Paul Wolf <paulwolf333@gmail.com> wrote: > > > For instance, a template language that validates the output would have to do frequency analysis. But that is getting too far off the purpose of strgen, although such a mechanism would certainly have its place. > > > > I don't think that would be necessary. The question being asked with > > validation is "can this string be generated from this template", not > > "is this string generated from this template with relatively high > > probability". Sorry, I meant frequency incidence within a produced string. And I understood Devin's point to be: For any given strgen expression that produces a set of strings, is there always a regex expression that captures the exact same set. And therefore is it not theoretically the case (leaving aside verbosity) that one of the syntaxes is superfluous (strgen). I think that is an entirely valid and interesting question. I'd have said before that it is not the case, but now I'm not so sure. I would still be sure that the strgen syntax is more fit for purpose for generating strings than regex on the basis of easy-of-use.
[toc] | [prev] | [next] | [standalone]
| From | Devin Jeanpierre <jeanpierreda@gmail.com> |
|---|---|
| Date | 2014-08-10 11:48 -0700 |
| Message-ID | <mailman.12822.1407696570.18130.python-list@python.org> |
| In reply to | #75985 |
On Sun, Aug 10, 2014 at 9:34 AM, Paul Wolf <paulwolf333@gmail.com> wrote:
> * No one will want to write that expression
We've already established that one to be wrong. ;)
> * The regex expression doesn't work anyway
That's a cheap swipe. The regexp doesn't work because I used a colon
instead of a comma, because I accidentally copied you. :(
Speaking of which, is there a reason you've diverged from regex syntax
in x{8: 15} vs x{8,15}?
Don't mind my suggestion to use existing formats even when it's
inconvenient. It's a knee jerk reaction/question, not a serious
complaint.
> I should also clarify that when I say the strgen template language is the converse of regular expressions, this is the case conceptually, not formally. Matching text strings is fundamentally different from producing randomized strings.
Mmmm, I wouldn't be so quick to dismiss any insights from regexps
here. It depends on your fundamentals. For example, automata-theoretic
approaches do apply, and can let you guarantee that equivalent
templates always generate the same outputs given the same inputs.
(Meaning that the only thing that matters is what the template
matches, not how it's spelled.)
> Whether using SystemRandom is cryptographically weak is not something I'm taking up here. Someone already suggested allowing the class to accept a different random source provider. That's an excellent idea. I wanted to make sure strgen does whatever they would do anyway hand-coding using the Python Standard Library except vastly more flexible, easier to edit and shorter. strgen is two things: a proposed standard way of expressing a string generation specification that relies heavily on randomness and a wrapper around the standard library. I specifically did not want to try to write better cryptographic routines.
The fallback is what worries me. Falling back from a secure thing to
an insecure thing doesn't sound good.
-- Devin
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2014-08-10 10:38 -0600 |
| Message-ID | <mailman.12818.1407688739.18130.python-list@python.org> |
| In reply to | #75871 |
[Multipart message — attachments visible in raw view] — view raw
On Aug 10, 2014 6:45 AM, "Devin Jeanpierre" <jeanpierreda@gmail.com> wrote: > > * Uses SystemRandom class (if available, or falls back to Random) > > This sounds cryptographically weak. Isn't the normal thing to do to > use a cryptographic hash function to generate a pseudorandom sequence? You mean in the fallback case, right? I'm no crypto expert, but I've never heard of SystemRandom being contra-recommended for crypto, and even the Python docs recommend it. The output of even a cryptographically strong hash isn't going to have any more entropy than the input, so if the input is predictable then the output will be also. One approach I'm aware of, which is used by Django, is to hash the RNG state along with the time and a local secret In order to reseed the RNG unpredictably whenever randomness is required. That creates a configuration burden in order to establish the secret, though.
[toc] | [prev] | [standalone]
Page 2 of 2 — ← Prev page 1 [2]
Back to top | Article view | comp.lang.python
csiph-web