Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #75871 > unrolled thread

Template language for random string generation

Started byPaul Wolf <paulwolf333@gmail.com>
First post2014-08-08 02:01 -0700
Last post2014-08-10 10:38 -0600
Articles 9 on this page of 29 — 10 participants

Back to article view | Back to comp.lang.python


Contents

  Template language for random string generation Paul Wolf <paulwolf333@gmail.com> - 2014-08-08 02:01 -0700
    Re: Template language for random string generation Chris Angelico <rosuav@gmail.com> - 2014-08-08 19:22 +1000
      Re: Template language for random string generation Paul Wolf <paulwolf333@gmail.com> - 2014-08-08 02:42 -0700
        Re: Template language for random string generation Ned Batchelder <ned@nedbatchelder.com> - 2014-08-08 07:20 -0400
          Re: Template language for random string generation Paul Wolf <paulwolf333@gmail.com> - 2014-08-08 06:02 -0700
        Re: Template language for random string generation Chris Angelico <rosuav@gmail.com> - 2014-08-08 21:29 +1000
          Re: Template language for random string generation Paul Wolf <paulwolf333@gmail.com> - 2014-08-08 06:03 -0700
    Re: Template language for random string generation Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-09 00:08 +1000
    Re: Template language for random string generation Skip Montanaro <skip@pobox.com> - 2014-08-08 09:35 -0500
      Re: Template language for random string generation cwolf.algo@gmail.com - 2014-08-08 11:43 -0700
        Re: Template language for random string generation Nick Cash <nick.cash@npcinternational.com> - 2014-08-08 20:28 +0000
    Re: Template language for random string generation Ian Kelly <ian.g.kelly@gmail.com> - 2014-08-08 16:03 -0600
      Re: Template language for random string generation Paul Wolf <paulwolf333@gmail.com> - 2014-08-08 23:52 -0700
        Re: Template language for random string generation Ian Kelly <ian.g.kelly@gmail.com> - 2014-08-09 01:49 -0600
        Re: Template language for random string generation Ian Kelly <ian.g.kelly@gmail.com> - 2014-08-09 01:57 -0600
    Re: Template language for random string generation Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-08-10 05:43 -0700
      Re: Template language for random string generation Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-11 02:31 +1000
        Re: Template language for random string generation Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-08-10 11:28 -0700
          Re: Template language for random string generation Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-11 12:22 +1000
            Re: Template language for random string generation Chris Angelico <rosuav@gmail.com> - 2014-08-11 12:31 +1000
            Re: Template language for random string generation Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-08-11 00:01 -0700
        Re: Template language for random string generation Chris Angelico <rosuav@gmail.com> - 2014-08-11 05:25 +1000
        Re: Template language for random string generation Paul Wolf <paulwolf333@gmail.com> - 2014-08-10 22:06 -0700
          Re: Template language for random string generation Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-08-11 08:58 +0100
      Re: Template language for random string generation Paul Wolf <paulwolf333@gmail.com> - 2014-08-10 09:34 -0700
        Re: Template language for random string generation Ian Kelly <ian.g.kelly@gmail.com> - 2014-08-10 10:47 -0600
          Re: Template language for random string generation Paul Wolf <paulwolf333@gmail.com> - 2014-08-10 21:56 -0700
        Re: Template language for random string generation Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-08-10 11:48 -0700
    Re: Template language for random string generation Ian Kelly <ian.g.kelly@gmail.com> - 2014-08-10 10:38 -0600

Page 2 of 2 — ← Prev page 1 [2]


#76028

FromDevin Jeanpierre <jeanpierreda@gmail.com>
Date2014-08-11 00:01 -0700
Message-ID<mailman.12838.1407740566.18130.python-list@python.org>
In reply to#76014
On Sun, Aug 10, 2014 at 7:22 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> Devin Jeanpierre wrote:
>
>> On Sun, Aug 10, 2014 at 9:31 AM, Steven D'Aprano
>> <steve+comp.lang.python@pearwood.info> wrote:
>
>>> I don't think that using a good, but not cryptographically-strong, random
>>> number generator to generate passwords is a serious vulnerability. What's
>>> your threat model?
>>
>> I've always wanted a password generator that worked on the fly based
>> off of a master password. If the passwords are generated randomly but
>> not cryptographically securely so, then given sufficiently many
>> passwords, the master password might be deduced.
>
> o_O
>
> So, what you're saying is that you're concerned that if an attacker has all
> your passwords, they might be able to generate new passwords?

No, I meant what I said. I was pretty specific.

-- Devin

[toc] | [prev] | [next] | [standalone]


#76002

FromChris Angelico <rosuav@gmail.com>
Date2014-08-11 05:25 +1000
Message-ID<mailman.12826.1407698764.18130.python-list@python.org>
In reply to#75984
On Mon, Aug 11, 2014 at 2:31 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> Personally, I think even the OP's specified language is too complex. For
> example, it supports literal text, but given the use-case (password
> generators) do we really want to support templates like "password[\d]"? I
> don't think so, and if somebody did, they can trivially say "password" +
> SG('[\d]').render().

What if you're using this to generate IDs for something (think Youtube
video references), and you want to have an alphabetic portion and a
numeric portion separated by a hyphen? I think there is a use-case for
interior literal text, because otherwise you'd have to either split
the result or do two calls to the generator.

> Here, let me google that for you :-)
>
> https://duckduckgo.com/html/?q=python+crypto

Hehe. :)

ChrisA

[toc] | [prev] | [next] | [standalone]


#76020

FromPaul Wolf <paulwolf333@gmail.com>
Date2014-08-10 22:06 -0700
Message-ID<11d608c5-b615-4571-a146-b3506c44f24c@googlegroups.com>
In reply to#75984
On Sunday, 10 August 2014 17:31:01 UTC+1, Steven D'Aprano  wrote:
> Devin Jeanpierre wrote:
> 
> 
> 
> > On Fri, Aug 8, 2014 at 2:01 AM, Paul Wolf <paulwolf333@gmail.com> wrote:
> 
> >> This is a proposal with a working implementation for a random string
> 
> >> generation template syntax for Python. `strgen` is a module for
> 
> >> generating random strings in Python using a regex-like template language.
> 
> >> Example:
> 
> >>
> 
> >>     >>> from strgen import StringGenerator as SG
> 
> >>     >>> SG("[\l\d]{8:15}&[\d]&[\p]").render()
> 
> >>     u'F0vghTjKalf4^mGLk'
> 
> > 
> 
> > Why aren't you using regular expressions? I am all for conciseness,
> 
> > but using an existing format is so helpful...
> 
> 
> 
> You've just answered your own question:
> 
> 
> 
> > Unfortunately, the equivalent regexp probably looks like
> 
> > r'(?=.*[0-9])(?=.*[A-Z])(?=.*[a-z])[a-zA-Z0-9]{8:15}'
> 
> 
> 
> Apart from being needlessly verbose, regex syntax is not appropriate because
> 
> it specifies too much, specifies too little, and specifies the wrong
> 
> things. It specifies too much: regexes like ^ and $ are meaningless in this
> 
> case. It specifies too little: there's no regex for the "shuffle operator".
> 
> And it specifies the wrong things: regexes like (?= ...) as used in your
> 
> example are for matching, not generating strings, and it isn't clear
> 
> what "match any character but don't consume any of the string" means when
> 
> generating strings.
> 
> 
> 
> Personally, I think even the OP's specified language is too complex. For
> 
> example, it supports literal text, but given the use-case (password
> 
> generators) do we really want to support templates like "password[\d]"? I
> 
> don't think so, and if somebody did, they can trivially say "password" +
> 
> SG('[\d]').render().
> 
> 
> 
> Larry Wall (the creator of Perl) has stated that one of the mistakes with
> 
> Perl's regular expression mini-language is that the Huffman coding is
> 
> wrong. Common things should be short, uncommon things can afford to be
> 
> longer. Since the most common thing for password generation is to specify
> 
> character classes, they should be short, e.g. d rather than [\d] (one
> 
> character versus four).
> 
> 
> 
> The template given could potentially be simplified to:
> 
> 
> 
> "(LD){8:15}&D&P"
> 
> 
> 
> where the round brackets () are purely used for grouping. Character codes
> 
> are specified by a single letter. (I use uppercase to avoid the problem
> 
> that l & 1 look very similar. YMMV.) The model here is custom format codes
> 
> from spreadsheets, which should be comfortable to anyone who is familiar
> 
> with Excel or OpenOffice. If you insist on having the facility to including
> 
> literal text in your templates, might I suggest:
> 
> 
> 
> "'password'd"  # Literal string "password", followed by a single digit.
> 
> 
> 
> but personally I believe that for the use-case given, that's a mistake.
> 
> 
> 
> Alternatively, date/time templates use two-character codes like %Y %m etc,
> 
> which is better than 
> 
> 
> 
> 
> 
> 
> 
> > (I've been working on this kind of thing with regexps, but it's still
> 
> > incomplete.)
> 
> > 
> 
> >> * Uses SystemRandom class (if available, or falls back to Random)
> 
> > 
> 
> > This sounds cryptographically weak. Isn't the normal thing to do to
> 
> > use a cryptographic hash function to generate a pseudorandom sequence?
> 
> 
> 
> I don't think that using a good, but not cryptographically-strong, random
> 
> number generator to generate passwords is a serious vulnerability. What's
> 
> your threat model? Attacks on passwords tend to be one of a very few:
> 
> 
> 
> - dictionary attacks (including tables of common passwords and 
> 
>   simple transformations of words, e.g. 'pas5w0d');
> 
> 
> 
> - brute force against short and weak passwords;
> 
> 
> 
> - attacking the hash function used to store passwords (not the password
> 
>   itself), e.g. rainbow tables;
> 
> 
> 
> - keyloggers or some other way of stealing the password (including
> 
>   phishing sites and the ever-popular "beat them with a lead pipe 
> 
>   until they give up the password");
> 
> 
> 
> - other social attacks, e.g. guessing that the person's password is their
> 
>   date of birth in reverse.
> 
> 
> 
> But unless the random number generator is *ridiculously* weak ("9, 9, 9, 9,
> 
> 9, 9, ...") I can't see any way to realistically attack the password
> 
> generator based on the weakness of the random number generator. Perhaps I'm
> 
> missing something?
> 
> 
> 
> 
> 
> > Someone should write a cryptographically secure pseudorandom number
> 
> > generator library for Python. :(
> 
> 
> 
> Here, let me google that for you :-)
> 
> 
> 
> https://duckduckgo.com/html/?q=python+crypto
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> Steven

I should clarify that the use case of password generation is only one of the use cases out of several that strgen is intended to support. It is also for: 

Test data generation: 

    [\l]{1:20}&[._]{0:1}@[\l]{15}.(com|net|org)

email addresses that use word characters and might have a period or an underscore in the first part. Or

	((john|robert|harry)|(mary|agnes|shelly)) (smith|jones|taylor)
	
produce names with roughly equal distribution of female/male first names. I contemplated - but did not implement - a feature where you can give strgen named functions that generate the required string (using whatever selection process that implementation chooses): 

	($malefirstname|$femalefirstname) $lastname

where

	def malefirstname():
		# get a name from the database at random

Voucher generation:

	[\d]{10}
	
10-digit voucher numbers. 

In none of the foregoing is security a concern, it should be noted. 

> Since the most common thing for password generation is to specify 
> character classes, they should be short, e.g. d rather than [\d] (one 
> character versus four).

But you assume only standard character classes and not custom ones like "[aeiuy]", not to mention unicode ranges outside of the English language. 

> If you insist on having the facility to including 
literal text in your templates, 

I do :-), as per above.

> might I suggest: 
"'password'd"  # Literal string "password", followed by a single digit.

As per above, I think the more verbose notation for character classes is necessary. Although your suggestion is not a bad one. I could have taken a route where you define the character classes with aliases and then construct a very lean template. That is effectively what the - unimplemented - function expressions do in the example above. 

The ability to produce weak passwords ('[abc]{3}') is something I chose not to take up in the strgen module because it should be (mostly) agnostic about what constitutes good security and to support a broader set of use cases as per above.

[toc] | [prev] | [next] | [standalone]


#76030

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2014-08-11 08:58 +0100
Message-ID<mailman.12839.1407743904.18130.python-list@python.org>
In reply to#76020
On 11/08/2014 06:06, Paul Wolf wrote:

I'm pleased to see that you have answers.  In return would you please 
read and action this https://wiki.python.org/moin/GoogleGroupsPython to 
prevent us seeing double line spacing and single line paragraphs, thanks.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#75985

FromPaul Wolf <paulwolf333@gmail.com>
Date2014-08-10 09:34 -0700
Message-ID<b8cf8463-ba05-4f2f-b572-60f73a7a9917@googlegroups.com>
In reply to#75981
On Sunday, 10 August 2014 13:43:04 UTC+1, Devin Jeanpierre  wrote:
> On Fri, Aug 8, 2014 at 2:01 AM, Paul Wolf <paulwolf333@gmail.com> wrote:
> 
> > This is a proposal with a working implementation for a random string generation template syntax for Python. `strgen` is a module for generating random strings in Python using a regex-like template language. Example:
> 
> >
> 
> >     >>> from strgen import StringGenerator as SG
> 
> >     >>> SG("[\l\d]{8:15}&[\d]&[\p]").render()
> 
> >     u'F0vghTjKalf4^mGLk'
> 
> 
> 
> Why aren't you using regular expressions? I am all for conciseness,
> 
> but using an existing format is so helpful...
> 
> 
> 
> Unfortunately, the equivalent regexp probably looks like
> 
> r'(?=.*[0-9])(?=.*[A-Z])(?=.*[a-z])[a-zA-Z0-9]{8:15}'
> 
> 
> 
> (I've been working on this kind of thing with regexps, but it's still
> 
> incomplete.)
> 
> 
> 
> > * Uses SystemRandom class (if available, or falls back to Random)
> 
> 
> 
> This sounds cryptographically weak. Isn't the normal thing to do to
> 
> use a cryptographic hash function to generate a pseudorandom sequence?
> 
> 
> 
> Someone should write a cryptographically secure pseudorandom number
> 
> generator library for Python. :(
> 
> 
> 
> (I think OpenSSL comes with one, but then you can't choose the seed.)
> 
> 
> 
> -- Devin

> Why aren't you using regular expressions?

I guess you answered your own question with your example: 

* No one will want to write that expression
* The regex expression doesn't work anyway
* The purpose of regex is just too different from the purpose of strgen

The purpose of strgen is to make life easier for developers and provide benefits that get pushed downstream (to users of the software that gets produced with it). Adopting a syntax similar to regex is only necessary or useful to the extent it achieves that. 

I should also clarify that when I say the strgen template language is the converse of regular expressions, this is the case conceptually, not formally. Matching text strings is fundamentally different from producing randomized strings. For instance, a template language that validates the output would have to do frequency analysis. But that is getting too far off the purpose of strgen, although such a mechanism would certainly have its place. 

> This sounds cryptographically weak.

Whether using SystemRandom is cryptographically weak is not something I'm taking up here. Someone already suggested allowing the class to accept a different random source provider. That's an excellent idea. I wanted to make sure strgen does whatever they would do anyway hand-coding using the Python Standard Library except vastly more flexible, easier to edit and shorter. strgen is two things: a proposed standard way of expressing a string generation specification that relies heavily on randomness and a wrapper around the standard library. I specifically did not want to try to write better cryptographic routines. 

[toc] | [prev] | [next] | [standalone]


#75987

FromIan Kelly <ian.g.kelly@gmail.com>
Date2014-08-10 10:47 -0600
Message-ID<mailman.12819.1407689317.18130.python-list@python.org>
In reply to#75985
On Sun, Aug 10, 2014 at 10:34 AM, Paul Wolf <paulwolf333@gmail.com> wrote:
> For instance, a template language that validates the output would have to do frequency analysis. But that is getting too far off the purpose of strgen, although such a mechanism would certainly have its place.

I don't think that would be necessary. The question being asked with
validation is "can this string be generated from this template", not
"is this string generated from this template with relatively high
probability".

[toc] | [prev] | [next] | [standalone]


#76019

FromPaul Wolf <paulwolf333@gmail.com>
Date2014-08-10 21:56 -0700
Message-ID<47417cbb-5624-4cbd-96ce-6e2f064f527f@googlegroups.com>
In reply to#75987
On Sunday, 10 August 2014 17:47:48 UTC+1, Ian  wrote:
> On Sun, Aug 10, 2014 at 10:34 AM, Paul Wolf <paulwolf333@gmail.com> wrote:
> 
> > For instance, a template language that validates the output would have to do frequency analysis. But that is getting too far off the purpose of strgen, although such a mechanism would certainly have its place.
> 
> 
> 
> I don't think that would be necessary. The question being asked with
> 
> validation is "can this string be generated from this template", not
> 
> "is this string generated from this template with relatively high
> 
> probability".

Sorry, I meant frequency incidence within a produced string. And I understood Devin's point to be: For any given strgen expression that produces a set of strings, is there always a regex expression that captures the exact same set. And therefore is it not theoretically the case (leaving aside verbosity) that one of the syntaxes is superfluous (strgen). I think that is an entirely valid and interesting question. I'd have said before that it is not the case, but now I'm not so sure. I would still be sure that the strgen syntax is more fit for purpose for generating strings than regex on the basis of easy-of-use.

[toc] | [prev] | [next] | [standalone]


#75997

FromDevin Jeanpierre <jeanpierreda@gmail.com>
Date2014-08-10 11:48 -0700
Message-ID<mailman.12822.1407696570.18130.python-list@python.org>
In reply to#75985
On Sun, Aug 10, 2014 at 9:34 AM, Paul Wolf <paulwolf333@gmail.com> wrote:
> * No one will want to write that expression

We've already established that one to be wrong. ;)

> * The regex expression doesn't work anyway

That's a cheap swipe. The regexp doesn't work because I used a colon
instead of a comma, because I accidentally copied you. :(

Speaking of which, is there a reason you've diverged from regex syntax
in x{8: 15} vs x{8,15}?


Don't mind my suggestion to use existing formats even when it's
inconvenient. It's a knee jerk reaction/question, not a serious
complaint.

> I should also clarify that when I say the strgen template language is the converse of regular expressions, this is the case conceptually, not formally. Matching text strings is fundamentally different from producing randomized strings.

Mmmm, I wouldn't be so quick to dismiss any insights from regexps
here. It depends on your fundamentals. For example, automata-theoretic
approaches do apply, and can let you guarantee that equivalent
templates always generate the same outputs given the same inputs.
(Meaning that the only thing that matters is what the template
matches, not how it's spelled.)

> Whether using SystemRandom is cryptographically weak is not something I'm taking up here. Someone already suggested allowing the class to accept a different random source provider. That's an excellent idea. I wanted to make sure strgen does whatever they would do anyway hand-coding using the Python Standard Library except vastly more flexible, easier to edit and shorter. strgen is two things: a proposed standard way of expressing a string generation specification that relies heavily on randomness and a wrapper around the standard library. I specifically did not want to try to write better cryptographic routines.

The fallback is what worries me. Falling back from a secure thing to
an insecure thing doesn't sound good.

-- Devin

[toc] | [prev] | [next] | [standalone]


#75986

FromIan Kelly <ian.g.kelly@gmail.com>
Date2014-08-10 10:38 -0600
Message-ID<mailman.12818.1407688739.18130.python-list@python.org>
In reply to#75871

[Multipart message — attachments visible in raw view] — view raw

On Aug 10, 2014 6:45 AM, "Devin Jeanpierre" <jeanpierreda@gmail.com> wrote:
> > * Uses SystemRandom class (if available, or falls back to Random)
>
> This sounds cryptographically weak. Isn't the normal thing to do to
> use a cryptographic hash function to generate a pseudorandom sequence?

You mean in the fallback case, right?  I'm no crypto expert, but I've never
heard of SystemRandom being contra-recommended for crypto, and even the
Python docs recommend it.

The output of even a cryptographically strong hash isn't going to have any
more entropy than the input, so if the input is predictable then the output
will be also.  One approach I'm aware of, which is used by Django, is to
hash the RNG state along with the time and a local secret In order to
reseed the RNG unpredictably whenever randomness is required. That creates
a configuration burden in order to establish the secret, though.

[toc] | [prev] | [standalone]


Page 2 of 2 — ← Prev page 1 [2]

Back to top | Article view | comp.lang.python


csiph-web