Groups > comp.lang.python > #45619 > unrolled thread

Re: Question about ast.literal_eval

Started by	Frank Millman <frank@chagford.com>
First post	2013-05-20 15:26 +0200
Last post	2013-05-21 10:00 +0100
Articles	8 — 5 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Question about ast.literal_eval Frank Millman <frank@chagford.com> - 2013-05-20 15:26 +0200
    Re: Question about ast.literal_eval Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-05-20 16:12 +0000
      Re: Question about ast.literal_eval Chris Angelico <rosuav@gmail.com> - 2013-05-21 02:23 +1000
      Re: Question about ast.literal_eval Frank Millman <frank@chagford.com> - 2013-05-21 08:30 +0200
        Re: Question about ast.literal_eval Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-05-21 07:21 +0000
          Re: Question about ast.literal_eval Frank Millman <frank@chagford.com> - 2013-05-21 10:06 +0200
          Re: Question about ast.literal_eval Fábio Santos <fabiosantosart@gmail.com> - 2013-05-21 09:23 +0100
          Re: Question about ast.literal_eval Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-05-21 10:00 +0100

#45619 — Re: Question about ast.literal_eval

From	Frank Millman <frank@chagford.com>
Date	2013-05-20 15:26 +0200
Subject	Re: Question about ast.literal_eval
Message-ID	<mailman.1888.1369056365.3114.python-list@python.org>

On 20/05/2013 10:07, Frank Millman wrote:
> On 20/05/2013 09:55, Chris Angelico wrote:
>> Is it a requirement that they be able to key in a constraint as a
>> single string? We have a similar situation in one of the systems at
>> work, so we divided the input into three(ish) parts: pick a field,
>> pick an operator (legal operators vary according to field type -
>> integers can't be compared against regular expressions, timestamps can
>> use >= and < only), then enter the other operand. Sure, that cuts out
>> a few possibilities, but you get 99.9%+ of all usage and it's easy to
>> sanitize.
>>
>> ChrisA
>>
>
> It is not a requirement, no. I just thought it would be a convenient
> short-cut.
>
> I had in mind something similar to your scheme above, so I guess I will
> have to bite the bullet and implement it.
>

Can anyone see anything wrong with the following approach. I have not 
definitely decided to do it this way, but I have been experimenting and 
it seems to work.

I store the boolean test as a json'd list of 6-part tuples. Each element 
of the tuple is a string, defined as follows -

0 - for the first entry in the list, the word 'check' (a placeholder - 
it is discarded at evaluation time), for any subsequent entries the word 
'and' or 'or'.

1 - left bracket - either '(' or ''.

2 - column name to check - it will be validated on entry.

3 - operator - must be one of '=', '!=', '<', '>', '<=', '>=', 'in', 
'is', 'is not'. At evaluation time, '=' is changed to '=='.

4 - value to compare - at evaluation time I call 
str(literal_eval(value)) to ensure that it is safe.

5 - right bracket - either ')' or ''.

At evaluation time I loop through the list, construct the boolean test 
as a string, and call eval() on it.

Here are some examples -

check = []
check.append(('check', '', 'name', 'in', "('abc', 'xyz')", ''))

check = []
check.append(('check', '', 'value', '>=', '0', ''))

check = []
check.append(('check', '(', 'descr', 'is not', 'None', ''))
check.append(('and', '', 'alt', 'is', 'None', ')'))
check.append(('or', '(', 'descr', 'is', 'None', ''))
check.append(('and', '', 'alt', 'is not', 'None', ')'))

I don't plan to check the logic - I will just display the exception if 
it does not evaluate.

It seems safe to me. Can anyone see a problem with it?

Frank

[toc] | [next] | [standalone]

#45622

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-05-20 16:12 +0000
Message-ID	<519a4b6a$0$29997$c3e8da3$5496439d@news.astraweb.com>
In reply to	#45619

On Mon, 20 May 2013 15:26:02 +0200, Frank Millman wrote:

> Can anyone see anything wrong with the following approach. I have not
> definitely decided to do it this way, but I have been experimenting and
> it seems to work.
> 
> I store the boolean test as a json'd list of 6-part tuples. Each element
> of the tuple is a string, defined as follows -
> 
> 0 - for the first entry in the list, the word 'check' (a placeholder -
> it is discarded at evaluation time), for any subsequent entries the word
> 'and' or 'or'.
> 
> 1 - left bracket - either '(' or ''.
> 
> 2 - column name to check - it will be validated on entry.
> 
> 3 - operator - must be one of '=', '!=', '<', '>', '<=', '>=', 'in',
> 'is', 'is not'. At evaluation time, '=' is changed to '=='.
> 
> 4 - value to compare - at evaluation time I call
> str(literal_eval(value)) to ensure that it is safe.
> 
> 5 - right bracket - either ')' or ''.
> 
> At evaluation time I loop through the list, construct the boolean test
> as a string, and call eval() on it.
[...]
> It seems safe to me. Can anyone see a problem with it?

It seems safe to me too, but then any fool can come up with a system 
which they themselves cannot break :-)

I think the real worry is validating the column name. That will be 
critical. Personally, I would strongly suggest writing your own mini-
evaluator that walks the list and evaluates it by hand. It isn't as 
convenient as just calling eval, but *definitely* safer.

If you do call eval, make sure you supply the globals and locals 
arguments. The usual way is:

eval(expression, {'__builtins__': None}, {})

which gives you an empty locals() and a minimal, (mostly) safe globals.

Finally, as a "belt-and-braces" approach, I wouldn't even call eval 
directly, but call a thin wrapper that raises an exception if the 
expression contains an underscore. Underscores are usually the key to 
breaking eval, so refusing to evaluate anything with an underscore raises 
the barrier very high.

And even with all those defences, I wouldn't allow untrusted data from 
the Internet anywhere near this. Just because I can't break it, doesn't 
mean it's safe.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#45624

From	Chris Angelico <rosuav@gmail.com>
Date	2013-05-21 02:23 +1000
Message-ID	<mailman.1891.1369067002.3114.python-list@python.org>
In reply to	#45622

On Tue, May 21, 2013 at 2:12 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> Personally, I would strongly suggest writing your own mini-
> evaluator that walks the list and evaluates it by hand. It isn't as
> convenient as just calling eval, but *definitely* safer.

Probably faster, too, for what it's worth - eval is pretty expensive.

ChrisA

[toc] | [prev] | [next] | [standalone]

#45645

From	Frank Millman <frank@chagford.com>
Date	2013-05-21 08:30 +0200
Message-ID	<mailman.1905.1369117820.3114.python-list@python.org>
In reply to	#45622

On 20/05/2013 18:12, Steven D'Aprano wrote:
> On Mon, 20 May 2013 15:26:02 +0200, Frank Millman wrote:
>
>> Can anyone see anything wrong with the following approach. I have not
>> definitely decided to do it this way, but I have been experimenting and
>> it seems to work.
>>
[...]
>
> It seems safe to me too, but then any fool can come up with a system
> which they themselves cannot break :-)
>

Thanks for the detailed response.

> I think the real worry is validating the column name. That will be
> critical.

I would not pass the actual column name to eval(), I would use it to 
retrieve a value from a data object and pass that to eval(). However, 
then your point becomes 'validating the value retrieved'. I had not 
thought about that. I will investigate further.

> Personally, I would strongly suggest writing your own mini-
> evaluator that walks the list and evaluates it by hand. It isn't as
> convenient as just calling eval, but *definitely* safer.
>

I am not sure I can wrap my mind around mixed 'and's, 'or's, and brackets.

[Thinking aloud]
Maybe I can manually reduce each internal test to a True or False, 
substitute them in the list, and then call eval() on the result.

eval('(True and False) or (False or True)')

I will experiment with that.

> If you do call eval, make sure you supply the globals and locals
> arguments. The usual way is:
>
> eval(expression, {'__builtins__': None}, {})
>
> which gives you an empty locals() and a minimal, (mostly) safe globals.
>

Thanks - I did not know about that.

> Finally, as a "belt-and-braces" approach, I wouldn't even call eval
> directly, but call a thin wrapper that raises an exception if the
> expression contains an underscore. Underscores are usually the key to
> breaking eval, so refusing to evaluate anything with an underscore raises
> the barrier very high.
>
> And even with all those defences, I wouldn't allow untrusted data from
> the Internet anywhere near this. Just because I can't break it, doesn't
> mean it's safe.
>

All good advice - much appreciated.

Frank

[toc] | [prev] | [next] | [standalone]

#45647

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-05-21 07:21 +0000
Message-ID	<519b2096$0$6574$c3e8da3$5496439d@news.astraweb.com>
In reply to	#45645

On Tue, 21 May 2013 08:30:03 +0200, Frank Millman wrote:

> On 20/05/2013 18:12, Steven D'Aprano wrote:

>> Personally, I would strongly suggest writing your own mini- evaluator
>> that walks the list and evaluates it by hand. It isn't as convenient as
>> just calling eval, but *definitely* safer.
>>
>>
> I am not sure I can wrap my mind around mixed 'and's, 'or's, and
> brackets.


Parsers are a solved problem in computer science, he says as if he had a 
clue what he was talking about *wink*

Here's a sketch of a solution... suppose you have a sequence of records, 
looking like this:

(bool_op, column_name, comparison_op, literal)

with appropriate validation on each field. The very first record has 
bool_op set to "or". Then, you do something like this:


import operator
OPERATORS = {
    '=': operator.eq,
    'is': operator.is_,
    '<': operator.lt,
    # etc.
    }


def eval_op(column_name, op, literal):
    value = lookup(column_name)  # whatever...
    return OPERATORS[op](value, literal)

result = False

for (bool_op, column_name, comparison_op, literal) in sequence:
    flag = eval_op(column_name, comparison_op, literal)
    if bool_op == 'and':
        result = result and flag
    else: 
        assert bool_op == 'or'
        result = result or flag
    # Lazy processing?
    if result:
        break


and in theory it should all Just Work.




-- 
Steven

[toc] | [prev] | [next] | [standalone]

#45650

From	Frank Millman <frank@chagford.com>
Date	2013-05-21 10:06 +0200
Message-ID	<mailman.1909.1369123609.3114.python-list@python.org>
In reply to	#45647

On 21/05/2013 09:21, Steven D'Aprano wrote:
> On Tue, 21 May 2013 08:30:03 +0200, Frank Millman wrote:
>
>> I am not sure I can wrap my mind around mixed 'and's, 'or's, and
>> brackets.
>
> Parsers are a solved problem in computer science, he says as if he had a
> clue what he was talking about *wink*
>
> Here's a sketch of a solution... suppose you have a sequence of records,
> looking like this:
>
> (bool_op, column_name, comparison_op, literal)
>
> with appropriate validation on each field. The very first record has
> bool_op set to "or". Then, you do something like this:
>
> import operator
> OPERATORS = {
>      '=': operator.eq,
>      'is': operator.is_,
>      '<': operator.lt,
>      # etc.
>      }
>
> def eval_op(column_name, op, literal):
>      value = lookup(column_name)  # whatever...
>      return OPERATORS[op](value, literal)
>
> result = False
>
> for (bool_op, column_name, comparison_op, literal) in sequence:
>      flag = eval_op(column_name, comparison_op, literal)
>      if bool_op == 'and':
>          result = result and flag
>      else:
>          assert bool_op == 'or'
>          result = result or flag
>      # Lazy processing?
>      if result:
>          break
>
> and in theory it should all Just Work.

That's very clever - thanks, Steven.

It doesn't address the issue of brackets. I imagine that the answer is 
something like -

   maintain a stack of results
   for each left bracket, push a level
   for each right bracket, pop the result

or something ...

I am sure that with enough trial and error I can get it working, but I 
might cheat for now and use the trick I mentioned earlier of calling 
eval() on a sequence of manually derived True/False values. I really 
can't see anything going wrong with that.

BTW, thanks to ChrisA for the following tip -

import operator
ops = {
   'in':lambda x,y: x in y,  # operator.contains has the args backwards

I would have battled with that one.

Frank

[toc] | [prev] | [next] | [standalone]

#45653

From	Fábio Santos <fabiosantosart@gmail.com>
Date	2013-05-21 09:23 +0100
Message-ID	<mailman.1910.1369124922.3114.python-list@python.org>
In reply to	#45647

[Multipart message — attachments visible in raw view] — view raw

On 21 May 2013 09:10, "Frank Millman" <frank@chagford.com> wrote:
> It doesn't address the issue of brackets. I imagine that the answer is
something like -
>
>   maintain a stack of results
>   for each left bracket, push a level
>   for each right bracket, pop the result
>
> or something ...
>

Time for me to suggest pyparsing or PLY. You're better off creating your
own AST and walking it to produce python or SQL than reinventing the wheel,
I think.

[toc] | [prev] | [next] | [standalone]

#45655

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2013-05-21 10:00 +0100
Message-ID	<mailman.1911.1369126836.3114.python-list@python.org>
In reply to	#45647

On 21/05/2013 09:23, Fábio Santos wrote:
>
> On 21 May 2013 09:10, "Frank Millman" <frank@chagford.com
> <mailto:frank@chagford.com>> wrote:
>  > It doesn't address the issue of brackets. I imagine that the answer
> is something like -
>  >
>  >   maintain a stack of results
>  >   for each left bracket, push a level
>  >   for each right bracket, pop the result
>  >
>  > or something ...
>  >
>
> Time for me to suggest pyparsing or PLY. You're better off creating your
> own AST and walking it to produce python or SQL than reinventing the
> wheel, I think.
>

Or pick one from this lot http://nedbatchelder.com/text/python-parsers.html

-- 
If you're using GoogleCrap™ please read this 
http://wiki.python.org/moin/GoogleGroupsPython.

Mark Lawrence

[toc] | [prev] | [standalone]

csiph-web

Re: Question about ast.literal_eval

Contents

#45619 — Re: Question about ast.literal_eval

#45622

#45624

#45645

#45647

#45650

#45653

#45655