Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #45619 > unrolled thread
| Started by | Frank Millman <frank@chagford.com> |
|---|---|
| First post | 2013-05-20 15:26 +0200 |
| Last post | 2013-05-21 10:00 +0100 |
| Articles | 8 — 5 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Question about ast.literal_eval Frank Millman <frank@chagford.com> - 2013-05-20 15:26 +0200
Re: Question about ast.literal_eval Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-05-20 16:12 +0000
Re: Question about ast.literal_eval Chris Angelico <rosuav@gmail.com> - 2013-05-21 02:23 +1000
Re: Question about ast.literal_eval Frank Millman <frank@chagford.com> - 2013-05-21 08:30 +0200
Re: Question about ast.literal_eval Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-05-21 07:21 +0000
Re: Question about ast.literal_eval Frank Millman <frank@chagford.com> - 2013-05-21 10:06 +0200
Re: Question about ast.literal_eval Fábio Santos <fabiosantosart@gmail.com> - 2013-05-21 09:23 +0100
Re: Question about ast.literal_eval Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-05-21 10:00 +0100
| From | Frank Millman <frank@chagford.com> |
|---|---|
| Date | 2013-05-20 15:26 +0200 |
| Subject | Re: Question about ast.literal_eval |
| Message-ID | <mailman.1888.1369056365.3114.python-list@python.org> |
On 20/05/2013 10:07, Frank Millman wrote:
> On 20/05/2013 09:55, Chris Angelico wrote:
>> Is it a requirement that they be able to key in a constraint as a
>> single string? We have a similar situation in one of the systems at
>> work, so we divided the input into three(ish) parts: pick a field,
>> pick an operator (legal operators vary according to field type -
>> integers can't be compared against regular expressions, timestamps can
>> use >= and < only), then enter the other operand. Sure, that cuts out
>> a few possibilities, but you get 99.9%+ of all usage and it's easy to
>> sanitize.
>>
>> ChrisA
>>
>
> It is not a requirement, no. I just thought it would be a convenient
> short-cut.
>
> I had in mind something similar to your scheme above, so I guess I will
> have to bite the bullet and implement it.
>
Can anyone see anything wrong with the following approach. I have not
definitely decided to do it this way, but I have been experimenting and
it seems to work.
I store the boolean test as a json'd list of 6-part tuples. Each element
of the tuple is a string, defined as follows -
0 - for the first entry in the list, the word 'check' (a placeholder -
it is discarded at evaluation time), for any subsequent entries the word
'and' or 'or'.
1 - left bracket - either '(' or ''.
2 - column name to check - it will be validated on entry.
3 - operator - must be one of '=', '!=', '<', '>', '<=', '>=', 'in',
'is', 'is not'. At evaluation time, '=' is changed to '=='.
4 - value to compare - at evaluation time I call
str(literal_eval(value)) to ensure that it is safe.
5 - right bracket - either ')' or ''.
At evaluation time I loop through the list, construct the boolean test
as a string, and call eval() on it.
Here are some examples -
check = []
check.append(('check', '', 'name', 'in', "('abc', 'xyz')", ''))
check = []
check.append(('check', '', 'value', '>=', '0', ''))
check = []
check.append(('check', '(', 'descr', 'is not', 'None', ''))
check.append(('and', '', 'alt', 'is', 'None', ')'))
check.append(('or', '(', 'descr', 'is', 'None', ''))
check.append(('and', '', 'alt', 'is not', 'None', ')'))
I don't plan to check the logic - I will just display the exception if
it does not evaluate.
It seems safe to me. Can anyone see a problem with it?
Frank
[toc] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-05-20 16:12 +0000 |
| Message-ID | <519a4b6a$0$29997$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #45619 |
On Mon, 20 May 2013 15:26:02 +0200, Frank Millman wrote:
> Can anyone see anything wrong with the following approach. I have not
> definitely decided to do it this way, but I have been experimenting and
> it seems to work.
>
> I store the boolean test as a json'd list of 6-part tuples. Each element
> of the tuple is a string, defined as follows -
>
> 0 - for the first entry in the list, the word 'check' (a placeholder -
> it is discarded at evaluation time), for any subsequent entries the word
> 'and' or 'or'.
>
> 1 - left bracket - either '(' or ''.
>
> 2 - column name to check - it will be validated on entry.
>
> 3 - operator - must be one of '=', '!=', '<', '>', '<=', '>=', 'in',
> 'is', 'is not'. At evaluation time, '=' is changed to '=='.
>
> 4 - value to compare - at evaluation time I call
> str(literal_eval(value)) to ensure that it is safe.
>
> 5 - right bracket - either ')' or ''.
>
> At evaluation time I loop through the list, construct the boolean test
> as a string, and call eval() on it.
[...]
> It seems safe to me. Can anyone see a problem with it?
It seems safe to me too, but then any fool can come up with a system
which they themselves cannot break :-)
I think the real worry is validating the column name. That will be
critical. Personally, I would strongly suggest writing your own mini-
evaluator that walks the list and evaluates it by hand. It isn't as
convenient as just calling eval, but *definitely* safer.
If you do call eval, make sure you supply the globals and locals
arguments. The usual way is:
eval(expression, {'__builtins__': None}, {})
which gives you an empty locals() and a minimal, (mostly) safe globals.
Finally, as a "belt-and-braces" approach, I wouldn't even call eval
directly, but call a thin wrapper that raises an exception if the
expression contains an underscore. Underscores are usually the key to
breaking eval, so refusing to evaluate anything with an underscore raises
the barrier very high.
And even with all those defences, I wouldn't allow untrusted data from
the Internet anywhere near this. Just because I can't break it, doesn't
mean it's safe.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-05-21 02:23 +1000 |
| Message-ID | <mailman.1891.1369067002.3114.python-list@python.org> |
| In reply to | #45622 |
On Tue, May 21, 2013 at 2:12 AM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > Personally, I would strongly suggest writing your own mini- > evaluator that walks the list and evaluates it by hand. It isn't as > convenient as just calling eval, but *definitely* safer. Probably faster, too, for what it's worth - eval is pretty expensive. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Frank Millman <frank@chagford.com> |
|---|---|
| Date | 2013-05-21 08:30 +0200 |
| Message-ID | <mailman.1905.1369117820.3114.python-list@python.org> |
| In reply to | #45622 |
On 20/05/2013 18:12, Steven D'Aprano wrote:
> On Mon, 20 May 2013 15:26:02 +0200, Frank Millman wrote:
>
>> Can anyone see anything wrong with the following approach. I have not
>> definitely decided to do it this way, but I have been experimenting and
>> it seems to work.
>>
[...]
>
> It seems safe to me too, but then any fool can come up with a system
> which they themselves cannot break :-)
>
Thanks for the detailed response.
> I think the real worry is validating the column name. That will be
> critical.
I would not pass the actual column name to eval(), I would use it to
retrieve a value from a data object and pass that to eval(). However,
then your point becomes 'validating the value retrieved'. I had not
thought about that. I will investigate further.
> Personally, I would strongly suggest writing your own mini-
> evaluator that walks the list and evaluates it by hand. It isn't as
> convenient as just calling eval, but *definitely* safer.
>
I am not sure I can wrap my mind around mixed 'and's, 'or's, and brackets.
[Thinking aloud]
Maybe I can manually reduce each internal test to a True or False,
substitute them in the list, and then call eval() on the result.
eval('(True and False) or (False or True)')
I will experiment with that.
> If you do call eval, make sure you supply the globals and locals
> arguments. The usual way is:
>
> eval(expression, {'__builtins__': None}, {})
>
> which gives you an empty locals() and a minimal, (mostly) safe globals.
>
Thanks - I did not know about that.
> Finally, as a "belt-and-braces" approach, I wouldn't even call eval
> directly, but call a thin wrapper that raises an exception if the
> expression contains an underscore. Underscores are usually the key to
> breaking eval, so refusing to evaluate anything with an underscore raises
> the barrier very high.
>
> And even with all those defences, I wouldn't allow untrusted data from
> the Internet anywhere near this. Just because I can't break it, doesn't
> mean it's safe.
>
All good advice - much appreciated.
Frank
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-05-21 07:21 +0000 |
| Message-ID | <519b2096$0$6574$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #45645 |
On Tue, 21 May 2013 08:30:03 +0200, Frank Millman wrote:
> On 20/05/2013 18:12, Steven D'Aprano wrote:
>> Personally, I would strongly suggest writing your own mini- evaluator
>> that walks the list and evaluates it by hand. It isn't as convenient as
>> just calling eval, but *definitely* safer.
>>
>>
> I am not sure I can wrap my mind around mixed 'and's, 'or's, and
> brackets.
Parsers are a solved problem in computer science, he says as if he had a
clue what he was talking about *wink*
Here's a sketch of a solution... suppose you have a sequence of records,
looking like this:
(bool_op, column_name, comparison_op, literal)
with appropriate validation on each field. The very first record has
bool_op set to "or". Then, you do something like this:
import operator
OPERATORS = {
'=': operator.eq,
'is': operator.is_,
'<': operator.lt,
# etc.
}
def eval_op(column_name, op, literal):
value = lookup(column_name) # whatever...
return OPERATORS[op](value, literal)
result = False
for (bool_op, column_name, comparison_op, literal) in sequence:
flag = eval_op(column_name, comparison_op, literal)
if bool_op == 'and':
result = result and flag
else:
assert bool_op == 'or'
result = result or flag
# Lazy processing?
if result:
break
and in theory it should all Just Work.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Frank Millman <frank@chagford.com> |
|---|---|
| Date | 2013-05-21 10:06 +0200 |
| Message-ID | <mailman.1909.1369123609.3114.python-list@python.org> |
| In reply to | #45647 |
On 21/05/2013 09:21, Steven D'Aprano wrote:
> On Tue, 21 May 2013 08:30:03 +0200, Frank Millman wrote:
>
>> I am not sure I can wrap my mind around mixed 'and's, 'or's, and
>> brackets.
>
> Parsers are a solved problem in computer science, he says as if he had a
> clue what he was talking about *wink*
>
> Here's a sketch of a solution... suppose you have a sequence of records,
> looking like this:
>
> (bool_op, column_name, comparison_op, literal)
>
> with appropriate validation on each field. The very first record has
> bool_op set to "or". Then, you do something like this:
>
> import operator
> OPERATORS = {
> '=': operator.eq,
> 'is': operator.is_,
> '<': operator.lt,
> # etc.
> }
>
> def eval_op(column_name, op, literal):
> value = lookup(column_name) # whatever...
> return OPERATORS[op](value, literal)
>
> result = False
>
> for (bool_op, column_name, comparison_op, literal) in sequence:
> flag = eval_op(column_name, comparison_op, literal)
> if bool_op == 'and':
> result = result and flag
> else:
> assert bool_op == 'or'
> result = result or flag
> # Lazy processing?
> if result:
> break
>
> and in theory it should all Just Work.
That's very clever - thanks, Steven.
It doesn't address the issue of brackets. I imagine that the answer is
something like -
maintain a stack of results
for each left bracket, push a level
for each right bracket, pop the result
or something ...
I am sure that with enough trial and error I can get it working, but I
might cheat for now and use the trick I mentioned earlier of calling
eval() on a sequence of manually derived True/False values. I really
can't see anything going wrong with that.
BTW, thanks to ChrisA for the following tip -
import operator
ops = {
'in':lambda x,y: x in y, # operator.contains has the args backwards
I would have battled with that one.
Frank
[toc] | [prev] | [next] | [standalone]
| From | Fábio Santos <fabiosantosart@gmail.com> |
|---|---|
| Date | 2013-05-21 09:23 +0100 |
| Message-ID | <mailman.1910.1369124922.3114.python-list@python.org> |
| In reply to | #45647 |
[Multipart message — attachments visible in raw view] — view raw
On 21 May 2013 09:10, "Frank Millman" <frank@chagford.com> wrote: > It doesn't address the issue of brackets. I imagine that the answer is something like - > > maintain a stack of results > for each left bracket, push a level > for each right bracket, pop the result > > or something ... > Time for me to suggest pyparsing or PLY. You're better off creating your own AST and walking it to produce python or SQL than reinventing the wheel, I think.
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2013-05-21 10:00 +0100 |
| Message-ID | <mailman.1911.1369126836.3114.python-list@python.org> |
| In reply to | #45647 |
On 21/05/2013 09:23, Fábio Santos wrote: > > On 21 May 2013 09:10, "Frank Millman" <frank@chagford.com > <mailto:frank@chagford.com>> wrote: > > It doesn't address the issue of brackets. I imagine that the answer > is something like - > > > > maintain a stack of results > > for each left bracket, push a level > > for each right bracket, pop the result > > > > or something ... > > > > Time for me to suggest pyparsing or PLY. You're better off creating your > own AST and walking it to produce python or SQL than reinventing the > wheel, I think. > Or pick one from this lot http://nedbatchelder.com/text/python-parsers.html -- If you're using GoogleCrap™ please read this http://wiki.python.org/moin/GoogleGroupsPython. Mark Lawrence
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web