Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #60631 > unrolled thread
| Started by | "magnus.lycka@gmail.com" <magnus.lycka@gmail.com> |
|---|---|
| First post | 2013-11-27 11:40 -0800 |
| Last post | 2013-11-27 22:08 +0000 |
| Articles | 7 — 5 participants |
Back to article view | Back to comp.lang.python
'_[1]' in .co_names using builtin compile() in Python 2.6 "magnus.lycka@gmail.com" <magnus.lycka@gmail.com> - 2013-11-27 11:40 -0800
Re: '_[1]' in .co_names using builtin compile() in Python 2.6 Ned Batchelder <ned@nedbatchelder.com> - 2013-11-27 15:09 -0500
Re: '_[1]' in .co_names using builtin compile() in Python 2.6 "magnus.lycka@gmail.com" <magnus.lycka@gmail.com> - 2013-11-28 03:17 -0800
Re: '_[1]' in .co_names using builtin compile() in Python 2.6 Ian Kelly <ian.g.kelly@gmail.com> - 2013-11-27 13:23 -0700
Re: '_[1]' in .co_names using builtin compile() in Python 2.6 Chris Kaynor <ckaynor@zindagigames.com> - 2013-11-27 12:44 -0800
Re: '_[1]' in .co_names using builtin compile() in Python 2.6 Ned Batchelder <ned@nedbatchelder.com> - 2013-11-27 16:26 -0500
Re: '_[1]' in .co_names using builtin compile() in Python 2.6 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-27 22:08 +0000
| From | "magnus.lycka@gmail.com" <magnus.lycka@gmail.com> |
|---|---|
| Date | 2013-11-27 11:40 -0800 |
| Subject | '_[1]' in .co_names using builtin compile() in Python 2.6 |
| Message-ID | <6cdefe87-5703-4caf-91c0-b4a02674a1e5@googlegroups.com> |
When I run e.g. compile('sin(5) * cos(6)', '<string>', 'eval').co_names, I get ('sin', 'cos'), which is just what I expected.
But when I have a list comprehension in the expression, I get a little surprise:
>>> compile('[x*x for x in y]', '<string>', 'eval').co_names
('_[1]', 'y', 'x')
>>>
This happens in Python 2.6.6 on Red Hat Linux, but not when I run Python 2.7.3 in Windows. Unfortunately I'm stuck with 2.6.
* Are there more surprises similar to this one that I can expect from compile(...).co_names? Is this "behaviour" documented somewhere?
* Is there perhaps a better way to achieve what I'm trying to do?
What I'm really after, is to check that python expressions embedded in text files are:
- well behaved (no syntax errors etc)
- don't accidentally access anything it shouldn't
- I serve them with the values they need on execution
So, in the case of "a.b + x" I'm really just interested in a and x, not b. So the (almost) whole story is that I do:
# Find names not starting with ".", i.e a & b in "a.c + b"
abbr_expr = re.sub(r"\.\w+", "", expr)
names = compile(abbr_expr, '<string>', 'eval').co_names
# Python 2.6 returns '_[1]' in co_names for list comprehension. Bug?
names = [name for name in names if re.match(r'\w+$', name)]
for name in names:
if name not in allowed_names:
raise NameError('Name: %s not permitted in expression: %s' % (name, expr))
[toc] | [next] | [standalone]
| From | Ned Batchelder <ned@nedbatchelder.com> |
|---|---|
| Date | 2013-11-27 15:09 -0500 |
| Message-ID | <mailman.3316.1385582986.18130.python-list@python.org> |
| In reply to | #60631 |
On 11/27/13 2:40 PM, magnus.lycka@gmail.com wrote:
> When I run e.g. compile('sin(5) * cos(6)', '<string>', 'eval').co_names, I get ('sin', 'cos'), which is just what I expected.
>
> But when I have a list comprehension in the expression, I get a little surprise:
>>>> compile('[x*x for x in y]', '<string>', 'eval').co_names
> ('_[1]', 'y', 'x')
>>>>
>
> This happens in Python 2.6.6 on Red Hat Linux, but not when I run Python 2.7.3 in Windows. Unfortunately I'm stuck with 2.6.
>
> * Are there more surprises similar to this one that I can expect from compile(...).co_names? Is this "behaviour" documented somewhere?
>
That name is the name of the list being built by the comprehension,
which I found out by disassembling the code object to see the bytecodes:
>>> co = compile("[x*x for x in y]", "<s>", "eval")
>>> co.co_names
('_[1]', 'y', 'x')
>>> import dis
>>> dis.dis(co)
1 0 BUILD_LIST 0
3 DUP_TOP
4 STORE_NAME 0 (_[1])
7 LOAD_NAME 1 (y)
10 GET_ITER
>> 11 FOR_ITER 17 (to 31)
14 STORE_NAME 2 (x)
17 LOAD_NAME 0 (_[1])
20 LOAD_NAME 2 (x)
23 LOAD_NAME 2 (x)
26 BINARY_MULTIPLY
27 LIST_APPEND
28 JUMP_ABSOLUTE 11
>> 31 DELETE_NAME 0 (_[1])
34 RETURN_VALUE
The same list comprehension in 2.7 uses an unnamed list on the stack:
1 0 BUILD_LIST 0
3 LOAD_NAME 0 (y)
6 GET_ITER
>> 7 FOR_ITER 16 (to 26)
10 STORE_NAME 1 (x)
13 LOAD_NAME 1 (x)
16 LOAD_NAME 1 (x)
19 BINARY_MULTIPLY
20 LIST_APPEND 2
23 JUMP_ABSOLUTE 7
>> 26 RETURN_VALUE
I don't know whether such facts are documented. They are deep
implementation details, and change from version to version, as you've seen.
> * Is there perhaps a better way to achieve what I'm trying to do?
>
> What I'm really after, is to check that python expressions embedded in text files are:
> - well behaved (no syntax errors etc)
> - don't accidentally access anything it shouldn't
> - I serve them with the values they need on execution
I hope you aren't trying to prevent malice this way: you cannot examine
a piece of Python code to prove that it's safe to execute. For an
extreme example, see: Eval Really Is Dangerous:
http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html
In your environment it looks like you have a whitelist of identifiers,
so you're probably ok.
>
> So, in the case of "a.b + x" I'm really just interested in a and x, not b. So the (almost) whole story is that I do:
>
> # Find names not starting with ".", i.e a & b in "a.c + b"
> abbr_expr = re.sub(r"\.\w+", "", expr)
> names = compile(abbr_expr, '<string>', 'eval').co_names
> # Python 2.6 returns '_[1]' in co_names for list comprehension. Bug?
> names = [name for name in names if re.match(r'\w+$', name)]
>
> for name in names:
> if name not in allowed_names:
> raise NameError('Name: %s not permitted in expression: %s' % (name, expr))
>
I don't know of a better way to determine the real names in the
expression. I doubt Python will insert a valid name into the namespace,
since it doesn't want to step on real user names. The simplest way to
do that is to autogenerate invalid names, like "_[1]" (I wonder why it
isn't "_[0]"?)
--Ned.
[toc] | [prev] | [next] | [standalone]
| From | "magnus.lycka@gmail.com" <magnus.lycka@gmail.com> |
|---|---|
| Date | 2013-11-28 03:17 -0800 |
| Message-ID | <1615c7ae-f1d2-4368-ac74-775be3a9eb47@googlegroups.com> |
| In reply to | #60632 |
On Wednesday, November 27, 2013 9:09:32 PM UTC+1, Ned Batchelder wrote: > I hope you aren't trying to prevent malice this way: you cannot examine > a piece of Python code to prove that it's safe to execute. No worry. Whoever has access to modifying those configuration files can cause a mess in all sorts of other ways, such as writing and running arbitrary programs. I just want to give reasonably rapid feedback when people make mistakes. As with all python code, it's very important to test properly, but the top level names are often defined elsewhere in the configuration, so I want to catch those errors ASAP.
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2013-11-27 13:23 -0700 |
| Message-ID | <mailman.3318.1385583798.18130.python-list@python.org> |
| In reply to | #60631 |
[Multipart message — attachments visible in raw view] — view raw
On Nov 27, 2013 2:11 PM, "Ned Batchelder" <ned@nedbatchelder.com> wrote:
>
> On 11/27/13 2:40 PM, magnus.lycka@gmail.com wrote:
>>
>> So, in the case of "a.b + x" I'm really just interested in a and x, not
b. So the (almost) whole story is that I do:
>>
>> # Find names not starting with ".", i.e a & b in "a.c + b"
>> abbr_expr = re.sub(r"\.\w+", "", expr)
>> names = compile(abbr_expr, '<string>', 'eval').co_names
>> # Python 2.6 returns '_[1]' in co_names for list comprehension. Bug?
>> names = [name for name in names if re.match(r'\w+$', name)]
>>
>> for name in names:
>> if name not in allowed_names:
>> raise NameError('Name: %s not permitted in expression: %s'
% (name, expr))
>>
>
> I don't know of a better way to determine the real names in the
expression. I doubt Python will insert a valid name into the namespace,
since it doesn't want to step on real user names. The simplest way to do
that is to autogenerate invalid names, like "_[1]" (I wonder why it isn't
"_[0]"?)
One possible alternative is to use the ast module to examine the parse tree
of the expression instead of the generated code object. Hard to say whether
that would be "better".
[toc] | [prev] | [next] | [standalone]
| From | Chris Kaynor <ckaynor@zindagigames.com> |
|---|---|
| Date | 2013-11-27 12:44 -0800 |
| Message-ID | <mailman.3321.1385585077.18130.python-list@python.org> |
| In reply to | #60631 |
[Multipart message — attachments visible in raw view] — view raw
On Wed, Nov 27, 2013 at 12:09 PM, Ned Batchelder <ned@nedbatchelder.com>wrote: > * Is there perhaps a better way to achieve what I'm trying to do? >> >> What I'm really after, is to check that python expressions embedded in >> text files are: >> - well behaved (no syntax errors etc) >> - don't accidentally access anything it shouldn't >> - I serve them with the values they need on execution >> > > I hope you aren't trying to prevent malice this way: you cannot examine a > piece of Python code to prove that it's safe to execute. For an extreme > example, see: Eval Really Is Dangerous: http://nedbatchelder.com/blog/ > 201206/eval_really_is_dangerous.html > > In your environment it looks like you have a whitelist of identifiers, so > you're probably ok. I just tested the crash example from that link in Python 2.7.5 win64 and the co_names from the compiled code is empty. Therefore, a simple whitelist would not catch that problematic code (and likely any other global access done correctly). Even a simple test of making sure that at least one (or any number of) valid identifier exists would be insufficent, as you can merely tack on a ",a" to add "a" to the co_names, and thus for any other variable. Basically, even with a pure whitelist, there is likely no possible way to make eval/exec safe, unless you also eliminate the ability to make literals. Chris
[toc] | [prev] | [next] | [standalone]
| From | Ned Batchelder <ned@nedbatchelder.com> |
|---|---|
| Date | 2013-11-27 16:26 -0500 |
| Message-ID | <mailman.3322.1385587627.18130.python-list@python.org> |
| In reply to | #60631 |
On 11/27/13 3:44 PM, Chris Kaynor wrote: > On Wed, Nov 27, 2013 at 12:09 PM, Ned Batchelder <ned@nedbatchelder.com > <mailto:ned@nedbatchelder.com>> wrote: > > * Is there perhaps a better way to achieve what I'm trying to do? > > What I'm really after, is to check that python expressions > embedded in text files are: > - well behaved (no syntax errors etc) > - don't accidentally access anything it shouldn't > - I serve them with the values they need on execution > > > I hope you aren't trying to prevent malice this way: you cannot > examine a piece of Python code to prove that it's safe to execute. > For an extreme example, see: Eval Really Is Dangerous: > http://nedbatchelder.com/blog/__201206/eval_really_is___dangerous.html > <http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html> > > In your environment it looks like you have a whitelist of > identifiers, so you're probably ok. > > > I just tested the crash example from that link in Python 2.7.5 win64 and > the co_names from the compiled code is empty. Therefore, a simple > whitelist would not catch that problematic code (and likely any other > global access done correctly). Even a simple test of making sure that at > least one (or any number of) valid identifier exists would be > insufficent, as you can merely tack on a ",a" to add "a" to the > co_names, and thus for any other variable. Ah, right you are! I neglected to go back and examine the dangerous code. So eval really is dangerous! --Ned. > > Basically, even with a pure whitelist, there is likely no possible way > to make eval/exec safe, unless you also eliminate the ability to make > literals. > > Chris > >
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-11-27 22:08 +0000 |
| Message-ID | <52966d72$0$29993$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #60631 |
On Wed, 27 Nov 2013 11:40:52 -0800, magnus.lycka@gmail.com wrote:
> What I'm really after, is to check that python expressions embedded in
> text files are: - well behaved (no syntax errors etc) - don't
> accidentally access anything it shouldn't - I serve them with the values
> they need on execution
If you are trying to get safe execution of untrusted code in Python, you
should read this recent thread from the Python core developers:
https://mail.python.org/pipermail/python-dev/2013-November/130132.html
Probably the only way to securely sandbox untrusted Python code is to use
operating system level security (such as a chroot jail) or an
implementation such as PyPy which has been designed from the beginning to
be sandboxed -- and even that may simply mean that nobody has broken out
of PyPy's sandbox *yet*.
Looking back at your example:
compile('sin(5) * cos(6)', '<string>', 'eval').co_names
I'm not sure I understand why you inspect the co_names. What does that
give you? You can tell that there are no syntax errors just by compiling
it, if there are syntax errors it will raise SyntaxError.
I would pre-process the string before compiling and disallow *anything*
containing "eval", "exec", or underscore. I'd also apply a limit to the
total length of the string. That doesn't necessarily rule out a hostile
user running arbitrary code, but it does make it harder.
Also, when you execute the compiled code, don't do this:
eval(code) # No!
Instead, provide an explicit globals and locals namespace:
safe_ish_namespace = {'__builtins__': None}
eval(code, safe_ish_namespace)
Again, this increases the barrier to somebody hacking out of your sandbox
without ruling it out altogether.
Good luck!
--
Steven
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web