Groups > comp.lang.python > #60631 > unrolled thread

'_[1]' in .co_names using builtin compile() in Python 2.6

Started by	"magnus.lycka@gmail.com" <magnus.lycka@gmail.com>
First post	2013-11-27 11:40 -0800
Last post	2013-11-27 22:08 +0000
Articles	7 — 5 participants

Back to article view | Back to comp.lang.python

  '_[1]' in .co_names using builtin compile() in Python 2.6 "magnus.lycka@gmail.com" <magnus.lycka@gmail.com> - 2013-11-27 11:40 -0800
    Re: '_[1]' in .co_names using builtin compile() in Python 2.6 Ned Batchelder <ned@nedbatchelder.com> - 2013-11-27 15:09 -0500
      Re: '_[1]' in .co_names using builtin compile() in Python 2.6 "magnus.lycka@gmail.com" <magnus.lycka@gmail.com> - 2013-11-28 03:17 -0800
    Re: '_[1]' in .co_names using builtin compile() in Python 2.6 Ian Kelly <ian.g.kelly@gmail.com> - 2013-11-27 13:23 -0700
    Re: '_[1]' in .co_names using builtin compile() in Python 2.6 Chris Kaynor <ckaynor@zindagigames.com> - 2013-11-27 12:44 -0800
    Re: '_[1]' in .co_names using builtin compile() in Python 2.6 Ned Batchelder <ned@nedbatchelder.com> - 2013-11-27 16:26 -0500
    Re: '_[1]' in .co_names using builtin compile() in Python 2.6 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-27 22:08 +0000

#60631 — '_[1]' in .co_names using builtin compile() in Python 2.6

From	"magnus.lycka@gmail.com" <magnus.lycka@gmail.com>
Date	2013-11-27 11:40 -0800
Subject	'_[1]' in .co_names using builtin compile() in Python 2.6
Message-ID	<6cdefe87-5703-4caf-91c0-b4a02674a1e5@googlegroups.com>

When I run e.g. compile('sin(5) * cos(6)', '<string>', 'eval').co_names, I get ('sin', 'cos'), which is just what I expected.

But when I have a list comprehension in the expression, I get a little surprise:
>>> compile('[x*x for x in y]',  '<string>', 'eval').co_names
('_[1]', 'y', 'x')
>>>

This happens in Python 2.6.6 on Red Hat Linux, but not when I run Python 2.7.3 in Windows. Unfortunately I'm stuck with 2.6.

* Are there more surprises similar to this one that I can expect from compile(...).co_names? Is this "behaviour" documented somewhere?

* Is there perhaps a better way to achieve what I'm trying to do?

What I'm really after, is to check that python expressions embedded in text files are:
- well behaved (no syntax errors etc)
- don't accidentally access anything it shouldn't
- I serve them with the values they need on execution

So, in the case of "a.b + x" I'm really just interested in a and x, not b. So the (almost) whole story is that I do:

    # Find names not starting with ".", i.e a & b in "a.c + b"
    abbr_expr = re.sub(r"\.\w+", "", expr)
    names = compile(abbr_expr, '<string>', 'eval').co_names
    # Python 2.6 returns '_[1]' in co_names for list comprehension. Bug?
    names = [name for name in names if re.match(r'\w+$', name)]

    for name in names:
        if name not in allowed_names:
            raise NameError('Name: %s not permitted in expression: %s' % (name, expr))

[toc] | [next] | [standalone]

#60632

From	Ned Batchelder <ned@nedbatchelder.com>
Date	2013-11-27 15:09 -0500
Message-ID	<mailman.3316.1385582986.18130.python-list@python.org>
In reply to	#60631

On 11/27/13 2:40 PM, magnus.lycka@gmail.com wrote:
> When I run e.g. compile('sin(5) * cos(6)', '<string>', 'eval').co_names, I get ('sin', 'cos'), which is just what I expected.
>
> But when I have a list comprehension in the expression, I get a little surprise:
>>>> compile('[x*x for x in y]',  '<string>', 'eval').co_names
> ('_[1]', 'y', 'x')
>>>>
>
> This happens in Python 2.6.6 on Red Hat Linux, but not when I run Python 2.7.3 in Windows. Unfortunately I'm stuck with 2.6.
>
> * Are there more surprises similar to this one that I can expect from compile(...).co_names? Is this "behaviour" documented somewhere?
>

That name is the name of the list being built by the comprehension, 
which I found out by disassembling the code object to see the bytecodes:

     >>> co = compile("[x*x for x in y]", "<s>", "eval")
     >>> co.co_names
     ('_[1]', 'y', 'x')
     >>> import dis
     >>> dis.dis(co)
       1           0 BUILD_LIST               0
                   3 DUP_TOP
                   4 STORE_NAME               0 (_[1])
                   7 LOAD_NAME                1 (y)
                  10 GET_ITER
             >>   11 FOR_ITER                17 (to 31)
                  14 STORE_NAME               2 (x)
                  17 LOAD_NAME                0 (_[1])
                  20 LOAD_NAME                2 (x)
                  23 LOAD_NAME                2 (x)
                  26 BINARY_MULTIPLY
                  27 LIST_APPEND
                  28 JUMP_ABSOLUTE           11
             >>   31 DELETE_NAME              0 (_[1])
                  34 RETURN_VALUE

The same list comprehension in 2.7 uses an unnamed list on the stack:

       1           0 BUILD_LIST               0
                   3 LOAD_NAME                0 (y)
                   6 GET_ITER
             >>    7 FOR_ITER                16 (to 26)
                  10 STORE_NAME               1 (x)
                  13 LOAD_NAME                1 (x)
                  16 LOAD_NAME                1 (x)
                  19 BINARY_MULTIPLY
                  20 LIST_APPEND              2
                  23 JUMP_ABSOLUTE            7
             >>   26 RETURN_VALUE

I don't know whether such facts are documented.  They are deep 
implementation details, and change from version to version, as you've seen.

> * Is there perhaps a better way to achieve what I'm trying to do?
>
> What I'm really after, is to check that python expressions embedded in text files are:
> - well behaved (no syntax errors etc)
> - don't accidentally access anything it shouldn't
> - I serve them with the values they need on execution

I hope you aren't trying to prevent malice this way: you cannot examine 
a piece of Python code to prove that it's safe to execute.  For an 
extreme example, see: Eval Really Is Dangerous: 
http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html

In your environment it looks like you have a whitelist of identifiers, 
so you're probably ok.

>
> So, in the case of "a.b + x" I'm really just interested in a and x, not b. So the (almost) whole story is that I do:
>
>      # Find names not starting with ".", i.e a & b in "a.c + b"
>      abbr_expr = re.sub(r"\.\w+", "", expr)
>      names = compile(abbr_expr, '<string>', 'eval').co_names
>      # Python 2.6 returns '_[1]' in co_names for list comprehension. Bug?
>      names = [name for name in names if re.match(r'\w+$', name)]
>
>      for name in names:
>          if name not in allowed_names:
>              raise NameError('Name: %s not permitted in expression: %s' % (name, expr))
>

I don't know of a better way to determine the real names in the 
expression.  I doubt Python will insert a valid name into the namespace, 
since it doesn't want to step on real user names.  The simplest way to 
do that is to autogenerate invalid names, like "_[1]" (I wonder why it 
isn't "_[0]"?)

--Ned.

[toc] | [prev] | [next] | [standalone]

#60683

From	"magnus.lycka@gmail.com" <magnus.lycka@gmail.com>
Date	2013-11-28 03:17 -0800
Message-ID	<1615c7ae-f1d2-4368-ac74-775be3a9eb47@googlegroups.com>
In reply to	#60632

On Wednesday, November 27, 2013 9:09:32 PM UTC+1, Ned Batchelder wrote:
> I hope you aren't trying to prevent malice this way: you cannot examine 
> a piece of Python code to prove that it's safe to execute. 

No worry. Whoever has access to modifying those configuration files
can cause a mess in all sorts of other ways, such as writing and running
arbitrary programs.

I just want to give reasonably rapid feedback when people make mistakes.

As with all python code, it's very important to test properly, but the
top level names are often defined elsewhere in the configuration, so I
want to catch those errors ASAP.

[toc] | [prev] | [next] | [standalone]

#60634

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2013-11-27 13:23 -0700
Message-ID	<mailman.3318.1385583798.18130.python-list@python.org>
In reply to	#60631

[Multipart message — attachments visible in raw view] — view raw

On Nov 27, 2013 2:11 PM, "Ned Batchelder" <ned@nedbatchelder.com> wrote:
>
> On 11/27/13 2:40 PM, magnus.lycka@gmail.com wrote:
>>
>> So, in the case of "a.b + x" I'm really just interested in a and x, not
b. So the (almost) whole story is that I do:
>>
>>      # Find names not starting with ".", i.e a & b in "a.c + b"
>>      abbr_expr = re.sub(r"\.\w+", "", expr)
>>      names = compile(abbr_expr, '<string>', 'eval').co_names
>>      # Python 2.6 returns '_[1]' in co_names for list comprehension. Bug?
>>      names = [name for name in names if re.match(r'\w+$', name)]
>>
>>      for name in names:
>>          if name not in allowed_names:
>>              raise NameError('Name: %s not permitted in expression: %s'
% (name, expr))
>>
>
> I don't know of a better way to determine the real names in the
expression.  I doubt Python will insert a valid name into the namespace,
since it doesn't want to step on real user names.  The simplest way to do
that is to autogenerate invalid names, like "_[1]" (I wonder why it isn't
"_[0]"?)

One possible alternative is to use the ast module to examine the parse tree
of the expression instead of the generated code object. Hard to say whether
that would be "better".

[toc] | [prev] | [next] | [standalone]

#60637

From	Chris Kaynor <ckaynor@zindagigames.com>
Date	2013-11-27 12:44 -0800
Message-ID	<mailman.3321.1385585077.18130.python-list@python.org>
In reply to	#60631

[Multipart message — attachments visible in raw view] — view raw

On Wed, Nov 27, 2013 at 12:09 PM, Ned Batchelder <ned@nedbatchelder.com>wrote:

> * Is there perhaps a better way to achieve what I'm trying to do?
>>
>> What I'm really after, is to check that python expressions embedded in
>> text files are:
>> - well behaved (no syntax errors etc)
>> - don't accidentally access anything it shouldn't
>> - I serve them with the values they need on execution
>>
>
> I hope you aren't trying to prevent malice this way: you cannot examine a
> piece of Python code to prove that it's safe to execute.  For an extreme
> example, see: Eval Really Is Dangerous: http://nedbatchelder.com/blog/
> 201206/eval_really_is_dangerous.html
>
> In your environment it looks like you have a whitelist of identifiers, so
> you're probably ok.

I just tested the crash example from that link in Python 2.7.5 win64 and
the co_names from the compiled code is empty. Therefore, a simple whitelist
would not catch that problematic code (and likely any other global access
done correctly). Even a simple test of making sure that at least one (or
any number of) valid identifier exists would be insufficent, as you can
merely tack on a ",a" to add "a" to the co_names, and thus for any other
variable.

Basically, even with a pure whitelist, there is likely no possible way to
make eval/exec safe, unless you also eliminate the ability to make literals.

Chris

[toc] | [prev] | [next] | [standalone]

#60639

From	Ned Batchelder <ned@nedbatchelder.com>
Date	2013-11-27 16:26 -0500
Message-ID	<mailman.3322.1385587627.18130.python-list@python.org>
In reply to	#60631

On 11/27/13 3:44 PM, Chris Kaynor wrote:
> On Wed, Nov 27, 2013 at 12:09 PM, Ned Batchelder <ned@nedbatchelder.com
> <mailto:ned@nedbatchelder.com>> wrote:
>
>         * Is there perhaps a better way to achieve what I'm trying to do?
>
>         What I'm really after, is to check that python expressions
>         embedded in text files are:
>         - well behaved (no syntax errors etc)
>         - don't accidentally access anything it shouldn't
>         - I serve them with the values they need on execution
>
>
>     I hope you aren't trying to prevent malice this way: you cannot
>     examine a piece of Python code to prove that it's safe to execute.
>       For an extreme example, see: Eval Really Is Dangerous:
>     http://nedbatchelder.com/blog/__201206/eval_really_is___dangerous.html
>     <http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html>
>
>     In your environment it looks like you have a whitelist of
>     identifiers, so you're probably ok.
>
>
> I just tested the crash example from that link in Python 2.7.5 win64 and
> the co_names from the compiled code is empty. Therefore, a simple
> whitelist would not catch that problematic code (and likely any other
> global access done correctly). Even a simple test of making sure that at
> least one (or any number of) valid identifier exists would be
> insufficent, as you can merely tack on a ",a" to add "a" to the
> co_names, and thus for any other variable.

Ah, right you are! I neglected to go back and examine the dangerous 
code.  So eval really is dangerous!

--Ned.

>
> Basically, even with a pure whitelist, there is likely no possible way
> to make eval/exec safe, unless you also eliminate the ability to make
> literals.
>
> Chris
>
>

[toc] | [prev] | [next] | [standalone]

#60641

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-11-27 22:08 +0000
Message-ID	<52966d72$0$29993$c3e8da3$5496439d@news.astraweb.com>
In reply to	#60631

On Wed, 27 Nov 2013 11:40:52 -0800, magnus.lycka@gmail.com wrote:

> What I'm really after, is to check that python expressions embedded in
> text files are: - well behaved (no syntax errors etc) - don't
> accidentally access anything it shouldn't - I serve them with the values
> they need on execution

If you are trying to get safe execution of untrusted code in Python, you 
should read this recent thread from the Python core developers:

https://mail.python.org/pipermail/python-dev/2013-November/130132.html

Probably the only way to securely sandbox untrusted Python code is to use 
operating system level security (such as a chroot jail) or an 
implementation such as PyPy which has been designed from the beginning to 
be sandboxed -- and even that may simply mean that nobody has broken out 
of PyPy's sandbox *yet*.

Looking back at your example:

compile('sin(5) * cos(6)', '<string>', 'eval').co_names

I'm not sure I understand why you inspect the co_names. What does that 
give you? You can tell that there are no syntax errors just by compiling 
it, if there are syntax errors it will raise SyntaxError.

I would pre-process the string before compiling and disallow *anything* 
containing "eval", "exec", or underscore. I'd also apply a limit to the 
total length of the string. That doesn't necessarily rule out a hostile 
user running arbitrary code, but it does make it harder.

Also, when you execute the compiled code, don't do this:

eval(code)  # No!

Instead, provide an explicit globals and locals namespace:

safe_ish_namespace = {'__builtins__': None}
eval(code, safe_ish_namespace)

Again, this increases the barrier to somebody hacking out of your sandbox 
without ruling it out altogether.

Good luck!

-- 
Steven

[toc] | [prev] | [standalone]

csiph-web

'_[1]' in .co_names using builtin compile() in Python 2.6

Contents

#60631 — '_[1]' in .co_names using builtin compile() in Python 2.6

#60632

#60683

#60634

#60637

#60639

#60641