Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #92394 > unrolled thread
| Started by | Robert Kern <robert.kern@gmail.com> |
|---|---|
| First post | 2015-06-10 12:22 +0100 |
| Last post | 2015-06-10 20:47 -0400 |
| Articles | 6 — 4 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: enhancement request: make py3 read/write py2 pickle format Robert Kern <robert.kern@gmail.com> - 2015-06-10 12:22 +0100
Re: enhancement request: make py3 read/write py2 pickle format Marko Rauhamaa <marko@pacujo.net> - 2015-06-10 15:08 +0300
Re: enhancement request: make py3 read/write py2 pickle format random832@fastmail.us - 2015-06-10 09:38 -0400
Re: enhancement request: make py3 read/write py2 pickle format Robert Kern <robert.kern@gmail.com> - 2015-06-10 14:52 +0100
Re: enhancement request: make py3 read/write py2 pickle format Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-06-11 11:30 +1200
Re: enhancement request: make py3 read/write py2 pickle format random832@fastmail.us - 2015-06-10 20:47 -0400
| From | Robert Kern <robert.kern@gmail.com> |
|---|---|
| Date | 2015-06-10 12:22 +0100 |
| Subject | Re: enhancement request: make py3 read/write py2 pickle format |
| Message-ID | <mailman.337.1433935377.13271.python-list@python.org> |
On 2015-06-10 12:04, Neal Becker wrote: > Chris Warrick wrote: > >> On Tue, Jun 9, 2015 at 8:08 PM, Neal Becker <ndbecker2@gmail.com> wrote: >>> One of the most annoying problems with py2/3 interoperability is that the >>> pickle formats are not compatible. There must be many who, like myself, >>> often use pickle format for data storage. >>> >>> It certainly would be a big help if py3 could read/write py2 pickle >>> format. You know, backward compatibility? >> >> Don’t use pickle. It’s unsafe — it executes arbitrary code, which >> means someone can give you a pickle file that will delete all your >> files or eat your cat. >> >> Instead, use a safe format that has no ability to execute code, like >> JSON. It will also work with other programming languages and >> environments if you ever need to talk to anyone else. >> >> But, FYI: there is backwards compatibility if you ask for it, in the >> form of protocol versions. That’s all you should know — again, don’t >> use pickle. > > I believe a good native serialization system is essential for any modern > programming language. If pickle isn't it, we need something else that can > serialize all language objects. Or, are you saying, it's impossible to do > this safely? By the very nature of the stated problem: serializing all language objects. Being able to construct any object, including instances of arbitrary classes, means that arbitrary code can be executed. All I have to do is make a pickle file for an object that claims that its constructor is shutil.rmtree(). This is fine in some use cases (e.g. wire format for otherwise-secured communication between two endpoints under your complete control), but it is worrying in others, like your use case of data storage (and presumably sharing). Python 2/3 is also the least of your compatibility worries there. Refactor a class to a different module, or did one of your third-party dependencies do this? Poof! Your pickle files no longer work. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
[toc] | [next] | [standalone]
| From | Marko Rauhamaa <marko@pacujo.net> |
|---|---|
| Date | 2015-06-10 15:08 +0300 |
| Message-ID | <878ubr3gv8.fsf@elektro.pacujo.net> |
| In reply to | #92394 |
Robert Kern <robert.kern@gmail.com>:
> By the very nature of the stated problem: serializing all language
> objects. Being able to construct any object, including instances of
> arbitrary classes, means that arbitrary code can be executed. All I
> have to do is make a pickle file for an object that claims that its
> constructor is shutil.rmtree().
You can't serialize/migrate arbitrary objects. Consider open TCP
connections, open files and other objects that extend outside the Python
VM. Also objects hold references to each other, leading to a huge
reference mesh.
For example:
a.buddy = b
b.buddy = a
with open("a", "wb") as f: f.write(serialize(a))
with open("b", "wb") as f: f.write(serialize(b))
with open("a", "rb") as f: aa = deserialize(f.read())
with open("b", "rb") as f: bb = deserialize(f.read())
assert aa.buddy is bb
Marko
[toc] | [prev] | [next] | [standalone]
| From | random832@fastmail.us |
|---|---|
| Date | 2015-06-10 09:38 -0400 |
| Message-ID | <mailman.340.1433943523.13271.python-list@python.org> |
| In reply to | #92396 |
On Wed, Jun 10, 2015, at 08:08, Marko Rauhamaa wrote:
> You can't serialize/migrate arbitrary objects. Consider open TCP
> connections, open files and other objects that extend outside the Python
> VM. Also objects hold references to each other, leading to a huge
> reference mesh.
>
> For example:
>
> a.buddy = b
> b.buddy = a
> with open("a", "wb") as f: f.write(serialize(a))
> with open("b", "wb") as f: f.write(serialize(b))
>
> with open("a", "rb") as f: aa = deserialize(f.read())
> with open("b", "rb") as f: bb = deserialize(f.read())
> assert aa.buddy is bb
Of course, if you serialize a single dict with e.g. {'a': a, 'b': b},
you can expect (with advanced serialization tools, anyway - I suspect
JSON will just make a mess or exceed maximum recursion depth)
result['a'].buddy is result['b']
[toc] | [prev] | [next] | [standalone]
| From | Robert Kern <robert.kern@gmail.com> |
|---|---|
| Date | 2015-06-10 14:52 +0100 |
| Message-ID | <mailman.341.1433944371.13271.python-list@python.org> |
| In reply to | #92396 |
On 2015-06-10 13:08, Marko Rauhamaa wrote:
> Robert Kern <robert.kern@gmail.com>:
>
>> By the very nature of the stated problem: serializing all language
>> objects. Being able to construct any object, including instances of
>> arbitrary classes, means that arbitrary code can be executed. All I
>> have to do is make a pickle file for an object that claims that its
>> constructor is shutil.rmtree().
>
> You can't serialize/migrate arbitrary objects. Consider open TCP
> connections, open files and other objects that extend outside the Python
> VM.
Yes, yes, but that's really beside the point. Yes, there are some objects for
which it doesn't even make sense to serialize. But my point is that even in this
slightly smaller set of objects that *can* be serialized (and pickle currently
does serialize), being able to serialize all of them entails arbitrary code
execution to deserialize them. To allow people to write their own types that can
be serialized, you have to let them specify arbitrary callables that will do the
reconstruction. If you whitelist the possible reconstruction callables, you have
greatly restricted the types that can participate in the serialization system.
> Also objects hold references to each other, leading to a huge
> reference mesh.
>
> For example:
>
> a.buddy = b
> b.buddy = a
> with open("a", "wb") as f: f.write(serialize(a))
> with open("b", "wb") as f: f.write(serialize(b))
>
> with open("a", "rb") as f: aa = deserialize(f.read())
> with open("b", "rb") as f: bb = deserialize(f.read())
> assert aa.buddy is bb
Yeah, no one expects that to work. For example, if I deserialize the same string
twice, you can't expect to get identical returned objects (as in,
"deserialize(pickle) is deserialize(pickle)"). However, pickle does correctly
handle fairly arbitrary reference graphs within the context of a single
serialization, which is the most that can be asked of a serialization system.
That isn't really a concern here.
>>> class A(object):
... pass
...
>>> a = A()
>>> b = A()
>>> a.buddy = b
>>> b.buddy = a
>>> data = [a, b]
>>> data[0].buddy is data[1]
True
>>> data[1].buddy is data[0]
True
>>> import cPickle
>>> unpickled = cPickle.loads(cPickle.dumps(data))
>>> unpickled[0].buddy is unpickled[1]
True
>>> unpickled[1].buddy is unpickled[0]
True
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2015-06-11 11:30 +1200 |
| Message-ID | <ctrvkhFfutdU1@mid.individual.net> |
| In reply to | #92399 |
Robert Kern wrote: > To allow people to write their own types that can be serialized, > you have to let them specify arbitrary callables that will do the > reconstruction. If you whitelist the possible reconstruction callables, > you have greatly restricted the types that can participate in the > serialization system. If whitelisting a type is the *only* thing you need to do to make it serialisable, I think that comes close enough to the stated goal of being able to "serialise all [potentially serialisable] language objects". Having to be explicit about which types are deserialisable is probably a good thing anyway. It gives you an opportunity to specify the mapping between the external format and class names, so that your serialised data doesn't contain assumptions about implementation details of your program. -- Greg
[toc] | [prev] | [next] | [standalone]
| From | random832@fastmail.us |
|---|---|
| Date | 2015-06-10 20:47 -0400 |
| Message-ID | <mailman.371.1433983690.13271.python-list@python.org> |
| In reply to | #92431 |
On Wed, Jun 10, 2015, at 19:30, Gregory Ewing wrote: > If whitelisting a type is the *only* thing you need to > do to make it serialisable, I think that comes close > enough to the stated goal of being able to "serialise > all [potentially serialisable] language objects". IMO the serialization framework should handle this by providing your own way to look them up (almost but not entirely unlike providing your own globals table to eval) rather than by having a whitelist.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web