Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #90134 > unrolled thread

To pickle or not to pickle

Started byCecil Westerhof <Cecil@decebal.nl>
First post2015-05-08 11:58 +0200
Last post2015-05-08 06:27 -0400
Articles 11 — 6 participants

Back to article view | Back to comp.lang.python


Contents

  To pickle or not to pickle Cecil Westerhof <Cecil@decebal.nl> - 2015-05-08 11:58 +0200
    Re: To pickle or not to pickle Peter Otten <__peter__@web.de> - 2015-05-08 12:32 +0200
      Re: To pickle or not to pickle Cecil Westerhof <Cecil@decebal.nl> - 2015-05-08 13:51 +0200
      Re: To pickle or not to pickle Irmen de Jong <irmen.NOSPAM@xs4all.nl> - 2015-05-08 19:11 +0200
    Re: To pickle or not to pickle Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-08 20:54 +1000
      Re: To pickle or not to pickle Cecil Westerhof <Cecil@decebal.nl> - 2015-05-08 13:55 +0200
        Re: To pickle or not to pickle Chris Angelico <rosuav@gmail.com> - 2015-05-08 22:53 +1000
          Re: To pickle or not to pickle Cecil Westerhof <Cecil@decebal.nl> - 2015-05-08 16:34 +0200
            Re: To pickle or not to pickle Chris Angelico <rosuav@gmail.com> - 2015-05-09 01:11 +1000
              Re: To pickle or not to pickle Cecil Westerhof <Cecil@decebal.nl> - 2015-05-08 18:43 +0200
    Re: To pickle or not to pickle Cem Karan <cfkaran2@gmail.com> - 2015-05-08 06:27 -0400

#90134 — To pickle or not to pickle

FromCecil Westerhof <Cecil@decebal.nl>
Date2015-05-08 11:58 +0200
SubjectTo pickle or not to pickle
Message-ID<87h9rnz8yy.fsf@Equus.decebal.nl>
I first used marshal in my filebasedMessages module. Then I read that
you should not use it, because it changes per Python version and it
was better to use pickle. So I did that and now I find:
    https://wiki.python.org/moin/Pickle

Is it really that bad and should I change again?

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

[toc] | [next] | [standalone]


#90135

FromPeter Otten <__peter__@web.de>
Date2015-05-08 12:32 +0200
Message-ID<mailman.229.1431081179.12865.python-list@python.org>
In reply to#90134
Cecil Westerhof wrote:

> I first used marshal in my filebasedMessages module. Then I read that
> you should not use it, because it changes per Python version and it
> was better to use pickle. So I did that and now I find:
>     https://wiki.python.org/moin/Pickle
> 
> Is it really that bad and should I change again?

Let's say it the other way around: pickle is fine for short term storage 
when the generation of the file is under your control and you only need to 
access it from Python. 

Does that description fit your requirements?

[toc] | [prev] | [next] | [standalone]


#90157

FromCecil Westerhof <Cecil@decebal.nl>
Date2015-05-08 13:51 +0200
Message-ID<87d22bz3q9.fsf@Equus.decebal.nl>
In reply to#90135
Op Friday 8 May 2015 12:32 CEST schreef Peter Otten:

> Cecil Westerhof wrote:
>
>> I first used marshal in my filebasedMessages module. Then I read
>> that you should not use it, because it changes per Python version
>> and it was better to use pickle. So I did that and now I find:
>> https://wiki.python.org/moin/Pickle
>>
>> Is it really that bad and should I change again?
>
> Let's say it the other way around: pickle is fine for short term
> storage when the generation of the file is under your control and
> you only need to access it from Python.
>
> Does that description fit your requirements?

Certainly. I use it to store which messages are ‘recently’ used, so I
will not use them for the next. I will keep it like this for the
moment being then.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

[toc] | [prev] | [next] | [standalone]


#90185

FromIrmen de Jong <irmen.NOSPAM@xs4all.nl>
Date2015-05-08 19:11 +0200
Message-ID<554cee28$0$2965$e4fe514c@news.xs4all.nl>
In reply to#90135
On 8-5-2015 12:32, Peter Otten wrote:
> Cecil Westerhof wrote:
> 
>> I first used marshal in my filebasedMessages module. Then I read that
>> you should not use it, because it changes per Python version and it
>> was better to use pickle. So I did that and now I find:
>>     https://wiki.python.org/moin/Pickle
>>
>> Is it really that bad and should I change again?
> 
> Let's say it the other way around: pickle is fine for short term storage 
> when the generation of the file is under your control and you only need to 
> access it from Python. 

The latter is not really a restriction, if you want to use it from Java or .NET.
https://github.com/irmen/Pyrolite  provides an (un)pickler for these platforms.


-irmen

[toc] | [prev] | [next] | [standalone]


#90141

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2015-05-08 20:54 +1000
Message-ID<554c95df$0$12988$c3e8da3$5496439d@news.astraweb.com>
In reply to#90134
On Fri, 8 May 2015 07:58 pm, Cecil Westerhof wrote:

> I first used marshal in my filebasedMessages module. Then I read that
> you should not use it, because it changes per Python version and it
> was better to use pickle. So I did that and now I find:
>     https://wiki.python.org/moin/Pickle
> 
> Is it really that bad and should I change again?

marshal is really only for Python's internal use. I think that if Python was
created today, marshal would probably be an undocumented and internal-only
module.

pickle is quite safe provided you trust the environment you are running in
and the source of the pickle files. If you don't trust them, then you
should avoid pickle and use a format which doesn't execute code.

You could use JSON, plists, ini-files, or XML, all of which are text-based
and handled by the standard library. There is also YAML, but you have to
use a third-party library for that.

You might also look at the "serpent" serialisation format used by Pyro:

https://pypi.python.org/pypi/serpent

If your code is only going to be used by yourself, I'd just use pickle. If
you are creating an application for others to use, I would spend the extra
effort to build in support for at least pickle, JSON and plists, and let
the user decide what they prefer.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#90154

FromCecil Westerhof <Cecil@decebal.nl>
Date2015-05-08 13:55 +0200
Message-ID<878uczz3ks.fsf@Equus.decebal.nl>
In reply to#90141
Op Friday 8 May 2015 12:54 CEST schreef Steven D'Aprano:

> If your code is only going to be used by yourself, I'd just use
> pickle. If you are creating an application for others to use, I
> would spend the extra effort to build in support for at least
> pickle, JSON and plists, and let the user decide what they prefer.

Well, I put it on GitHub, so I hope it is going to be used by others
also. ;-) There are other things that are more urgent at the moment,
but in the future I will implement JSON and plists then.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

[toc] | [prev] | [next] | [standalone]


#90163

FromChris Angelico <rosuav@gmail.com>
Date2015-05-08 22:53 +1000
Message-ID<mailman.246.1431089615.12865.python-list@python.org>
In reply to#90154
On Fri, May 8, 2015 at 9:55 PM, Cecil Westerhof <Cecil@decebal.nl> wrote:
> Op Friday 8 May 2015 12:54 CEST schreef Steven D'Aprano:
>
>> If your code is only going to be used by yourself, I'd just use
>> pickle. If you are creating an application for others to use, I
>> would spend the extra effort to build in support for at least
>> pickle, JSON and plists, and let the user decide what they prefer.
>
> Well, I put it on GitHub, so I hope it is going to be used by others
> also. ;-) There are other things that are more urgent at the moment,
> but in the future I will implement JSON and plists then.

But will the pickle files be shared? If not, they're still nice and
private, and fairly safe. The problem comes when, for instance, you
have a client Python program that pickles data and sends it over a
network to a server Python program to be unpickled, because then
someone could craft a malicious pickle and send it to you to eat. If
they're only ever saved locally and re-read, there shouldn't be any
security risk (anyone who could reach in and edit the pickle file
could probably reach in and change the code anyway).

That said, if your needs are sufficiently simple, it may be worth
using something plain text just for the debuggability.

ChrisA

[toc] | [prev] | [next] | [standalone]


#90173

FromCecil Westerhof <Cecil@decebal.nl>
Date2015-05-08 16:34 +0200
Message-ID<87zj5fxhm5.fsf@Equus.decebal.nl>
In reply to#90163
Op Friday 8 May 2015 14:53 CEST schreef Chris Angelico:

> On Fri, May 8, 2015 at 9:55 PM, Cecil Westerhof <Cecil@decebal.nl> wrote:
>> Op Friday 8 May 2015 12:54 CEST schreef Steven D'Aprano:
>>
>>> If your code is only going to be used by yourself, I'd just use
>>> pickle. If you are creating an application for others to use, I
>>> would spend the extra effort to build in support for at least
>>> pickle, JSON and plists, and let the user decide what they prefer.
>>
>> Well, I put it on GitHub, so I hope it is going to be used by
>> others also. ;-) There are other things that are more urgent at the
>> moment, but in the future I will implement JSON and plists then.
>
> But will the pickle files be shared? If not, they're still nice and
> private, and fairly safe. The problem comes when, for instance, you
> have a client Python program that pickles data and sends it over a
> network to a server Python program to be unpickled, because then
> someone could craft a malicious pickle and send it to you to eat. If
> they're only ever saved locally and re-read, there shouldn't be any
> security risk (anyone who could reach in and edit the pickle file
> could probably reach in and change the code anyway).

I would expect not. But I never know what someone else is going to do.
;-)

But in my case there is a Twitter directory with:
    quotes.txt
    quotes.pickle
    tips.txt
    tips.pickle

All four files are normally only accessed by the Python program. When
I want to extend the messages I use a text editor to append them.

The .txt files contain messages that can be used. And the .pickle
files contain the ‘recently’ used messages.

When I unpickle quotes.pickle I get:
    [25, 112, 4, 18, 41, 2, 81, 75, 28, 60, 105, 47, 84, 65, 103, 42,
    13, 66, 55, 124, 6, 82, 76, 12, 61, 113, 119, 96, 3, 68, 11, 89,
    98, 107, 118, 29, 57, 33, 88, 121, 110, 49, 90, 72, 87, 114, 43,
    59, 8, 92]

Very simple indeed.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

[toc] | [prev] | [next] | [standalone]


#90174

FromChris Angelico <rosuav@gmail.com>
Date2015-05-09 01:11 +1000
Message-ID<mailman.253.1431097914.12865.python-list@python.org>
In reply to#90173
On Sat, May 9, 2015 at 12:34 AM, Cecil Westerhof <Cecil@decebal.nl> wrote:
> When I unpickle quotes.pickle I get:
>     [25, 112, 4, 18, 41, 2, 81, 75, 28, 60, 105, 47, 84, 65, 103, 42,
>     13, 66, 55, 124, 6, 82, 76, 12, 61, 113, 119, 96, 3, 68, 11, 89,
>     98, 107, 118, 29, 57, 33, 88, 121, 110, 49, 90, 72, 87, 114, 43,
>     59, 8, 92]
>
> Very simple indeed.

In that case, I'd probably write it out as JSON, or as a simple
whitespace-separated list of numbers. That way, if anything goes
wrong, you can open up the file and look at it easily.

ChrisA

[toc] | [prev] | [next] | [standalone]


#90183

FromCecil Westerhof <Cecil@decebal.nl>
Date2015-05-08 18:43 +0200
Message-ID<87fv77xbnj.fsf@Equus.decebal.nl>
In reply to#90174
Op Friday 8 May 2015 17:11 CEST schreef Chris Angelico:

> On Sat, May 9, 2015 at 12:34 AM, Cecil Westerhof <Cecil@decebal.nl> wrote:
>> When I unpickle quotes.pickle I get:
>> [25, 112, 4, 18, 41, 2, 81, 75, 28, 60, 105, 47, 84, 65, 103, 42,
>> 13, 66, 55, 124, 6, 82, 76, 12, 61, 113, 119, 96, 3, 68, 11, 89,
>> 98, 107, 118, 29, 57, 33, 88, 121, 110, 49, 90, 72, 87, 114, 43,
>> 59, 8, 92]
>>
>> Very simple indeed.
>
> In that case, I'd probably write it out as JSON, or as a simple
> whitespace-separated list of numbers. That way, if anything goes
> wrong, you can open up the file and look at it easily.

Done. And the files are even smaller. ;-)

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

[toc] | [prev] | [next] | [standalone]


#90146

FromCem Karan <cfkaran2@gmail.com>
Date2015-05-08 06:27 -0400
Message-ID<mailman.234.1431084638.12865.python-list@python.org>
In reply to#90134
What are you using pickle for?  If this is just for yourself, go for it.  If you're planning on interchanging with different languages/platforms/etc., JSON or XML might be better.  If you're after something that is smaller and faster, maybe MessagePack or Google Protocol Buffers.  If you're after something that can hold a planet's worth of data, maybe HDF5.  It really depends on your use-case.

MessagePack - http://en.wikipedia.org/wiki/MessagePack
Google Protocol Buffers - http://en.wikipedia.org/wiki/Protocol_Buffers
HDF5 - http://en.wikipedia.org/wiki/Hierarchical_Data_Format

Thanks,
Cem Karan

On May 8, 2015, at 5:58 AM, Cecil Westerhof <Cecil@decebal.nl> wrote:

> I first used marshal in my filebasedMessages module. Then I read that
> you should not use it, because it changes per Python version and it
> was better to use pickle. So I did that and now I find:
>    https://wiki.python.org/moin/Pickle
> 
> Is it really that bad and should I change again?
> 
> -- 
> Cecil Westerhof
> Senior Software Engineer
> LinkedIn: http://www.linkedin.com/in/cecilwesterhof
> -- 
> https://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web