Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #90134 > unrolled thread
| Started by | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| First post | 2015-05-08 11:58 +0200 |
| Last post | 2015-05-08 06:27 -0400 |
| Articles | 11 — 6 participants |
Back to article view | Back to comp.lang.python
To pickle or not to pickle Cecil Westerhof <Cecil@decebal.nl> - 2015-05-08 11:58 +0200
Re: To pickle or not to pickle Peter Otten <__peter__@web.de> - 2015-05-08 12:32 +0200
Re: To pickle or not to pickle Cecil Westerhof <Cecil@decebal.nl> - 2015-05-08 13:51 +0200
Re: To pickle or not to pickle Irmen de Jong <irmen.NOSPAM@xs4all.nl> - 2015-05-08 19:11 +0200
Re: To pickle or not to pickle Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-08 20:54 +1000
Re: To pickle or not to pickle Cecil Westerhof <Cecil@decebal.nl> - 2015-05-08 13:55 +0200
Re: To pickle or not to pickle Chris Angelico <rosuav@gmail.com> - 2015-05-08 22:53 +1000
Re: To pickle or not to pickle Cecil Westerhof <Cecil@decebal.nl> - 2015-05-08 16:34 +0200
Re: To pickle or not to pickle Chris Angelico <rosuav@gmail.com> - 2015-05-09 01:11 +1000
Re: To pickle or not to pickle Cecil Westerhof <Cecil@decebal.nl> - 2015-05-08 18:43 +0200
Re: To pickle or not to pickle Cem Karan <cfkaran2@gmail.com> - 2015-05-08 06:27 -0400
| From | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| Date | 2015-05-08 11:58 +0200 |
| Subject | To pickle or not to pickle |
| Message-ID | <87h9rnz8yy.fsf@Equus.decebal.nl> |
I first used marshal in my filebasedMessages module. Then I read that
you should not use it, because it changes per Python version and it
was better to use pickle. So I did that and now I find:
https://wiki.python.org/moin/Pickle
Is it really that bad and should I change again?
--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof
[toc] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2015-05-08 12:32 +0200 |
| Message-ID | <mailman.229.1431081179.12865.python-list@python.org> |
| In reply to | #90134 |
Cecil Westerhof wrote: > I first used marshal in my filebasedMessages module. Then I read that > you should not use it, because it changes per Python version and it > was better to use pickle. So I did that and now I find: > https://wiki.python.org/moin/Pickle > > Is it really that bad and should I change again? Let's say it the other way around: pickle is fine for short term storage when the generation of the file is under your control and you only need to access it from Python. Does that description fit your requirements?
[toc] | [prev] | [next] | [standalone]
| From | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| Date | 2015-05-08 13:51 +0200 |
| Message-ID | <87d22bz3q9.fsf@Equus.decebal.nl> |
| In reply to | #90135 |
Op Friday 8 May 2015 12:32 CEST schreef Peter Otten: > Cecil Westerhof wrote: > >> I first used marshal in my filebasedMessages module. Then I read >> that you should not use it, because it changes per Python version >> and it was better to use pickle. So I did that and now I find: >> https://wiki.python.org/moin/Pickle >> >> Is it really that bad and should I change again? > > Let's say it the other way around: pickle is fine for short term > storage when the generation of the file is under your control and > you only need to access it from Python. > > Does that description fit your requirements? Certainly. I use it to store which messages are ‘recently’ used, so I will not use them for the next. I will keep it like this for the moment being then. -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof
[toc] | [prev] | [next] | [standalone]
| From | Irmen de Jong <irmen.NOSPAM@xs4all.nl> |
|---|---|
| Date | 2015-05-08 19:11 +0200 |
| Message-ID | <554cee28$0$2965$e4fe514c@news.xs4all.nl> |
| In reply to | #90135 |
On 8-5-2015 12:32, Peter Otten wrote: > Cecil Westerhof wrote: > >> I first used marshal in my filebasedMessages module. Then I read that >> you should not use it, because it changes per Python version and it >> was better to use pickle. So I did that and now I find: >> https://wiki.python.org/moin/Pickle >> >> Is it really that bad and should I change again? > > Let's say it the other way around: pickle is fine for short term storage > when the generation of the file is under your control and you only need to > access it from Python. The latter is not really a restriction, if you want to use it from Java or .NET. https://github.com/irmen/Pyrolite provides an (un)pickler for these platforms. -irmen
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-05-08 20:54 +1000 |
| Message-ID | <554c95df$0$12988$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #90134 |
On Fri, 8 May 2015 07:58 pm, Cecil Westerhof wrote: > I first used marshal in my filebasedMessages module. Then I read that > you should not use it, because it changes per Python version and it > was better to use pickle. So I did that and now I find: > https://wiki.python.org/moin/Pickle > > Is it really that bad and should I change again? marshal is really only for Python's internal use. I think that if Python was created today, marshal would probably be an undocumented and internal-only module. pickle is quite safe provided you trust the environment you are running in and the source of the pickle files. If you don't trust them, then you should avoid pickle and use a format which doesn't execute code. You could use JSON, plists, ini-files, or XML, all of which are text-based and handled by the standard library. There is also YAML, but you have to use a third-party library for that. You might also look at the "serpent" serialisation format used by Pyro: https://pypi.python.org/pypi/serpent If your code is only going to be used by yourself, I'd just use pickle. If you are creating an application for others to use, I would spend the extra effort to build in support for at least pickle, JSON and plists, and let the user decide what they prefer. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| Date | 2015-05-08 13:55 +0200 |
| Message-ID | <878uczz3ks.fsf@Equus.decebal.nl> |
| In reply to | #90141 |
Op Friday 8 May 2015 12:54 CEST schreef Steven D'Aprano: > If your code is only going to be used by yourself, I'd just use > pickle. If you are creating an application for others to use, I > would spend the extra effort to build in support for at least > pickle, JSON and plists, and let the user decide what they prefer. Well, I put it on GitHub, so I hope it is going to be used by others also. ;-) There are other things that are more urgent at the moment, but in the future I will implement JSON and plists then. -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-05-08 22:53 +1000 |
| Message-ID | <mailman.246.1431089615.12865.python-list@python.org> |
| In reply to | #90154 |
On Fri, May 8, 2015 at 9:55 PM, Cecil Westerhof <Cecil@decebal.nl> wrote: > Op Friday 8 May 2015 12:54 CEST schreef Steven D'Aprano: > >> If your code is only going to be used by yourself, I'd just use >> pickle. If you are creating an application for others to use, I >> would spend the extra effort to build in support for at least >> pickle, JSON and plists, and let the user decide what they prefer. > > Well, I put it on GitHub, so I hope it is going to be used by others > also. ;-) There are other things that are more urgent at the moment, > but in the future I will implement JSON and plists then. But will the pickle files be shared? If not, they're still nice and private, and fairly safe. The problem comes when, for instance, you have a client Python program that pickles data and sends it over a network to a server Python program to be unpickled, because then someone could craft a malicious pickle and send it to you to eat. If they're only ever saved locally and re-read, there shouldn't be any security risk (anyone who could reach in and edit the pickle file could probably reach in and change the code anyway). That said, if your needs are sufficiently simple, it may be worth using something plain text just for the debuggability. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| Date | 2015-05-08 16:34 +0200 |
| Message-ID | <87zj5fxhm5.fsf@Equus.decebal.nl> |
| In reply to | #90163 |
Op Friday 8 May 2015 14:53 CEST schreef Chris Angelico:
> On Fri, May 8, 2015 at 9:55 PM, Cecil Westerhof <Cecil@decebal.nl> wrote:
>> Op Friday 8 May 2015 12:54 CEST schreef Steven D'Aprano:
>>
>>> If your code is only going to be used by yourself, I'd just use
>>> pickle. If you are creating an application for others to use, I
>>> would spend the extra effort to build in support for at least
>>> pickle, JSON and plists, and let the user decide what they prefer.
>>
>> Well, I put it on GitHub, so I hope it is going to be used by
>> others also. ;-) There are other things that are more urgent at the
>> moment, but in the future I will implement JSON and plists then.
>
> But will the pickle files be shared? If not, they're still nice and
> private, and fairly safe. The problem comes when, for instance, you
> have a client Python program that pickles data and sends it over a
> network to a server Python program to be unpickled, because then
> someone could craft a malicious pickle and send it to you to eat. If
> they're only ever saved locally and re-read, there shouldn't be any
> security risk (anyone who could reach in and edit the pickle file
> could probably reach in and change the code anyway).
I would expect not. But I never know what someone else is going to do.
;-)
But in my case there is a Twitter directory with:
quotes.txt
quotes.pickle
tips.txt
tips.pickle
All four files are normally only accessed by the Python program. When
I want to extend the messages I use a text editor to append them.
The .txt files contain messages that can be used. And the .pickle
files contain the ‘recently’ used messages.
When I unpickle quotes.pickle I get:
[25, 112, 4, 18, 41, 2, 81, 75, 28, 60, 105, 47, 84, 65, 103, 42,
13, 66, 55, 124, 6, 82, 76, 12, 61, 113, 119, 96, 3, 68, 11, 89,
98, 107, 118, 29, 57, 33, 88, 121, 110, 49, 90, 72, 87, 114, 43,
59, 8, 92]
Very simple indeed.
--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-05-09 01:11 +1000 |
| Message-ID | <mailman.253.1431097914.12865.python-list@python.org> |
| In reply to | #90173 |
On Sat, May 9, 2015 at 12:34 AM, Cecil Westerhof <Cecil@decebal.nl> wrote: > When I unpickle quotes.pickle I get: > [25, 112, 4, 18, 41, 2, 81, 75, 28, 60, 105, 47, 84, 65, 103, 42, > 13, 66, 55, 124, 6, 82, 76, 12, 61, 113, 119, 96, 3, 68, 11, 89, > 98, 107, 118, 29, 57, 33, 88, 121, 110, 49, 90, 72, 87, 114, 43, > 59, 8, 92] > > Very simple indeed. In that case, I'd probably write it out as JSON, or as a simple whitespace-separated list of numbers. That way, if anything goes wrong, you can open up the file and look at it easily. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| Date | 2015-05-08 18:43 +0200 |
| Message-ID | <87fv77xbnj.fsf@Equus.decebal.nl> |
| In reply to | #90174 |
Op Friday 8 May 2015 17:11 CEST schreef Chris Angelico: > On Sat, May 9, 2015 at 12:34 AM, Cecil Westerhof <Cecil@decebal.nl> wrote: >> When I unpickle quotes.pickle I get: >> [25, 112, 4, 18, 41, 2, 81, 75, 28, 60, 105, 47, 84, 65, 103, 42, >> 13, 66, 55, 124, 6, 82, 76, 12, 61, 113, 119, 96, 3, 68, 11, 89, >> 98, 107, 118, 29, 57, 33, 88, 121, 110, 49, 90, 72, 87, 114, 43, >> 59, 8, 92] >> >> Very simple indeed. > > In that case, I'd probably write it out as JSON, or as a simple > whitespace-separated list of numbers. That way, if anything goes > wrong, you can open up the file and look at it easily. Done. And the files are even smaller. ;-) -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof
[toc] | [prev] | [next] | [standalone]
| From | Cem Karan <cfkaran2@gmail.com> |
|---|---|
| Date | 2015-05-08 06:27 -0400 |
| Message-ID | <mailman.234.1431084638.12865.python-list@python.org> |
| In reply to | #90134 |
What are you using pickle for? If this is just for yourself, go for it. If you're planning on interchanging with different languages/platforms/etc., JSON or XML might be better. If you're after something that is smaller and faster, maybe MessagePack or Google Protocol Buffers. If you're after something that can hold a planet's worth of data, maybe HDF5. It really depends on your use-case. MessagePack - http://en.wikipedia.org/wiki/MessagePack Google Protocol Buffers - http://en.wikipedia.org/wiki/Protocol_Buffers HDF5 - http://en.wikipedia.org/wiki/Hierarchical_Data_Format Thanks, Cem Karan On May 8, 2015, at 5:58 AM, Cecil Westerhof <Cecil@decebal.nl> wrote: > I first used marshal in my filebasedMessages module. Then I read that > you should not use it, because it changes per Python version and it > was better to use pickle. So I did that and now I find: > https://wiki.python.org/moin/Pickle > > Is it really that bad and should I change again? > > -- > Cecil Westerhof > Senior Software Engineer > LinkedIn: http://www.linkedin.com/in/cecilwesterhof > -- > https://mail.python.org/mailman/listinfo/python-list
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web