Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #35606 > unrolled thread

pickle module doens't work

Started byOmer Korat <animus.partum.universum@gmail.com>
First post2012-12-27 02:46 -0800
Last post2012-12-27 04:05 -0800
Articles 13 — 6 participants

Back to article view | Back to comp.lang.python


Contents

  pickle module doens't work Omer Korat <animus.partum.universum@gmail.com> - 2012-12-27 02:46 -0800
    Re: pickle module doens't work Peter Otten <__peter__@web.de> - 2012-12-27 12:29 +0100
      Re: pickle module doens't work Omer Korat <animus.partum.universum@gmail.com> - 2012-12-27 04:05 -0800
        Re: pickle module doens't work Dave Angel <d@davea.name> - 2012-12-27 07:34 -0500
          Re: pickle module doens't work Omer Korat <animus.partum.universum@gmail.com> - 2012-12-27 05:16 -0800
            Re: pickle module doens't work Chris Angelico <rosuav@gmail.com> - 2012-12-28 00:20 +1100
          Re: pickle module doens't work Omer Korat <animus.partum.universum@gmail.com> - 2012-12-27 05:16 -0800
            Re: pickle module doens't work Tim Roberts <timr@probo.com> - 2012-12-28 21:41 -0800
              Re: pickle module doens't work Omer Korat <animus.partum.universum@gmail.com> - 2013-01-01 06:33 -0800
                Re: pickle module doens't work Tim Roberts <timr@probo.com> - 2013-01-01 11:14 -0800
                  Re: pickle module doens't work Omer Korat <animus.partum.universum@gmail.com> - 2013-01-02 06:08 -0800
        Re: pickle module doens't work Terry Reedy <tjreedy@udel.edu> - 2012-12-27 16:19 -0500
      Re: pickle module doens't work Omer Korat <animus.partum.universum@gmail.com> - 2012-12-27 04:05 -0800

#35606 — pickle module doens't work

FromOmer Korat <animus.partum.universum@gmail.com>
Date2012-12-27 02:46 -0800
Subjectpickle module doens't work
Message-ID<ee10f0f7-7713-4879-82a1-ec5804767af6@googlegroups.com>
Hi all,

I'm working on a project in Python 2.7. I have a few large objects, and I want to save them for later use, so that it will be possible to load them whole from a file, instead of creating them every time anew. It is critical that they be  transportable between platforms. Problem is, when I use the 2.7 pickle module, all I get is a file containing a string representing the commands used to create the object. But there's nothing I can do with this string, because it only contains information about the object's module, class and parameters. And that way, they aren't transportable.
In python 3.3 this problem is solved, and the pickle.dump generates a series of bytes, which can be loaded in any other module independently of anything. But in my project, I need NLTK 2.0, which is written in python 2.7...

Anybody has suggestions? Maybe there is a way to use pickle so that it yields the results I need? Or is there any other module that does pickle's job? Or perhaps there is a way to mechanically translate between python versions, so I'll be able to use pickle from 3.3 inside an application written in 2.7? Or perhaps somebody knows of a way to embed a piece of 3.3 code inside a 2.7 program?

It can't be I'm the only one who wants to save python objects for later use! There must be a standard method to do this, but I couldn't find any on the web!
If someone can solve this for me I'll be so grateful.

[toc] | [next] | [standalone]


#35610

FromPeter Otten <__peter__@web.de>
Date2012-12-27 12:29 +0100
Message-ID<mailman.1339.1356607732.29569.python-list@python.org>
In reply to#35606
Omer Korat wrote:

> I'm working on a project in Python 2.7. I have a few large objects, and I
> want to save them for later use, so that it will be possible to load them
> whole from a file, instead of creating them every time anew. It is
> critical that they be  transportable between platforms. Problem is, when I
> use the 2.7 pickle module, all I get is a file containing a string
> representing the commands used to create the object. But there's nothing I
> can do with this string, because it only contains information about the
> object's module, class and parameters. And that way, they aren't
> transportable. In python 3.3 this problem is solved, and the pickle.dump
> generates a series of bytes, which can be loaded in any other module
> independently of anything. But in my project, I need NLTK 2.0, which is
> written in python 2.7...
> 
> Anybody has suggestions? Maybe there is a way to use pickle so that it
> yields the results I need? Or is there any other module that does pickle's
> job? Or perhaps there is a way to mechanically translate between python
> versions, so I'll be able to use pickle from 3.3 inside an application
> written in 2.7? Or perhaps somebody knows of a way to embed a piece of 3.3
> code inside a 2.7 program?
> 
> It can't be I'm the only one who wants to save python objects for later
> use! There must be a standard method to do this, but I couldn't find any
> on the web! If someone can solve this for me I'll be so grateful.

Pickling works the same way in Python 2 and Python 3. For classes only the 
names are dumped, so you need (the same version of) NLTK on the source and 
the destination platform.

If you can provide a short demo of what works in Python 3 but fails in 
Python 2 we may be able to find the actual problem or misunderstanding.
Maybe it is just that different protocols are used by default? I so, try

with open(filename, "wb") as f:
    pickle.dump(f, your_data, protocol=pickle.HIGHEST_PROTOCOL)

[toc] | [prev] | [next] | [standalone]


#35611

FromOmer Korat <animus.partum.universum@gmail.com>
Date2012-12-27 04:05 -0800
Message-ID<f6ea95c2-2448-4f93-8aa4-e4e2aeb731ba@googlegroups.com>
In reply to#35610
You're probably right in general, for me the 3.3 and 2.7 pickles definitely don't work the same:

3.3:
>>> type(pickle.dumps(1))
<type 'bytes'>

2.7:
>>> type(pickle.dumps(1, pickle.HIGHEST_PROTOCOL))
<type 'str'>


As you can see, in 2.7 when I try to dump something, I get useless string. Look what I gen when I dump an NLTK object such as the sent_tokenize function:

'\x80\x02cnltk.tokenize\nsent_tokenize\ng\x00'

Now, this is useless. If I try to load it on a platform without NLTK installed on it, I get:

ImportError: No module named 'nltk'

So it means the actual sent_tokenizer wasn't saved. Just it's module.

[toc] | [prev] | [next] | [standalone]


#35613

FromDave Angel <d@davea.name>
Date2012-12-27 07:34 -0500
Message-ID<mailman.1341.1356611938.29569.python-list@python.org>
In reply to#35611
On 12/27/2012 07:05 AM, Omer Korat wrote:
> You're probably right in general, for me the 3.3 and 2.7 pickles definitely don't work the same:
>
> 3.3:
>>>> type(pickle.dumps(1))
> <type 'bytes'>
>
> 2.7:
>>>> type(pickle.dumps(1, pickle.HIGHEST_PROTOCOL))
> <type 'str'>

That is the same. In 2.7, str is made up of bytes, while in 3.3, str
would be unicode. So 'bytes' is the 3.3 equivalent of str.

>
> As you can see, in 2.7 when I try to dump something, I get useless string. Look what I gen when I dump an NLTK object such as the sent_tokenize function:
>
> '\x80\x02cnltk.tokenize\nsent_tokenize\ng\x00'
>
> Now, this is useless. If I try to load it on a platform without NLTK installed on it, I get:
>
> ImportError: No module named 'nltk'
>
> So it means the actual sent_tokenizer wasn't saved. Just it's module.

As Peter Otten has already pointed out, that's how pickle works. It does
not somehow encode the whole module into the pickle, only enough
information to recreate the particular objects you're saving, *using*
the same modules. I don't know of any method of avoiding the destination
machine needing nltk, regardless of Python version.

Perhaps you'd rather see it in the Python docs.

http://docs.python.org/2/library/pickle.html
http://docs.python.org/3.3/library/pickle.html

pickle <http://docs.python.org/2/library/pickle.html#module-pickle>can
save and restore class instances transparently, however the class
definition must be importable and live in the same module as when the
object was stored.
and
Similarly, when class instances are pickled, their class’s code and data
are not pickled along with them. Only the instance data are pickled.
This is done on purpose, so you can fix bugs in a class or add methods
to the class and still load objects that were created with an earlier
version of the class.

-- 

DaveA

[toc] | [prev] | [next] | [standalone]


#35616

FromOmer Korat <animus.partum.universum@gmail.com>
Date2012-12-27 05:16 -0800
Message-ID<5258c759-3632-4e96-b5b4-21def3f3c6c3@googlegroups.com>
In reply to#35613
I see. In that case, all I have to do is make sure NLTK is available when I load the pickled objects. That pretty much solves my problem. Thanks!
So it means pickle doesn't ever save the object's values, only how it was created? 

Say I have a large object that requires a lot of time to train on data. It means pickle doesn't save its values, so you have to train it every time anew? Is there no way to save its trained values?

[toc] | [prev] | [next] | [standalone]


#35618

FromChris Angelico <rosuav@gmail.com>
Date2012-12-28 00:20 +1100
Message-ID<mailman.1344.1356614452.29569.python-list@python.org>
In reply to#35616
On Fri, Dec 28, 2012 at 12:16 AM, Omer Korat
<animus.partum.universum@gmail.com> wrote:
> I see. In that case, all I have to do is make sure NLTK is available when I load the pickled objects. That pretty much solves my problem. Thanks!
> So it means pickle doesn't ever save the object's values, only how it was created?
>
> Say I have a large object that requires a lot of time to train on data. It means pickle doesn't save its values, so you have to train it every time anew? Is there no way to save its trained values?

It'll save instance data but not class data or code. So it'll save all
that content, and it assumes that class data is either static or will
be recreated appropriately during unpickling.

ChrisA

[toc] | [prev] | [next] | [standalone]


#35617

FromOmer Korat <animus.partum.universum@gmail.com>
Date2012-12-27 05:16 -0800
Message-ID<mailman.1343.1356614185.29569.python-list@python.org>
In reply to#35613
I see. In that case, all I have to do is make sure NLTK is available when I load the pickled objects. That pretty much solves my problem. Thanks!
So it means pickle doesn't ever save the object's values, only how it was created? 

Say I have a large object that requires a lot of time to train on data. It means pickle doesn't save its values, so you have to train it every time anew? Is there no way to save its trained values?

[toc] | [prev] | [next] | [standalone]


#35741

FromTim Roberts <timr@probo.com>
Date2012-12-28 21:41 -0800
Message-ID<4e0td81rm0gru7g5a3rmm6grr5aa2r101m@4ax.com>
In reply to#35617
Omer Korat <animus.partum.universum@gmail.com> wrote:
>
>So it means pickle doesn't ever save the object's values, only how it was created? 

You say that as though there were a difference between the two.  There
isn't.  An object is just a dictionary of values.  If you set an object
member to a string, then that object's dictionary for that member name
contains a string.  It doesn't contain some alternative packed binary
representation of a string.

>Say I have a large object that requires a lot of time to train on data. It
>means pickle doesn't save its values, so you have to train it every time
>anew? Is there no way to save its trained values?

When you say "train on data", what do you mean?  If your training creates
computed data in other members, those members and their values should also
be saved in the pickle.
-- 
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

[toc] | [prev] | [next] | [standalone]


#35890

FromOmer Korat <animus.partum.universum@gmail.com>
Date2013-01-01 06:33 -0800
Message-ID<959b2a1c-ef26-4b6b-b855-fdf29cafc681@googlegroups.com>
In reply to#35741
I am using the nltk.classify.MaxEntClassifier. This object has a set of labels, and a set of probabilities: P(label | features). It modifies this probability given data. SO for example, if you tell this object that the label L appears 60% of the time with the feature F, then P(L | F) = 0.6. 
The point is, there is no way to access the probabilities directly. The object's 'classify' method uses these probabilities, but you can't call them as an object property. 
In order to adjust probabilities, you have to call the object's 'train' method, and feed classified data in.
So is there any way to save a MaxEntClassifier object, with its classification probabilities, without having to call the 'train' method?

[toc] | [prev] | [next] | [standalone]


#35911

FromTim Roberts <timr@probo.com>
Date2013-01-01 11:14 -0800
Message-ID<jtc6e81mtotimn5v99f1q8ln82n6crc48s@4ax.com>
In reply to#35890
Omer Korat <animus.partum.universum@gmail.com> wrote:
>
>I am using the nltk.classify.MaxEntClassifier. This object has a set of 
>labels, and a set of probabilities: P(label | features). It modifies 
>this probability given data. SO for example, if you tell this object 
>that the label L appears 60% of the time with the feature F, then 
>P(L | F) = 0.6. 
>
>The point is, there is no way to access the probabilities directly. 
>The object's 'classify' method uses these probabilities, but you can't
>call them as an object property. 

Well, you have the source code, so you can certainly go look at the
implementation and see what the data is based on.

>In order to adjust probabilities, you have to call the object's 'train' 
>method, and feed classified data in.

The "train" method is not actually an object method, it's a class method.
It doesn't use any existing probabilities -- it returns a NEW
MaxEntClassifier based entirely on the training set.

>So is there any way to save a MaxEntClassifier object, with its 
>classification probabilities, without having to call the 'train' method?

If you haven't called the "train" method, there IS no MaxEntClassifier
object.  Once you have called "train", you should be able to pickle the new
MaxEntClassifier and fetch it back with its state intact.
-- 
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

[toc] | [prev] | [next] | [standalone]


#35984

FromOmer Korat <animus.partum.universum@gmail.com>
Date2013-01-02 06:08 -0800
Message-ID<3cc49c16-d901-4c27-9ba8-afbd2630b700@googlegroups.com>
In reply to#35911
Yeah, right. I didn't think about that. I'll check in the source how the data is stored.
Thanks for helping sort it all out.

[toc] | [prev] | [next] | [standalone]


#35655

FromTerry Reedy <tjreedy@udel.edu>
Date2012-12-27 16:19 -0500
Message-ID<mailman.1370.1356643212.29569.python-list@python.org>
In reply to#35611
On 12/27/2012 7:34 AM, Dave Angel wrote:

> Perhaps you'd rather see it in the Python docs.
>
> http://docs.python.org/2/library/pickle.html
> http://docs.python.org/3.3/library/pickle.html
>
> pickle <http://docs.python.org/2/library/pickle.html#module-pickle>can
> save and restore class instances transparently, however the class
> definition must be importable and live in the same module as when the
> object was stored.
> and
> Similarly, when class instances are pickled, their class’s code and data
> are not pickled along with them. Only the instance data are pickled.
> This is done on purpose, so you can fix bugs in a class or add methods
> to the class and still load objects that were created with an earlier
> version of the class.

I should point out the the above was probably written before the 
(partial) unification of types and classes in 2.2 (completed in 3.3). So 
'class' is referring to 'Python-coded class' and 'code' is referring to 
'(compiled) Python code', and not machine code. Now, everything that 
pickle pickles is a 'class instance' and class code can be compiled from 
either Python or the interpreter's system language (C, Java, C#, others, 
or even Python itself).

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#35612

FromOmer Korat <animus.partum.universum@gmail.com>
Date2012-12-27 04:05 -0800
Message-ID<mailman.1340.1356609912.29569.python-list@python.org>
In reply to#35610
You're probably right in general, for me the 3.3 and 2.7 pickles definitely don't work the same:

3.3:
>>> type(pickle.dumps(1))
<type 'bytes'>

2.7:
>>> type(pickle.dumps(1, pickle.HIGHEST_PROTOCOL))
<type 'str'>


As you can see, in 2.7 when I try to dump something, I get useless string. Look what I gen when I dump an NLTK object such as the sent_tokenize function:

'\x80\x02cnltk.tokenize\nsent_tokenize\ng\x00'

Now, this is useless. If I try to load it on a platform without NLTK installed on it, I get:

ImportError: No module named 'nltk'

So it means the actual sent_tokenizer wasn't saved. Just it's module.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web