Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #35606 > unrolled thread
| Started by | Omer Korat <animus.partum.universum@gmail.com> |
|---|---|
| First post | 2012-12-27 02:46 -0800 |
| Last post | 2012-12-27 04:05 -0800 |
| Articles | 13 — 6 participants |
Back to article view | Back to comp.lang.python
pickle module doens't work Omer Korat <animus.partum.universum@gmail.com> - 2012-12-27 02:46 -0800
Re: pickle module doens't work Peter Otten <__peter__@web.de> - 2012-12-27 12:29 +0100
Re: pickle module doens't work Omer Korat <animus.partum.universum@gmail.com> - 2012-12-27 04:05 -0800
Re: pickle module doens't work Dave Angel <d@davea.name> - 2012-12-27 07:34 -0500
Re: pickle module doens't work Omer Korat <animus.partum.universum@gmail.com> - 2012-12-27 05:16 -0800
Re: pickle module doens't work Chris Angelico <rosuav@gmail.com> - 2012-12-28 00:20 +1100
Re: pickle module doens't work Omer Korat <animus.partum.universum@gmail.com> - 2012-12-27 05:16 -0800
Re: pickle module doens't work Tim Roberts <timr@probo.com> - 2012-12-28 21:41 -0800
Re: pickle module doens't work Omer Korat <animus.partum.universum@gmail.com> - 2013-01-01 06:33 -0800
Re: pickle module doens't work Tim Roberts <timr@probo.com> - 2013-01-01 11:14 -0800
Re: pickle module doens't work Omer Korat <animus.partum.universum@gmail.com> - 2013-01-02 06:08 -0800
Re: pickle module doens't work Terry Reedy <tjreedy@udel.edu> - 2012-12-27 16:19 -0500
Re: pickle module doens't work Omer Korat <animus.partum.universum@gmail.com> - 2012-12-27 04:05 -0800
| From | Omer Korat <animus.partum.universum@gmail.com> |
|---|---|
| Date | 2012-12-27 02:46 -0800 |
| Subject | pickle module doens't work |
| Message-ID | <ee10f0f7-7713-4879-82a1-ec5804767af6@googlegroups.com> |
Hi all, I'm working on a project in Python 2.7. I have a few large objects, and I want to save them for later use, so that it will be possible to load them whole from a file, instead of creating them every time anew. It is critical that they be transportable between platforms. Problem is, when I use the 2.7 pickle module, all I get is a file containing a string representing the commands used to create the object. But there's nothing I can do with this string, because it only contains information about the object's module, class and parameters. And that way, they aren't transportable. In python 3.3 this problem is solved, and the pickle.dump generates a series of bytes, which can be loaded in any other module independently of anything. But in my project, I need NLTK 2.0, which is written in python 2.7... Anybody has suggestions? Maybe there is a way to use pickle so that it yields the results I need? Or is there any other module that does pickle's job? Or perhaps there is a way to mechanically translate between python versions, so I'll be able to use pickle from 3.3 inside an application written in 2.7? Or perhaps somebody knows of a way to embed a piece of 3.3 code inside a 2.7 program? It can't be I'm the only one who wants to save python objects for later use! There must be a standard method to do this, but I couldn't find any on the web! If someone can solve this for me I'll be so grateful.
[toc] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2012-12-27 12:29 +0100 |
| Message-ID | <mailman.1339.1356607732.29569.python-list@python.org> |
| In reply to | #35606 |
Omer Korat wrote:
> I'm working on a project in Python 2.7. I have a few large objects, and I
> want to save them for later use, so that it will be possible to load them
> whole from a file, instead of creating them every time anew. It is
> critical that they be transportable between platforms. Problem is, when I
> use the 2.7 pickle module, all I get is a file containing a string
> representing the commands used to create the object. But there's nothing I
> can do with this string, because it only contains information about the
> object's module, class and parameters. And that way, they aren't
> transportable. In python 3.3 this problem is solved, and the pickle.dump
> generates a series of bytes, which can be loaded in any other module
> independently of anything. But in my project, I need NLTK 2.0, which is
> written in python 2.7...
>
> Anybody has suggestions? Maybe there is a way to use pickle so that it
> yields the results I need? Or is there any other module that does pickle's
> job? Or perhaps there is a way to mechanically translate between python
> versions, so I'll be able to use pickle from 3.3 inside an application
> written in 2.7? Or perhaps somebody knows of a way to embed a piece of 3.3
> code inside a 2.7 program?
>
> It can't be I'm the only one who wants to save python objects for later
> use! There must be a standard method to do this, but I couldn't find any
> on the web! If someone can solve this for me I'll be so grateful.
Pickling works the same way in Python 2 and Python 3. For classes only the
names are dumped, so you need (the same version of) NLTK on the source and
the destination platform.
If you can provide a short demo of what works in Python 3 but fails in
Python 2 we may be able to find the actual problem or misunderstanding.
Maybe it is just that different protocols are used by default? I so, try
with open(filename, "wb") as f:
pickle.dump(f, your_data, protocol=pickle.HIGHEST_PROTOCOL)
[toc] | [prev] | [next] | [standalone]
| From | Omer Korat <animus.partum.universum@gmail.com> |
|---|---|
| Date | 2012-12-27 04:05 -0800 |
| Message-ID | <f6ea95c2-2448-4f93-8aa4-e4e2aeb731ba@googlegroups.com> |
| In reply to | #35610 |
You're probably right in general, for me the 3.3 and 2.7 pickles definitely don't work the same: 3.3: >>> type(pickle.dumps(1)) <type 'bytes'> 2.7: >>> type(pickle.dumps(1, pickle.HIGHEST_PROTOCOL)) <type 'str'> As you can see, in 2.7 when I try to dump something, I get useless string. Look what I gen when I dump an NLTK object such as the sent_tokenize function: '\x80\x02cnltk.tokenize\nsent_tokenize\ng\x00' Now, this is useless. If I try to load it on a platform without NLTK installed on it, I get: ImportError: No module named 'nltk' So it means the actual sent_tokenizer wasn't saved. Just it's module.
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2012-12-27 07:34 -0500 |
| Message-ID | <mailman.1341.1356611938.29569.python-list@python.org> |
| In reply to | #35611 |
On 12/27/2012 07:05 AM, Omer Korat wrote: > You're probably right in general, for me the 3.3 and 2.7 pickles definitely don't work the same: > > 3.3: >>>> type(pickle.dumps(1)) > <type 'bytes'> > > 2.7: >>>> type(pickle.dumps(1, pickle.HIGHEST_PROTOCOL)) > <type 'str'> That is the same. In 2.7, str is made up of bytes, while in 3.3, str would be unicode. So 'bytes' is the 3.3 equivalent of str. > > As you can see, in 2.7 when I try to dump something, I get useless string. Look what I gen when I dump an NLTK object such as the sent_tokenize function: > > '\x80\x02cnltk.tokenize\nsent_tokenize\ng\x00' > > Now, this is useless. If I try to load it on a platform without NLTK installed on it, I get: > > ImportError: No module named 'nltk' > > So it means the actual sent_tokenizer wasn't saved. Just it's module. As Peter Otten has already pointed out, that's how pickle works. It does not somehow encode the whole module into the pickle, only enough information to recreate the particular objects you're saving, *using* the same modules. I don't know of any method of avoiding the destination machine needing nltk, regardless of Python version. Perhaps you'd rather see it in the Python docs. http://docs.python.org/2/library/pickle.html http://docs.python.org/3.3/library/pickle.html pickle <http://docs.python.org/2/library/pickle.html#module-pickle>can save and restore class instances transparently, however the class definition must be importable and live in the same module as when the object was stored. and Similarly, when class instances are pickled, their class’s code and data are not pickled along with them. Only the instance data are pickled. This is done on purpose, so you can fix bugs in a class or add methods to the class and still load objects that were created with an earlier version of the class. -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Omer Korat <animus.partum.universum@gmail.com> |
|---|---|
| Date | 2012-12-27 05:16 -0800 |
| Message-ID | <5258c759-3632-4e96-b5b4-21def3f3c6c3@googlegroups.com> |
| In reply to | #35613 |
I see. In that case, all I have to do is make sure NLTK is available when I load the pickled objects. That pretty much solves my problem. Thanks! So it means pickle doesn't ever save the object's values, only how it was created? Say I have a large object that requires a lot of time to train on data. It means pickle doesn't save its values, so you have to train it every time anew? Is there no way to save its trained values?
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-12-28 00:20 +1100 |
| Message-ID | <mailman.1344.1356614452.29569.python-list@python.org> |
| In reply to | #35616 |
On Fri, Dec 28, 2012 at 12:16 AM, Omer Korat <animus.partum.universum@gmail.com> wrote: > I see. In that case, all I have to do is make sure NLTK is available when I load the pickled objects. That pretty much solves my problem. Thanks! > So it means pickle doesn't ever save the object's values, only how it was created? > > Say I have a large object that requires a lot of time to train on data. It means pickle doesn't save its values, so you have to train it every time anew? Is there no way to save its trained values? It'll save instance data but not class data or code. So it'll save all that content, and it assumes that class data is either static or will be recreated appropriately during unpickling. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Omer Korat <animus.partum.universum@gmail.com> |
|---|---|
| Date | 2012-12-27 05:16 -0800 |
| Message-ID | <mailman.1343.1356614185.29569.python-list@python.org> |
| In reply to | #35613 |
I see. In that case, all I have to do is make sure NLTK is available when I load the pickled objects. That pretty much solves my problem. Thanks! So it means pickle doesn't ever save the object's values, only how it was created? Say I have a large object that requires a lot of time to train on data. It means pickle doesn't save its values, so you have to train it every time anew? Is there no way to save its trained values?
[toc] | [prev] | [next] | [standalone]
| From | Tim Roberts <timr@probo.com> |
|---|---|
| Date | 2012-12-28 21:41 -0800 |
| Message-ID | <4e0td81rm0gru7g5a3rmm6grr5aa2r101m@4ax.com> |
| In reply to | #35617 |
Omer Korat <animus.partum.universum@gmail.com> wrote: > >So it means pickle doesn't ever save the object's values, only how it was created? You say that as though there were a difference between the two. There isn't. An object is just a dictionary of values. If you set an object member to a string, then that object's dictionary for that member name contains a string. It doesn't contain some alternative packed binary representation of a string. >Say I have a large object that requires a lot of time to train on data. It >means pickle doesn't save its values, so you have to train it every time >anew? Is there no way to save its trained values? When you say "train on data", what do you mean? If your training creates computed data in other members, those members and their values should also be saved in the pickle. -- Tim Roberts, timr@probo.com Providenza & Boekelheide, Inc.
[toc] | [prev] | [next] | [standalone]
| From | Omer Korat <animus.partum.universum@gmail.com> |
|---|---|
| Date | 2013-01-01 06:33 -0800 |
| Message-ID | <959b2a1c-ef26-4b6b-b855-fdf29cafc681@googlegroups.com> |
| In reply to | #35741 |
I am using the nltk.classify.MaxEntClassifier. This object has a set of labels, and a set of probabilities: P(label | features). It modifies this probability given data. SO for example, if you tell this object that the label L appears 60% of the time with the feature F, then P(L | F) = 0.6. The point is, there is no way to access the probabilities directly. The object's 'classify' method uses these probabilities, but you can't call them as an object property. In order to adjust probabilities, you have to call the object's 'train' method, and feed classified data in. So is there any way to save a MaxEntClassifier object, with its classification probabilities, without having to call the 'train' method?
[toc] | [prev] | [next] | [standalone]
| From | Tim Roberts <timr@probo.com> |
|---|---|
| Date | 2013-01-01 11:14 -0800 |
| Message-ID | <jtc6e81mtotimn5v99f1q8ln82n6crc48s@4ax.com> |
| In reply to | #35890 |
Omer Korat <animus.partum.universum@gmail.com> wrote: > >I am using the nltk.classify.MaxEntClassifier. This object has a set of >labels, and a set of probabilities: P(label | features). It modifies >this probability given data. SO for example, if you tell this object >that the label L appears 60% of the time with the feature F, then >P(L | F) = 0.6. > >The point is, there is no way to access the probabilities directly. >The object's 'classify' method uses these probabilities, but you can't >call them as an object property. Well, you have the source code, so you can certainly go look at the implementation and see what the data is based on. >In order to adjust probabilities, you have to call the object's 'train' >method, and feed classified data in. The "train" method is not actually an object method, it's a class method. It doesn't use any existing probabilities -- it returns a NEW MaxEntClassifier based entirely on the training set. >So is there any way to save a MaxEntClassifier object, with its >classification probabilities, without having to call the 'train' method? If you haven't called the "train" method, there IS no MaxEntClassifier object. Once you have called "train", you should be able to pickle the new MaxEntClassifier and fetch it back with its state intact. -- Tim Roberts, timr@probo.com Providenza & Boekelheide, Inc.
[toc] | [prev] | [next] | [standalone]
| From | Omer Korat <animus.partum.universum@gmail.com> |
|---|---|
| Date | 2013-01-02 06:08 -0800 |
| Message-ID | <3cc49c16-d901-4c27-9ba8-afbd2630b700@googlegroups.com> |
| In reply to | #35911 |
Yeah, right. I didn't think about that. I'll check in the source how the data is stored. Thanks for helping sort it all out.
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2012-12-27 16:19 -0500 |
| Message-ID | <mailman.1370.1356643212.29569.python-list@python.org> |
| In reply to | #35611 |
On 12/27/2012 7:34 AM, Dave Angel wrote: > Perhaps you'd rather see it in the Python docs. > > http://docs.python.org/2/library/pickle.html > http://docs.python.org/3.3/library/pickle.html > > pickle <http://docs.python.org/2/library/pickle.html#module-pickle>can > save and restore class instances transparently, however the class > definition must be importable and live in the same module as when the > object was stored. > and > Similarly, when class instances are pickled, their class’s code and data > are not pickled along with them. Only the instance data are pickled. > This is done on purpose, so you can fix bugs in a class or add methods > to the class and still load objects that were created with an earlier > version of the class. I should point out the the above was probably written before the (partial) unification of types and classes in 2.2 (completed in 3.3). So 'class' is referring to 'Python-coded class' and 'code' is referring to '(compiled) Python code', and not machine code. Now, everything that pickle pickles is a 'class instance' and class code can be compiled from either Python or the interpreter's system language (C, Java, C#, others, or even Python itself). -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Omer Korat <animus.partum.universum@gmail.com> |
|---|---|
| Date | 2012-12-27 04:05 -0800 |
| Message-ID | <mailman.1340.1356609912.29569.python-list@python.org> |
| In reply to | #35610 |
You're probably right in general, for me the 3.3 and 2.7 pickles definitely don't work the same: 3.3: >>> type(pickle.dumps(1)) <type 'bytes'> 2.7: >>> type(pickle.dumps(1, pickle.HIGHEST_PROTOCOL)) <type 'str'> As you can see, in 2.7 when I try to dump something, I get useless string. Look what I gen when I dump an NLTK object such as the sent_tokenize function: '\x80\x02cnltk.tokenize\nsent_tokenize\ng\x00' Now, this is useless. If I try to load it on a platform without NLTK installed on it, I get: ImportError: No module named 'nltk' So it means the actual sent_tokenizer wasn't saved. Just it's module.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web