Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #87328 > unrolled thread
| Started by | John Nagle <nagle@animats.com> |
|---|---|
| First post | 2015-03-12 12:55 -0700 |
| Last post | 2015-03-12 22:57 +0100 |
| Articles | 6 — 4 participants |
Back to article view | Back to comp.lang.python
Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3 John Nagle <nagle@animats.com> - 2015-03-12 12:55 -0700
Re: Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3 Cameron Simpson <cs@zip.com.au> - 2015-03-13 08:56 +1100
Re: Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3 John Nagle <nagle@animats.com> - 2015-03-12 17:18 -0700
Re: Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3 John Nagle <nagle@animats.com> - 2015-03-12 23:05 -0700
Re: Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-13 19:43 +1100
Re: Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3 Peter Otten <__peter__@web.de> - 2015-03-12 22:57 +0100
| From | John Nagle <nagle@animats.com> |
|---|---|
| Date | 2015-03-12 12:55 -0700 |
| Subject | Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3 |
| Message-ID | <mdsqtq$7uu$1@dont-email.me> |
I have working code from Python 2 which uses "pickle"
to talk to a subprocess via stdin/stdio. I'm trying to
make that work in Python 3.
First, the subprocess Python is invoked with the "-d' option,
so stdin and stdio are supposed to be unbuffered binary streams.
That was enough in Python 2, but it's not enough in Python 3.
The subprocess and its connections are set up with
proc = subprocess.Popen(launchargs,stdin=subprocess.PIPE,
stdout=subprocess.PIPE, env=env)
...
self.reader = pickle.Unpickler(self.proc.stdout)
self.writer = pickle.Pickler(self.proc.stdin, 2)
after which I get
result = self.reader.load()
TypeError: 'str' does not support the buffer interface
That's as far as traceback goes, so I assume this is
disappearing into C code.
OK, I know I need a byte stream. I tried
self.reader = pickle.Unpickler(self.proc.stdout.buffer)
self.writer = pickle.Pickler(self.proc.stdin.buffer, 2)
That's not allowed. The "stdin" and "stdout" that are
fields of "proc" do not have "buffer". So I can't do that
in the parent process. In the child, though, where
stdin and stdout come from "sys", "sys.stdin.buffer" is valid.
That fixes the ""str" does not support the buffer interface
error." But now I get the pickle error "Ran out of input"
on the process child side. Probably because there's a
str/bytes incompatibility somewhere.
So how do I get clean binary byte streams between parent
and child process?
John Nagle
[toc] | [next] | [standalone]
| From | Cameron Simpson <cs@zip.com.au> |
|---|---|
| Date | 2015-03-13 08:56 +1100 |
| Message-ID | <mailman.302.1426197396.21433.python-list@python.org> |
| In reply to | #87328 |
On 12Mar2015 12:55, John Nagle <nagle@animats.com> wrote:
> I have working code from Python 2 which uses "pickle"
>to talk to a subprocess via stdin/stdio. I'm trying to
>make that work in Python 3.
> First, the subprocess Python is invoked with the "-d' option,
>so stdin and stdio are supposed to be unbuffered binary streams.
You shouldn't need to use unbuffered streams specificly. It should be enough to
.flush() the output stream (at whichever end) after you have written the pickle
data.
I'm skipping some of your discussion; I can see nothing wong. I don't use
pickle itself so aside from saying that your use seems to conform to the python
3 doco I can't comment more deeply. That said, I do use subprocess a fair bit.
[...]
> result = self.reader.load()
>TypeError: 'str' does not support the buffer interface
>That's as far as traceback goes, so I assume this is
>disappearing into C code.
No line numbers at all? Or, I suppose, just the line number from your program
and nothing from the pickle module?
>OK, I know I need a byte stream. I tried
> self.reader = pickle.Unpickler(self.proc.stdout.buffer)
> self.writer = pickle.Pickler(self.proc.stdin.buffer, 2)
You should not need to care about these. They're not required.
>That's not allowed. The "stdin" and "stdout" that are
>fields of "proc" do not have "buffer". So I can't do that
>in the parent process. In the child, though, where
>stdin and stdout come from "sys", "sys.stdin.buffer" is valid.
But irrelevant. Besides, the stream buffer may not contain the whole pickle
data anyway; it will be empty before a read and quite possibly incomplete
afterwards. It is just a buffer.
>That fixes the ""str" does not support the buffer interface
>error."
I'm not sure "fix" is the right characterisation here.
>But now I get the pickle error "Ran out of input"
>on the process child side. Probably because there's a
>str/bytes incompatibility somewhere.
No, probably because the buffer is only ever a snapshot of part of the stream.
str/bytes errors are more glaringly obviously so.
>So how do I get clean binary byte streams between parent
>and child process?
This is where I'm confused: my experience is that Popen.subprocess gives you
binary streams; I always need to put an encoder/decoder on them to use text.
Did that just the other day.
BTW, this is on some UNIX variant? Should not be very relevant...
Further questions:
What does self.proc.stdout.__class__ say? And for stdin?
Cheers,
Cameron Simpson <cs@zip.com.au>
My opinions are borrowed from someone who no longer needs them.
-- KatmanDu@uga.cc.uga.edu
[toc] | [prev] | [next] | [standalone]
| From | John Nagle <nagle@animats.com> |
|---|---|
| Date | 2015-03-12 17:18 -0700 |
| Message-ID | <mdtac0$65v$1@dont-email.me> |
| In reply to | #87334 |
On 3/12/2015 2:56 PM, Cameron Simpson wrote:
> On 12Mar2015 12:55, John Nagle <nagle@animats.com> wrote:
>> I have working code from Python 2 which uses "pickle" to talk to a
>> subprocess via stdin/stdio. I'm trying to make that work in Python
>> 3. First, the subprocess Python is invoked with the "-d' option, so
>> stdin and stdio are supposed to be unbuffered binary streams.
>
> You shouldn't need to use unbuffered streams specifically. It should
> be enough to .flush() the output stream (at whichever end) after you
> have written the pickle data.
Doing that.
It's a repeat-transaction thing. Main process sends pickeled
item to subprocess, subprocess reads item, subprocess does work,
subprocess writes picked item to parent. This repeats.
I call writer.clear_memo() and set reader.memo = {} at the
end of each cycle, to clear Pickle's cache. That all worked
fine in Python 2. Are there any known problems with reusing
Python 3 "pickle"s streams?
The identical code works with Python 2.7.9; it's converted to Python
3 using "six" so I can run on both Python versions and look for
differences. I'm using Pickle format 2, for compatibility.
(Tried 0, the ASCII format; it didn't help.)
> I'm skipping some of your discussion; I can see nothing wrong. I
> don't use pickle itself so aside from saying that your use seems to
> conform to the python 3 docs I can't comment more deeply. That said,
> I do use subprocess a fair bit.
I'll have to put in more logging and see exactly what's going
over the pipes.
John Nagle
[toc] | [prev] | [next] | [standalone]
| From | John Nagle <nagle@animats.com> |
|---|---|
| Date | 2015-03-12 23:05 -0700 |
| Message-ID | <55027E35.60505@animats.com> |
| In reply to | #87342 |
On 3/12/2015 5:18 PM, John Nagle wrote:
> On 3/12/2015 2:56 PM, Cameron Simpson wrote:
>> On 12Mar2015 12:55, John Nagle <nagle@animats.com> wrote:
>>> I have working code from Python 2 which uses "pickle" to talk to a
>>> subprocess via stdin/stdio. I'm trying to make that work in Python
>>> 3.
I'm starting to think that the "cpickle" module, which Python 3
uses by default, has a problem. After the program has been
running for a while, I start seeing errors such as
File "C:\projects\sitetruth\InfoSiteRating.py", line 200, in scansite
if len(self.badbusinessinfo) > 0 : # if bad stuff
NameError: name 'len' is not defined
which ought to be impossible in Python, and
File "C:\projects\sitetruth\subprocesscall.py", line 129, in send
self.writer.dump(args) # send data
OSError: [Errno 22] Invalid argument
from somewhere deep inside CPickle.
I got
File "C:\projects\sitetruth\InfoSiteRating.py", line 223, in
get_rating_text
(ratingsmalliconurl, ratinglargiconurl, ratingalttext) =
DetailsPageBuilder.getratingiconinfo(rating)
NameError: name 'DetailsPageBuilder' is not defined
(That's an imported module. It worked earlier in the run.)
and finally, even after I deleted all .pyc files and all Python
cache directories:
Fatal Python error: GC object already tracked
Current thread 0x00001a14 (most recent call first):
File "C:\python34\lib\site-packages\pymysql\connections.py", line 411
in description
File "C:\python34\lib\site-packages\pymysql\connections.py", line 1248
in _get_descriptions
File "C:\python34\lib\site-packages\pymysql\connections.py", line 1182
in _read_result_packet
File "C:\python34\lib\site-packages\pymysql\connections.py", line 1132
in read
File "C:\python34\lib\site-packages\pymysql\connections.py", line 929
in _read_query_result
File "C:\python34\lib\site-packages\pymysql\connections.py", line 768
in query
File "C:\python34\lib\site-packages\pymysql\cursors.py", line 282 in
_query
File "C:\python34\lib\site-packages\pymysql\cursors.py", line 134 in
execute
File "C:\projects\sitetruth\domaincacheitem.py", line 128 in select
File "C:\projects\sitetruth\domaincache.py", line 30 in search
File "C:\projects\sitetruth\ratesite.py", line 31 in ratedomain
File "C:\projects\sitetruth\RatingProcess.py", line 68 in call
File "C:\projects\sitetruth\subprocesscall.py", line 140 in docall
File "C:\projects\sitetruth\subprocesscall.py", line 158 in run
File "C:\projects\sitetruth\RatingProcess.py", line 89 in main
File "C:\projects\sitetruth\RatingProcess.py", line 95 in <module>
That's a definite memory error.
So something is corrupting memory. Probably CPickle.
All my code is in Python. Every library module came in via "pip", into a
clean Python 3.4.3 (32 bit) installation on Win7/x86-64.
Currently installed packages:
beautifulsoup4 (4.3.2)
dnspython3 (1.12.0)
html5lib (0.999)
pip (6.0.8)
PyMySQL (0.6.6)
pyparsing (2.0.3)
setuptools (12.0.5)
six (1.9.0)
And it works fine with Python 2.7.9.
Is there some way to force the use of the pure Python pickle module?
My guess is that there's something about reusing "pickle" instances
that botches memory uses in CPython 3's C code for "cpickle".
John Nagle
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-03-13 19:43 +1100 |
| Message-ID | <5502a31d$0$12997$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #87356 |
John Nagle wrote: > I'm starting to think that the "cpickle" module, which Python 3 > uses by default, has a problem. After the program has been > running for a while, I start seeing errors such as > > File "C:\projects\sitetruth\InfoSiteRating.py", line 200, in scansite > if len(self.badbusinessinfo) > 0 : # if bad stuff > NameError: name 'len' is not defined > > which ought to be impossible in Python, and "Impossible"? py> len <built-in function len> py> import __builtin__ # use builtins in Python 3 py> del __builtin__.len py> len Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'len' is not defined Why something is deleting builtins len is a mystery. Sounds to me that your Python installation is borked. > File "C:\projects\sitetruth\subprocesscall.py", line 129, in send > self.writer.dump(args) # send data > OSError: [Errno 22] Invalid argument > > from somewhere deep inside CPickle. Why do you say "deep inside CPickle"? The traceback says C:\projects\sitetruth\subprocesscall.py Is it possible you have accidentally shadowed the CPickle module with something? What does this say? import cPickle print cPickle.__file__ Use _pickle in Python 3. > I got > > File "C:\projects\sitetruth\InfoSiteRating.py", line 223, in > get_rating_text > (ratingsmalliconurl, ratinglargiconurl, ratingalttext) = > DetailsPageBuilder.getratingiconinfo(rating) > NameError: name 'DetailsPageBuilder' is not defined > (That's an imported module. It worked earlier in the run.) > > and finally, even after I deleted all .pyc files and all Python > cache directories: > > Fatal Python error: GC object already tracked > > Current thread 0x00001a14 (most recent call first): > File "C:\python34\lib\site-packages\pymysql\connections.py", line 411 > in description > File "C:\python34\lib\site-packages\pymysql\connections.py", line 1248 > in _get_descriptions > File "C:\python34\lib\site-packages\pymysql\connections.py", line 1182 > in _read_result_packet > File "C:\python34\lib\site-packages\pymysql\connections.py", line 1132 > in read > File "C:\python34\lib\site-packages\pymysql\connections.py", line 929 > in _read_query_result > File "C:\python34\lib\site-packages\pymysql\connections.py", line 768 > in query > File "C:\python34\lib\site-packages\pymysql\cursors.py", line 282 in > _query > File "C:\python34\lib\site-packages\pymysql\cursors.py", line 134 in > execute > File "C:\projects\sitetruth\domaincacheitem.py", line 128 in select > File "C:\projects\sitetruth\domaincache.py", line 30 in search > File "C:\projects\sitetruth\ratesite.py", line 31 in ratedomain > File "C:\projects\sitetruth\RatingProcess.py", line 68 in call > File "C:\projects\sitetruth\subprocesscall.py", line 140 in docall > File "C:\projects\sitetruth\subprocesscall.py", line 158 in run > File "C:\projects\sitetruth\RatingProcess.py", line 89 in main > File "C:\projects\sitetruth\RatingProcess.py", line 95 in <module> > > That's a definite memory error. > > So something is corrupting memory. Probably CPickle. > All my code is in Python. Every library module came in via "pip", into a > clean Python 3.4.3 (32 bit) installation on Win7/x86-64. > Currently installed packages: > > beautifulsoup4 (4.3.2) > dnspython3 (1.12.0) > html5lib (0.999) > pip (6.0.8) > PyMySQL (0.6.6) > pyparsing (2.0.3) > setuptools (12.0.5) > six (1.9.0) > > And it works fine with Python 2.7.9. > > Is there some way to force the use of the pure Python pickle module? Try renaming the _pickle module. This works on Linux: mv /usr/local/lib/python3.3/lib-dynload/_pickle.cpython-33m.so /usr/local/lib/python3.3/lib-dynload/_pickle.cpython-33m.so~ > My guess is that there's something about reusing "pickle" instances > that botches memory uses in CPython 3's C code for "cpickle". How are you reusing instances? -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2015-03-12 22:57 +0100 |
| Message-ID | <mailman.303.1426197442.21433.python-list@python.org> |
| In reply to | #87328 |
John Nagle wrote:
> I have working code from Python 2 which uses "pickle"
> to talk to a subprocess via stdin/stdio. I'm trying to
> make that work in Python 3.
>
> First, the subprocess Python is invoked with the "-d' option,
> so stdin and stdio are supposed to be unbuffered binary streams.
> That was enough in Python 2, but it's not enough in Python 3.
>
> The subprocess and its connections are set up with
>
> proc = subprocess.Popen(launchargs,stdin=subprocess.PIPE,
> stdout=subprocess.PIPE, env=env)
>
> ...
> self.reader = pickle.Unpickler(self.proc.stdout)
> self.writer = pickle.Pickler(self.proc.stdin, 2)
>
> after which I get
>
> result = self.reader.load()
> TypeError: 'str' does not support the buffer interface
>
> That's as far as traceback goes, so I assume this is
> disappearing into C code.
>
> OK, I know I need a byte stream. I tried
>
> self.reader = pickle.Unpickler(self.proc.stdout.buffer)
> self.writer = pickle.Pickler(self.proc.stdin.buffer, 2)
>
> That's not allowed. The "stdin" and "stdout" that are
> fields of "proc" do not have "buffer". So I can't do that
> in the parent process. In the child, though, where
> stdin and stdout come from "sys", "sys.stdin.buffer" is valid.
> That fixes the ""str" does not support the buffer interface
> error." But now I get the pickle error "Ran out of input"
> on the process child side. Probably because there's a
> str/bytes incompatibility somewhere.
>
> So how do I get clean binary byte streams between parent
> and child process?
I don't know what you have to do to rule out deadlocks when you use pipes
for both stdin and stdout, but binary streams are the default for
subprocess. Can you provide a complete example?
Anyway, here is a demo for two-way communication using the communicate()
method:
$ cat parent.py
import pickle
import subprocess
data = (5, 4.3, "üblich ähnlich nötig")
p = subprocess.Popen(
["python3", "child.py"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
result = p.communicate(pickle.dumps(data, protocol=2))[0]
print(pickle.loads(result))
$ cat child.py
import sys
import pickle
a, b, c = pickle.load(sys.stdin.buffer)
pickle.dump((a, b, c.upper()), sys.stdout.buffer)
$ python3 parent.py
(5, 4.3, 'ÜBLICH ÄHNLICH NÖTIG')
This is likely not what you want because here everything is buffered so that
continuous interaction is not possible.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web