Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #87328 > unrolled thread

Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3

Started byJohn Nagle <nagle@animats.com>
First post2015-03-12 12:55 -0700
Last post2015-03-12 22:57 +0100
Articles 6 — 4 participants

Back to article view | Back to comp.lang.python


Contents

  Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3 John Nagle <nagle@animats.com> - 2015-03-12 12:55 -0700
    Re: Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3 Cameron Simpson <cs@zip.com.au> - 2015-03-13 08:56 +1100
      Re: Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3 John Nagle <nagle@animats.com> - 2015-03-12 17:18 -0700
        Re: Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3 John Nagle <nagle@animats.com> - 2015-03-12 23:05 -0700
          Re: Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-03-13 19:43 +1100
    Re: Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3 Peter Otten <__peter__@web.de> - 2015-03-12 22:57 +0100

#87328 — Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3

FromJohn Nagle <nagle@animats.com>
Date2015-03-12 12:55 -0700
SubjectPython3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3
Message-ID<mdsqtq$7uu$1@dont-email.me>
  I have working code from Python 2 which uses "pickle"
to talk to a subprocess via stdin/stdio.  I'm trying to
make that work in Python 3.

  First, the subprocess Python is invoked with the "-d' option,
so stdin and stdio are supposed to be unbuffered binary streams.
That was enough in Python 2, but it's not enough in Python 3.

The subprocess and its connections are set up with

  proc = subprocess.Popen(launchargs,stdin=subprocess.PIPE,
    stdout=subprocess.PIPE, env=env)

  ...
  self.reader = pickle.Unpickler(self.proc.stdout)
  self.writer = pickle.Pickler(self.proc.stdin, 2)

after which I get

  result = self.reader.load()
TypeError: 'str' does not support the buffer interface

That's as far as traceback goes, so I assume this is
disappearing into C code.

OK, I know I need a byte stream.  I tried

  self.reader = pickle.Unpickler(self.proc.stdout.buffer)
  self.writer = pickle.Pickler(self.proc.stdin.buffer, 2)

That's not allowed.  The "stdin" and "stdout" that are
fields of "proc" do not have "buffer".  So I can't do that
in the parent process.  In the child, though, where
stdin and stdout come from "sys", "sys.stdin.buffer" is valid.
That fixes the ""str" does not support the buffer interface
error."  But now I get the pickle error "Ran out of input"
on the process child side.  Probably because there's a
str/bytes incompatibility somewhere.

So how do I get clean binary byte streams between parent
and child process?

			John Nagle

[toc] | [next] | [standalone]


#87334

FromCameron Simpson <cs@zip.com.au>
Date2015-03-13 08:56 +1100
Message-ID<mailman.302.1426197396.21433.python-list@python.org>
In reply to#87328
On 12Mar2015 12:55, John Nagle <nagle@animats.com> wrote:
>  I have working code from Python 2 which uses "pickle"
>to talk to a subprocess via stdin/stdio.  I'm trying to
>make that work in Python 3.
>  First, the subprocess Python is invoked with the "-d' option,
>so stdin and stdio are supposed to be unbuffered binary streams.

You shouldn't need to use unbuffered streams specificly. It should be enough to 
.flush() the output stream (at whichever end) after you have written the pickle 
data.

I'm skipping some of your discussion; I can see nothing wong. I don't use 
pickle itself so aside from saying that your use seems to conform to the python 
3 doco I can't comment more deeply. That said, I do use subprocess a fair bit.

[...]
>  result = self.reader.load()
>TypeError: 'str' does not support the buffer interface
>That's as far as traceback goes, so I assume this is
>disappearing into C code.

No line numbers at all? Or, I suppose, just the line number from your program 
and nothing from the pickle module?

>OK, I know I need a byte stream.  I tried
>  self.reader = pickle.Unpickler(self.proc.stdout.buffer)
>  self.writer = pickle.Pickler(self.proc.stdin.buffer, 2)

You should not need to care about these. They're not required.

>That's not allowed.  The "stdin" and "stdout" that are
>fields of "proc" do not have "buffer".  So I can't do that
>in the parent process.  In the child, though, where
>stdin and stdout come from "sys", "sys.stdin.buffer" is valid.

But irrelevant. Besides, the stream buffer may not contain the whole pickle 
data anyway; it will be empty before a read and quite possibly incomplete 
afterwards. It is just a buffer.

>That fixes the ""str" does not support the buffer interface
>error."

I'm not sure "fix" is the right characterisation here.

>But now I get the pickle error "Ran out of input"
>on the process child side.  Probably because there's a
>str/bytes incompatibility somewhere.

No, probably because the buffer is only ever a snapshot of part of the stream.

str/bytes errors are more glaringly obviously so.

>So how do I get clean binary byte streams between parent
>and child process?

This is where I'm confused: my experience is that Popen.subprocess gives you 
binary streams; I always need to put an encoder/decoder on them to use text.  
Did that just the other day.

BTW, this is on some UNIX variant? Should not be very relevant...

Further questions:

What does self.proc.stdout.__class__ say? And for stdin?

Cheers,
Cameron Simpson <cs@zip.com.au>

My opinions are borrowed from someone who no longer needs them.
        -- KatmanDu@uga.cc.uga.edu

[toc] | [prev] | [next] | [standalone]


#87342

FromJohn Nagle <nagle@animats.com>
Date2015-03-12 17:18 -0700
Message-ID<mdtac0$65v$1@dont-email.me>
In reply to#87334
On 3/12/2015 2:56 PM, Cameron Simpson wrote:
> On 12Mar2015 12:55, John Nagle <nagle@animats.com> wrote:
>> I have working code from Python 2 which uses "pickle" to talk to a
>> subprocess via stdin/stdio.  I'm trying to make that work in Python
>> 3. First, the subprocess Python is invoked with the "-d' option, so
>> stdin and stdio are supposed to be unbuffered binary streams.
> 
> You shouldn't need to use unbuffered streams specifically. It should
> be enough to .flush() the output stream (at whichever end) after you
> have written the pickle data.

    Doing that.

    It's a repeat-transaction thing.  Main process sends pickeled
item to subprocess, subprocess reads item, subprocess does work,
subprocess writes picked item to parent.  This repeats.

    I call writer.clear_memo() and set reader.memo = {} at the
end of each cycle, to clear Pickle's cache.  That all worked
fine in Python 2.  Are there any known problems with reusing
Python 3 "pickle"s streams?

    The identical code works with Python 2.7.9; it's converted to Python
3 using "six" so I can run on both Python versions and look for
differences.  I'm using Pickle format 2, for compatibility.
(Tried 0, the ASCII format; it didn't help.)

> I'm skipping some of your discussion; I can see nothing wrong. I
> don't use pickle itself so aside from saying that your use seems to
> conform to the python 3 docs I can't comment more deeply. That said,
> I do use subprocess a fair bit.

     I'll have to put in more logging and see exactly what's going
over the pipes.

				John Nagle

[toc] | [prev] | [next] | [standalone]


#87356

FromJohn Nagle <nagle@animats.com>
Date2015-03-12 23:05 -0700
Message-ID<55027E35.60505@animats.com>
In reply to#87342
On 3/12/2015 5:18 PM, John Nagle wrote:
> On 3/12/2015 2:56 PM, Cameron Simpson wrote:
>> On 12Mar2015 12:55, John Nagle <nagle@animats.com> wrote:
>>> I have working code from Python 2 which uses "pickle" to talk to a
>>> subprocess via stdin/stdio.  I'm trying to make that work in Python
>>> 3.

   I'm starting to think that the "cpickle" module, which Python 3
uses by default, has a problem. After the program has been
running for a while, I start seeing errors such as

  File "C:\projects\sitetruth\InfoSiteRating.py", line 200, in scansite
    if len(self.badbusinessinfo) > 0 :                  # if bad stuff
NameError: name 'len' is not defined

which ought to be impossible in Python, and

  File "C:\projects\sitetruth\subprocesscall.py", line 129, in send
    self.writer.dump(args)                          # send data
OSError: [Errno 22] Invalid argument

from somewhere deep inside CPickle.

I got

  File "C:\projects\sitetruth\InfoSiteRating.py", line 223, in
get_rating_text
    (ratingsmalliconurl, ratinglargiconurl, ratingalttext) =
DetailsPageBuilder.getratingiconinfo(rating)
NameError: name 'DetailsPageBuilder' is not defined
(That's an imported module.  It worked earlier in the run.)

and finally, even after I deleted all .pyc files and all Python
cache directories: 	

Fatal Python error: GC object already tracked

Current thread 0x00001a14 (most recent call first):
  File "C:\python34\lib\site-packages\pymysql\connections.py", line 411
in description
  File "C:\python34\lib\site-packages\pymysql\connections.py", line 1248
in _get_descriptions
  File "C:\python34\lib\site-packages\pymysql\connections.py", line 1182
in _read_result_packet
  File "C:\python34\lib\site-packages\pymysql\connections.py", line 1132
in read
  File "C:\python34\lib\site-packages\pymysql\connections.py", line 929
in _read_query_result
  File "C:\python34\lib\site-packages\pymysql\connections.py", line 768
in query
  File "C:\python34\lib\site-packages\pymysql\cursors.py", line 282 in
_query
  File "C:\python34\lib\site-packages\pymysql\cursors.py", line 134 in
execute
  File "C:\projects\sitetruth\domaincacheitem.py", line 128 in select
  File "C:\projects\sitetruth\domaincache.py", line 30 in search
  File "C:\projects\sitetruth\ratesite.py", line 31 in ratedomain
  File "C:\projects\sitetruth\RatingProcess.py", line 68 in call
  File "C:\projects\sitetruth\subprocesscall.py", line 140 in docall
  File "C:\projects\sitetruth\subprocesscall.py", line 158 in run
  File "C:\projects\sitetruth\RatingProcess.py", line 89 in main
  File "C:\projects\sitetruth\RatingProcess.py", line 95 in <module>

That's a definite memory error.

So something is corrupting memory.  Probably CPickle.

All my code is in Python. Every library module came in via "pip", into a
clean Python 3.4.3 (32 bit) installation on Win7/x86-64.
Currently installed packages:

beautifulsoup4 (4.3.2)
dnspython3 (1.12.0)
html5lib (0.999)
pip (6.0.8)
PyMySQL (0.6.6)
pyparsing (2.0.3)
setuptools (12.0.5)
six (1.9.0)

And it works fine with Python 2.7.9.

Is there some way to force the use of the pure Python pickle module?
My guess is that there's something about reusing "pickle" instances
that botches memory uses in CPython 3's C code for "cpickle".

				John Nagle	

[toc] | [prev] | [next] | [standalone]


#87361

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2015-03-13 19:43 +1100
Message-ID<5502a31d$0$12997$c3e8da3$5496439d@news.astraweb.com>
In reply to#87356
John Nagle wrote:

>    I'm starting to think that the "cpickle" module, which Python 3
> uses by default, has a problem. After the program has been
> running for a while, I start seeing errors such as
> 
>   File "C:\projects\sitetruth\InfoSiteRating.py", line 200, in scansite
>     if len(self.badbusinessinfo) > 0 :                  # if bad stuff
> NameError: name 'len' is not defined
> 
> which ought to be impossible in Python, and

"Impossible"?

py> len
<built-in function len>
py> import __builtin__  # use builtins in Python 3
py> del __builtin__.len
py> len
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'len' is not defined


Why something is deleting builtins len is a mystery. Sounds to me that your
Python installation is borked.


>   File "C:\projects\sitetruth\subprocesscall.py", line 129, in send
>     self.writer.dump(args)                          # send data
> OSError: [Errno 22] Invalid argument
> 
> from somewhere deep inside CPickle.

Why do you say "deep inside CPickle"? The traceback says 
C:\projects\sitetruth\subprocesscall.py

Is it possible you have accidentally shadowed the CPickle module with
something? What does this say?

import cPickle
print cPickle.__file__

Use _pickle in Python 3.


> I got
> 
>   File "C:\projects\sitetruth\InfoSiteRating.py", line 223, in
> get_rating_text
>     (ratingsmalliconurl, ratinglargiconurl, ratingalttext) =
> DetailsPageBuilder.getratingiconinfo(rating)
> NameError: name 'DetailsPageBuilder' is not defined
> (That's an imported module.  It worked earlier in the run.)
> 
> and finally, even after I deleted all .pyc files and all Python
> cache directories:
> 
> Fatal Python error: GC object already tracked
> 
> Current thread 0x00001a14 (most recent call first):
>   File "C:\python34\lib\site-packages\pymysql\connections.py", line 411
> in description
>   File "C:\python34\lib\site-packages\pymysql\connections.py", line 1248
> in _get_descriptions
>   File "C:\python34\lib\site-packages\pymysql\connections.py", line 1182
> in _read_result_packet
>   File "C:\python34\lib\site-packages\pymysql\connections.py", line 1132
> in read
>   File "C:\python34\lib\site-packages\pymysql\connections.py", line 929
> in _read_query_result
>   File "C:\python34\lib\site-packages\pymysql\connections.py", line 768
> in query
>   File "C:\python34\lib\site-packages\pymysql\cursors.py", line 282 in
> _query
>   File "C:\python34\lib\site-packages\pymysql\cursors.py", line 134 in
> execute
>   File "C:\projects\sitetruth\domaincacheitem.py", line 128 in select
>   File "C:\projects\sitetruth\domaincache.py", line 30 in search
>   File "C:\projects\sitetruth\ratesite.py", line 31 in ratedomain
>   File "C:\projects\sitetruth\RatingProcess.py", line 68 in call
>   File "C:\projects\sitetruth\subprocesscall.py", line 140 in docall
>   File "C:\projects\sitetruth\subprocesscall.py", line 158 in run
>   File "C:\projects\sitetruth\RatingProcess.py", line 89 in main
>   File "C:\projects\sitetruth\RatingProcess.py", line 95 in <module>
> 
> That's a definite memory error.
>
> So something is corrupting memory.  Probably CPickle.


> All my code is in Python. Every library module came in via "pip", into a
> clean Python 3.4.3 (32 bit) installation on Win7/x86-64.
> Currently installed packages:
> 
> beautifulsoup4 (4.3.2)
> dnspython3 (1.12.0)
> html5lib (0.999)
> pip (6.0.8)
> PyMySQL (0.6.6)
> pyparsing (2.0.3)
> setuptools (12.0.5)
> six (1.9.0)
> 
> And it works fine with Python 2.7.9.
> 
> Is there some way to force the use of the pure Python pickle module?

Try renaming the _pickle module. This works on Linux:

mv /usr/local/lib/python3.3/lib-dynload/_pickle.cpython-33m.so /usr/local/lib/python3.3/lib-dynload/_pickle.cpython-33m.so~


> My guess is that there's something about reusing "pickle" instances
> that botches memory uses in CPython 3's C code for "cpickle".

How are you reusing instances?


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#87335

FromPeter Otten <__peter__@web.de>
Date2015-03-12 22:57 +0100
Message-ID<mailman.303.1426197442.21433.python-list@python.org>
In reply to#87328
John Nagle wrote:

>   I have working code from Python 2 which uses "pickle"
> to talk to a subprocess via stdin/stdio.  I'm trying to
> make that work in Python 3.
> 
>   First, the subprocess Python is invoked with the "-d' option,
> so stdin and stdio are supposed to be unbuffered binary streams.
> That was enough in Python 2, but it's not enough in Python 3.
> 
> The subprocess and its connections are set up with
> 
>   proc = subprocess.Popen(launchargs,stdin=subprocess.PIPE,
>     stdout=subprocess.PIPE, env=env)
> 
>   ...
>   self.reader = pickle.Unpickler(self.proc.stdout)
>   self.writer = pickle.Pickler(self.proc.stdin, 2)
> 
> after which I get
> 
>   result = self.reader.load()
> TypeError: 'str' does not support the buffer interface
> 
> That's as far as traceback goes, so I assume this is
> disappearing into C code.
> 
> OK, I know I need a byte stream.  I tried
> 
>   self.reader = pickle.Unpickler(self.proc.stdout.buffer)
>   self.writer = pickle.Pickler(self.proc.stdin.buffer, 2)
> 
> That's not allowed.  The "stdin" and "stdout" that are
> fields of "proc" do not have "buffer".  So I can't do that
> in the parent process.  In the child, though, where
> stdin and stdout come from "sys", "sys.stdin.buffer" is valid.
> That fixes the ""str" does not support the buffer interface
> error."  But now I get the pickle error "Ran out of input"
> on the process child side.  Probably because there's a
> str/bytes incompatibility somewhere.
> 
> So how do I get clean binary byte streams between parent
> and child process?

I don't know what you have to do to rule out deadlocks when you use pipes 
for both stdin and stdout, but binary streams are the default for 
subprocess. Can you provide a complete example?

Anyway, here is a demo for two-way communication using the communicate() 
method:

$ cat parent.py
import pickle
import subprocess

data = (5, 4.3, "üblich ähnlich nötig")

p = subprocess.Popen(
    ["python3", "child.py"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE)

result = p.communicate(pickle.dumps(data, protocol=2))[0]
print(pickle.loads(result))

$ cat child.py
import sys
import pickle

a, b, c = pickle.load(sys.stdin.buffer)
pickle.dump((a, b, c.upper()), sys.stdout.buffer)

$ python3 parent.py 
(5, 4.3, 'ÜBLICH ÄHNLICH NÖTIG')

This is likely not what you want because here everything is buffered so that 
continuous interaction is not possible.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web