Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #72340 > unrolled thread

Python 3.2 has some deadly infection

Started byMark Lawrence <breamoreboy@yahoo.co.uk>
First post2014-05-31 17:10 +0100
Last post2014-06-03 14:22 -0400
Articles 20 on this page of 92 — 19 participants

Back to article view | Back to comp.lang.python


Contents

  Python 3.2 has some deadly infection Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-31 17:10 +0100
    Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-05-31 22:55 +0300
    Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-01 02:26 +0000
      Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-01 12:43 +1000
      Re: Python 3.2 has some deadly infection Tim Delaney <timothy.c.delaney@gmail.com> - 2014-06-02 08:54 +1000
        Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-02 01:14 +0000
          Re: Python 3.2 has some deadly infection Tim Delaney <timothy.c.delaney@gmail.com> - 2014-06-02 12:23 +1000
            Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-01 19:46 -0700
          Re: Python 3.2 has some deadly infection Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2014-06-02 07:45 +0000
          Re: Python 3.2 has some deadly infection Tim Delaney <timothy.c.delaney@gmail.com> - 2014-06-02 19:02 +1000
          Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-02 19:14 +1000
          Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-02 12:10 +0100
            Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-03 16:34 +0000
              Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-04 02:43 +1000
          Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-02 17:34 -0400
            Re: Python 3.2 has some deadly infection Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-06-03 17:16 +1200
              Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-03 02:21 -0400
              Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-03 15:18 +0100
                Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-04 13:08 +0000
                  Re: Python 3.2 has some deadly infection Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-06-05 14:01 +1200
                    Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 10:16 +0300
                      Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-05 17:30 +1000
                        Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 11:05 +0300
                          Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-05 18:36 +1000
                            Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 12:53 +0300
                              Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-05 05:43 -0700
                              Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 14:50 -0400
                                Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 23:21 +0300
                                  Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 18:09 -0400
                                  Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 23:13 +0000
                                    Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 02:30 +0300
                                      Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 09:39 +1000
                                      Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 22:08 -0400
                                      Re: Python 3.2 has some deadly infection Ethan Furman <ethan@stoneleaf.us> - 2014-06-05 20:47 -0700
                    Re: Python 3.2 has some deadly infection Steven D'Aprano <steve@pearwood.info> - 2014-06-05 08:34 +0000
                      Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 12:41 +0300
                        Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 06:37 -0700
                          Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 17:45 +0300
                            Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 15:33 +0000
                              Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 02:12 +1000
                                Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 09:54 -0700
                                  Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 03:36 +1000
                              Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 19:52 +0300
                                Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 03:28 +1000
                                  Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 15:35 -0700
                                    Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 08:52 +1000
                                      Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 20:11 -0700
                                        Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 13:20 +1000
                                          Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 20:32 -0700
                                Re: Python 3.2 has some deadly infection Akira Li <4kir4.1i@gmail.com> - 2014-06-06 12:03 +0400
                            Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-05 16:37 +0100
                              Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 16:16 +0000
                            Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 01:50 +1000
                            Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-05 17:17 +0100
                              Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 16:32 +0000
                                Re: Python 3.2 has some deadly infection Ethan Furman <ethan@stoneleaf.us> - 2014-06-06 07:40 -0700
                            Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 03:14 +1000
                            Re: Python 3.2 has some deadly infection Ian Kelly <ian.g.kelly@gmail.com> - 2014-06-05 11:16 -0600
                            Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 14:11 -0400
                              Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 21:30 +0300
                                Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 23:02 +0000
                                  Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 02:21 +0300
                                    Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-06 12:15 +0000
                                      Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 16:00 +0300
                                  Re: Python 3.2 has some deadly infection rurpy@yahoo.com - 2014-06-07 21:34 -0700
                                Re: Python 3.2 has some deadly infection Ethan Furman <ethan@stoneleaf.us> - 2014-06-06 06:24 -0700
                                  Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 17:10 +0300
                                    Re: Python 3.2 has some deadly infection Michael Torrie <torriem@gmail.com> - 2014-06-06 09:02 -0600
                                      Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 18:32 +0300
                                        Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 01:50 +1000
                                          Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 20:02 +0300
                                            Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-06 10:13 -0700
                                              Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 03:26 +1000
                                          Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-06 11:03 -0700
                                          Re: Python 3.2 has some deadly infection Denis McMahon <denismfmcmahon@gmail.com> - 2014-06-06 21:18 +0000
                                            Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 08:18 +1000
                                        Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-06 15:57 +0000
                                          Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-06 09:21 -0700
                                            Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 02:48 +1000
                                              Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-06 10:04 -0700
                                                Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 03:12 +1000
                                          Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 20:11 +0300
                                            Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 03:16 +1000
                                            Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 20:18 +0300
                                            Re: Python 3.2 has some deadly infection Ned Batchelder <ned@nedbatchelder.com> - 2014-06-06 13:33 -0400
                                Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 01:25 +1000
                                  Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-06 08:44 -0700
                                    Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-06 08:48 -0700
                            Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-06 12:56 +0100
                  Re: Python 3.2 has some deadly infection Akira Li <4kir4.1i@gmail.com> - 2014-06-05 06:49 +0400
              Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-04 00:25 +1000
              Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-03 14:22 -0400

Page 2 of 5 — ← Prev page 1 [2] 3 4 5  Next page →


#72684

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-06-05 10:16 +0300
Message-ID<87a99r7rmx.fsf@elektro.pacujo.net>
In reply to#72665
Gregory Ewing <greg.ewing@canterbury.ac.nz>:

> As a result, most unix programs, most of the time, deal
> with text on stdin and stdout.

Well, ok. But even accepting that premise, that "text" might not be what
Python3 considers "text".

For example, if your program reads in XML, JSON or Python, the parser
object might prefer to take it in as bytes and not have it predecoded by
sys.stdin.

> So, it makes sense for them to be text by default.

I'm not sure. That could lead to nasty surprises.

I've experienced analogous consternations when the "sort" utility hasn't
worked identically for identical input: it is heavily influenced by the
(spit, spit) locale. That's why 99.9% of your scripts should prefix
"sort" and "grep" with LC_ALL=C -- even when the input really is UTF-8.

Should I now take it further and prefix all Python programs with
LC_ALL=C? Probably not, since UTF-8 might cause sys.stdin to barf.

> And wherever there's text, there needs to be an encoding.

No problem there, only should sys.stdin and sys.stdout carry the
decoding/encoding out or should it be left for the program.


Marko

[toc] | [prev] | [next] | [standalone]


#72687

FromChris Angelico <rosuav@gmail.com>
Date2014-06-05 17:30 +1000
Message-ID<mailman.10727.1401953433.18130.python-list@python.org>
In reply to#72684
On Thu, Jun 5, 2014 at 5:16 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
> No problem there, only should sys.stdin and sys.stdout carry the
> decoding/encoding out or should it be left for the program.

The most normal thing to do with the standard streams is to have them
produce text, and as much as possible, you shouldn't have to go to
great lengths to make that work. If, in Python, I say print("Hello,
world!"), I expect that to produce a line of text on the screen,
without my code having to encode that to bytes, figure out what sort
of newline to add, etc, etc.

Even if stdout isn't a tty, chances are you're still working with
text. Only an extreme few Unix programs actually manipulate binary
standard streams (some, like cat, will pipe binary through unchanged,
but even cat assumes text for options like -n); those few should be
the ones to have to worry about setting stdin and stdout to be binary.
In the same way that we have double-quoted strings being Unicode
strings, we should have print() and input() "naturally just work" with
Unicode, which means they should negotiate encodings with the system
without the programmer having to lift a finger.

ChrisA

[toc] | [prev] | [next] | [standalone]


#72689

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-06-05 11:05 +0300
Message-ID<871tv37pdt.fsf@elektro.pacujo.net>
In reply to#72687
Chris Angelico <rosuav@gmail.com>:

> If, in Python, I say print("Hello, world!"), I expect that to produce
> a line of text on the screen, without my code having to encode that to
> bytes, figure out what sort of newline to add, etc, etc.

That example in no way represents the typical Python program (if there
is one).

> Only an extreme few Unix programs actually manipulate binary standard
> streams

That's quite an assumption to make.

> we should have print() and input() "naturally just work" with Unicode

No problem there. I couldn't imagine using either function for anything
serious.


Marko

[toc] | [prev] | [next] | [standalone]


#72692

FromChris Angelico <rosuav@gmail.com>
Date2014-06-05 18:36 +1000
Message-ID<mailman.10730.1401957426.18130.python-list@python.org>
In reply to#72689
On Thu, Jun 5, 2014 at 6:05 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Chris Angelico <rosuav@gmail.com>:
>
>> If, in Python, I say print("Hello, world!"), I expect that to produce
>> a line of text on the screen, without my code having to encode that to
>> bytes, figure out what sort of newline to add, etc, etc.
>
> That example in no way represents the typical Python program (if there
> is one).

It's simpler than most, but use of print() is certainly quite common.
A naive search of .py files in my /usr came up with five thousand
instances of ' print(', and given that that search won't necessarily
find a Python 2 print statement (and I'm on Debian Wheezy, so Py2 is
the system Python), I think that's a fairly respectable figure.

>> Only an extreme few Unix programs actually manipulate binary standard
>> streams
>
> That's quite an assumption to make.

Okay. Start listing some. You have (de)compression programs like gzip,
which primarily work with files but can work with standard streams;
some image or movie manipulation programs (eg avconv) can also read
from stdin, although again, it's far more common to use files; cat
will happily transmit binary untouched, but all its options (at least
the ones I can see in my 'man cat') are for working with text.

What else do you have? Let's see... grep, sort, less/more, sed, awk,
these are all text manipulation programs. All your "give me info about
the system" programs (ls, mount, pwd, hostname, date.......) print
text to stdout. Some also read from stdin, like md5sum and related.

Piles and piles of programs that work with text. A small handful that
work with binary, and most of them are more commonly used directly
with files, not with pipes. The most common case is that it all be
text.

>> we should have print() and input() "naturally just work" with Unicode
>
> No problem there. I couldn't imagine using either function for anything
> serious.

I don't know about those exact functions, but I do know that there are
plenty of Python programs that use the console (take hg as one fairly
hefty example). Maybe input() isn't all that heavily used, but
certainly print() is a fine function. I can not only imagine using
them seriously, I *have used* them, and their equivalents in other
languages, seriously.

If the standard streams are so crucial, why are their most obvious
interfaces insignificant to you?

ChrisA

[toc] | [prev] | [next] | [standalone]


#72699

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-06-05 12:53 +0300
Message-ID<87sinju1hf.fsf@elektro.pacujo.net>
In reply to#72692
Chris Angelico <rosuav@gmail.com>:

> If the standard streams are so crucial, why are their most obvious
> interfaces insignificant to you?

I want the standard streams to consume and produce bytes. I do a lot of
system programming and connect processes to each other with socketpairs,
pipes and the like. I have dealt with plugin APIs that communicate over
stdin and stdout.

Python is clearly on a crusade to make *text* a first class system
entity. I don't believe that is possible (without casualties) in the
linux world. Python text should only exist inside string objects.


Marko

[toc] | [prev] | [next] | [standalone]


#72701

Fromwxjmfauth@gmail.com
Date2014-06-05 05:43 -0700
Message-ID<8637bf51-9909-45d4-a209-48347e533f8a@googlegroups.com>
In reply to#72699
Le jeudi 5 juin 2014 11:53:00 UTC+2, Marko Rauhamaa a écrit :
> Chris Angelico <rosuav@gmail.com>:
> 
> 
> 
> > If the standard streams are so crucial, why are their most obvious
> 
> > interfaces insignificant to you?
> 
> 
> 
> I want the standard streams to consume and produce bytes. I do a lot of
> 
> system programming and connect processes to each other with socketpairs,
> 
> pipes and the like. I have dealt with plugin APIs that communicate over
> 
> stdin and stdout.
> 
> 
> 
> Python is clearly on a crusade to make *text* a first class system
> 
> entity. I don't believe that is possible (without casualties) in the
> 
> linux world. Python text should only exist inside string objects.
> 
> 
> 
> 
> 
> Marko

=====

Are you sure?

>>> timeit.repeat("(x*1000 + y)", setup="x = 'abc'; y = 'z'")
[0.9457552436453511, 0.9190932610143818, 0.9322044912393039]
>>> timeit.repeat("(x*1000 + y)", setup="x = 'abc'; y = '\u0fce'")
[2.5541921791045183, 2.52434366066052, 2.5337417948967413]
>>> timeit.repeat("(x*1000 + y)", setup="x = 'abc'.encode('utf-8'); y = 'z'.encode('utf-8')")
[0.9168235779232532, 0.8989583403075017, 0.8964204541650247]
>>> timeit.repeat("(x*1000 + y)", setup="x = 'abc'.encode('utf-8'); y = '\u0fce'.encode('utf-8')")
[0.9320969737165115, 0.9086006535332558, 0.9051715140790861]
>>> 
>>> 
>>> sys.getsizeof('abc'*1000 + '\u0fce')
6040
>>> sys.getsizeof(('abc'*1000 + '\u0fce').encode('utf-8'))
3020
>>>

jmf

[toc] | [prev] | [next] | [standalone]


#72748

FromTerry Reedy <tjreedy@udel.edu>
Date2014-06-05 14:50 -0400
Message-ID<mailman.10757.1401996534.18130.python-list@python.org>
In reply to#72699
On 6/5/2014 5:53 AM, Marko Rauhamaa wrote:
> Chris Angelico <rosuav@gmail.com>:
>
>> If the standard streams are so crucial, why are their most obvious
>> interfaces insignificant to you?
>
> I want the standard streams to consume and produce bytes.

Easy. Read the manual entry for stdxxx. "To write or read binary data 
from/to the standard streams, use the underlying binary buffer object. 
For example, to write bytes to stdout, use 
sys.stdout.buffer.write(b'abc')" To make it easy, use bound methods.

myfilter.p
----------
import sys
sysin = sys.stdin.buffer.read
sysout = sys.stdout.buffer.write
syserr = sys.stderr.buffer.write

<filter code with calls to sysin, sysout, syserr.>
---

The same trick of defining bound methods to save both writing and 
execution time is also useful for text filters when you use 
sys.stdin.read, etc, more than once in the text.

When you try this, please report the result, either way.

 > I do a lot of  system programming and connect processes to each other
 > with socketpairs, pipes and the like. I have dealt with plugin APIs
 > that communicate over stdin and stdout.

Now you know how to do so on Python 3.

> Python is clearly on a crusade to make *text* a first class system
> entity. I don't believe that is possible (without casualties) in the
> linux world. Python text should only exist inside string objects.

You are clearly on a crusade to push a falsehood. Why?

On Windows and, I believe, Mac, utf-16 encoded text (C widechar type) 
*is* a 'first class system entity. The problem Python has with *nix is 
getting text bytes from the system in an unknown or worse, 
wrongly-claimed encoding. The Python developers do their best to cope 
with the differences and peculiarities of the systems it runs on.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#72757

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-06-05 23:21 +0300
Message-ID<8738fjumy8.fsf@elektro.pacujo.net>
In reply to#72748
Terry Reedy <tjreedy@udel.edu>:

> On 6/5/2014 5:53 AM, Marko Rauhamaa wrote:
>> Chris Angelico <rosuav@gmail.com>:
>>
>>> If the standard streams are so crucial, why are their most obvious
>>> interfaces insignificant to you?
>>
>> I want the standard streams to consume and produce bytes.
>
> Easy. Read the manual entry for stdxxx. "To write or read binary data
> from/to the standard streams, use the underlying binary buffer object.
> For example, to write bytes to stdout, use
> sys.stdout.buffer.write(b'abc')"

This note from the manual is a bit vague:

   Note that the streams can be replaced with objects (like io.StringIO)
   that do not support the buffer attribute or the detach() method

"Can be replaced" by who? By the Python developers? By me? By random
library calls?

Does it mean the buffer and detach are not guaranteed to stay with the
API?


Marko

[toc] | [prev] | [next] | [standalone]


#72773

FromTerry Reedy <tjreedy@udel.edu>
Date2014-06-05 18:09 -0400
Message-ID<mailman.10775.1402006179.18130.python-list@python.org>
In reply to#72757
On 6/5/2014 4:21 PM, Marko Rauhamaa wrote:
> Terry Reedy <tjreedy@udel.edu>:
>
>> On 6/5/2014 5:53 AM, Marko Rauhamaa wrote:
>>> Chris Angelico <rosuav@gmail.com>:
>>>
>>>> If the standard streams are so crucial, why are their most obvious
>>>> interfaces insignificant to you?
>>>
>>> I want the standard streams to consume and produce bytes.
>>
>> Easy. Read the manual entry for stdxxx. "To write or read binary data
>> from/to the standard streams, use the underlying binary buffer object.
>> For example, to write bytes to stdout, use
>> sys.stdout.buffer.write(b'abc')"
>
> This note from the manual is a bit vague:
>
>     Note that the streams can be replaced with objects (like io.StringIO)
>     that do not support the buffer attribute or the detach() method
>
> "Can be replaced" by who? By the Python developers? By me? By random
> library calls?

Fair question. The Python developers will not fiddle with stdxxx for 3rd 
party code on 3rd party systems. We do sometimes *temporarily replace 
the streams with StringIO, either directly or via test.support when 
testing Python itself or stdlib modules. That is done in Lib/test, and 
except for testing StringIO, it is only done as a convenience, not a 
necessity.

To test a binary stream filter, you would have to do something else, 
like read from and write to actual files on disk. Otherwise, you seem 
unlikely to sabotage yourself, even accidentally.

Random non-stdlib library calls could sabotage you. However, in my 
opinion, an imported 3rd party module should never modify std streams, 
with one exception. The exception would be a module whose entire purpose 
was to put the streams in a known state, as documented, and only if 
intentionally asked to.

Having said that, bound methods created (first) should work regardless 
of any subsequent manipulation of sys. Here is an experiment, run from 
an Idle editor.

import sys
sysout = sys.stdout.write
sys.stdout = None
sysout('works anyway\n')
 >>>
works anyway

(Of course, subsequent attempts to continue interactively fail. But that 
is not your use case.)

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#72787

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-06-05 23:13 +0000
Message-ID<5390f997$0$29978$c3e8da3$5496439d@news.astraweb.com>
In reply to#72757
On Thu, 05 Jun 2014 23:21:35 +0300, Marko Rauhamaa wrote:

> Terry Reedy <tjreedy@udel.edu>:
> 
>> On 6/5/2014 5:53 AM, Marko Rauhamaa wrote:
>>> Chris Angelico <rosuav@gmail.com>:
>>>
>>>> If the standard streams are so crucial, why are their most obvious
>>>> interfaces insignificant to you?
>>>
>>> I want the standard streams to consume and produce bytes.
>>
>> Easy. Read the manual entry for stdxxx. "To write or read binary data
>> from/to the standard streams, use the underlying binary buffer object.
>> For example, to write bytes to stdout, use
>> sys.stdout.buffer.write(b'abc')"
> 
> This note from the manual is a bit vague:
> 
>    Note that the streams can be replaced with objects (like io.StringIO)
>    that do not support the buffer attribute or the detach() method
> 
> "Can be replaced" by who? By the Python developers? By me? By random
> library calls?

By you. sys.stdout and friends are writable. Any code you call may have 
replaced them with another file-like object, and you should honour that.

The API could have/should have been a little more friendly, but it's 
conceptually simple:

* Does sys.stdout have a buffer attribute? Then write raw bytes to
  the buffer.

* If not, then write raw bytes to sys.stdout.

* If either fails, then somebody has replaced stdout with something
  weird, and they deserve whatever horrible fate their damn fool
  move causes. It's not your responsibility to try to keep your
  application running under bizarre circumstances.



-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]


#72791

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-06-06 02:30 +0300
Message-ID<87ioof53z1.fsf@elektro.pacujo.net>
In reply to#72787
Steven D'Aprano <steve+comp.lang.python@pearwood.info>:

>> "Can be replaced" by who? By the Python developers? By me? By random
>> library calls?
>
> By you. sys.stdout and friends are writable. Any code you call may
> have replaced them with another file-like object, and you should
> honour that.

I can of course overwrite even sys and os and open and all. That hardly
merits mentioning in the API documentation.

What I'm afraid of is that the Python developers are reserving the right
to remove the buffer and detach attributes from the standard streams in
a future version. That would be terrible.

If it means some other module is allowed to commandeer the standard
streams, that would be bad as well.

Worst of all, I don't know why the caveat had to be there.

Or is it maybe because some python command line options could cause
buffer and detach not to be there? That would explain the caveat, but
still would be kinda sucky.


Marko

[toc] | [prev] | [next] | [standalone]


#72792

FromChris Angelico <rosuav@gmail.com>
Date2014-06-06 09:39 +1000
Message-ID<mailman.10787.1402011598.18130.python-list@python.org>
In reply to#72791
On Fri, Jun 6, 2014 at 9:30 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Steven D'Aprano <steve+comp.lang.python@pearwood.info>:
>
>>> "Can be replaced" by who? By the Python developers? By me? By random
>>> library calls?
>>
>> By you. sys.stdout and friends are writable. Any code you call may
>> have replaced them with another file-like object, and you should
>> honour that.
>
> I can of course overwrite even sys and os and open and all. That hardly
> merits mentioning in the API documentation.
>
> What I'm afraid of is that the Python developers are reserving the right
> to remove the buffer and detach attributes from the standard streams in
> a future version. That would be terrible.
>
> If it means some other module is allowed to commandeer the standard
> streams, that would be bad as well.
>
> Worst of all, I don't know why the caveat had to be there.
>
> Or is it maybe because some python command line options could cause
> buffer and detach not to be there? That would explain the caveat, but
> still would be kinda sucky.

It's more that replacng sys.std* is considered reasonably normal
(unlike, say, replacing sys.float_info, which would be a weird thing
to do); and you could replace them with something that doesn't have
those attributes. If you're running a top-level script and you never
import anything that changes the streams, you should be able to depend
on those always being there.

ChrisA

[toc] | [prev] | [next] | [standalone]


#72808

FromTerry Reedy <tjreedy@udel.edu>
Date2014-06-05 22:08 -0400
Message-ID<mailman.10800.1402020549.18130.python-list@python.org>
In reply to#72791
On 6/5/2014 7:30 PM, Marko Rauhamaa wrote:
> Steven D'Aprano <steve+comp.lang.python@pearwood.info>:
>
>>> "Can be replaced" by who? By the Python developers? By me? By random
>>> library calls?
>>
>> By you. sys.stdout and friends are writable. Any code you call may
>> have replaced them with another file-like object, and you should
>> honour that.
>
> I can of course overwrite even sys and os and open and all. That hardly
> merits mentioning in the API documentation.
>
> What I'm afraid of is that the Python developers are reserving the right
> to remove the buffer and detach attributes from the standard streams in
> a future version.

No, not at all.

> That would be terrible.

Agreed.

> If it means some other module is allowed to commandeer the standard
> streams, that would be bad as well.

I think that, for the most part, library modules should either open a 
file given a filename from outside or read from and write to open files 
handed to them from outside, but not hard-code the std streams. The 
module doc should say if the file (name or object) must be text or in 
particular binary.

The warning is also a hint as to how to solve a problem, such as testing 
a binary filter. Assume the module reads from and writes to .buffer and 
has a main function. One approach, untested:

import sys, io, unittest
from mod import main

class Binstd:
     def __init(self):
         self.buffer = io.BytesIO

sys.stdin = Binstd()
sys.stdout = Binstd()

sys.stdin.buffer.write('test data')
sys.stdin.buffer.seek(0)
main()
out = sys.stdout.buffer.getvalue()
# test that out is as expected for the input
# seek to 0 and truncate for more tests

> Worst of all, I don't know why the caveat had to be there.

Because the streams can be replaced for a variety of good reasons, as above.

> Or is it maybe because some python command line options could cause
> buffer and detach not to be there? That would explain the caveat, but
> still would be kinda sucky.

The doc set documents the Python command line options, as well any that
are CPython specific.  It is possible that some implementation could add
one to open stdxyz in binary mode. CPython does not really need that.


-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#72817

FromEthan Furman <ethan@stoneleaf.us>
Date2014-06-05 20:47 -0700
Message-ID<mailman.10805.1402029381.18130.python-list@python.org>
In reply to#72791
On 06/05/2014 04:30 PM, Marko Rauhamaa wrote:
>
> What I'm afraid of is that the Python developers are reserving the right
> to remove the buffer and detach attributes from the standard streams in
> a future version.

Being afraid is silly.  If you have a question, ask it.

--
~Ethan~

[toc] | [prev] | [next] | [standalone]


#72691

FromSteven D'Aprano <steve@pearwood.info>
Date2014-06-05 08:34 +0000
Message-ID<53902bb1$0$11109$c3e8da3@news.astraweb.com>
In reply to#72665
On Thu, 05 Jun 2014 14:01:50 +1200, Gregory Ewing wrote:

> Steven D'Aprano wrote:
>> The whole concept of stdin and stdout is based on the idea of having a
>> console to read from and write to.
> 
> Not really; stdin and stdout are frequently connected to files, or pipes
> to other processes. The console, if it exists, just happens to be a
> convenient default value for them. Even on a system without a console,
> they're still a useful abstraction.

If you had kept reading my post, including the bits you cut out *wink*, 
you'd see that I did raise that same point. Having stdin and stdout 
trivially generalises to the idea of replacing them with other files, or 
pipes. But the idea of having standard input and standard output in the 
first place comes about because they are useful for the console. I gave 
the example of Mac, which didn't have a command-line interface at all, 
hence no console, no stdin, no stdout.

If a system had no command line interface (hence no consoles), why would 
you bother with a *standard* input file and output file that are never 
used?


> But we were talking about encodings, and whether stdin and stdout should
> be text or binary by default. Well, one of the design principles behind
> unix is to make use of plain text wherever possible. 

What's plain text? *half a wink*

Its a serious question. Some people think that "good ol' plain text" is 
EBCDIC, like IBM intended. To them, the letter "A" is synonymous with the 
byte 0xC1, and there's no need for an encoding (or so they think) because 
"A" *is* 0xC1.

Of course, people on ASCII systems know better: who needs encodings when 
it is a universal fact that "A" *is* 0x41?

*wink*


> Not just for stuff
> meant to be seen on the screen, but for stuff kept in files as well.
> 
> As a result, most unix programs, most of the time, deal with text on
> stdin and stdout. So, it makes sense for them to be text by default. And
> wherever there's text, there needs to be an encoding. This is true
> whether a console is involved or not.


Agreed.




-- 
Steven

[toc] | [prev] | [next] | [standalone]


#72697

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-06-05 12:41 +0300
Message-ID<87wqcvu20h.fsf@elektro.pacujo.net>
In reply to#72691
Steven D'Aprano <steve@pearwood.info>:

> But the idea of having standard input and standard output in the first
> place comes about because they are useful for the console.

I doubt that. Classic programs take input and produce output. Standard
input and output are the default input and output. The textbook Pascal
programs started:

   program myprogram(input, output);

> If a system had no command line interface (hence no consoles), why
> would you bother with a *standard* input file and output file that are
> never used?

Because programs are supposed to do useful work. They consume input and
produce output. That concept is older than computers themselves and is
used to define things like computation, algorithm, halting etc.

> On Thu, 05 Jun 2014 14:01:50 +1200, Gregory Ewing wrote:
>> But we were talking about encodings, and whether stdin and stdout
>> should be text or binary by default. Well, one of the design
>> principles behind unix is to make use of plain text wherever
>> possible.

No, one of the design principles behind unix is that all data is bytes:
memory, files, devices, sockets, pathnames. Yes, the
ASCII-is-good-for-everybody assumption has been there since the
beginning, but Python will not be able to hide the fact that there is no
text data (in the Python sense). There are only bytes.

UTF-8 beautifully gives text a second-class citizenship in unix/linux.
It will never be granted first-class citizenship, though.

>> As a result, most unix programs, most of the time, deal with text on
>> stdin and stdout. So, it makes sense for them to be text by default.
>> And wherever there's text, there needs to be an encoding. This is
>> true whether a console is involved or not.
>
> Agreed.

Disagreed strongly.

   tcpdump -s 0 -w - >error.pcap
   tar zxf - <python.tar.gz
   sha1sum <smile.jpg
   base64 -d <a.dat >a.exe
   wget ftp://micorsops.com/something.avi -O - | mplayer -cache 8192 -

Unfortunately, the text/binary dichotomy breaks a beautiful principle in
Python as well. In numerous contexts, any file-like object will be
valid. Now there is no file-like object. Instead, you have
text-file-like objects and binary-file-like objects, which require
special attention since some operate on strings while others operate on
bytes.


Marko

[toc] | [prev] | [next] | [standalone]


#72704

FromRustom Mody <rustompmody@gmail.com>
Date2014-06-05 06:37 -0700
Message-ID<7b3543f6-6f62-49c5-abdc-e2783fd6d629@googlegroups.com>
In reply to#72697
On Thursday, June 5, 2014 3:11:34 PM UTC+5:30, Marko Rauhamaa wrote:
> Steven D'Aprano wrote:

> > But the idea of having standard input and standard output in the first
> > place comes about because they are useful for the console.

> I doubt that. Classic programs take input and produce output. Standard
> input and output are the default input and output. The textbook Pascal
> programs started:

>    program myprogram(input, output);

> > If a system had no command line interface (hence no consoles), why
> > would you bother with a *standard* input file and output file that are
> > never used?

> Because programs are supposed to do useful work. They consume input and
> produce output. That concept is older than computers themselves and is
> used to define things like computation, algorithm, halting etc.

> > On Thu, 05 Jun 2014 14:01:50 +1200, Gregory Ewing wrote:
> >> But we were talking about encodings, and whether stdin and stdout
> >> should be text or binary by default. Well, one of the design
> >> principles behind unix is to make use of plain text wherever
> >> possible.

> No, one of the design principles behind unix is that all data is bytes:
> memory, files, devices, sockets, pathnames. Yes, the
> ASCII-is-good-for-everybody assumption has been there since the
> beginning, but Python will not be able to hide the fact that there is no
> text data (in the Python sense). There are only bytes.

> UTF-8 beautifully gives text a second-class citizenship in unix/linux.
> It will never be granted first-class citizenship, though.

> >> As a result, most unix programs, most of the time, deal with text on
> >> stdin and stdout. So, it makes sense for them to be text by default.
> >> And wherever there's text, there needs to be an encoding. This is
> >> true whether a console is involved or not.
> > Agreed.

> Disagreed strongly.

>    tcpdump -s 0 -w - >error.pcap
>    tar zxf - <python.tar.gz
>    sha1sum <smile.jpg
>    base64 -d <a.dat >a.exe
>    wget ftp://micorsops.com/something.avi -O - | mplayer -cache 8192 -

> Unfortunately, the text/binary dichotomy breaks a beautiful principle in
> Python as well. In numerous contexts, any file-like object will be
> valid. Now there is no file-like object. Instead, you have
> text-file-like objects and binary-file-like objects, which require
> special attention since some operate on strings while others operate on
> bytes.


Pascal is for building pyramids — imposing, breathtaking, static
structures built by armies pushing heavy blocks into place. — Alan Perlis

Lisp is like a ball of mud. Add more and it's still a ball of mud
— it still looks like Lisp. — Guy Steele

There are two fundamental outlooks in computer science —
structuring and universality. And they pull in opposite
directions.

Universality happens when a data-structure can hold everything —
a universal data structure.

Some of the most significant advances in CS come from a universalist vision:

- von Neumann machine storing data+code in memory
- Turing-tape able to store arbitrary turing machines (∴ universal TM)
- Lisp program ≡ Lisp data
- Stream of byte can handle/represent everything in Unix — memory, files,
  devices, sockets, pathnames.

However after the allurement of universality is over, the
realization dawns that we have a mess — Lisp is a 'mud-ball'. At
which point people start needing to make distinctions — code and
data, different data-structures, type-systems etc. IOW imposing
structure on the mud-ball.

Taking a broad view, while structuring trades the power for
order, it is universality that adds significant power.

Python is not as universal as Lisp — it has no homoiconicity.
But it is close enough in that any variable/data-structure can
contain any value.

What Marko  is saying is that by imposing the structuring of
unicode on the outside (Unix) world of text=byte, significant power is lost.

This is also Armin's crib.

How significant that loss is, is yet to be seen…

[toc] | [prev] | [next] | [standalone]


#72708

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-06-05 17:45 +0300
Message-ID<87oay7tnxt.fsf@elektro.pacujo.net>
In reply to#72704
Rustom Mody <rustompmody@gmail.com>:

> What Marko is saying is that by imposing the structuring of unicode on
> the outside (Unix) world of text=byte, significant power is lost.

Mostly I'm saying Python3 will not be able to hide the fact that linux
data consists of bytes. It shouldn't even try. The linux OS outside the
Python process talks bytes, not strings.

A different OS might have different assumptions.


Marko

[toc] | [prev] | [next] | [standalone]


#72710

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-06-05 15:33 +0000
Message-ID<53908dd0$0$29978$c3e8da3$5496439d@news.astraweb.com>
In reply to#72708
On Thu, 05 Jun 2014 17:45:34 +0300, Marko Rauhamaa wrote:

> Rustom Mody <rustompmody@gmail.com>:
> 
>> What Marko is saying is that by imposing the structuring of unicode on
>> the outside (Unix) world of text=byte, significant power is lost.
> 
> Mostly I'm saying Python3 will not be able to hide the fact that linux
> data consists of bytes. It shouldn't even try. The linux OS outside the
> Python process talks bytes, not strings.

Data on pretty much *all* computers consists of bytes, regardless of the 
language or operating system. There may be a few esoteric or ancient 
machines from the Dark Ages that aren't based on bytes, and even fewer 
that aren't based on bits (ancient Soviet era mainframes, if any of them 
still survive), but they aren't important. Someday esoteric non-byte 
machines, perhaps quantum computers, or machines based on DNA, or nano-
sized analog computers made of carbon atoms, say, will be important, but 
this is not that day. For now, bytes rule *everywhere*.

Nevertheless, there are important abstractions that are written on top of 
the bytes layer, and in the Unix and Linux world, the most important 
abstraction is *text*. In the Unix world, text formats and text 
processing is much more common in user-space apps than binary processing. 
Perhaps the definitive explanation and celebration of the Unix way is 
Eric Raymond's "The Art Of Unix Programming":

http://www.catb.org/esr/writings/taoup/html/ch05s01.html




-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]


#72723

FromChris Angelico <rosuav@gmail.com>
Date2014-06-06 02:12 +1000
Message-ID<mailman.10743.1401984750.18130.python-list@python.org>
In reply to#72710
On Fri, Jun 6, 2014 at 1:33 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> In the Unix world, text formats and text
> processing is much more common in user-space apps than binary processing.
> Perhaps the definitive explanation and celebration of the Unix way is
> Eric Raymond's "The Art Of Unix Programming":
>
> http://www.catb.org/esr/writings/taoup/html/ch05s01.html

Specifically, this from the opening paragraph:
"""
Text streams are a valuable universal format because they're easy for
human beings to read, write, and edit without specialized tools. These
formats are (or can be designed to be) transparent.
"""

He goes on to talk about network protocols, one of the best examples
of this. I've idly speculated at times about the possibility of
rewriting the Magic: The Gathering Online back-end with a view to
making it easier to work with. Among other changes, I'd be wanting to
make the client-server communication be plain text (an SMTP-style of
protocol), with an external layer of encryption (TLS). This would mean
that:

1) Internal testing can be done without TLS, making the communication
absolutely transparent, easy to debug, easy to watch, everything.
Adding TLS later would have zero impact on the critical code
internally - it's just a layer around the outside.
2) Upgrades to crypto can simply follow industry best-practice.
(Reminder, to anyone who might have been mad enough to consider this:
DO NOT roll your own crypto! Ever! Even if you use a good library for
the heavy lifting!)
3) A debug log of what the client has sent and received could be
included, even in production, at very low cost. You don't need to
decode packets and pretty-print them - you just take the lines of
text, maybe adorn or color them according to which were sent/received,
and dump them into a display box or log file somewhere.
4) The server is forced to acknowledge that the client might not be
the one it expected. Not only do you get better security that way, but
you could also call this a feature.
5) Therefore, you can debug the system with a simple TELNET or MUD
client (okay, most MUD clients don't do SSL, but you can use "openssl
s_client"). As someone who's debugged myriad issues using his trusty
MUD client, I consider this to be a *huge* advantage.

All it takes is a few simple rules, like: All communication is text,
encoded down the wire as UTF-8, and consists of lines (terminated by
U+000A) which consist of a word, a U+0020 space, and then parameters
to the command. There, that's a rigorous definition that covers
everything you'll need of it; compare with what Flash uses, by
default:

https://en.wikipedia.org/wiki/Action_Message_Format

Sure, it might be slightly more compact going down the wire; but what
do you really gain?

Text wins.

ChrisA

[toc] | [prev] | [next] | [standalone]


Page 2 of 5 — ← Prev page 1 [2] 3 4 5  Next page →

Back to top | Article view | comp.lang.python


csiph-web