Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #72340 > unrolled thread

Python 3.2 has some deadly infection

Started byMark Lawrence <breamoreboy@yahoo.co.uk>
First post2014-05-31 17:10 +0100
Last post2014-06-03 14:22 -0400
Articles 20 on this page of 92 — 19 participants

Back to article view | Back to comp.lang.python


Contents

  Python 3.2 has some deadly infection Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-31 17:10 +0100
    Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-05-31 22:55 +0300
    Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-01 02:26 +0000
      Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-01 12:43 +1000
      Re: Python 3.2 has some deadly infection Tim Delaney <timothy.c.delaney@gmail.com> - 2014-06-02 08:54 +1000
        Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-02 01:14 +0000
          Re: Python 3.2 has some deadly infection Tim Delaney <timothy.c.delaney@gmail.com> - 2014-06-02 12:23 +1000
            Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-01 19:46 -0700
          Re: Python 3.2 has some deadly infection Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2014-06-02 07:45 +0000
          Re: Python 3.2 has some deadly infection Tim Delaney <timothy.c.delaney@gmail.com> - 2014-06-02 19:02 +1000
          Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-02 19:14 +1000
          Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-02 12:10 +0100
            Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-03 16:34 +0000
              Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-04 02:43 +1000
          Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-02 17:34 -0400
            Re: Python 3.2 has some deadly infection Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-06-03 17:16 +1200
              Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-03 02:21 -0400
              Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-03 15:18 +0100
                Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-04 13:08 +0000
                  Re: Python 3.2 has some deadly infection Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-06-05 14:01 +1200
                    Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 10:16 +0300
                      Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-05 17:30 +1000
                        Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 11:05 +0300
                          Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-05 18:36 +1000
                            Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 12:53 +0300
                              Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-05 05:43 -0700
                              Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 14:50 -0400
                                Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 23:21 +0300
                                  Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 18:09 -0400
                                  Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 23:13 +0000
                                    Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 02:30 +0300
                                      Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 09:39 +1000
                                      Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 22:08 -0400
                                      Re: Python 3.2 has some deadly infection Ethan Furman <ethan@stoneleaf.us> - 2014-06-05 20:47 -0700
                    Re: Python 3.2 has some deadly infection Steven D'Aprano <steve@pearwood.info> - 2014-06-05 08:34 +0000
                      Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 12:41 +0300
                        Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 06:37 -0700
                          Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 17:45 +0300
                            Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 15:33 +0000
                              Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 02:12 +1000
                                Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 09:54 -0700
                                  Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 03:36 +1000
                              Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 19:52 +0300
                                Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 03:28 +1000
                                  Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 15:35 -0700
                                    Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 08:52 +1000
                                      Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 20:11 -0700
                                        Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 13:20 +1000
                                          Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 20:32 -0700
                                Re: Python 3.2 has some deadly infection Akira Li <4kir4.1i@gmail.com> - 2014-06-06 12:03 +0400
                            Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-05 16:37 +0100
                              Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 16:16 +0000
                            Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 01:50 +1000
                            Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-05 17:17 +0100
                              Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 16:32 +0000
                                Re: Python 3.2 has some deadly infection Ethan Furman <ethan@stoneleaf.us> - 2014-06-06 07:40 -0700
                            Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 03:14 +1000
                            Re: Python 3.2 has some deadly infection Ian Kelly <ian.g.kelly@gmail.com> - 2014-06-05 11:16 -0600
                            Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 14:11 -0400
                              Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 21:30 +0300
                                Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 23:02 +0000
                                  Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 02:21 +0300
                                    Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-06 12:15 +0000
                                      Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 16:00 +0300
                                  Re: Python 3.2 has some deadly infection rurpy@yahoo.com - 2014-06-07 21:34 -0700
                                Re: Python 3.2 has some deadly infection Ethan Furman <ethan@stoneleaf.us> - 2014-06-06 06:24 -0700
                                  Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 17:10 +0300
                                    Re: Python 3.2 has some deadly infection Michael Torrie <torriem@gmail.com> - 2014-06-06 09:02 -0600
                                      Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 18:32 +0300
                                        Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 01:50 +1000
                                          Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 20:02 +0300
                                            Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-06 10:13 -0700
                                              Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 03:26 +1000
                                          Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-06 11:03 -0700
                                          Re: Python 3.2 has some deadly infection Denis McMahon <denismfmcmahon@gmail.com> - 2014-06-06 21:18 +0000
                                            Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 08:18 +1000
                                        Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-06 15:57 +0000
                                          Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-06 09:21 -0700
                                            Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 02:48 +1000
                                              Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-06 10:04 -0700
                                                Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 03:12 +1000
                                          Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 20:11 +0300
                                            Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 03:16 +1000
                                            Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 20:18 +0300
                                            Re: Python 3.2 has some deadly infection Ned Batchelder <ned@nedbatchelder.com> - 2014-06-06 13:33 -0400
                                Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 01:25 +1000
                                  Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-06 08:44 -0700
                                    Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-06 08:48 -0700
                            Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-06 12:56 +0100
                  Re: Python 3.2 has some deadly infection Akira Li <4kir4.1i@gmail.com> - 2014-06-05 06:49 +0400
              Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-04 00:25 +1000
              Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-03 14:22 -0400

Page 1 of 5  [1] 2 3 4 5  Next page →


#72340 — Python 3.2 has some deadly infection

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2014-05-31 17:10 +0100
SubjectPython 3.2 has some deadly infection
Message-ID<mailman.10509.1401552642.18130.python-list@python.org>
Some interesting comments here 
http://techtonik.rainforce.org/2014/05/python-32-has-some-deadly-infection.html 
so I'm simply asking for other opinions.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

[toc] | [next] | [standalone]


#72348

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-05-31 22:55 +0300
Message-ID<87wqd1afl0.fsf@elektro.pacujo.net>
In reply to#72340
Mark Lawrence <breamoreboy@yahoo.co.uk>:

> Some interesting comments here
> http://techtonik.rainforce.org/2014/05/python-32-has-some-deadly-infection.html
> so I'm simply asking for other opinions.

I read the article, but unfortunately I failed to see interesting
comments or opinions. There was some graphic, but it didn't say anything
to me, and the article didn't really seem to be making any argument
apart from the disappointed tone.


Marko

[toc] | [prev] | [next] | [standalone]


#72354

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-06-01 02:26 +0000
Message-ID<538a8f48$0$29978$c3e8da3$5496439d@news.astraweb.com>
In reply to#72340
On Sat, 31 May 2014 17:10:20 +0100, Mark Lawrence wrote:

> Some interesting comments here
> http://techtonik.rainforce.org/2014/05/python-32-has-some-deadly-
infection.html
> so I'm simply asking for other opinions.

Oh, Anatoly Techtonik. He's quite notorious on python-dev for wanting to 
impose his wild and sometimes wacky processes on the entire community. 
Specific examples aren't coming to mind, and I'm too lazy to search the 
archives, so I'll just make one up to give you an idea of the flavour of 
his requests:

    "Twitter is the only way that developers can effectively
    communicate. We must shut down all the mailing lists and the
    bug tracker and move all communication immediately to 
    Twitter. And by we I mean you."
    [Not an actual quote.]

I've come to the conclusion that he occasionally has a point to his 
posts, but only at random by virtue of the scatter-gun technique. He's 
obviously widely read, but not deeply, and so he fires off a lot of ill-
thought out but superficially attractive proposals. Just by chance a few 
of them end up being interesting, not *interesting enough* for somebody 
else to do the work. At this point the ideas languish, because he refuses 
to sign a contributor agreement so the Python core developers cannot 
accept anything from him.

This blog post is a strong opinion about Python, but it isn't clear what 
that opinion *actually is*. His post is rambling and unfocused and 
incoherent ("art is the future"). He rails against having to write PEPs, 
and decries the lack of stats, summaries, analysis and comparison, 
utterly missing the point that the purpose of the PEP process is to 
provide those stats, summaries, analysis and comparison. Reading between 
the lines, I think what he means, deep down, is that *somebody else* 
ought to gather those stats and do the analysis to support his ideas, and 
not expect him to write the PEP.

He makes at least one factually wrong claim:

    "I thought that C/C++ must die, because really all major 
    security problems are because of it."
    [actual quote]

He's talking about buffer overflows. Buffer overflows have never been 
responsible for "all" major security problems. Even allowing for a little 
hyperbole, buffer overflows have not been responsible for the majority of 
major security problems for a very long time. It is not 1992 any more, 
and today the single largest source of security bugs are code injection 
attacks. In Python terms that mostly means failure to sanitize SQL 
queries and the use of eval() and exec() on untrusted data.

http://cwe.mitre.org/top25/

Three of the top four software errors are forms of code injection: SQL 
injection, OS command injection, cross-site scripting. The classic C 
buffer overflow comes in at number 3, so it's not an inconsiderable cause 
of security vulnerabilities even today, but it is not even close to the 
only such cause.

See also http://www.sans.org/top25-software-errors/ 

Back to the blog post... it's 2014, Python 3.3 and 3.4 have come out, why 
is he talking about 3.2?

It's interesting that he starts off by stating his graph is meaningless:

    "They don't measure anything - just show some lines that 
    correlate to each other."

then immediately tries to draw a conclusion from those lines:

    "It looks like the peak of Python was on February 2011, 
    and since then there was a significant drop."

I've written about the difficulty of measuring language popularity in any 
meaningful way:

http://import-that.dreamwidth.org/1388.html
http://import-that.dreamwidth.org/2873.html

Anatoly has picked the TIOBE Index, but I don't know that this is the 
best measure of language popularity. According to it, Python is more 
popular than Javascript. I love Python, but really, more popular than 
Javascript? That feels wrong to me.

In any case, I think that a better explanation for the observed dip in 
Feb 2011 is not that Python 3.2 is infected (infected by what?) but 
*regression to the mean*. Regression to the mean is a statistical 
phenomenon which basically says that all else being equal, an extreme 
value is likely to be followed by a less extreme (closer to the average) 
value.

Language popularity, as measured by TIOBE, is at least in part random. 
(Look at how wiggly the lines are. The wiggles represent random 
variation.) If by chance a language gets a spike in interest one month, 
it is less likely to 

Because TIOBE's results contain so much random noise, they really ought 
to smooth them out by averaging the scores over a three month window, and 
show trend lines. They don't, I believe, because random hiccoughs in the 
data provide interest: "Last month, Java was overthrown from it's #1 
ranking by C. This month it has fought its way back to #1 again! Tune in 
next month to see if C can repeat it's stunning victory!!!"

I think that long term trend lines would be much less exciting but much 
more informative. Eyeballing the graph, it seems to me that Java and C++ 
are trending down, C is probably steady, and Objective C and Python 
trending up. If by chance there was a flurry of interest in Python for a 
month or two, and then things fell back to normal (regression to the 
mean), that might look like a slump.

But I digress... back to Anatoly's post. I think he reveals more about 
himself than Python:

    "When these little things sum up, you realize that you're 
    just wasting time trying to improve things that people 
    don't want to improve. They don't want to improve the process.
    They don't realize that the problem is not in the language, 
    but in the way they don't want to hear each other. Technology
    showed that people want to be heard, that they opinion should
    be  accounted  , not closed as  won't fix  , or  works for 
    me  . It is not a community process, when you rely on abilities
    of certain individuals to monitor and respond to all traffic
    and wishes, especially when they fail to do so."

On Python-Dev, this is Anatoly's repeated claim: the process is broken, 
because well it just is okay. In my opinion, "the process is broken" is 
Anatoly's shorthand for "I want to do things THIS way, and you won't let 
me. My way is SO OBVIOUSLY BRILLIANT that everybody, no matter their 
circumstances, will be immeasurably better off by switching to my process 
instead of the old way of doing things. Anyone who thinks differently is 
simply not paying attention. Didn't you hear how brilliant my process is?"

Anatoly does make a few concrete complaints about Python 3, or at least 
as concrete as he gets in this post:

"I expected Python 3 to be ready for the internet age" -- What does that 
mean? What makes him think it isn't?

"with cross-platform behavior preferred over system-dependent one" -- 
It's not clear how cross-platform behaviour has anything to do with the 
Internet age. Python has preferred cross-platform behaviour forever, 
except for those features and modules which are explicitly intended to be 
interfaces to system-dependent features. (E.g. a lot of functions in the 
os module are thin wrappers around OS features. Hence the name of the 
module.)

"with clear semantics to work with binary data" -- There are clear 
semantics to work with binary data: use bytes, and the struct module. 
Those features can be improved, and indeed Python 3.4 has improved them, 
and 3.5 is in the process of improving them further. But to suggest that 
Python doesn't have those clear semantics is simply false.

"with non-confusing error handling" -- How is Python 3's error handling 
confusing? It's the same error handling as Python 2. Where is the 
confusion?


TL;DR: Anatoly's blog post is long on disappointment and short on actual 
content. It feels to me that we could summarise his post as:

    I don't know what I want, I won't recognise it even if I saw
    it, but Python 3 isn't it. I blame others for not living up
    to my expectations for features I cannot describe and were 
    never promised.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#72356

FromChris Angelico <rosuav@gmail.com>
Date2014-06-01 12:43 +1000
Message-ID<mailman.10516.1401590627.18130.python-list@python.org>
In reply to#72354
On Sun, Jun 1, 2014 at 12:26 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> TL;DR: Anatoly's blog post is long on disappointment and short on actual
> content. It feels to me that we could summarise his post as:
>
>     I don't know what I want, I won't recognise it even if I saw
>     it, but Python 3 isn't it. I blame others for not living up
>     to my expectations for features I cannot describe and were
>     never promised.

I think that summary is accurate. When Mark posted this last night
(okay, it was last night for me, probably not for most of you), I
tried to read the post and figure out what he was actually saying...
and failed. Gave up on it and moved on. Got better things to do with
my life... like, I dunno, actually writing code, which seems to be
something that people who whine in blog posts don't do.

ChrisA

[toc] | [prev] | [next] | [standalone]


#72386

FromTim Delaney <timothy.c.delaney@gmail.com>
Date2014-06-02 08:54 +1000
Message-ID<mailman.10531.1401663275.18130.python-list@python.org>
In reply to#72354

[Multipart message — attachments visible in raw view] — view raw

On 1 June 2014 12:26, Steven D'Aprano <steve+comp.lang.python@pearwood.info>
wrote:

>
> "with cross-platform behavior preferred over system-dependent one" --
> It's not clear how cross-platform behaviour has anything to do with the
> Internet age. Python has preferred cross-platform behaviour forever,
> except for those features and modules which are explicitly intended to be
> interfaces to system-dependent features. (E.g. a lot of functions in the
> os module are thin wrappers around OS features. Hence the name of the
> module.)
>

There is the behaviour of defaulting input and output to the system
encoding. I personally think we would all be better off if Python (and
Java, and many other languages) defaulted to UTF-8. This hopefully would
eventually have the effect of producers changing to output UTF-8 by
default, and consumers learning to manually specify an encoding when it's
not UTF-8 (due to invalid codepoints).

I'm currently working on a product that interacts with lots of other
products. These other products can be using any encoding - but most of the
functions that interact with I/O assume the system default encoding of the
machine that is collecting the data. The product has been in production for
nearly a decade, so there's a lot of pushback against changes deep in the
code for fear that it will break working systems. The fact that they are
working largely by accident appears to escape them ...

FWIW, changing to use iso-latin-1 by default would be the most sensible
option (effectively treating everything as bytes), with the option for
another encoding if/when more information is known (e.g. there's often a
call to return the encoding, and the output of that call is guaranteed to
be ASCII).

Tim Delaney

[toc] | [prev] | [next] | [standalone]


#72389

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-06-02 01:14 +0000
Message-ID<538bcfff$0$29978$c3e8da3$5496439d@news.astraweb.com>
In reply to#72386
On Mon, 02 Jun 2014 08:54:33 +1000, Tim Delaney wrote:

> On 1 June 2014 12:26, Steven D'Aprano
> <steve+comp.lang.python@pearwood.info> wrote:
> 
> 
>> "with cross-platform behavior preferred over system-dependent one" --
>> It's not clear how cross-platform behaviour has anything to do with the
>> Internet age. Python has preferred cross-platform behaviour forever,
>> except for those features and modules which are explicitly intended to
>> be interfaces to system-dependent features. (E.g. a lot of functions in
>> the os module are thin wrappers around OS features. Hence the name of
>> the module.)
>>
>>
> There is the behaviour of defaulting input and output to the system
> encoding. 

That's a tricky one, but I think on balance that is a case where 
defaulting to the system encoding is the right thing to do. Input and out 
occurs on the local system you are running on, which by definition isn't 
cross-platform. (Non-local I/O is possible, but requires work -- it 
doesn't just happen.)


> I personally think we would all be better off if Python (and
> Java, and many other languages) defaulted to UTF-8. This hopefully would
> eventually have the effect of producers changing to output UTF-8 by
> default, and consumers learning to manually specify an encoding when
> it's not UTF-8 (due to invalid codepoints).

UTF-8 everywhere should be our ultimate aim. Then we can forget about 
legacy encodings except when digging out ancient documents from archived 
floppy disks :-)


> I'm currently working on a product that interacts with lots of other
> products. These other products can be using any encoding - but most of
> the functions that interact with I/O assume the system default encoding
> of the machine that is collecting the data. The product has been in
> production for nearly a decade, so there's a lot of pushback against
> changes deep in the code for fear that it will break working systems.
> The fact that they are working largely by accident appears to escape
> them ...
> 
> FWIW, changing to use iso-latin-1 by default would be the most sensible
> option (effectively treating everything as bytes), with the option for
> another encoding if/when more information is known (e.g. there's often a
> call to return the encoding, and the output of that call is guaranteed
> to be ASCII).

Python 2 does what you suggest, and it is *broken*. Python 2.7 creates 
moji-bake, while Python 3 gets it right:


[steve@ando ~]$ python2.7 -c "print u'δжç'"
δжç
[steve@ando ~]$ python3.3 -c "print(u'δжç')"
δжç


Latin-1 is one of those legacy encodings which needs to die, not to be 
entrenched as the default. My terminal uses UTF-8 by default (as it 
should), and if I use the terminal to input "δжç", Python ought to see 
what I input, not Latin-1 moji-bake.

If I were to use Windows with a legacy code page, then I couldn't even 
enter "δжç" on the command line since none of the legacy encodings 
support that set of characters at the same time. I don't know exactly 
what I would get if I tried (say, by copying and pasting text from a 
Unicode-aware application), but I'd see that it was weird *in the shell* 
before it even reaches Python.

On the other hand, if I were to input something supported by the legacy 
encoding, let's say I entered "αβγ" while using ISO-8859-7 (Greek), then 
Python ought to see "αβγ" and not moji-bake:

py> b = "αβγ".encode('iso-8859-7')  # what the shell generates
py> b.decode('latin-1')  # what Python interprets those bytes as
'áâã'


Defaulting to the system encoding means that Python input and output just 
works, to the degree that input and output on your system just works. If 
your system is crippled by the use of a legacy encoding, then Python will 
at least be *no worse* than your system.



-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]


#72390

FromTim Delaney <timothy.c.delaney@gmail.com>
Date2014-06-02 12:23 +1000
Message-ID<mailman.10534.1401676125.18130.python-list@python.org>
In reply to#72389

[Multipart message — attachments visible in raw view] — view raw

On 2 June 2014 11:14, Steven D'Aprano <steve+comp.lang.python@pearwood.info>
wrote:

> On Mon, 02 Jun 2014 08:54:33 +1000, Tim Delaney wrote:
> > I'm currently working on a product that interacts with lots of other
> > products. These other products can be using any encoding - but most of
> > the functions that interact with I/O assume the system default encoding
> > of the machine that is collecting the data. The product has been in
> > production for nearly a decade, so there's a lot of pushback against
> > changes deep in the code for fear that it will break working systems.
> > The fact that they are working largely by accident appears to escape
> > them ...
> >
> > FWIW, changing to use iso-latin-1 by default would be the most sensible
> > option (effectively treating everything as bytes), with the option for
> > another encoding if/when more information is known (e.g. there's often a
> > call to return the encoding, and the output of that call is guaranteed
> > to be ASCII).
>
> Python 2 does what you suggest, and it is *broken*. Python 2.7 creates
> moji-bake, while Python 3 gets it right:
>

The purpose of my example was to show a case where no thought was put into
encodings - the assumption was that the system encoding and the remote
system encoding would be the same. This is most definitely not the case a
lot of the time.

I also should have been more clear that *in the particular situation I was
talking about* iso-latin-1 as default would be the right thing to do, not
in the general case. Quite often we won't know the correct encoding until
we've executed a command via ssh - iso-latin-1 will allow us to extract the
info we need (which will generally be 7-bit ASCII) without the possibility
of an invalid encoding. Sure we may get mojibake, but that's better than
the alternative when we don't yet know the correct encoding.


> Latin-1 is one of those legacy encodings which needs to die, not to be
> entrenched as the default. My terminal uses UTF-8 by default (as it
> should), and if I use the terminal to input "δжç", Python ought to see
> what I input, not Latin-1 moji-bake.
>

For some purposes, there needs to be a way to treat an arbitrary stream of
bytes as an arbitrary stream of 8-bit characters. iso-latin-1 is a
convenient way to do that. It's not the only way, but settling on it and
being consistent is better than not having a way.

Tim Delaney

[toc] | [prev] | [next] | [standalone]


#72391

FromRustom Mody <rustompmody@gmail.com>
Date2014-06-01 19:46 -0700
Message-ID<7bdd7967-deed-4bbc-b177-0a0202b2ab7f@googlegroups.com>
In reply to#72390
On Monday, June 2, 2014 7:53:05 AM UTC+5:30, Tim Delaney wrote:
> On 2 June 2014 11:14, Steven D'Aprano <steve+comp....@pearwood.info> wrote:
>>  Latin-1 is one of those legacy encodings which needs to die, not to be
>> entrenched as the default. My terminal uses UTF-8 by default (as it
>> should), and if I use the terminal to input "δжç", Python ought to see
>> what I input, not Latin-1 moji-bake.

> For some purposes, there needs to be a way to treat an arbitrary
> stream of bytes as an arbitrary stream of 8-bit
> characters. iso-latin-1 is a convenient way to do that. It's not the
> only way, but settling on it and being consistent is better than not
> having a way.

Here is a quote from the oracle docs:

http://docs.oracle.com/cd/E23824_01/html/E26033/glmbx.html#glmar

| The C locale, also known as the POSIX locale, is the POSIX system
| default locale for all POSIX-compliant systems.

In more layman language

| ASCII also known as the 'Unix locale' is the default for all *nix
| compliant systems

which is a key aspect of what Ive called 'The UNIX Assumption' :
http://blog.languager.org/2014/04/unicode-and-unix-assumption.html

[toc] | [prev] | [next] | [standalone]


#72397

FromWolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de>
Date2014-06-02 07:45 +0000
Message-ID<mailman.10546.1401695176.18130.python-list@python.org>
In reply to#72389
Tim Delaney <timothy.c.delaney <at> gmail.com> writes:

> 
> I also should have been more clear that *in the particular situation I was
talking about* iso-latin-1 as default would be the right thing to do, not in
the general case. Quite often we won't know the correct encoding until we've
executed a command via ssh - iso-latin-1 will allow us to extract the info
we need (which will generally be 7-bit ASCII) without the possibility of an
invalid encoding. Sure we may get mojibake, but that's better than the
alternative when we don't yet know the correct encoding.
>  
> Latin-1 is one of those legacy encodings which needs to die, not to be
> entrenched as the default. My terminal uses UTF-8 by default (as
itshould), and if I use the terminal to input "δжç", Python ought to seewhat
I input, not Latin-1 moji-bake.
> 
> 
> For some purposes, there needs to be a way to treat an arbitrary stream of
bytes as an arbitrary stream of 8-bit characters. iso-latin-1 is a
convenient way to do that.
> 

For that purpose, Python3 has the bytes() type. Read the data as is, then
decode it to a string once you figured out its encoding.

Wolfgang


[toc] | [prev] | [next] | [standalone]


#72400

FromTim Delaney <timothy.c.delaney@gmail.com>
Date2014-06-02 19:02 +1000
Message-ID<mailman.10548.1401699744.18130.python-list@python.org>
In reply to#72389

[Multipart message — attachments visible in raw view] — view raw

On 2 June 2014 17:45, Wolfgang Maier <
wolfgang.maier@biologie.uni-freiburg.de> wrote:

> Tim Delaney <timothy.c.delaney <at> gmail.com> writes:
>
> > For some purposes, there needs to be a way to treat an arbitrary stream
> of
> bytes as an arbitrary stream of 8-bit characters. iso-latin-1 is a
> convenient way to do that.
> >
>
> For that purpose, Python3 has the bytes() type. Read the data as is, then
> decode it to a string once you figured out its encoding.
>

I know that, you know that. Convincing other people of that is the
difficulty.

I probably should have mentioned it, but in my case it's not even Python
(Java). It's exactly the same principal - an assumption was made that has
become entrenched due to the fear of breakage. If they'd been forced to
think about encodings up-front, it shouldn't have been an issue, which was
the point I was trying to make.

In Java, it's much worse. At least with Python you can perform string-like
operations on bytes. In Java you have to convert it to characters before
you can really do anything with it, so people just use the default encoding
all the time - especially if they want the convenience of line-by-line
reading using BufferedReader ...

Tim Delaney

[toc] | [prev] | [next] | [standalone]


#72402

FromChris Angelico <rosuav@gmail.com>
Date2014-06-02 19:14 +1000
Message-ID<mailman.10551.1401700886.18130.python-list@python.org>
In reply to#72389
On Mon, Jun 2, 2014 at 7:02 PM, Tim Delaney <timothy.c.delaney@gmail.com> wrote:
> In Java, it's much worse. At least with Python you can perform string-like
> operations on bytes. In Java you have to convert it to characters before you
> can really do anything with it, so people just use the default encoding all
> the time - especially if they want the convenience of line-by-line reading
> using BufferedReader ...

What exactly is "line-by-line reading" with bytes? As I understand it,
lines are defined by characters. If you mean "reading a stream of
bytes and dividing it on 0x0A", then surely you can do that, but that
assumes an ASCII-compatible encoding.

ChrisA

[toc] | [prev] | [next] | [standalone]


#72405

FromRobin Becker <robin@reportlab.com>
Date2014-06-02 12:10 +0100
Message-ID<mailman.10554.1401707464.18130.python-list@python.org>
In reply to#72389
............
>
> I probably should have mentioned it, but in my case it's not even Python
> (Java). It's exactly the same principal - an assumption was made that has
> become entrenched due to the fear of breakage. If they'd been forced to
> think about encodings up-front, it shouldn't have been an issue, which was
> the point I was trying to make.
>
there seems to be an implicit assumption in python land that encoded strings are 
the norm. On virtually every computer I encounter that assumption is wrong. The 
vast majority of bytes in most computers is not something that can be easily 
printed out for humans to read. I suppose some clever pythonista can figure out 
an encoding to read my .o / .so etc  files, but they are practically meaningless 
to a unicode program today. Same goes for most image formats and media files. 
Browsers routinely encounter mis/un-encoded pages.

> In Java, it's much worse. At least with Python you can perform string-like
> operations on bytes. In Java you have to convert it to characters before
> you can really do anything with it, so people just use the default encoding
> all the time - especially if they want the convenience of line-by-line
> reading using BufferedReader ...
..


In python I would have preferred for bytes to remain the default io mechanism, 
at least that would allow me to decide if I need any decoding.

As the cat example

http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/

showed these extra assumptions are sometimes really in the way.
-- 
Robin Becker

[toc] | [prev] | [next] | [standalone]


#72533

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-06-03 16:34 +0000
Message-ID<538df925$0$29978$c3e8da3$5496439d@news.astraweb.com>
In reply to#72405
On Mon, 02 Jun 2014 12:10:48 +0100, Robin Becker wrote:

> there seems to be an implicit assumption in python land that encoded
> strings are the norm. On virtually every computer I encounter that
> assumption is wrong. The vast majority of bytes in most computers is not
> something that can be easily printed out for humans to read. I suppose
> some clever pythonista can figure out an encoding to read my .o / .so
> etc  files, but they are practically meaningless to a unicode program
> today. Same goes for most image formats and media files. Browsers
> routinely encounter mis/un-encoded pages.

If you include image, video and sound files, you are probably correct 
that most content of files is binary.

Outside of those three kinds of files, I would expect that *by far* the 
single largest kind of file is text. Some text is wrapped in a binary 
layer, e.g. .doc, .odt, etc. but an awful lot of it is good old human 
readable text, including web pages (html) and XML.

Every programming language I know of defaults to opening files in text 
mode rather than binary mode. There may be exceptions, but reading and 
writing text is ubiquitous while writing .o and .so files is not.


> In python I would have preferred for bytes to remain the default io
> mechanism, at least that would allow me to decide if I need any
> decoding.

That implies that you're opening files in binary mode by default. It also 
implies that even something as trivial as writing the string "Hello 
World" to a file (stdout is a file) is impossible until you've learned 
about encodings and know which encoding you need. I really don't think 
that's a good plan, for any language, but especially a language like 
Python which is intended for beginners as well as experts.

The Python 2 approach, where stdout in binary but tries really hard to 
pretend to be a superset of ASCII, is simply broken. It works well for 
trivial examples, while breaking in surprising and hard-to-diagnose ways 
in others. It violates the Zen, errors should not be ignored unless 
explicitly silenced, instead silently failing and giving moji-bake:

[steve@ando ~]$ python2.7 -c "import sys; sys.stdout.write(u'ñβж\n')"
ñβж

Changing to print doesn't help:

[steve@ando ~]$ python2.7 -c "print u'ñβж'"
ñβж


Python 3 works correctly, whether you use print or sys.stdout:

[steve@ando ~]$ python3.3 -c "import sys; sys.stdout.write(u'ñβж\n')"
ñβж

(although I haven't tested it on Windows).





-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]


#72535

FromChris Angelico <rosuav@gmail.com>
Date2014-06-04 02:43 +1000
Message-ID<mailman.10634.1401813821.18130.python-list@python.org>
In reply to#72533
On Wed, Jun 4, 2014 at 2:34 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> Outside of those three kinds of files, I would expect that *by far* the
> single largest kind of file is text. Some text is wrapped in a binary
> layer, e.g. .doc, .odt, etc. but an awful lot of it is good old human
> readable text, including web pages (html) and XML.

In terms of file I/O in Python, text wrapped in a binary layer has to
be treated as binary, not text. There's no difference between a JPEG
file that has some textual EXIF information and an ODT file that's a
whole lot of zipped up text; both of them have to be read as binary,
then unpacked according to the container's specs, and then the text
portion decoded according to an encoding like UTF-8.

But you're quite right that a large proportion of files out there
really are text.

ChrisA

[toc] | [prev] | [next] | [standalone]


#72444

FromTerry Reedy <tjreedy@udel.edu>
Date2014-06-02 17:34 -0400
Message-ID<mailman.10575.1401744891.18130.python-list@python.org>
In reply to#72389
On 6/2/2014 7:10 AM, Robin Becker wrote:

> there seems to be an implicit assumption in python land that encoded
> strings are the norm.

I don't know why you say that. To have a stream of bytes interpreted as 
characters, open in text mode and give the encoding. Otherwise, open in 
binary mode and apply whatever encoding you want. Image programs like 
Pil or Pillow assume that bytes have image encodings. Same idea.

 > On virtually every computer I encounter that assumption is wrong.

Except for the std streams (see below), it is also not part of Python.

I will just point out that bytes are given meaning by encoding meaning 
into them. Unicode attempts to reduce the hundreds of text encodings to 
just a few, and mostly to just one for external storage and transmission.

> In python I would have preferred for bytes to remain the default io

Do you really think that defaulting the open mode to 'rb' rather than 
'rt' would be a better choice for newbies?

> mechanism, at least that would allow me to decide if I need any decoding.

Assuming that 'rb' is actually needed more than 'rt' for you in 
particular, is it really such a burden to give a mode more often than not?

> As the cat example
> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/
> showed these extra assumptions are sometimes really in the way.

This example is *only* about the *pre-opened* stdxyz streams. Python 
uses these to read characters from the keyboard and print characters to 
the screen in input, print, and the interactive interpreter. So they are 
open in text mode (which wraps binary read and write). The developers, 
knowing that people can and do write batch mode programs that avoid 
input and print, gave a documented way to convert the streams back to 
binary. (See the sys doc.)

The issue Armin ran into is this. He write a library module that makes 
sure the streams are binary. Someone else does the same. A program 
imports both modules, in either order. The conversion method referenced 
above raises an exception if one attempt to convert an already converted 
stream. Much of the extra code Armin published detects whether the steam 
is already binary or needs conversion.

The obvious solution is to enhance the conversion method so that one may 
say 'convert is needed, otherwise just pass'.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#72468

FromGregory Ewing <greg.ewing@canterbury.ac.nz>
Date2014-06-03 17:16 +1200
Message-ID<bv540tFca0mU1@mid.individual.net>
In reply to#72444
Terry Reedy wrote:
> The issue Armin ran into is this. He write a library module that makes 
> sure the streams are binary.

Seems to me he made a mistake right there. A library should
*not* be making global changes like that. It can obtain
binary streams from stdin and stdout for its own use, but
it shouldn't stuff them back into sys.stdin and sys.stdout.

If he had trouble because another library did that, then
that library is broken, not Python.

-- 
Greg

[toc] | [prev] | [next] | [standalone]


#72478

FromTerry Reedy <tjreedy@udel.edu>
Date2014-06-03 02:21 -0400
Message-ID<mailman.10598.1401776514.18130.python-list@python.org>
In reply to#72468
On 6/3/2014 1:16 AM, Gregory Ewing wrote:
> Terry Reedy wrote:
>> The issue Armin ran into is this. He write a library module that makes
>> sure the streams are binary.
>
> Seems to me he made a mistake right there. A library should
> *not* be making global changes like that. It can obtain
> binary streams from stdin and stdout for its own use, but
> it shouldn't stuff them back into sys.stdin and sys.stdout.
>
> If he had trouble because another library did that, then
> that library is broken, not Python.

I agree. The example in Armin's blog rant was an application, an empty 
unix filter (ie, simplified cat clone). For that example the complex 
code he posted to show how awful Python 3 is is unneeded. When I asked 
what he did not directly use the fix in the doc, without the 
scaffolding, he switching to the 'library' module explanation.

The problem is that causal readers like Robin sometimes jump from 'In 
Python 3, it can be hard to do something one really ought not to do' to 
'Binary I/O is hard in Python 3' -- which is is not.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#72521

FromRobin Becker <robin@reportlab.com>
Date2014-06-03 15:18 +0100
Message-ID<mailman.10625.1401805111.18130.python-list@python.org>
In reply to#72468
........
> The problem is that causal readers like Robin sometimes jump from 'In Python 3,
> it can be hard to do something one really ought not to do' to 'Binary I/O is
> hard in Python 3' -- which is is not.
>
I'm fairly causal and I did understand that the rant was a bit over the top for 
fairly practical reasons I have always regarded the std streams as allowing 
binary data and always objected to having to open files in python with  a 't' or 
'b' mode to cope with line ending issues.

Isn't it a bit old fashioned to think everything is connected to a console?

I think the idea that we only give meaning to binary data using encodings is a 
bit limiting. A zip or gif file has structure, but I don't think it's reasonable 
to regard such a file as having an encoding in the python unicode sense.
-- 
Robin Becker

[toc] | [prev] | [next] | [standalone]


#72635

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-06-04 13:08 +0000
Message-ID<538f1a61$0$29978$c3e8da3$5496439d@news.astraweb.com>
In reply to#72521
On Tue, 03 Jun 2014 15:18:19 +0100, Robin Becker wrote:

> ........
>> The problem is that causal readers like Robin sometimes jump from 'In
>> Python 3, it can be hard to do something one really ought not to do' to
>> 'Binary I/O is hard in Python 3' -- which is is not.
>>
> I'm fairly causal and I did understand that the rant was a bit over the
> top for fairly practical reasons I have always regarded the std streams
> as allowing binary data and always objected to having to open files in
> python with  a 't' or 'b' mode to cope with line ending issues.
> 
> Isn't it a bit old fashioned to think everything is connected to a
> console?

The whole concept of stdin and stdout is based on the idea of having a 
console to read from and write to. Otherwise, what would be the point? 
Classic Mac (pre OS X) had no command line interface nothing, and nothing 
even remotely like stdin and stdout. But once you have a console, stdin, 
stdout, and stderr become useful. And once you have them, then you can 
extend the concept using redirection and pipes. But fundamentally, stdin 
and stdout are about consoles.


> I think the idea that we only give meaning to binary data using
> encodings is a bit limiting. A zip or gif file has structure, but I
> don't think it's reasonable to regard such a file as having an encoding
> in the python unicode sense.

In the Unicode sense? Of course not, that would be silly.

The concept of encodings is bigger than just text, and in that sense zip 
compression is an encoding which encodes non-random data into a different 
format which generally takes up less space.



-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]


#72665

FromGregory Ewing <greg.ewing@canterbury.ac.nz>
Date2014-06-05 14:01 +1200
Message-ID<bva1ccFdr03U1@mid.individual.net>
In reply to#72635
Steven D'Aprano wrote:
> The whole concept of stdin and stdout is based on the idea of having a 
> console to read from and write to.

Not really; stdin and stdout are frequently connected to
files, or pipes to other processes. The console, if it
exists, just happens to be a convenient default value for
them. Even on a system without a console, they're still
a useful abstraction.

But we were talking about encodings, and whether stdin
and stdout should be text or binary by default. Well,
one of the design principles behind unix is to make use
of plain text wherever possible. Not just for stuff
meant to be seen on the screen, but for stuff kept in
files as well.

As a result, most unix programs, most of the time, deal
with text on stdin and stdout. So, it makes sense for
them to be text by default. And wherever there's text,
there needs to be an encoding. This is true whether
a console is involved or not.

-- 
Greg

[toc] | [prev] | [next] | [standalone]


Page 1 of 5  [1] 2 3 4 5  Next page →

Back to top | Article view | comp.lang.python


csiph-web