Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #72340 > unrolled thread
| Started by | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| First post | 2014-05-31 17:10 +0100 |
| Last post | 2014-06-03 14:22 -0400 |
| Articles | 20 on this page of 92 — 19 participants |
Back to article view | Back to comp.lang.python
Python 3.2 has some deadly infection Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-31 17:10 +0100
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-05-31 22:55 +0300
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-01 02:26 +0000
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-01 12:43 +1000
Re: Python 3.2 has some deadly infection Tim Delaney <timothy.c.delaney@gmail.com> - 2014-06-02 08:54 +1000
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-02 01:14 +0000
Re: Python 3.2 has some deadly infection Tim Delaney <timothy.c.delaney@gmail.com> - 2014-06-02 12:23 +1000
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-01 19:46 -0700
Re: Python 3.2 has some deadly infection Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2014-06-02 07:45 +0000
Re: Python 3.2 has some deadly infection Tim Delaney <timothy.c.delaney@gmail.com> - 2014-06-02 19:02 +1000
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-02 19:14 +1000
Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-02 12:10 +0100
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-03 16:34 +0000
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-04 02:43 +1000
Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-02 17:34 -0400
Re: Python 3.2 has some deadly infection Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-06-03 17:16 +1200
Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-03 02:21 -0400
Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-03 15:18 +0100
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-04 13:08 +0000
Re: Python 3.2 has some deadly infection Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-06-05 14:01 +1200
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 10:16 +0300
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-05 17:30 +1000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 11:05 +0300
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-05 18:36 +1000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 12:53 +0300
Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-05 05:43 -0700
Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 14:50 -0400
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 23:21 +0300
Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 18:09 -0400
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 23:13 +0000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 02:30 +0300
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 09:39 +1000
Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 22:08 -0400
Re: Python 3.2 has some deadly infection Ethan Furman <ethan@stoneleaf.us> - 2014-06-05 20:47 -0700
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve@pearwood.info> - 2014-06-05 08:34 +0000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 12:41 +0300
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 06:37 -0700
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 17:45 +0300
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 15:33 +0000
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 02:12 +1000
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 09:54 -0700
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 03:36 +1000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 19:52 +0300
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 03:28 +1000
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 15:35 -0700
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 08:52 +1000
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 20:11 -0700
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 13:20 +1000
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 20:32 -0700
Re: Python 3.2 has some deadly infection Akira Li <4kir4.1i@gmail.com> - 2014-06-06 12:03 +0400
Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-05 16:37 +0100
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 16:16 +0000
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 01:50 +1000
Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-05 17:17 +0100
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 16:32 +0000
Re: Python 3.2 has some deadly infection Ethan Furman <ethan@stoneleaf.us> - 2014-06-06 07:40 -0700
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 03:14 +1000
Re: Python 3.2 has some deadly infection Ian Kelly <ian.g.kelly@gmail.com> - 2014-06-05 11:16 -0600
Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 14:11 -0400
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 21:30 +0300
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 23:02 +0000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 02:21 +0300
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-06 12:15 +0000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 16:00 +0300
Re: Python 3.2 has some deadly infection rurpy@yahoo.com - 2014-06-07 21:34 -0700
Re: Python 3.2 has some deadly infection Ethan Furman <ethan@stoneleaf.us> - 2014-06-06 06:24 -0700
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 17:10 +0300
Re: Python 3.2 has some deadly infection Michael Torrie <torriem@gmail.com> - 2014-06-06 09:02 -0600
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 18:32 +0300
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 01:50 +1000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 20:02 +0300
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-06 10:13 -0700
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 03:26 +1000
Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-06 11:03 -0700
Re: Python 3.2 has some deadly infection Denis McMahon <denismfmcmahon@gmail.com> - 2014-06-06 21:18 +0000
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 08:18 +1000
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-06 15:57 +0000
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-06 09:21 -0700
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 02:48 +1000
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-06 10:04 -0700
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 03:12 +1000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 20:11 +0300
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 03:16 +1000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 20:18 +0300
Re: Python 3.2 has some deadly infection Ned Batchelder <ned@nedbatchelder.com> - 2014-06-06 13:33 -0400
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 01:25 +1000
Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-06 08:44 -0700
Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-06 08:48 -0700
Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-06 12:56 +0100
Re: Python 3.2 has some deadly infection Akira Li <4kir4.1i@gmail.com> - 2014-06-05 06:49 +0400
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-04 00:25 +1000
Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-03 14:22 -0400
Page 1 of 5 [1] 2 3 4 5 Next page →
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2014-05-31 17:10 +0100 |
| Subject | Python 3.2 has some deadly infection |
| Message-ID | <mailman.10509.1401552642.18130.python-list@python.org> |
Some interesting comments here http://techtonik.rainforce.org/2014/05/python-32-has-some-deadly-infection.html so I'm simply asking for other opinions. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
[toc] | [next] | [standalone]
| From | Marko Rauhamaa <marko@pacujo.net> |
|---|---|
| Date | 2014-05-31 22:55 +0300 |
| Message-ID | <87wqd1afl0.fsf@elektro.pacujo.net> |
| In reply to | #72340 |
Mark Lawrence <breamoreboy@yahoo.co.uk>: > Some interesting comments here > http://techtonik.rainforce.org/2014/05/python-32-has-some-deadly-infection.html > so I'm simply asking for other opinions. I read the article, but unfortunately I failed to see interesting comments or opinions. There was some graphic, but it didn't say anything to me, and the article didn't really seem to be making any argument apart from the disappointed tone. Marko
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-06-01 02:26 +0000 |
| Message-ID | <538a8f48$0$29978$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #72340 |
On Sat, 31 May 2014 17:10:20 +0100, Mark Lawrence wrote:
> Some interesting comments here
> http://techtonik.rainforce.org/2014/05/python-32-has-some-deadly-
infection.html
> so I'm simply asking for other opinions.
Oh, Anatoly Techtonik. He's quite notorious on python-dev for wanting to
impose his wild and sometimes wacky processes on the entire community.
Specific examples aren't coming to mind, and I'm too lazy to search the
archives, so I'll just make one up to give you an idea of the flavour of
his requests:
"Twitter is the only way that developers can effectively
communicate. We must shut down all the mailing lists and the
bug tracker and move all communication immediately to
Twitter. And by we I mean you."
[Not an actual quote.]
I've come to the conclusion that he occasionally has a point to his
posts, but only at random by virtue of the scatter-gun technique. He's
obviously widely read, but not deeply, and so he fires off a lot of ill-
thought out but superficially attractive proposals. Just by chance a few
of them end up being interesting, not *interesting enough* for somebody
else to do the work. At this point the ideas languish, because he refuses
to sign a contributor agreement so the Python core developers cannot
accept anything from him.
This blog post is a strong opinion about Python, but it isn't clear what
that opinion *actually is*. His post is rambling and unfocused and
incoherent ("art is the future"). He rails against having to write PEPs,
and decries the lack of stats, summaries, analysis and comparison,
utterly missing the point that the purpose of the PEP process is to
provide those stats, summaries, analysis and comparison. Reading between
the lines, I think what he means, deep down, is that *somebody else*
ought to gather those stats and do the analysis to support his ideas, and
not expect him to write the PEP.
He makes at least one factually wrong claim:
"I thought that C/C++ must die, because really all major
security problems are because of it."
[actual quote]
He's talking about buffer overflows. Buffer overflows have never been
responsible for "all" major security problems. Even allowing for a little
hyperbole, buffer overflows have not been responsible for the majority of
major security problems for a very long time. It is not 1992 any more,
and today the single largest source of security bugs are code injection
attacks. In Python terms that mostly means failure to sanitize SQL
queries and the use of eval() and exec() on untrusted data.
http://cwe.mitre.org/top25/
Three of the top four software errors are forms of code injection: SQL
injection, OS command injection, cross-site scripting. The classic C
buffer overflow comes in at number 3, so it's not an inconsiderable cause
of security vulnerabilities even today, but it is not even close to the
only such cause.
See also http://www.sans.org/top25-software-errors/
Back to the blog post... it's 2014, Python 3.3 and 3.4 have come out, why
is he talking about 3.2?
It's interesting that he starts off by stating his graph is meaningless:
"They don't measure anything - just show some lines that
correlate to each other."
then immediately tries to draw a conclusion from those lines:
"It looks like the peak of Python was on February 2011,
and since then there was a significant drop."
I've written about the difficulty of measuring language popularity in any
meaningful way:
http://import-that.dreamwidth.org/1388.html
http://import-that.dreamwidth.org/2873.html
Anatoly has picked the TIOBE Index, but I don't know that this is the
best measure of language popularity. According to it, Python is more
popular than Javascript. I love Python, but really, more popular than
Javascript? That feels wrong to me.
In any case, I think that a better explanation for the observed dip in
Feb 2011 is not that Python 3.2 is infected (infected by what?) but
*regression to the mean*. Regression to the mean is a statistical
phenomenon which basically says that all else being equal, an extreme
value is likely to be followed by a less extreme (closer to the average)
value.
Language popularity, as measured by TIOBE, is at least in part random.
(Look at how wiggly the lines are. The wiggles represent random
variation.) If by chance a language gets a spike in interest one month,
it is less likely to
Because TIOBE's results contain so much random noise, they really ought
to smooth them out by averaging the scores over a three month window, and
show trend lines. They don't, I believe, because random hiccoughs in the
data provide interest: "Last month, Java was overthrown from it's #1
ranking by C. This month it has fought its way back to #1 again! Tune in
next month to see if C can repeat it's stunning victory!!!"
I think that long term trend lines would be much less exciting but much
more informative. Eyeballing the graph, it seems to me that Java and C++
are trending down, C is probably steady, and Objective C and Python
trending up. If by chance there was a flurry of interest in Python for a
month or two, and then things fell back to normal (regression to the
mean), that might look like a slump.
But I digress... back to Anatoly's post. I think he reveals more about
himself than Python:
"When these little things sum up, you realize that you're
just wasting time trying to improve things that people
don't want to improve. They don't want to improve the process.
They don't realize that the problem is not in the language,
but in the way they don't want to hear each other. Technology
showed that people want to be heard, that they opinion should
be accounted , not closed as won't fix , or works for
me . It is not a community process, when you rely on abilities
of certain individuals to monitor and respond to all traffic
and wishes, especially when they fail to do so."
On Python-Dev, this is Anatoly's repeated claim: the process is broken,
because well it just is okay. In my opinion, "the process is broken" is
Anatoly's shorthand for "I want to do things THIS way, and you won't let
me. My way is SO OBVIOUSLY BRILLIANT that everybody, no matter their
circumstances, will be immeasurably better off by switching to my process
instead of the old way of doing things. Anyone who thinks differently is
simply not paying attention. Didn't you hear how brilliant my process is?"
Anatoly does make a few concrete complaints about Python 3, or at least
as concrete as he gets in this post:
"I expected Python 3 to be ready for the internet age" -- What does that
mean? What makes him think it isn't?
"with cross-platform behavior preferred over system-dependent one" --
It's not clear how cross-platform behaviour has anything to do with the
Internet age. Python has preferred cross-platform behaviour forever,
except for those features and modules which are explicitly intended to be
interfaces to system-dependent features. (E.g. a lot of functions in the
os module are thin wrappers around OS features. Hence the name of the
module.)
"with clear semantics to work with binary data" -- There are clear
semantics to work with binary data: use bytes, and the struct module.
Those features can be improved, and indeed Python 3.4 has improved them,
and 3.5 is in the process of improving them further. But to suggest that
Python doesn't have those clear semantics is simply false.
"with non-confusing error handling" -- How is Python 3's error handling
confusing? It's the same error handling as Python 2. Where is the
confusion?
TL;DR: Anatoly's blog post is long on disappointment and short on actual
content. It feels to me that we could summarise his post as:
I don't know what I want, I won't recognise it even if I saw
it, but Python 3 isn't it. I blame others for not living up
to my expectations for features I cannot describe and were
never promised.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-06-01 12:43 +1000 |
| Message-ID | <mailman.10516.1401590627.18130.python-list@python.org> |
| In reply to | #72354 |
On Sun, Jun 1, 2014 at 12:26 PM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > TL;DR: Anatoly's blog post is long on disappointment and short on actual > content. It feels to me that we could summarise his post as: > > I don't know what I want, I won't recognise it even if I saw > it, but Python 3 isn't it. I blame others for not living up > to my expectations for features I cannot describe and were > never promised. I think that summary is accurate. When Mark posted this last night (okay, it was last night for me, probably not for most of you), I tried to read the post and figure out what he was actually saying... and failed. Gave up on it and moved on. Got better things to do with my life... like, I dunno, actually writing code, which seems to be something that people who whine in blog posts don't do. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Tim Delaney <timothy.c.delaney@gmail.com> |
|---|---|
| Date | 2014-06-02 08:54 +1000 |
| Message-ID | <mailman.10531.1401663275.18130.python-list@python.org> |
| In reply to | #72354 |
[Multipart message — attachments visible in raw view] — view raw
On 1 June 2014 12:26, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > > "with cross-platform behavior preferred over system-dependent one" -- > It's not clear how cross-platform behaviour has anything to do with the > Internet age. Python has preferred cross-platform behaviour forever, > except for those features and modules which are explicitly intended to be > interfaces to system-dependent features. (E.g. a lot of functions in the > os module are thin wrappers around OS features. Hence the name of the > module.) > There is the behaviour of defaulting input and output to the system encoding. I personally think we would all be better off if Python (and Java, and many other languages) defaulted to UTF-8. This hopefully would eventually have the effect of producers changing to output UTF-8 by default, and consumers learning to manually specify an encoding when it's not UTF-8 (due to invalid codepoints). I'm currently working on a product that interacts with lots of other products. These other products can be using any encoding - but most of the functions that interact with I/O assume the system default encoding of the machine that is collecting the data. The product has been in production for nearly a decade, so there's a lot of pushback against changes deep in the code for fear that it will break working systems. The fact that they are working largely by accident appears to escape them ... FWIW, changing to use iso-latin-1 by default would be the most sensible option (effectively treating everything as bytes), with the option for another encoding if/when more information is known (e.g. there's often a call to return the encoding, and the output of that call is guaranteed to be ASCII). Tim Delaney
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-06-02 01:14 +0000 |
| Message-ID | <538bcfff$0$29978$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #72386 |
On Mon, 02 Jun 2014 08:54:33 +1000, Tim Delaney wrote:
> On 1 June 2014 12:26, Steven D'Aprano
> <steve+comp.lang.python@pearwood.info> wrote:
>
>
>> "with cross-platform behavior preferred over system-dependent one" --
>> It's not clear how cross-platform behaviour has anything to do with the
>> Internet age. Python has preferred cross-platform behaviour forever,
>> except for those features and modules which are explicitly intended to
>> be interfaces to system-dependent features. (E.g. a lot of functions in
>> the os module are thin wrappers around OS features. Hence the name of
>> the module.)
>>
>>
> There is the behaviour of defaulting input and output to the system
> encoding.
That's a tricky one, but I think on balance that is a case where
defaulting to the system encoding is the right thing to do. Input and out
occurs on the local system you are running on, which by definition isn't
cross-platform. (Non-local I/O is possible, but requires work -- it
doesn't just happen.)
> I personally think we would all be better off if Python (and
> Java, and many other languages) defaulted to UTF-8. This hopefully would
> eventually have the effect of producers changing to output UTF-8 by
> default, and consumers learning to manually specify an encoding when
> it's not UTF-8 (due to invalid codepoints).
UTF-8 everywhere should be our ultimate aim. Then we can forget about
legacy encodings except when digging out ancient documents from archived
floppy disks :-)
> I'm currently working on a product that interacts with lots of other
> products. These other products can be using any encoding - but most of
> the functions that interact with I/O assume the system default encoding
> of the machine that is collecting the data. The product has been in
> production for nearly a decade, so there's a lot of pushback against
> changes deep in the code for fear that it will break working systems.
> The fact that they are working largely by accident appears to escape
> them ...
>
> FWIW, changing to use iso-latin-1 by default would be the most sensible
> option (effectively treating everything as bytes), with the option for
> another encoding if/when more information is known (e.g. there's often a
> call to return the encoding, and the output of that call is guaranteed
> to be ASCII).
Python 2 does what you suggest, and it is *broken*. Python 2.7 creates
moji-bake, while Python 3 gets it right:
[steve@ando ~]$ python2.7 -c "print u'δжç'"
δжç
[steve@ando ~]$ python3.3 -c "print(u'δжç')"
δжç
Latin-1 is one of those legacy encodings which needs to die, not to be
entrenched as the default. My terminal uses UTF-8 by default (as it
should), and if I use the terminal to input "δжç", Python ought to see
what I input, not Latin-1 moji-bake.
If I were to use Windows with a legacy code page, then I couldn't even
enter "δжç" on the command line since none of the legacy encodings
support that set of characters at the same time. I don't know exactly
what I would get if I tried (say, by copying and pasting text from a
Unicode-aware application), but I'd see that it was weird *in the shell*
before it even reaches Python.
On the other hand, if I were to input something supported by the legacy
encoding, let's say I entered "αβγ" while using ISO-8859-7 (Greek), then
Python ought to see "αβγ" and not moji-bake:
py> b = "αβγ".encode('iso-8859-7') # what the shell generates
py> b.decode('latin-1') # what Python interprets those bytes as
'áâã'
Defaulting to the system encoding means that Python input and output just
works, to the degree that input and output on your system just works. If
your system is crippled by the use of a legacy encoding, then Python will
at least be *no worse* than your system.
--
Steven D'Aprano
http://import-that.dreamwidth.org/
[toc] | [prev] | [next] | [standalone]
| From | Tim Delaney <timothy.c.delaney@gmail.com> |
|---|---|
| Date | 2014-06-02 12:23 +1000 |
| Message-ID | <mailman.10534.1401676125.18130.python-list@python.org> |
| In reply to | #72389 |
[Multipart message — attachments visible in raw view] — view raw
On 2 June 2014 11:14, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > On Mon, 02 Jun 2014 08:54:33 +1000, Tim Delaney wrote: > > I'm currently working on a product that interacts with lots of other > > products. These other products can be using any encoding - but most of > > the functions that interact with I/O assume the system default encoding > > of the machine that is collecting the data. The product has been in > > production for nearly a decade, so there's a lot of pushback against > > changes deep in the code for fear that it will break working systems. > > The fact that they are working largely by accident appears to escape > > them ... > > > > FWIW, changing to use iso-latin-1 by default would be the most sensible > > option (effectively treating everything as bytes), with the option for > > another encoding if/when more information is known (e.g. there's often a > > call to return the encoding, and the output of that call is guaranteed > > to be ASCII). > > Python 2 does what you suggest, and it is *broken*. Python 2.7 creates > moji-bake, while Python 3 gets it right: > The purpose of my example was to show a case where no thought was put into encodings - the assumption was that the system encoding and the remote system encoding would be the same. This is most definitely not the case a lot of the time. I also should have been more clear that *in the particular situation I was talking about* iso-latin-1 as default would be the right thing to do, not in the general case. Quite often we won't know the correct encoding until we've executed a command via ssh - iso-latin-1 will allow us to extract the info we need (which will generally be 7-bit ASCII) without the possibility of an invalid encoding. Sure we may get mojibake, but that's better than the alternative when we don't yet know the correct encoding. > Latin-1 is one of those legacy encodings which needs to die, not to be > entrenched as the default. My terminal uses UTF-8 by default (as it > should), and if I use the terminal to input "δжç", Python ought to see > what I input, not Latin-1 moji-bake. > For some purposes, there needs to be a way to treat an arbitrary stream of bytes as an arbitrary stream of 8-bit characters. iso-latin-1 is a convenient way to do that. It's not the only way, but settling on it and being consistent is better than not having a way. Tim Delaney
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2014-06-01 19:46 -0700 |
| Message-ID | <7bdd7967-deed-4bbc-b177-0a0202b2ab7f@googlegroups.com> |
| In reply to | #72390 |
On Monday, June 2, 2014 7:53:05 AM UTC+5:30, Tim Delaney wrote: > On 2 June 2014 11:14, Steven D'Aprano <steve+comp....@pearwood.info> wrote: >> Latin-1 is one of those legacy encodings which needs to die, not to be >> entrenched as the default. My terminal uses UTF-8 by default (as it >> should), and if I use the terminal to input "δжç", Python ought to see >> what I input, not Latin-1 moji-bake. > For some purposes, there needs to be a way to treat an arbitrary > stream of bytes as an arbitrary stream of 8-bit > characters. iso-latin-1 is a convenient way to do that. It's not the > only way, but settling on it and being consistent is better than not > having a way. Here is a quote from the oracle docs: http://docs.oracle.com/cd/E23824_01/html/E26033/glmbx.html#glmar | The C locale, also known as the POSIX locale, is the POSIX system | default locale for all POSIX-compliant systems. In more layman language | ASCII also known as the 'Unix locale' is the default for all *nix | compliant systems which is a key aspect of what Ive called 'The UNIX Assumption' : http://blog.languager.org/2014/04/unicode-and-unix-assumption.html
[toc] | [prev] | [next] | [standalone]
| From | Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> |
|---|---|
| Date | 2014-06-02 07:45 +0000 |
| Message-ID | <mailman.10546.1401695176.18130.python-list@python.org> |
| In reply to | #72389 |
Tim Delaney <timothy.c.delaney <at> gmail.com> writes: > > I also should have been more clear that *in the particular situation I was talking about* iso-latin-1 as default would be the right thing to do, not in the general case. Quite often we won't know the correct encoding until we've executed a command via ssh - iso-latin-1 will allow us to extract the info we need (which will generally be 7-bit ASCII) without the possibility of an invalid encoding. Sure we may get mojibake, but that's better than the alternative when we don't yet know the correct encoding. > > Latin-1 is one of those legacy encodings which needs to die, not to be > entrenched as the default. My terminal uses UTF-8 by default (as itshould), and if I use the terminal to input "δжç", Python ought to seewhat I input, not Latin-1 moji-bake. > > > For some purposes, there needs to be a way to treat an arbitrary stream of bytes as an arbitrary stream of 8-bit characters. iso-latin-1 is a convenient way to do that. > For that purpose, Python3 has the bytes() type. Read the data as is, then decode it to a string once you figured out its encoding. Wolfgang
[toc] | [prev] | [next] | [standalone]
| From | Tim Delaney <timothy.c.delaney@gmail.com> |
|---|---|
| Date | 2014-06-02 19:02 +1000 |
| Message-ID | <mailman.10548.1401699744.18130.python-list@python.org> |
| In reply to | #72389 |
[Multipart message — attachments visible in raw view] — view raw
On 2 June 2014 17:45, Wolfgang Maier < wolfgang.maier@biologie.uni-freiburg.de> wrote: > Tim Delaney <timothy.c.delaney <at> gmail.com> writes: > > > For some purposes, there needs to be a way to treat an arbitrary stream > of > bytes as an arbitrary stream of 8-bit characters. iso-latin-1 is a > convenient way to do that. > > > > For that purpose, Python3 has the bytes() type. Read the data as is, then > decode it to a string once you figured out its encoding. > I know that, you know that. Convincing other people of that is the difficulty. I probably should have mentioned it, but in my case it's not even Python (Java). It's exactly the same principal - an assumption was made that has become entrenched due to the fear of breakage. If they'd been forced to think about encodings up-front, it shouldn't have been an issue, which was the point I was trying to make. In Java, it's much worse. At least with Python you can perform string-like operations on bytes. In Java you have to convert it to characters before you can really do anything with it, so people just use the default encoding all the time - especially if they want the convenience of line-by-line reading using BufferedReader ... Tim Delaney
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-06-02 19:14 +1000 |
| Message-ID | <mailman.10551.1401700886.18130.python-list@python.org> |
| In reply to | #72389 |
On Mon, Jun 2, 2014 at 7:02 PM, Tim Delaney <timothy.c.delaney@gmail.com> wrote: > In Java, it's much worse. At least with Python you can perform string-like > operations on bytes. In Java you have to convert it to characters before you > can really do anything with it, so people just use the default encoding all > the time - especially if they want the convenience of line-by-line reading > using BufferedReader ... What exactly is "line-by-line reading" with bytes? As I understand it, lines are defined by characters. If you mean "reading a stream of bytes and dividing it on 0x0A", then surely you can do that, but that assumes an ASCII-compatible encoding. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Robin Becker <robin@reportlab.com> |
|---|---|
| Date | 2014-06-02 12:10 +0100 |
| Message-ID | <mailman.10554.1401707464.18130.python-list@python.org> |
| In reply to | #72389 |
............ > > I probably should have mentioned it, but in my case it's not even Python > (Java). It's exactly the same principal - an assumption was made that has > become entrenched due to the fear of breakage. If they'd been forced to > think about encodings up-front, it shouldn't have been an issue, which was > the point I was trying to make. > there seems to be an implicit assumption in python land that encoded strings are the norm. On virtually every computer I encounter that assumption is wrong. The vast majority of bytes in most computers is not something that can be easily printed out for humans to read. I suppose some clever pythonista can figure out an encoding to read my .o / .so etc files, but they are practically meaningless to a unicode program today. Same goes for most image formats and media files. Browsers routinely encounter mis/un-encoded pages. > In Java, it's much worse. At least with Python you can perform string-like > operations on bytes. In Java you have to convert it to characters before > you can really do anything with it, so people just use the default encoding > all the time - especially if they want the convenience of line-by-line > reading using BufferedReader ... .. In python I would have preferred for bytes to remain the default io mechanism, at least that would allow me to decide if I need any decoding. As the cat example http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ showed these extra assumptions are sometimes really in the way. -- Robin Becker
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-06-03 16:34 +0000 |
| Message-ID | <538df925$0$29978$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #72405 |
On Mon, 02 Jun 2014 12:10:48 +0100, Robin Becker wrote: > there seems to be an implicit assumption in python land that encoded > strings are the norm. On virtually every computer I encounter that > assumption is wrong. The vast majority of bytes in most computers is not > something that can be easily printed out for humans to read. I suppose > some clever pythonista can figure out an encoding to read my .o / .so > etc files, but they are practically meaningless to a unicode program > today. Same goes for most image formats and media files. Browsers > routinely encounter mis/un-encoded pages. If you include image, video and sound files, you are probably correct that most content of files is binary. Outside of those three kinds of files, I would expect that *by far* the single largest kind of file is text. Some text is wrapped in a binary layer, e.g. .doc, .odt, etc. but an awful lot of it is good old human readable text, including web pages (html) and XML. Every programming language I know of defaults to opening files in text mode rather than binary mode. There may be exceptions, but reading and writing text is ubiquitous while writing .o and .so files is not. > In python I would have preferred for bytes to remain the default io > mechanism, at least that would allow me to decide if I need any > decoding. That implies that you're opening files in binary mode by default. It also implies that even something as trivial as writing the string "Hello World" to a file (stdout is a file) is impossible until you've learned about encodings and know which encoding you need. I really don't think that's a good plan, for any language, but especially a language like Python which is intended for beginners as well as experts. The Python 2 approach, where stdout in binary but tries really hard to pretend to be a superset of ASCII, is simply broken. It works well for trivial examples, while breaking in surprising and hard-to-diagnose ways in others. It violates the Zen, errors should not be ignored unless explicitly silenced, instead silently failing and giving moji-bake: [steve@ando ~]$ python2.7 -c "import sys; sys.stdout.write(u'ñβж\n')" ñβж Changing to print doesn't help: [steve@ando ~]$ python2.7 -c "print u'ñβж'" ñβж Python 3 works correctly, whether you use print or sys.stdout: [steve@ando ~]$ python3.3 -c "import sys; sys.stdout.write(u'ñβж\n')" ñβж (although I haven't tested it on Windows). -- Steven D'Aprano http://import-that.dreamwidth.org/
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-06-04 02:43 +1000 |
| Message-ID | <mailman.10634.1401813821.18130.python-list@python.org> |
| In reply to | #72533 |
On Wed, Jun 4, 2014 at 2:34 AM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > Outside of those three kinds of files, I would expect that *by far* the > single largest kind of file is text. Some text is wrapped in a binary > layer, e.g. .doc, .odt, etc. but an awful lot of it is good old human > readable text, including web pages (html) and XML. In terms of file I/O in Python, text wrapped in a binary layer has to be treated as binary, not text. There's no difference between a JPEG file that has some textual EXIF information and an ODT file that's a whole lot of zipped up text; both of them have to be read as binary, then unpacked according to the container's specs, and then the text portion decoded according to an encoding like UTF-8. But you're quite right that a large proportion of files out there really are text. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2014-06-02 17:34 -0400 |
| Message-ID | <mailman.10575.1401744891.18130.python-list@python.org> |
| In reply to | #72389 |
On 6/2/2014 7:10 AM, Robin Becker wrote: > there seems to be an implicit assumption in python land that encoded > strings are the norm. I don't know why you say that. To have a stream of bytes interpreted as characters, open in text mode and give the encoding. Otherwise, open in binary mode and apply whatever encoding you want. Image programs like Pil or Pillow assume that bytes have image encodings. Same idea. > On virtually every computer I encounter that assumption is wrong. Except for the std streams (see below), it is also not part of Python. I will just point out that bytes are given meaning by encoding meaning into them. Unicode attempts to reduce the hundreds of text encodings to just a few, and mostly to just one for external storage and transmission. > In python I would have preferred for bytes to remain the default io Do you really think that defaulting the open mode to 'rb' rather than 'rt' would be a better choice for newbies? > mechanism, at least that would allow me to decide if I need any decoding. Assuming that 'rb' is actually needed more than 'rt' for you in particular, is it really such a burden to give a mode more often than not? > As the cat example > http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ > showed these extra assumptions are sometimes really in the way. This example is *only* about the *pre-opened* stdxyz streams. Python uses these to read characters from the keyboard and print characters to the screen in input, print, and the interactive interpreter. So they are open in text mode (which wraps binary read and write). The developers, knowing that people can and do write batch mode programs that avoid input and print, gave a documented way to convert the streams back to binary. (See the sys doc.) The issue Armin ran into is this. He write a library module that makes sure the streams are binary. Someone else does the same. A program imports both modules, in either order. The conversion method referenced above raises an exception if one attempt to convert an already converted stream. Much of the extra code Armin published detects whether the steam is already binary or needs conversion. The obvious solution is to enhance the conversion method so that one may say 'convert is needed, otherwise just pass'. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2014-06-03 17:16 +1200 |
| Message-ID | <bv540tFca0mU1@mid.individual.net> |
| In reply to | #72444 |
Terry Reedy wrote: > The issue Armin ran into is this. He write a library module that makes > sure the streams are binary. Seems to me he made a mistake right there. A library should *not* be making global changes like that. It can obtain binary streams from stdin and stdout for its own use, but it shouldn't stuff them back into sys.stdin and sys.stdout. If he had trouble because another library did that, then that library is broken, not Python. -- Greg
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2014-06-03 02:21 -0400 |
| Message-ID | <mailman.10598.1401776514.18130.python-list@python.org> |
| In reply to | #72468 |
On 6/3/2014 1:16 AM, Gregory Ewing wrote: > Terry Reedy wrote: >> The issue Armin ran into is this. He write a library module that makes >> sure the streams are binary. > > Seems to me he made a mistake right there. A library should > *not* be making global changes like that. It can obtain > binary streams from stdin and stdout for its own use, but > it shouldn't stuff them back into sys.stdin and sys.stdout. > > If he had trouble because another library did that, then > that library is broken, not Python. I agree. The example in Armin's blog rant was an application, an empty unix filter (ie, simplified cat clone). For that example the complex code he posted to show how awful Python 3 is is unneeded. When I asked what he did not directly use the fix in the doc, without the scaffolding, he switching to the 'library' module explanation. The problem is that causal readers like Robin sometimes jump from 'In Python 3, it can be hard to do something one really ought not to do' to 'Binary I/O is hard in Python 3' -- which is is not. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Robin Becker <robin@reportlab.com> |
|---|---|
| Date | 2014-06-03 15:18 +0100 |
| Message-ID | <mailman.10625.1401805111.18130.python-list@python.org> |
| In reply to | #72468 |
........ > The problem is that causal readers like Robin sometimes jump from 'In Python 3, > it can be hard to do something one really ought not to do' to 'Binary I/O is > hard in Python 3' -- which is is not. > I'm fairly causal and I did understand that the rant was a bit over the top for fairly practical reasons I have always regarded the std streams as allowing binary data and always objected to having to open files in python with a 't' or 'b' mode to cope with line ending issues. Isn't it a bit old fashioned to think everything is connected to a console? I think the idea that we only give meaning to binary data using encodings is a bit limiting. A zip or gif file has structure, but I don't think it's reasonable to regard such a file as having an encoding in the python unicode sense. -- Robin Becker
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-06-04 13:08 +0000 |
| Message-ID | <538f1a61$0$29978$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #72521 |
On Tue, 03 Jun 2014 15:18:19 +0100, Robin Becker wrote: > ........ >> The problem is that causal readers like Robin sometimes jump from 'In >> Python 3, it can be hard to do something one really ought not to do' to >> 'Binary I/O is hard in Python 3' -- which is is not. >> > I'm fairly causal and I did understand that the rant was a bit over the > top for fairly practical reasons I have always regarded the std streams > as allowing binary data and always objected to having to open files in > python with a 't' or 'b' mode to cope with line ending issues. > > Isn't it a bit old fashioned to think everything is connected to a > console? The whole concept of stdin and stdout is based on the idea of having a console to read from and write to. Otherwise, what would be the point? Classic Mac (pre OS X) had no command line interface nothing, and nothing even remotely like stdin and stdout. But once you have a console, stdin, stdout, and stderr become useful. And once you have them, then you can extend the concept using redirection and pipes. But fundamentally, stdin and stdout are about consoles. > I think the idea that we only give meaning to binary data using > encodings is a bit limiting. A zip or gif file has structure, but I > don't think it's reasonable to regard such a file as having an encoding > in the python unicode sense. In the Unicode sense? Of course not, that would be silly. The concept of encodings is bigger than just text, and in that sense zip compression is an encoding which encodes non-random data into a different format which generally takes up less space. -- Steven D'Aprano http://import-that.dreamwidth.org/
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2014-06-05 14:01 +1200 |
| Message-ID | <bva1ccFdr03U1@mid.individual.net> |
| In reply to | #72635 |
Steven D'Aprano wrote: > The whole concept of stdin and stdout is based on the idea of having a > console to read from and write to. Not really; stdin and stdout are frequently connected to files, or pipes to other processes. The console, if it exists, just happens to be a convenient default value for them. Even on a system without a console, they're still a useful abstraction. But we were talking about encodings, and whether stdin and stdout should be text or binary by default. Well, one of the design principles behind unix is to make use of plain text wherever possible. Not just for stuff meant to be seen on the screen, but for stuff kept in files as well. As a result, most unix programs, most of the time, deal with text on stdin and stdout. So, it makes sense for them to be text by default. And wherever there's text, there needs to be an encoding. This is true whether a console is involved or not. -- Greg
[toc] | [prev] | [next] | [standalone]
Page 1 of 5 [1] 2 3 4 5 Next page →
Back to top | Article view | comp.lang.python
csiph-web