Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #71389 > unrolled thread

Everything you did not want to know about Unicode in Python 3

Started byMark Lawrence <breamoreboy@yahoo.co.uk>
First post2014-05-12 16:19 +0100
Last post2014-05-14 09:56 -0600
Articles 20 on this page of 72 — 25 participants

Back to article view | Back to comp.lang.python


Contents

  Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-12 16:19 +0100
    Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-12 17:47 +0000
      Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-12 12:31 -0600
      Re: Everything you did not want to know about Unicode in Python 3 MRAB <python@mrabarnett.plus.com> - 2014-05-12 20:42 +0100
      Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-12 16:16 -0600
      Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 09:42 +1000
      Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 01:18 +0000
        Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 11:39 +1000
          Re: Everything you did not want to know about Unicode in Python 3 alex23 <wuwei23@gmail.com> - 2014-05-13 16:25 +1000
            Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 16:32 +1000
        Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-12 20:58 -0500
        Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-13 03:33 +0100
        Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-12 22:10 -0700
          Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 00:39 -0500
            Re: Everything you did not want to know about Unicode in Python 3 Gene Heskett <gheskett@wdtv.com> - 2014-05-13 01:45 -0400
            Re: Everything you did not want to know about Unicode in Python 3 Ben Finney <ben@benfinney.id.au> - 2014-05-13 16:03 +1000
            Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-12 23:09 -0700
            Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 16:18 +1000
              Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 01:32 -0500
              Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 01:32 -0500
              Re: Everything you did not want to know about Unicode in Python 3 Roy Smith <roy@panix.com> - 2014-05-13 07:20 -0400
                Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 13:39 +0000
                  Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 23:43 +1000
                    Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-13 07:30 -0700
                      Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 00:36 +1000
                  Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 13:51 +0000
                    Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-13 14:42 +0000
                      Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 15:21 +0000
                      Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 23:53 +0000
                        Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 10:08 +1000
                          Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-14 12:42 +0000
                            Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 22:52 +1000
                            Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-16 14:46 +0000
                              Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 01:07 +0000
                                Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-17 07:19 +0300
                                  Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-17 09:35 +0100
                                  Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 10:29 +0100
                                    Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 14:15 +0000
                                      Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 22:01 +0100
                                Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 09:57 +0100
                                  Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 12:07 +0000
                                    Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 22:07 +0100
                                Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-17 19:18 +1000
                                Re: Everything you did not want to know about Unicode in Python 3 Ben Finney <ben@benfinney.id.au> - 2014-05-17 21:05 +1000
                        [OT] Copyright statements and why they can be useful (was: Everything you did not want to know about Unicode in Python 3) Ben Finney <ben@benfinney.id.au> - 2014-05-14 11:01 +1000
                        Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-14 09:07 -0600
                  Re: Everything you did not want to know about Unicode in Python 3 Dave Angel <davea@davea.name> - 2014-05-13 21:56 -0400
              Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 13:49 +0000
        Re: Everything you did not want to know about Unicode in Python 3 gregor <gregor@ediwo.com> - 2014-05-13 09:27 +0200
        Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 10:08 +0200
          Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 11:25 +0300
            Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 18:38 +1000
              Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 12:06 +0300
                Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 19:29 +1000
                Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve@pearwood.info> - 2014-05-13 09:44 +0000
              Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 11:38 +0200
            Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 11:46 +0200
              Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 12:59 +0300
            Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-13 14:30 +0100
            Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 23:37 +1000
            Re: Everything you did not want to know about Unicode in Python 3 Skip Montanaro <skip@pobox.com> - 2014-05-13 09:02 -0500
          Re: Everything you did not want to know about Unicode in Python 3 wxjmfauth@gmail.com - 2014-05-14 00:00 -0700
        Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-13 11:19 +0000
          Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-13 10:08 -0600
            Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-14 00:10 +0000
              Re: Everything you did not want to know about Unicode in Python 3 Ethan Furman <ethan@stoneleaf.us> - 2014-05-13 17:53 -0700
              Re: Everything you did not want to know about Unicode in Python 3 Terry Reedy <tjreedy@udel.edu> - 2014-05-14 17:47 -0400
              Re: Everything you did not want to know about Unicode in Python 3 Antoine Pitrou <antoine@python.org> - 2014-05-16 11:50 +0000
                Re: Everything you did not want to know about Unicode in Python 3 wxjmfauth@gmail.com - 2014-05-16 06:20 -0700
            Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-14 12:38 +0000
          Re: Everything you did not want to know about Unicode in Python 3 Robin Becker <robin@reportlab.com> - 2014-05-14 16:30 +0100
          Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-14 09:56 -0600

Page 1 of 4  [1] 2 3 4  Next page →


#71389 — Everything you did not want to know about Unicode in Python 3

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2014-05-12 16:19 +0100
SubjectEverything you did not want to know about Unicode in Python 3
Message-ID<mailman.9915.1399907977.18130.python-list@python.org>
This was *NOT* written by our resident unicode expert 
http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/

Posted as I thought it would make a rather pleasant change from 
interminable threads about names vs values vs variables vs objects.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

[toc] | [next] | [standalone]


#71397

Fromalister <alister.nospam.ware@ntlworld.com>
Date2014-05-12 17:47 +0000
Message-ID<8P7cv.78617$Sp6.8377@fx15.am4>
In reply to#71389
On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote:

> This was *NOT* written by our resident unicode expert
> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/
> 
> Posted as I thought it would make a rather pleasant change from
> interminable threads about names vs values vs variables vs objects.

Surely those example programs are not the pythonoic way to do things or 
am i missing something?

if those code samples are anything to go by this guy makes JMF look 
sensible.



-- 
The Heineken Uncertainty Principle:
	You can never be sure how many beers you had last night.

[toc] | [prev] | [next] | [standalone]


#71398

FromIan Kelly <ian.g.kelly@gmail.com>
Date2014-05-12 12:31 -0600
Message-ID<mailman.9918.1399919514.18130.python-list@python.org>
In reply to#71397
On Mon, May 12, 2014 at 11:47 AM, alister
<alister.nospam.ware@ntlworld.com> wrote:
> On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote:
>
>> This was *NOT* written by our resident unicode expert
>> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/
>>
>> Posted as I thought it would make a rather pleasant change from
>> interminable threads about names vs values vs variables vs objects.
>
> Surely those example programs are not the pythonoic way to do things or
> am i missing something?

The _is_binary_reader and _is_binary_writer functions look like they
could be simplified by calling isinstance on the io object itself
against io.TextIOBase, io.BufferedIOBase or io.RawIOBase, rather than
doing those odd 0-length reads and writes.  And then perhaps those
exception-swallowing try-excepts wouldn't be necessary.  But perhaps
there's a non-obvious reason why it's written the way it is.

And there appears to be a bug where everything *except* the filename
'-' is treated as stdin, so the script probably hasn't been tested at
all.

> if those code samples are anything to go by this guy makes JMF look
> sensible.

This is an ad hominem.  Just because his code sucks doesn't mean he's
wrong about the state of Unicode and UNIX in Python 3.

[toc] | [prev] | [next] | [standalone]


#71400

FromMRAB <python@mrabarnett.plus.com>
Date2014-05-12 20:42 +0100
Message-ID<mailman.9919.1399923932.18130.python-list@python.org>
In reply to#71397
On 2014-05-12 19:31, Ian Kelly wrote:
> On Mon, May 12, 2014 at 11:47 AM, alister
> <alister.nospam.ware@ntlworld.com> wrote:
>> On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote:
>>
>>> This was *NOT* written by our resident unicode expert
>>> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/
>>>
>>> Posted as I thought it would make a rather pleasant change from
>>> interminable threads about names vs values vs variables vs objects.
>>
>> Surely those example programs are not the pythonoic way to do things or
>> am i missing something?
>
> The _is_binary_reader and _is_binary_writer functions look like they
> could be simplified by calling isinstance on the io object itself
> against io.TextIOBase, io.BufferedIOBase or io.RawIOBase, rather than
> doing those odd 0-length reads and writes.  And then perhaps those
> exception-swallowing try-excepts wouldn't be necessary.  But perhaps
> there's a non-obvious reason why it's written the way it is.
>
How about checking sys.stdin.mode and sys.stdout.mode?

> And there appears to be a bug where everything *except* the filename
> '-' is treated as stdin, so the script probably hasn't been tested at
> all.
>
>> if those code samples are anything to go by this guy makes JMF look
>> sensible.
>
> This is an ad hominem.  Just because his code sucks doesn't mean he's
> wrong about the state of Unicode and UNIX in Python 3.
>

[toc] | [prev] | [next] | [standalone]


#71404

FromIan Kelly <ian.g.kelly@gmail.com>
Date2014-05-12 16:16 -0600
Message-ID<mailman.9923.1399936290.18130.python-list@python.org>
In reply to#71397
On Mon, May 12, 2014 at 1:42 PM, MRAB <python@mrabarnett.plus.com> wrote:
> How about checking sys.stdin.mode and sys.stdout.mode?

Seems to work, but I notice that the docs only define the mode
attribute for the FileIO class, which sys.stdin and sys.stdout are not
instances of.

[toc] | [prev] | [next] | [standalone]


#71407

FromChris Angelico <rosuav@gmail.com>
Date2014-05-13 09:42 +1000
Message-ID<mailman.9925.1399938136.18130.python-list@python.org>
In reply to#71397
On Tue, May 13, 2014 at 4:31 AM, Ian Kelly <ian.g.kelly@gmail.com> wrote:
> Just because his code sucks doesn't mean he's
> wrong about the state of Unicode and UNIX in Python 3.

Uhm... I think wrongness of code is generally fairly indicative of
wrongness of thinking :) If I write a rant about how Python's list
type sucks and it turns out my code is using it like a cons cell and
never putting more than two elements into a list, then you would
accurately conclude that I'm wrong about the state of data type
support in Python.

I don't have a problem with someone coming to the list here with
misconceptions. That's what discussions are for. But rants like that,
on blogs, I quickly get weary of reading. The tone is always "Look
what's so wrong", not inviting dialogue, and I can't be bothered
digging into the details to compose a full response. Chances are the
author's (a) not looking at what 3.4 and what's happened to improve
things (and certainly not 3.5 and what's going to happen), and (b) not
listening to responses anyway.

ChrisA

[toc] | [prev] | [next] | [standalone]


#71416

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-05-13 01:18 +0000
Message-ID<537172eb$0$29980$c3e8da3$5496439d@news.astraweb.com>
In reply to#71397
On Mon, 12 May 2014 17:47:48 +0000, alister wrote:

> On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote:
> 
>> This was *NOT* written by our resident unicode expert
>> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/
>> 
>> Posted as I thought it would make a rather pleasant change from
>> interminable threads about names vs values vs variables vs objects.
> 
> Surely those example programs are not the pythonoic way to do things or
> am i missing something?

Feel free to show us your version of "cat" for Python then. Feel free to 
target any version you like. Don't forget to test it against files with 
names and content that:

- aren't valid UTF-8;

- are valid UTF-8, but not valid in the local encoding.



> if those code samples are anything to go by this guy makes JMF look
> sensible.

Armin Ronacher is an extremely experienced and knowledgeable Python 
developer, and a Python core developer. He might be wrong, but he's not 
*obviously* wrong.

Unicode is hard, not because Unicode is hard, but because of legacy 
problems. I can create a file on a machine that uses ISO-8859-7 for the 
file name, put JShift-JIS encoded text inside it, transfer it to a 
machine that uses Windows-1251 as the file system encoding, then SSH into 
that machine from a system using Big5, and try to make sense of it. If 
everybody used UTF-8 any time data touched a disk or network, we'd be 
laughing. It would all be so simple.

Reading Armin's post, I think that all that is needed to simplify his 
Python 3 version is:

- have a bytes version of sys.argv (bargv? argvb?) and read 
  the file names from that;

- have a simple way to write bytes to stdout and stderr.

Most programs won't need either of those, but file system utilities will.



-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]


#71418

FromChris Angelico <rosuav@gmail.com>
Date2014-05-13 11:39 +1000
Message-ID<mailman.9934.1399945166.18130.python-list@python.org>
In reply to#71416
On Tue, May 13, 2014 at 11:18 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> Reading Armin's post, I think that all that is needed to simplify his
> Python 3 version is:
>
> - have a bytes version of sys.argv (bargv? argvb?) and read
>   the file names from that;

argb? :)

> - have a simple way to write bytes to stdout and stderr.

I'm not sure how that goes with I/O redirection, but sure.

ChrisA

[toc] | [prev] | [next] | [standalone]


#71438

Fromalex23 <wuwei23@gmail.com>
Date2014-05-13 16:25 +1000
Message-ID<lksdru$p5k$1@dont-email.me>
In reply to#71418
On 13/05/2014 11:39 AM, Chris Angelico wrote:
> On Tue, May 13, 2014 at 11:18 AM, Steven D'Aprano
> <steve+comp.lang.python@pearwood.info> wrote:
>> - have a bytes version of sys.argv (bargv? argvb?) and read
>>    the file names from that;
>
> argb? :)

I tried and failed to come up with an "argy bargy" joke here so decided 
to go for a meta-reference instead.

[toc] | [prev] | [next] | [standalone]


#71440

FromChris Angelico <rosuav@gmail.com>
Date2014-05-13 16:32 +1000
Message-ID<mailman.9941.1399962736.18130.python-list@python.org>
In reply to#71438
On Tue, May 13, 2014 at 4:25 PM, alex23 <wuwei23@gmail.com> wrote:
> On 13/05/2014 11:39 AM, Chris Angelico wrote:
>>
>> On Tue, May 13, 2014 at 11:18 AM, Steven D'Aprano
>> <steve+comp.lang.python@pearwood.info> wrote:
>>>
>>> - have a bytes version of sys.argv (bargv? argvb?) and read
>>>    the file names from that;
>>
>>
>> argb? :)
>
>
> I tried and failed to come up with an "argy bargy" joke here so decided to
> go for a meta-reference instead.

I'm just waiting for someone to have need for arguments in both
network byte order and host byte order. The latter, of course, would
be "argh".

ChrisA

[toc] | [prev] | [next] | [standalone]


#71420

FromMark H Harris <harrismh777@gmail.com>
Date2014-05-12 20:58 -0500
Message-ID<lkru8l$t30$1@speranza.aioe.org>
In reply to#71416
On 5/12/14 8:18 PM, Steven D'Aprano wrote:
> Unicode is hard, not because Unicode is hard, but because of legacy
> problems.

Yes.  To put a finer point on that, Unicode (which is only a 
specification constantly being improved upon) is harder to implement 
when it hasn't been on the design board from the ground up; Python in 
this case.

Julia has Unicode support from the ground up, and it was easier for 
those guys to implement (in beta release) than for the Python crew when 
they undertook the Unicode work that had to be done for Python3.x (just 
an observation).

Anytime there are legacy code issues, regression testing problems, and a 
host of domain issues that weren't thought through from the get-go there 
are going to be more problematic hurdles; not to mention bugs.

Having said that, I still think Unicode is somewhat harder than you're 
admitting.

marcus

[toc] | [prev] | [next] | [standalone]


#71422

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2014-05-13 03:33 +0100
Message-ID<mailman.9935.1399948395.18130.python-list@python.org>
In reply to#71416
On 13/05/2014 02:18, Steven D'Aprano wrote:
> On Mon, 12 May 2014 17:47:48 +0000, alister wrote:
>
>> On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote:
>>
>>> This was *NOT* written by our resident unicode expert
>>> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/
>>>
>>> Posted as I thought it would make a rather pleasant change from
>>> interminable threads about names vs values vs variables vs objects.
>>
>> Surely those example programs are not the pythonoic way to do things or
>> am i missing something?
>
> Feel free to show us your version of "cat" for Python then. Feel free to
> target any version you like. Don't forget to test it against files with
> names and content that:
>
> - aren't valid UTF-8;
>
> - are valid UTF-8, but not valid in the local encoding.
>
>
>
>> if those code samples are anything to go by this guy makes JMF look
>> sensible.
>
> Armin Ronacher is an extremely experienced and knowledgeable Python
> developer, and a Python core developer. He might be wrong, but he's not
> *obviously* wrong.
>
> Unicode is hard, not because Unicode is hard, but because of legacy
> problems. I can create a file on a machine that uses ISO-8859-7 for the
> file name, put JShift-JIS encoded text inside it, transfer it to a
> machine that uses Windows-1251 as the file system encoding, then SSH into
> that machine from a system using Big5, and try to make sense of it. If
> everybody used UTF-8 any time data touched a disk or network, we'd be
> laughing. It would all be so simple.
>
> Reading Armin's post, I think that all that is needed to simplify his
> Python 3 version is:
>
> - have a bytes version of sys.argv (bargv? argvb?) and read
>    the file names from that;
>
> - have a simple way to write bytes to stdout and stderr.
>
> Most programs won't need either of those, but file system utilities will.
>

I think http://bugs.python.org/issue8776 and 
http://bugs.python.org/issue8775 are relevant but both were placed in 
the small round filing cabinet.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

[toc] | [prev] | [next] | [standalone]


#71425

FromRustom Mody <rustompmody@gmail.com>
Date2014-05-12 22:10 -0700
Message-ID<82899649-014a-4309-b06e-b981fc6921fa@googlegroups.com>
In reply to#71416
On Tuesday, May 13, 2014 6:48:35 AM UTC+5:30, Steven D'Aprano wrote:
> On Mon, 12 May 2014 17:47:48 +0000, alister wrote:
> 
> > Surely those example programs are not the pythonoic way to do things or
> > am i missing something?
> 
> 
> 
> Feel free to show us your version of "cat" for Python then. Feel free to 
> target any version you like. Don't forget to test it against files with 
> names and content that:
> 
> 
> - aren't valid UTF-8;
> 
> 
> - are valid UTF-8, but not valid in the local encoding.

Thanks for a non-defensive appraisal!

> 
> 
> > if those code samples are anything to go by this guy makes JMF look
> > sensible.
> 
> 
> 
> Armin Ronacher is an extremely experienced and knowledgeable Python 
> developer, and a Python core developer. He might be wrong, but he's not 
> *obviously* wrong.
> 
> 
> 
> Unicode is hard, not because Unicode is hard, but because of legacy 
> problems. I can create a file on a machine that uses ISO-8859-7 for the 
> file name, put JShift-JIS encoded text inside it, transfer it to a 
> machine that uses Windows-1251 as the file system encoding, then SSH into 
> that machine from a system using Big5, and try to make sense of it. If 
> everybody used UTF-8 any time data touched a disk or network, we'd be 
> laughing. It would all be so simple.

I think the most helpful way forward is to accept two things:
a. Unicode is a headache
b. No-unicode is a non-option

> 
> 
> 
> Reading Armin's post, I think that all that is needed to simplify his 
> Python 3 version is:
> 
> 
> 
> - have a bytes version of sys.argv (bargv? argvb?) and read 
>   the file names from that;
> 
> - have a simple way to write bytes to stdout and stderr.
> 
> 
> Most programs won't need either of those, but file system utilities will.

About the technical merits of Armin's post and your suggestions, Ive 
nothing to say, since I am an ignoramus on (the mechanics of) unicode

[Consider me an eager, early, ignorant adopter :-) ]

Its however good to note that unicode is rather unique in the history
not just of IT/CS but of humanity, in the sense that no one (to the best
of my knowledge) has ever tried to come up with an all-encompassing umbrella
for all humanity's scripts/writing systems etc.

So hiccups and mistakes are only to be expected.  The absence of these would
be much more surprising!

[toc] | [prev] | [next] | [standalone]


#71428

FromMark H Harris <harrismh777@gmail.com>
Date2014-05-13 00:39 -0500
Message-ID<lksb5p$njf$1@speranza.aioe.org>
In reply to#71425
On 5/13/14 12:10 AM, Rustom Mody wrote:
> I think the most helpful way forward is to accept two things:
> a. Unicode is a headache
> b. No-unicode is a non-option

QOTW    (so far...)

[toc] | [prev] | [next] | [standalone]


#71430

FromGene Heskett <gheskett@wdtv.com>
Date2014-05-13 01:45 -0400
Message-ID<mailman.9936.1399960355.18130.python-list@python.org>
In reply to#71428
On Tuesday 13 May 2014 01:39:06 Mark H Harris did opine
And Gene did reply:
> On 5/13/14 12:10 AM, Rustom Mody wrote:
> > I think the most helpful way forward is to accept two things:
> > a. Unicode is a headache
> > b. No-unicode is a non-option
> 
> QOTW    (so far...)

But its early yet, only Tuesday & its just barely started... :)

Cheers, Gene
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page <http://geneslinuxbox.net:6309/gene>
US V Castleman, SCOTUS, Mar 2014 is grounds for Impeaching SCOTUS

[toc] | [prev] | [next] | [standalone]


#71434

FromBen Finney <ben@benfinney.id.au>
Date2014-05-13 16:03 +1000
Message-ID<mailman.9938.1399961037.18130.python-list@python.org>
In reply to#71428
Gene Heskett <gheskett@wdtv.com> writes:

> On Tuesday 13 May 2014 01:39:06 Mark H Harris did opine

> > QOTW    (so far...)
>
> But its early yet, only Tuesday & its just barely started... :)

Says who? For some of us, Tuesday is approaching sunset.

(It's always a good day to remind people that the rest of the world
exists.)

-- 
 \     “Reality must take precedence over public relations, for nature |
  `\       cannot be fooled.” —Richard P. Feynman, _Rogers' Commission |
_o__)                       Report into the Challenger Crash_, 1986-06 |
Ben Finney

[toc] | [prev] | [next] | [standalone]


#71436

FromRustom Mody <rustompmody@gmail.com>
Date2014-05-12 23:09 -0700
Message-ID<72d4f4e7-1bbd-4ceb-8e7f-d8ca18e1c1b2@googlegroups.com>
In reply to#71428
On Tuesday, May 13, 2014 11:09:06 AM UTC+5:30, Mark H. Harris wrote:
> On 5/13/14 12:10 AM, Rustom Mody wrote:
> 
> > I think the most helpful way forward is to accept two things:
> > a. Unicode is a headache
> > b. No-unicode is a non-option
> 
> 
> QOTW    (so far...)

I said that getting unicode right straight off is unrealistic.

I should have added this:
Armin makes a (sarcastic?) dig about the fact that python (3) goofs because
its mismatched with the assumptions of unix.

| UNIX is bytes, has been defined that way and will always be that way. To 

| Unicode on UNIX is only madness if you force it on everything. But that's not 
| how Unicode on UNIX works. UNIX does not have a distinction between unicode 
| and byte APIs. They are one and the same which makes them easy to deal with.]

| Python 3 takes a very difference stance on Unicode than UNIX does. Python 3 
| says: everything is Unicode ...

This may be right...
Or it may be the other way round as I claim at 
http://blog.languager.org/2014/04/unicode-and-unix-assumption.html

At this point I dont believe that anyone is very clear what is the
right way and and wrong way

[toc] | [prev] | [next] | [standalone]


#71437

FromChris Angelico <rosuav@gmail.com>
Date2014-05-13 16:18 +1000
Message-ID<mailman.9939.1399961928.18130.python-list@python.org>
In reply to#71428
On Tue, May 13, 2014 at 4:03 PM, Ben Finney <ben@benfinney.id.au> wrote:
> (It's always a good day to remind people that the rest of the world
> exists.)

Ironic that this should come up in a discussion on Unicode, given that
Unicode's fundamental purpose is to welcome that whole rest of the
world instead of yelling "LALALALALA America is everything" and
pretending that ASCII, or Latin-1, or something, is all you need.

ChrisA
Currently enjoying "Monday Night Flagging" on Threshold RPG... at 4pm
on Tuesday.

[toc] | [prev] | [next] | [standalone]


#71441

FromMark H Harris <harrismh777@gmail.com>
Date2014-05-13 01:32 -0500
Message-ID<5371BC77.4090106@gmail.com>
In reply to#71437
On 5/13/14 1:18 AM, Chris Angelico wrote:
> instead of yelling "LALALALALA America is everything" and
> pretending that ASCII, or Latin-1, or something, is all you need.
>

... it isn't?



LALALALALALALALALA   :))

[toc] | [prev] | [next] | [standalone]


#71442

FromMark H Harris <harrismh777@gmail.com>
Date2014-05-13 01:32 -0500
Message-ID<mailman.9942.1399962753.18130.python-list@python.org>
In reply to#71437
On 5/13/14 1:18 AM, Chris Angelico wrote:
> instead of yelling "LALALALALA America is everything" and
> pretending that ASCII, or Latin-1, or something, is all you need.
>

... it isn't?



LALALALALALALALALA   :))

[toc] | [prev] | [next] | [standalone]


Page 1 of 4  [1] 2 3 4  Next page →

Back to top | Article view | comp.lang.python


csiph-web