Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #71425

Re: Everything you did not want to know about Unicode in Python 3

Newsgroups comp.lang.python
Date 2014-05-12 22:10 -0700
References <mailman.9915.1399907977.18130.python-list@python.org> <8P7cv.78617$Sp6.8377@fx15.am4> <537172eb$0$29980$c3e8da3$5496439d@news.astraweb.com>
Message-ID <82899649-014a-4309-b06e-b981fc6921fa@googlegroups.com> (permalink)
Subject Re: Everything you did not want to know about Unicode in Python 3
From Rustom Mody <rustompmody@gmail.com>

Show all headers | View raw


On Tuesday, May 13, 2014 6:48:35 AM UTC+5:30, Steven D'Aprano wrote:
> On Mon, 12 May 2014 17:47:48 +0000, alister wrote:
> 
> > Surely those example programs are not the pythonoic way to do things or
> > am i missing something?
> 
> 
> 
> Feel free to show us your version of "cat" for Python then. Feel free to 
> target any version you like. Don't forget to test it against files with 
> names and content that:
> 
> 
> - aren't valid UTF-8;
> 
> 
> - are valid UTF-8, but not valid in the local encoding.

Thanks for a non-defensive appraisal!

> 
> 
> > if those code samples are anything to go by this guy makes JMF look
> > sensible.
> 
> 
> 
> Armin Ronacher is an extremely experienced and knowledgeable Python 
> developer, and a Python core developer. He might be wrong, but he's not 
> *obviously* wrong.
> 
> 
> 
> Unicode is hard, not because Unicode is hard, but because of legacy 
> problems. I can create a file on a machine that uses ISO-8859-7 for the 
> file name, put JShift-JIS encoded text inside it, transfer it to a 
> machine that uses Windows-1251 as the file system encoding, then SSH into 
> that machine from a system using Big5, and try to make sense of it. If 
> everybody used UTF-8 any time data touched a disk or network, we'd be 
> laughing. It would all be so simple.

I think the most helpful way forward is to accept two things:
a. Unicode is a headache
b. No-unicode is a non-option

> 
> 
> 
> Reading Armin's post, I think that all that is needed to simplify his 
> Python 3 version is:
> 
> 
> 
> - have a bytes version of sys.argv (bargv? argvb?) and read 
>   the file names from that;
> 
> - have a simple way to write bytes to stdout and stderr.
> 
> 
> Most programs won't need either of those, but file system utilities will.

About the technical merits of Armin's post and your suggestions, Ive 
nothing to say, since I am an ignoramus on (the mechanics of) unicode

[Consider me an eager, early, ignorant adopter :-) ]

Its however good to note that unicode is rather unique in the history
not just of IT/CS but of humanity, in the sense that no one (to the best
of my knowledge) has ever tried to come up with an all-encompassing umbrella
for all humanity's scripts/writing systems etc.

So hiccups and mistakes are only to be expected.  The absence of these would
be much more surprising!

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-12 16:19 +0100
  Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-12 17:47 +0000
    Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-12 12:31 -0600
    Re: Everything you did not want to know about Unicode in Python 3 MRAB <python@mrabarnett.plus.com> - 2014-05-12 20:42 +0100
    Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-12 16:16 -0600
    Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 09:42 +1000
    Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 01:18 +0000
      Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 11:39 +1000
        Re: Everything you did not want to know about Unicode in Python 3 alex23 <wuwei23@gmail.com> - 2014-05-13 16:25 +1000
          Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 16:32 +1000
      Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-12 20:58 -0500
      Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-13 03:33 +0100
      Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-12 22:10 -0700
        Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 00:39 -0500
          Re: Everything you did not want to know about Unicode in Python 3 Gene Heskett <gheskett@wdtv.com> - 2014-05-13 01:45 -0400
          Re: Everything you did not want to know about Unicode in Python 3 Ben Finney <ben@benfinney.id.au> - 2014-05-13 16:03 +1000
          Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-12 23:09 -0700
          Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 16:18 +1000
            Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 01:32 -0500
            Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 01:32 -0500
            Re: Everything you did not want to know about Unicode in Python 3 Roy Smith <roy@panix.com> - 2014-05-13 07:20 -0400
              Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 13:39 +0000
                Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 23:43 +1000
                Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-13 07:30 -0700
                Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 00:36 +1000
                Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 13:51 +0000
                Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-13 14:42 +0000
                Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 15:21 +0000
                Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 23:53 +0000
                Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 10:08 +1000
                Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-14 12:42 +0000
                Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 22:52 +1000
                Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-16 14:46 +0000
                Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 01:07 +0000
                Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-17 07:19 +0300
                Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-17 09:35 +0100
                Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 10:29 +0100
                Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 14:15 +0000
                Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 22:01 +0100
                Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 09:57 +0100
                Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 12:07 +0000
                Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 22:07 +0100
                Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-17 19:18 +1000
                Re: Everything you did not want to know about Unicode in Python 3 Ben Finney <ben@benfinney.id.au> - 2014-05-17 21:05 +1000
                [OT] Copyright statements and why they can be useful (was: Everything you did not want to know about Unicode in Python 3) Ben Finney <ben@benfinney.id.au> - 2014-05-14 11:01 +1000
                Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-14 09:07 -0600
                Re: Everything you did not want to know about Unicode in Python 3 Dave Angel <davea@davea.name> - 2014-05-13 21:56 -0400
            Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 13:49 +0000
      Re: Everything you did not want to know about Unicode in Python 3 gregor <gregor@ediwo.com> - 2014-05-13 09:27 +0200
      Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 10:08 +0200
        Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 11:25 +0300
          Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 18:38 +1000
            Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 12:06 +0300
              Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 19:29 +1000
              Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve@pearwood.info> - 2014-05-13 09:44 +0000
            Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 11:38 +0200
          Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 11:46 +0200
            Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 12:59 +0300
          Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-13 14:30 +0100
          Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 23:37 +1000
          Re: Everything you did not want to know about Unicode in Python 3 Skip Montanaro <skip@pobox.com> - 2014-05-13 09:02 -0500
        Re: Everything you did not want to know about Unicode in Python 3 wxjmfauth@gmail.com - 2014-05-14 00:00 -0700
      Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-13 11:19 +0000
        Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-13 10:08 -0600
          Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-14 00:10 +0000
            Re: Everything you did not want to know about Unicode in Python 3 Ethan Furman <ethan@stoneleaf.us> - 2014-05-13 17:53 -0700
            Re: Everything you did not want to know about Unicode in Python 3 Terry Reedy <tjreedy@udel.edu> - 2014-05-14 17:47 -0400
            Re: Everything you did not want to know about Unicode in Python 3 Antoine Pitrou <antoine@python.org> - 2014-05-16 11:50 +0000
              Re: Everything you did not want to know about Unicode in Python 3 wxjmfauth@gmail.com - 2014-05-16 06:20 -0700
          Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-14 12:38 +0000
        Re: Everything you did not want to know about Unicode in Python 3 Robin Becker <robin@reportlab.com> - 2014-05-14 16:30 +0100
        Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-14 09:56 -0600

csiph-web