Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #70837

Re: Unicode 7

Newsgroups comp.lang.python
Date 2014-05-01 19:29 -0700
References (1 earlier) <mailman.9579.1398794381.18130.python-list@python.org> <ac9b2a50-3b5d-4ee8-8954-9f0f1ab490b6@googlegroups.com> <5361d4f9$0$11109$c3e8da3@news.astraweb.com> <82067b83-a6f5-4b16-b012-385535ea5607@googlegroups.com> <mailman.9637.1398983969.18130.python-list@python.org>
Message-ID <8c30f6fc-8493-419b-a4c8-dfe4a9d30de0@googlegroups.com> (permalink)
Subject Re: Unicode 7
From Rustom Mody <rustompmody@gmail.com>

Show all headers | View raw


On Friday, May 2, 2014 4:08:35 AM UTC+5:30, Terry Reedy wrote:
> On 5/1/2014 2:04 PM, Rustom Mody wrote:

> >>> Since its Unicode-troll time, here's my contribution
> >>> http://blog.languager.org/2014/04/unicode-and-unix-assumption.html

> I will not comment on the Unix-assumption part, but I think you go wrong 
> with this:  "Unicode is a Headache". The major headache is that unicode 
> and its very few encodings are not universally used. The headache is all 
> the non-unicode legacy encodings still being used. So you better title 
> this section 'Non-Unicode is a Headache'.

> The first sentence is this misleading tautology: "With ASCII, data is 
> ASCII whether its file, core, terminal, or network; ie "ABC" is 
> 65,66,67." Let me translate: "If all text is ASCII encoded, then text 
> data is ASCII, whether ..." But it was never the case that all text was 
> ASCII encoded. IBM used 6-bit BCDIC and then 8-bit EBCDIC and I believe 
> still uses the latter. Other mainframe makers used other encodings of 
> A-Z + 0-9 + symbols + control codes. The all-ASCII paradise was never 
> universal. You could have just as well said "With EBCDIC, data is 
> EBCDIC, whether ..."

> https://en.wikipedia.org/wiki/Ascii
> https://en.wikipedia.org/wiki/EBCDIC

> A crucial step in the spread of Ascii was its use for microcomputers, 
> including the IBM PC. The latter was considered a toy by the mainframe 
> guys. If they had known that PCs would partly take over the computing 
> world, they might have suggested or insisted that the it use EBCDIC.

> "With unicode there are:
>      encodings"
> where 'encodings' is linked to
> https://en.wikipedia.org/wiki/Character_encodings_in_HTML

> If html 'always' used utf-8 (like xml), as has become common but not 
> universal, all of the problems with *non-unicode* character sets and 
> encodings would disappear. The pre-unicode declarations could then 
> disappear. More truthful: "without unicode there are 100s of encodings 
> and with unicode only 3 that we should worry about.

> "in-memory formats"

> These are not the concern of the using programmer as long as they do not 
> introduce bugs or limitations (as do all the languages stuck on UCS-2 
> and many using UTF-16, including old Python narrow builds). Using what 
> should generally be the universal transmission format, UFT-8, as the 
> internal format means either losing indexing and slicing, having those 
> operations slow from O(1) to O(len(string)), or adding an index table 
> that is not part of the unicode standard. Using UTF-32 avoids the above 
> but usually wasted space -- up to 75%.

> "strange beasties like python's FSR"

> Have you really let yourself be poisoned by JMF's bizarre rants? The FSR 
> is an *internal optimization* that benefits most unicode operations that 
> people actually perform. It uses UTF-32 by default but adapts to the 
> strings users create by compressing the internal format. The compression 
> is trivial -- simple dropping leading null bytes common to all 
> characters -- so each character is still readable as is. The string 
> headers records how many bytes are left.  Is the idea of algorithms that 
> adapt to inputs really strange to you?

> Like good adaptive algorthms, the FSR is invisible to the user except 
> for reducing space or time or maybe both. Unicode operations are 
> otherwise the same as with previous wide builds. People who used to use 
> narrow-builds also benefit from bug elimination. The only 'headaches' 
> involved might have been those of the developers who optimized previous 
> wide builds.

> CPython has many other functions with special-case optimizations and 
> 'fast paths' for common, simple cases. For instance, (some? all?) number 
> operations are optimized for pairs of integers.  Do you call these 
> 'strange beasties'?

Here is an instance of someone who would like a certain optimization to be
dis-able-able

https://mail.python.org/pipermail/python-list/2014-February/667169.html

To the best of my knowledge its nothing to do with unicode or with jmf.

Why if optimizations are always desirable do C compilers have:
-O0 O1 O2 O3 and zillions of more specific flags?

JFTR I have no issue with FSR.  What we have to hand to jmf - willingly
or otherwise - is that many more people have heard of FSR thanks to him. [I am one of them]

I dont even know whether jmf has a real
technical (as he calls it 'mathematical') issue or its entirely political:

"Why should I pay more for a EURO sign than a $ sign?"

Well perhaps that is more related to the exchange rate than to python!

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Unicode 7 wxjmfauth@gmail.com - 2014-04-29 10:37 -0700
  Re: Unicode 7 Tim Chase <python.list@tim.thechases.com> - 2014-04-29 12:59 -0500
    Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-04-29 21:53 -0700
      Re: Unicode 7 Steven D'Aprano <steve@pearwood.info> - 2014-05-01 05:00 +0000
        Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 11:04 -0700
          Re: Unicode 7 Terry Reedy <tjreedy@udel.edu> - 2014-05-01 18:38 -0400
            Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 19:29 -0700
              Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 19:39 -0700
              Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-02 13:01 +1000
                Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 20:16 -0700
              Re: Unicode 7 Terry Reedy <tjreedy@udel.edu> - 2014-05-02 01:05 -0400
            Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-02 03:15 +0000
          Re: Unicode 7 MRAB <python@mrabarnett.plus.com> - 2014-05-02 00:33 +0100
            Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 19:02 -0700
              Re: Unicode 7 Ben Finney <ben@benfinney.id.au> - 2014-05-02 12:39 +1000
                Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 19:59 -0700
              Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-02 08:45 +0000
                Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-02 19:08 +1000
                Re: Unicode 7 Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2014-05-02 13:04 +0300
                Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 03:39 -0700
                Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-02 11:55 +0000
                Re: Unicode 7 Marko Rauhamaa <marko@pacujo.net> - 2014-05-02 15:19 +0300
                Re: Unicode 7 Ben Finney <ben@benfinney.id.au> - 2014-05-03 07:07 +1000
                Re: Unicode 7 Roy Smith <roy@panix.com> - 2014-05-02 17:13 -0400
                Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 09:03 -0700
                Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 09:50 -0700
                Re: Unicode 7 Michael Torrie <torriem@gmail.com> - 2014-05-02 11:39 -0600
                Re: Unicode 7 Ned Batchelder <ned@nedbatchelder.com> - 2014-05-02 13:46 -0400
                Re: Unicode 7 Peter Otten <__peter__@web.de> - 2014-05-02 20:07 +0200
                Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 17:58 -0700
                Re: Unicode 7 Ned Batchelder <ned@nedbatchelder.com> - 2014-05-02 21:18 -0400
                Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 18:42 -0700
                Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-03 11:54 +1000
                Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 19:02 -0700
                Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-03 11:15 +1000
                Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-03 02:02 +0000
                Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-03 02:04 +0000
                Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-03 12:17 +1000
                Re: Unicode 7 Terry Reedy <tjreedy@udel.edu> - 2014-05-02 22:19 -0400
                Re: Unicode 7 Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-05-03 12:57 -0400
                Re: Unicode 7 Tim Chase <python.list@tim.thechases.com> - 2014-05-02 07:58 -0500
              Re: Unicode 7 MRAB <python@mrabarnett.plus.com> - 2014-05-02 17:52 +0100
          Re: Unicode 7 Terry Reedy <tjreedy@udel.edu> - 2014-05-02 00:16 -0400
            Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 21:42 -0700
              Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-02 14:54 +1000
              Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-02 08:08 +0000
                Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-02 19:01 +1000
                Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-02 11:52 +0000
                Re: Unicode 7 Ben Finney <ben@benfinney.id.au> - 2014-05-02 19:16 +1000
                Re: Unicode 7 Marko Rauhamaa <marko@pacujo.net> - 2014-05-02 13:05 +0300
                Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-02 19:24 +1000
                Re: Unicode 7 MRAB <python@mrabarnett.plus.com> - 2014-05-02 18:07 +0100
  Re: Unicode 7 MRAB <python@mrabarnett.plus.com> - 2014-04-29 19:12 +0100
    Re: Unicode 7 wxjmfauth@gmail.com - 2014-04-30 00:06 -0700
      Re: Unicode 7 Tim Chase <python.list@tim.thechases.com> - 2014-04-30 13:48 -0500
        Re: Unicode 7 wxjmfauth@gmail.com - 2014-04-30 23:00 -0700

csiph-web