Groups > comp.lang.python > #70722 > unrolled thread

Unicode 7

Started by	wxjmfauth@gmail.com
First post	2014-04-29 10:37 -0700
Last post	2014-04-30 23:00 -0700
Articles	20 on this page of 56 — 16 participants

Back to article view | Back to comp.lang.python

  Unicode 7 wxjmfauth@gmail.com - 2014-04-29 10:37 -0700
    Re: Unicode 7 Tim Chase <python.list@tim.thechases.com> - 2014-04-29 12:59 -0500
      Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-04-29 21:53 -0700
        Re: Unicode 7 Steven D'Aprano <steve@pearwood.info> - 2014-05-01 05:00 +0000
          Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 11:04 -0700
            Re: Unicode 7 Terry Reedy <tjreedy@udel.edu> - 2014-05-01 18:38 -0400
              Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 19:29 -0700
                Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 19:39 -0700
                Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-02 13:01 +1000
                  Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 20:16 -0700
                Re: Unicode 7 Terry Reedy <tjreedy@udel.edu> - 2014-05-02 01:05 -0400
              Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-02 03:15 +0000
            Re: Unicode 7 MRAB <python@mrabarnett.plus.com> - 2014-05-02 00:33 +0100
              Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 19:02 -0700
                Re: Unicode 7 Ben Finney <ben@benfinney.id.au> - 2014-05-02 12:39 +1000
                  Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 19:59 -0700
                Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-02 08:45 +0000
                  Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-02 19:08 +1000
                    Re: Unicode 7 Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2014-05-02 13:04 +0300
                  Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 03:39 -0700
                    Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-02 11:55 +0000
                      Re: Unicode 7 Marko Rauhamaa <marko@pacujo.net> - 2014-05-02 15:19 +0300
                        Re: Unicode 7 Ben Finney <ben@benfinney.id.au> - 2014-05-03 07:07 +1000
                          Re: Unicode 7 Roy Smith <roy@panix.com> - 2014-05-02 17:13 -0400
                      Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 09:03 -0700
                      Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 09:50 -0700
                        Re: Unicode 7 Michael Torrie <torriem@gmail.com> - 2014-05-02 11:39 -0600
                        Re: Unicode 7 Ned Batchelder <ned@nedbatchelder.com> - 2014-05-02 13:46 -0400
                        Re: Unicode 7 Peter Otten <__peter__@web.de> - 2014-05-02 20:07 +0200
                          Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 17:58 -0700
                            Re: Unicode 7 Ned Batchelder <ned@nedbatchelder.com> - 2014-05-02 21:18 -0400
                              Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 18:42 -0700
                                Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-03 11:54 +1000
                                  Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 19:02 -0700
                            Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-03 11:15 +1000
                            Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-03 02:02 +0000
                              Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-03 02:04 +0000
                              Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-03 12:17 +1000
                            Re: Unicode 7 Terry Reedy <tjreedy@udel.edu> - 2014-05-02 22:19 -0400
                      Re: Unicode 7 Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-05-03 12:57 -0400
                  Re: Unicode 7 Tim Chase <python.list@tim.thechases.com> - 2014-05-02 07:58 -0500
                Re: Unicode 7 MRAB <python@mrabarnett.plus.com> - 2014-05-02 17:52 +0100
            Re: Unicode 7 Terry Reedy <tjreedy@udel.edu> - 2014-05-02 00:16 -0400
              Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 21:42 -0700
                Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-02 14:54 +1000
                Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-02 08:08 +0000
                  Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-02 19:01 +1000
                    Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-02 11:52 +0000
                  Re: Unicode 7 Ben Finney <ben@benfinney.id.au> - 2014-05-02 19:16 +1000
                    Re: Unicode 7 Marko Rauhamaa <marko@pacujo.net> - 2014-05-02 13:05 +0300
                  Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-02 19:24 +1000
                  Re: Unicode 7 MRAB <python@mrabarnett.plus.com> - 2014-05-02 18:07 +0100
    Re: Unicode 7 MRAB <python@mrabarnett.plus.com> - 2014-04-29 19:12 +0100
      Re: Unicode 7 wxjmfauth@gmail.com - 2014-04-30 00:06 -0700
        Re: Unicode 7 Tim Chase <python.list@tim.thechases.com> - 2014-04-30 13:48 -0500
          Re: Unicode 7 wxjmfauth@gmail.com - 2014-04-30 23:00 -0700

Page 2 of 3 — ← Prev page 1 [2] 3 Next page →

#70864

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2014-05-02 11:55 +0000
Message-ID	<536387b8$0$29965$c3e8da3$5496439d@news.astraweb.com>
In reply to	#70862

On Fri, 02 May 2014 03:39:34 -0700, Rustom Mody wrote:

> On Friday, May 2, 2014 2:15:41 PM UTC+5:30, Steven D'Aprano wrote:
>> On Thu, 01 May 2014 19:02:48 -0700, Rustom Mody wrote:
>> > - Worst of all what we
>> > *dont* see -- how many others dont see what we see?
> 
>> Again, this a deficiency of the font. There are very few code points in
>> Unicode which are intended to be invisible, e.g. space, newline, zero-
>> width joiner, control characters, etc., but they ought to be equally
>> invisible to everyone. No printable character should ever be invisible
>> in any decent font.
> 
> Thats not what I meant.
> 
> I wrote http://blog.languager.org/2014/04/unicoded-python.html
>  – mostly on a debian box.
> Later on seeing it on a less heavily setup ubuntu box, I see
>  ⟮ ⟯ ⟬ ⟭ ⦇ ⦈ ⦉ ⦊
> have become 'missing-glyph' boxes.
> 
> It leads me ask, how much else of what I am writing, some random reader
> has simply not seen?
> Quite simply we can never know – because most are going to go away
> saying "mojibaked/garbled rubbish"
> 
> Speaking of what you understood of what I said: Yes invisible chars is
> another problem I was recently bitten by. I pasted something from google
> into emacs' org mode. Following that link again I kept getting a broken
> link.
> 
> Until I found that the link had an invisible char
> 
> The problem was that emacs was faithfully rendering that char according
> to standard, ie invisibly!

And you've never been bitten by an invisible control character in ASCII 
text? You've lived a sheltered life!

Nothing you are describing is unique to Unicode.


-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]

#70865

From	Marko Rauhamaa <marko@pacujo.net>
Date	2014-05-02 15:19 +0300
Message-ID	<878uqkjru6.fsf@elektro.pacujo.net>
In reply to	#70864

Steven D'Aprano <steve+comp.lang.python@pearwood.info>:

> And you've never been bitten by an invisible control character in
> ASCII text? You've lived a sheltered life!

That reminds me: " " (nonbreakable space) is often used between numbers
and units, for example.


Marko

[toc] | [prev] | [next] | [standalone]

#70876

From	Ben Finney <ben@benfinney.id.au>
Date	2014-05-03 07:07 +1000
Message-ID	<mailman.9659.1399064866.18130.python-list@python.org>
In reply to	#70865

Marko Rauhamaa <marko@pacujo.net> writes:

> That reminds me: " " [U+00A0 NON-BREAKING SPACE] is often used between
> numbers and units, for example.

The non-breaking space (“ ” U+00A0) is frequently used in text to keep
conceptually inseparable text such as “100 km” from automatic word
breaks <URL:https://en.wikipedia.org/wiki/Non-breaking_space>.

Because of established, conflicting conventions for separating groups of
digits (“1,234.00” in many countries; “1.234,00” in many others)
<URL:https://en.wikipedia.org/wiki/Thousands_separator#Digit_grouping>,
the “ ” U+2009 THIN SPACE <URL:https://en.wikipedia.org/wiki/Thin_Space>
is recommended for separating digit groups (e.g. “1 234 567 m”)
<URL:https://en.wikipedia.org/wiki/SI_units#General_rules>.

-- 
 \           “We spend the first twelve months of our children's lives |
  `\          teaching them to walk and talk and the next twelve years |
_o__)           telling them to sit down and shut up.” —Phyllis Diller |
Ben Finney

[toc] | [prev] | [next] | [standalone]

#70877

From	Roy Smith <roy@panix.com>
Date	2014-05-02 17:13 -0400
Message-ID	<roy-661A3F.17135502052014@news.panix.com>
In reply to	#70876

In article <mailman.9659.1399064866.18130.python-list@python.org>,
 Ben Finney <ben@benfinney.id.au> wrote:

> The non-breaking space (“ ” U+00A0) is frequently used in text to keep
> conceptually inseparable text such as “100 km” from automatic word
> breaks <URL:https://en.wikipedia.org/wiki/Non-breaking_space>.

Which, by the way, argparse doesn't honor...

http://bugs.python.org/issue16623

[toc] | [prev] | [next] | [standalone]

#70867

From	Rustom Mody <rustompmody@gmail.com>
Date	2014-05-02 09:03 -0700
Message-ID	<4e348f0a-9c31-447e-8438-9282942a08b2@googlegroups.com>
In reply to	#70864

On Friday, May 2, 2014 5:25:37 PM UTC+5:30, Steven D'Aprano wrote:
> On Fri, 02 May 2014 03:39:34 -0700, Rustom Mody wrote:

> > On Friday, May 2, 2014 2:15:41 PM UTC+5:30, Steven D'Aprano wrote:
> >> On Thu, 01 May 2014 19:02:48 -0700, Rustom Mody wrote:
> >> > - Worst of all what we
> >> > *dont* see -- how many others dont see what we see?
> >> Again, this a deficiency of the font. There are very few code points in
> >> Unicode which are intended to be invisible, e.g. space, newline, zero-
> >> width joiner, control characters, etc., but they ought to be equally
> >> invisible to everyone. No printable character should ever be invisible
> >> in any decent font.
> > Thats not what I meant.
> > I wrote http://blog.languager.org/2014/04/unicoded-python.html
> >  – mostly on a debian box.
> > Later on seeing it on a less heavily setup ubuntu box, I see
> >  ⟮ ⟯ ⟬ ⟭ ⦇ ⦈ ⦉ ⦊
> > have become 'missing-glyph' boxes.
> > It leads me ask, how much else of what I am writing, some random reader
> > has simply not seen?
> > Quite simply we can never know – because most are going to go away
> > saying "mojibaked/garbled rubbish"
> > Speaking of what you understood of what I said: Yes invisible chars is
> > another problem I was recently bitten by. I pasted something from google
> > into emacs' org mode. Following that link again I kept getting a broken
> > link.
> > Until I found that the link had an invisible char
> > The problem was that emacs was faithfully rendering that char according
> > to standard, ie invisibly!

> And you've never been bitten by an invisible control character in ASCII 
> text? You've lived a sheltered life!

For control characters Ive seen:
- garbage (the ASCII equiv of mojibake)
- Straight ^A^B^C
- Maybe their names NUL,SOH,STX,ETX,EOT,ENQ,ACK…
- Or maybe just a little dot .
- More pathological behavior: a control sequence putting the
  terminal into some other mode

But I dont ever remember seeing a control character become
invisible (except [ \t\n\f])

[toc] | [prev] | [next] | [standalone]

#70869

From	Rustom Mody <rustompmody@gmail.com>
Date	2014-05-02 09:50 -0700
Message-ID	<5e91529c-c03f-44ee-a610-5697fea167b2@googlegroups.com>
In reply to	#70864

On Friday, May 2, 2014 5:25:37 PM UTC+5:30, Steven D'Aprano wrote:
> On Fri, 02 May 2014 03:39:34 -0700, Rustom Mody wrote:

> > On Friday, May 2, 2014 2:15:41 PM UTC+5:30, Steven D'Aprano wrote:
> >> On Thu, 01 May 2014 19:02:48 -0700, Rustom Mody wrote:
> >> > - Worst of all what we
> >> > *dont* see -- how many others dont see what we see?
> >> Again, this a deficiency of the font. There are very few code points in
> >> Unicode which are intended to be invisible, e.g. space, newline, zero-
> >> width joiner, control characters, etc., but they ought to be equally
> >> invisible to everyone. No printable character should ever be invisible
> >> in any decent font.
> > Thats not what I meant.
> > I wrote http://blog.languager.org/2014/04/unicoded-python.html
> >  – mostly on a debian box.
> > Later on seeing it on a less heavily setup ubuntu box, I see
> >  ⟮ ⟯ ⟬ ⟭ ⦇ ⦈ ⦉ ⦊
> > have become 'missing-glyph' boxes.
> > It leads me ask, how much else of what I am writing, some random reader
> > has simply not seen?
> > Quite simply we can never know – because most are going to go away
> > saying "mojibaked/garbled rubbish"
> > Speaking of what you understood of what I said: Yes invisible chars is
> > another problem I was recently bitten by. I pasted something from google
> > into emacs' org mode. Following that link again I kept getting a broken
> > link.
> > Until I found that the link had an invisible char
> > The problem was that emacs was faithfully rendering that char according
> > to standard, ie invisibly!

> And you've never been bitten by an invisible control character in ASCII 
> text? You've lived a sheltered life!

> Nothing you are describing is unique to Unicode.

Just noticed a small thing in which python does a bit better than haskell:
$ ghci
let (ﬁne, fine) = (1,2)
Prelude> (ﬁne, fine)
(1,2)
Prelude> 

In case its not apparent, the fi in the first fine is a ligature.

Python just barfs:

>>> ﬁne = 1
  File "<stdin>", line 1
    ﬁne = 1
    ^
SyntaxError: invalid syntax
>>> 

The point of that example is to show that unicode gives all kind of 
"Aaah! Gotcha!!" opportunities that just dont exist in the old world.
Python may have got this one right but there are surely dozens of others.

On the other hand I see more eagerness for unicode source-text there
eg.

https://github.com/i-tu/Hasklig
http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#unicode-syntax
http://www.haskell.org/haskellwiki/Unicode-symbols
http://hackage.haskell.org/package/base-unicode-symbols

Some music 𝄞 𝄢 ♭ 𝄱 to appease the utf-8 gods

[toc] | [prev] | [next] | [standalone]

#70872

From	Michael Torrie <torriem@gmail.com>
Date	2014-05-02 11:39 -0600
Message-ID	<mailman.9656.1399052391.18130.python-list@python.org>
In reply to	#70869

On 05/02/2014 10:50 AM, Rustom Mody wrote:
> Python just barfs:
> 
>>>> ﬁne = 1
>   File "<stdin>", line 1
>     ﬁne = 1
>     ^
> SyntaxError: invalid syntax
>>>>
> 
> The point of that example is to show that unicode gives all kind of 
> "Aaah! Gotcha!!" opportunities that just dont exist in the old world.
> Python may have got this one right but there are surely dozens of others.

Except that it doesn't.  This has nothing to do with unicode handling.
It has everything to do with what defines an identifier in Python.  This
is no different than someone wondering why they can't start an
identifier in Python 1.x with a number or punctuation mark.

[toc] | [prev] | [next] | [standalone]

#70873

From	Ned Batchelder <ned@nedbatchelder.com>
Date	2014-05-02 13:46 -0400
Message-ID	<mailman.9657.1399052782.18130.python-list@python.org>
In reply to	#70869

On 5/2/14 12:50 PM, Rustom Mody wrote:
> Just noticed a small thing in which python does a bit better than haskell:
> $ ghci
> let (ﬁne, fine) = (1,2)
> Prelude> (ﬁne, fine)
> (1,2)
> Prelude>
>
> In case its not apparent, the fi in the first fine is a ligature.
>
> Python just barfs:
>
>>>> >>>ﬁne = 1
>    File "<stdin>", line 1
>      ﬁne = 1
>      ^
> SyntaxError: invalid syntax
>>>> >>>

Surely by now we could at least be explicit about which version of 
Python we are talking about?

   $ python2.7
   Python 2.7.2 (default, Oct 11 2012, 20:14:37)
   [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on 
darwin
   Type "help", "copyright", "credits" or "license" for more information.
   >>> ﬁne = 1
     File "<stdin>", line 1
       ﬁne = 1
       ^
   SyntaxError: invalid syntax
   >>> ^D
   $ python3.4
   Python 3.4.0b1 (default, Dec 16 2013, 21:05:22)
   [GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   >>> ﬁne = 1
   >>> ﬁne
   1

In Python 2 identifiers must be ASCII.  Python 3 allows many Unicode 
characters in identifiers (see PEP 3131 for details: 
http://legacy.python.org/dev/peps/pep-3131/)

-- 
Ned Batchelder, http://nedbatchelder.com

[toc] | [prev] | [next] | [standalone]

#70874

From	Peter Otten <__peter__@web.de>
Date	2014-05-02 20:07 +0200
Message-ID	<mailman.9658.1399054040.18130.python-list@python.org>
In reply to	#70869

Rustom Mody wrote:

> Just noticed a small thing in which python does a bit better than haskell:
> $ ghci
> let (ﬁne, fine) = (1,2)
> Prelude> (ﬁne, fine)
> (1,2)
> Prelude>
> 
> In case its not apparent, the fi in the first fine is a ligature.
> 
> Python just barfs:

Not Python 3:

Python 3.3.2+ (default, Feb 28 2014, 00:52:16) 
[GCC 4.8.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> (ﬁne, fine) = (1,2)
>>> (ﬁne, fine)
(2, 2)

No copy-and-paste errors involved:

>>> eval("\ufb01ne")
2
>>> eval(b"fine".decode("ascii"))
2

[toc] | [prev] | [next] | [standalone]

#70880

From	Rustom Mody <rustompmody@gmail.com>
Date	2014-05-02 17:58 -0700
Message-ID	<432508d1-984d-4c07-890b-31a7058429c6@googlegroups.com>
In reply to	#70874

On Friday, May 2, 2014 11:37:02 PM UTC+5:30, Peter Otten wrote:
> Rustom Mody wrote:

> > Just noticed a small thing in which python does a bit better than haskell:
> > $ ghci
> > let (ﬁne, fine) = (1,2)
> > Prelude> (ﬁne, fine)
> > (1,2)
> > In case its not apparent, the fi in the first fine is a ligature.
> > Python just barfs:

> Not Python 3:

> Python 3.3.2+ (default, Feb 28 2014, 00:52:16) 
> [GCC 4.8.1] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> (ﬁne, fine) = (1,2)
> >>> (ﬁne, fine)
> (2, 2)

> No copy-and-paste errors involved:

> >>> eval("\ufb01ne")
> 2
> >>> eval(b"fine".decode("ascii"))
> 2

Aah! Thanks Peter (and Ned and Michael) — 2-3 confusion — my bad.

I am confused about the tone however:
You think this

>>> (ﬁne, fine) = (1,2) # and no issue about it

is fine?

[toc] | [prev] | [next] | [standalone]

#70881

From	Ned Batchelder <ned@nedbatchelder.com>
Date	2014-05-02 21:18 -0400
Message-ID	<mailman.9661.1399079916.18130.python-list@python.org>
In reply to	#70880

On 5/2/14 8:58 PM, Rustom Mody wrote:
> On Friday, May 2, 2014 11:37:02 PM UTC+5:30, Peter Otten wrote:
>> Rustom Mody wrote:
>
>>> Just noticed a small thing in which python does a bit better than haskell:
>>> $ ghci
>>> let (ﬁne, fine) = (1,2)
>>> Prelude> (ﬁne, fine)
>>> (1,2)
>>> In case its not apparent, the fi in the first fine is a ligature.
>>> Python just barfs:
>
>> Not Python 3:
>
>> Python 3.3.2+ (default, Feb 28 2014, 00:52:16)
>> [GCC 4.8.1] on linux
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> (ﬁne, fine) = (1,2)
>>>>> (ﬁne, fine)
>> (2, 2)
>
>> No copy-and-paste errors involved:
>
>>>>> eval("\ufb01ne")
>> 2
>>>>> eval(b"fine".decode("ascii"))
>> 2
>
> Aah! Thanks Peter (and Ned and Michael) — 2-3 confusion — my bad.
>
> I am confused about the tone however:
> You think this
>
>>>> (ﬁne, fine) = (1,2) # and no issue about it
>
> is fine?
>
>

Can you be more explicit?  It seems like you think it isn't fine.  Why 
not?  What bothers you about it?  Should there be an issue?

-- 
Ned Batchelder, http://nedbatchelder.com

[toc] | [prev] | [next] | [standalone]

#70883

From	Rustom Mody <rustompmody@gmail.com>
Date	2014-05-02 18:42 -0700
Message-ID	<eff49eb5-f631-4adf-b5cd-8c31759669db@googlegroups.com>
In reply to	#70881

On Saturday, May 3, 2014 6:48:21 AM UTC+5:30, Ned Batchelder wrote:
> On 5/2/14 8:58 PM, Rustom Mody wrote:
> > On Friday, May 2, 2014 11:37:02 PM UTC+5:30, Peter Otten wrote:
> >> Rustom Mody wrote:
> >>> Just noticed a small thing in which python does a bit better than haskell:
> >>> $ ghci
> >>> let (ﬁne, fine) = (1,2)
> >>> Prelude> (ﬁne, fine)
> >>> (1,2)
> >>> In case its not apparent, the fi in the first fine is a ligature.
> >>> Python just barfs:
> >> Not Python 3:
> >> Python 3.3.2+ (default, Feb 28 2014, 00:52:16)
> >> [GCC 4.8.1] on linux
> >> Type "help", "copyright", "credits" or "license" for more information.
> >>>>> (ﬁne, fine) = (1,2)
> >>>>> (ﬁne, fine)
> >> (2, 2)
> >> No copy-and-paste errors involved:
> >>>>> eval("\ufb01ne")
> >> 2
> >>>>> eval(b"fine".decode("ascii"))
> >> 2
> > Aah! Thanks Peter (and Ned and Michael) — 2-3 confusion — my bad.
> > I am confused about the tone however:
> > You think this
> >>>> (ﬁne, fine) = (1,2) # and no issue about it
> > is fine?

> Can you be more explicit?  It seems like you think it isn't fine.  Why 
> not?  What bothers you about it?  Should there be an issue?

Two identifiers that to some programmers
- can look the same
- and not to others
- and that the language treats as different

is not fine (or ﬁne) to me.

Putting them together as I did is summarizing the problem.

Think of them textually widely separated.
And the code (un)serendipitously 'working' (ie not giving NameErrors)

[toc] | [prev] | [next] | [standalone]

#70884

From	Chris Angelico <rosuav@gmail.com>
Date	2014-05-03 11:54 +1000
Message-ID	<mailman.9662.1399082056.18130.python-list@python.org>
In reply to	#70883

On Sat, May 3, 2014 at 11:42 AM, Rustom Mody <rustompmody@gmail.com> wrote:
> Two identifiers that to some programmers
> - can look the same
> - and not to others
> - and that the language treats as different
>
> is not fine (or ﬁne) to me.

The language treats them as the same, though.

ChrisA

[toc] | [prev] | [next] | [standalone]

#70885

From	Rustom Mody <rustompmody@gmail.com>
Date	2014-05-02 19:02 -0700
Message-ID	<4c2e056d-eeda-4e2c-9eaa-c411212628d0@googlegroups.com>
In reply to	#70884

On Saturday, May 3, 2014 7:24:08 AM UTC+5:30, Chris Angelico wrote:
> On Sat, May 3, 2014 at 11:42 AM, Rustom Mody wrote:
> > Two identifiers that to some programmers
> > - can look the same
> > - and not to others
> > - and that the language treats as different
> > is not fine (or ﬁne) to me.

> The language treats them as the same, though.

Whoops! I seem to be goofing a lot today

Saw Peter's

>>> (ﬁne, fine) = (1,2) 

Didn't notice his next line
>>> (ﬁne, fine)
(2, 2) 

So then I am back to my original point:

Python is giving better behavior than Haskell in this regard!

[Earlier reached this conclusion via a wrong path]

[toc] | [prev] | [next] | [standalone]

#70882

From	Chris Angelico <rosuav@gmail.com>
Date	2014-05-03 11:15 +1000
Message-ID	<mailman.9660.1399079758.18130.python-list@python.org>
In reply to	#70880

On Sat, May 3, 2014 at 10:58 AM, Rustom Mody <rustompmody@gmail.com> wrote:
> You think this
>
>>>> (ﬁne, fine) = (1,2) # and no issue about it
>
> is fine?

Not sure which part you're objecting to. Are you saying that this
should be an error:

>>> a, a = 1, 2 # simple ASCII identifier used twice

or that Python should take the exact sequence of codepoints, rather
than normalizing?

Python 3.5.0a0 (default:6a0def54c63d, Mar 26 2014, 01:11:09)
[GCC 4.7.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> ﬁne = 1
>>> vars()
{'__package__': None, '__spec__': None, '__doc__': None, 'fine': 1,
'__loader__': <class '_frozen_importlib.BuiltinImporter'>,
'__builtins__': <module 'builtins' (built-in)>, '__name__':
'__main__'}

As regards normalization, I would be happy with either "keep it
exactly as you provided" or "normalize according to <insert Unicode
standard normalization here>", as long as it's consistent. It's like
what happens with SQL identifiers: according to the standard, an
unquoted name should be uppercased, but some databases instead
lowercase them. It doesn't break code (modulo quoted names, not
applicable here), as long as it's consistent.

(My reading of PEP 3131 is that NFKC is used; is that what's
implemented, or was that a temporary measure and/or something for Py2
to consider?)

ChrisA

[toc] | [prev] | [next] | [standalone]

#70886

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2014-05-03 02:02 +0000
Message-ID	<53644e38$0$29965$c3e8da3$5496439d@news.astraweb.com>
In reply to	#70880

On Fri, 02 May 2014 17:58:51 -0700, Rustom Mody wrote:

> I am confused about the tone however: You think this
> 
>>>> (ﬁne, fine) = (1,2) # and no issue about it
> 
> is fine?

It's no worse than any other obfuscated variable name:

MOOSE, MO0SE, M0OSE = 1, 2, 3
xl, x1 = 1, 2

If you know your victim is reading source code in Ariel font, "rn" and 
"m" are virtually indistinguishable except at very large sizes.

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]

#70887

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2014-05-03 02:04 +0000
Message-ID	<53644ebf$0$29965$c3e8da3$5496439d@news.astraweb.com>
In reply to	#70886

On Sat, 03 May 2014 02:02:32 +0000, Steven D'Aprano wrote:

> On Fri, 02 May 2014 17:58:51 -0700, Rustom Mody wrote:
> 
>> I am confused about the tone however: You think this
>> 
>>>>> (ﬁne, fine) = (1,2) # and no issue about it
>> 
>> is fine?
> 
> 
> It's no worse than any other obfuscated variable name:
> 
> MOOSE, MO0SE, M0OSE = 1, 2, 3
> xl, x1 = 1, 2
> 
> If you know your victim is reading source code in Ariel font, "rn" and
> "m" are virtually indistinguishable except at very large sizes.


Ooops! I too missed that Python normalises the name ﬁne to fine, so in 
fact this is not a case of obfuscation. 



-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]

#70888

From	Chris Angelico <rosuav@gmail.com>
Date	2014-05-03 12:17 +1000
Message-ID	<mailman.9663.1399083429.18130.python-list@python.org>
In reply to	#70886

On Sat, May 3, 2014 at 12:02 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> If you know your victim is reading source code in Ariel font, "rn" and
> "m" are virtually indistinguishable except at very large sizes.

I kinda like the idea of naming it after a bratty teenager who rebels
against her father and runs away from home, but normally the font's
called Arial. :)

ChrisA

[toc] | [prev] | [next] | [standalone]

#70889

From	Terry Reedy <tjreedy@udel.edu>
Date	2014-05-02 22:19 -0400
Message-ID	<mailman.9664.1399083661.18130.python-list@python.org>
In reply to	#70880

On 5/2/2014 9:15 PM, Chris Angelico wrote:

> (My reading of PEP 3131 is that NFKC is used; is that what's
> implemented, or was that a temporary measure and/or something for Py2
> to consider?)

The 3.4 docs say "The syntax of identifiers in Python is based on the 
Unicode standard annex UAX-31, with elaboration and changes as defined 
below; see also PEP 3131 for further details."
...
"All identifiers are converted into the normal form NFKC while parsing; 
comparison of identifiers is based on NFKC."

Without reading UAX-31, I don't know how much was changed, but I suspect 
not much. In any case, the current rules are intended and very unlikely 
to change as that would break code going either forward or back for 
little purpose.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#70897

From	Dennis Lee Bieber <wlfraed@ix.netcom.com>
Date	2014-05-03 12:57 -0400
Message-ID	<mailman.9666.1399136294.18130.python-list@python.org>
In reply to	#70864

On 02 May 2014 11:55:37 GMT, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> declaimed the following:

>And you've never been bitten by an invisible control character in ASCII 
>text? You've lived a sheltered life!
>
	Xerox Sigma CP/V would even permit them in file names (though the
system was EBCDIC, not ASCII -- just feeding lots of ASCII terminals).

	Think of the pain someone would have trying to figure out where in a 32
character file name the <BEL> was positioned. Even on a 1200bps serial
line, one couldn't really determine between which printable characters the
terminal beeped while listing the directory.

-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]

Page 2 of 3 — ← Prev page 1 [2] 3 Next page →

csiph-web

Unicode 7

Contents

#70864

#70865

#70876

#70877

#70867

#70869

#70872

#70873

#70874

#70880

#70881

#70883

#70884

#70885

#70882

#70886

#70887

#70888

#70889

#70897