Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #70722 > unrolled thread
| Started by | wxjmfauth@gmail.com |
|---|---|
| First post | 2014-04-29 10:37 -0700 |
| Last post | 2014-04-30 23:00 -0700 |
| Articles | 20 on this page of 56 — 16 participants |
Back to article view | Back to comp.lang.python
Unicode 7 wxjmfauth@gmail.com - 2014-04-29 10:37 -0700
Re: Unicode 7 Tim Chase <python.list@tim.thechases.com> - 2014-04-29 12:59 -0500
Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-04-29 21:53 -0700
Re: Unicode 7 Steven D'Aprano <steve@pearwood.info> - 2014-05-01 05:00 +0000
Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 11:04 -0700
Re: Unicode 7 Terry Reedy <tjreedy@udel.edu> - 2014-05-01 18:38 -0400
Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 19:29 -0700
Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 19:39 -0700
Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-02 13:01 +1000
Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 20:16 -0700
Re: Unicode 7 Terry Reedy <tjreedy@udel.edu> - 2014-05-02 01:05 -0400
Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-02 03:15 +0000
Re: Unicode 7 MRAB <python@mrabarnett.plus.com> - 2014-05-02 00:33 +0100
Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 19:02 -0700
Re: Unicode 7 Ben Finney <ben@benfinney.id.au> - 2014-05-02 12:39 +1000
Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 19:59 -0700
Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-02 08:45 +0000
Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-02 19:08 +1000
Re: Unicode 7 Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2014-05-02 13:04 +0300
Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 03:39 -0700
Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-02 11:55 +0000
Re: Unicode 7 Marko Rauhamaa <marko@pacujo.net> - 2014-05-02 15:19 +0300
Re: Unicode 7 Ben Finney <ben@benfinney.id.au> - 2014-05-03 07:07 +1000
Re: Unicode 7 Roy Smith <roy@panix.com> - 2014-05-02 17:13 -0400
Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 09:03 -0700
Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 09:50 -0700
Re: Unicode 7 Michael Torrie <torriem@gmail.com> - 2014-05-02 11:39 -0600
Re: Unicode 7 Ned Batchelder <ned@nedbatchelder.com> - 2014-05-02 13:46 -0400
Re: Unicode 7 Peter Otten <__peter__@web.de> - 2014-05-02 20:07 +0200
Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 17:58 -0700
Re: Unicode 7 Ned Batchelder <ned@nedbatchelder.com> - 2014-05-02 21:18 -0400
Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 18:42 -0700
Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-03 11:54 +1000
Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-02 19:02 -0700
Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-03 11:15 +1000
Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-03 02:02 +0000
Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-03 02:04 +0000
Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-03 12:17 +1000
Re: Unicode 7 Terry Reedy <tjreedy@udel.edu> - 2014-05-02 22:19 -0400
Re: Unicode 7 Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-05-03 12:57 -0400
Re: Unicode 7 Tim Chase <python.list@tim.thechases.com> - 2014-05-02 07:58 -0500
Re: Unicode 7 MRAB <python@mrabarnett.plus.com> - 2014-05-02 17:52 +0100
Re: Unicode 7 Terry Reedy <tjreedy@udel.edu> - 2014-05-02 00:16 -0400
Re: Unicode 7 Rustom Mody <rustompmody@gmail.com> - 2014-05-01 21:42 -0700
Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-02 14:54 +1000
Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-02 08:08 +0000
Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-02 19:01 +1000
Re: Unicode 7 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-02 11:52 +0000
Re: Unicode 7 Ben Finney <ben@benfinney.id.au> - 2014-05-02 19:16 +1000
Re: Unicode 7 Marko Rauhamaa <marko@pacujo.net> - 2014-05-02 13:05 +0300
Re: Unicode 7 Chris Angelico <rosuav@gmail.com> - 2014-05-02 19:24 +1000
Re: Unicode 7 MRAB <python@mrabarnett.plus.com> - 2014-05-02 18:07 +0100
Re: Unicode 7 MRAB <python@mrabarnett.plus.com> - 2014-04-29 19:12 +0100
Re: Unicode 7 wxjmfauth@gmail.com - 2014-04-30 00:06 -0700
Re: Unicode 7 Tim Chase <python.list@tim.thechases.com> - 2014-04-30 13:48 -0500
Re: Unicode 7 wxjmfauth@gmail.com - 2014-04-30 23:00 -0700
Page 2 of 3 — ← Prev page 1 [2] 3 Next page →
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-05-02 11:55 +0000 |
| Message-ID | <536387b8$0$29965$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #70862 |
On Fri, 02 May 2014 03:39:34 -0700, Rustom Mody wrote: > On Friday, May 2, 2014 2:15:41 PM UTC+5:30, Steven D'Aprano wrote: >> On Thu, 01 May 2014 19:02:48 -0700, Rustom Mody wrote: >> > - Worst of all what we >> > *dont* see -- how many others dont see what we see? > >> Again, this a deficiency of the font. There are very few code points in >> Unicode which are intended to be invisible, e.g. space, newline, zero- >> width joiner, control characters, etc., but they ought to be equally >> invisible to everyone. No printable character should ever be invisible >> in any decent font. > > Thats not what I meant. > > I wrote http://blog.languager.org/2014/04/unicoded-python.html > – mostly on a debian box. > Later on seeing it on a less heavily setup ubuntu box, I see > ⟮ ⟯ ⟬ ⟭ ⦇ ⦈ ⦉ ⦊ > have become 'missing-glyph' boxes. > > It leads me ask, how much else of what I am writing, some random reader > has simply not seen? > Quite simply we can never know – because most are going to go away > saying "mojibaked/garbled rubbish" > > Speaking of what you understood of what I said: Yes invisible chars is > another problem I was recently bitten by. I pasted something from google > into emacs' org mode. Following that link again I kept getting a broken > link. > > Until I found that the link had an invisible char > > The problem was that emacs was faithfully rendering that char according > to standard, ie invisibly! And you've never been bitten by an invisible control character in ASCII text? You've lived a sheltered life! Nothing you are describing is unique to Unicode. -- Steven D'Aprano http://import-that.dreamwidth.org/
[toc] | [prev] | [next] | [standalone]
| From | Marko Rauhamaa <marko@pacujo.net> |
|---|---|
| Date | 2014-05-02 15:19 +0300 |
| Message-ID | <878uqkjru6.fsf@elektro.pacujo.net> |
| In reply to | #70864 |
Steven D'Aprano <steve+comp.lang.python@pearwood.info>: > And you've never been bitten by an invisible control character in > ASCII text? You've lived a sheltered life! That reminds me: " " (nonbreakable space) is often used between numbers and units, for example. Marko
[toc] | [prev] | [next] | [standalone]
| From | Ben Finney <ben@benfinney.id.au> |
|---|---|
| Date | 2014-05-03 07:07 +1000 |
| Message-ID | <mailman.9659.1399064866.18130.python-list@python.org> |
| In reply to | #70865 |
Marko Rauhamaa <marko@pacujo.net> writes: > That reminds me: " " [U+00A0 NON-BREAKING SPACE] is often used between > numbers and units, for example. The non-breaking space (“ ” U+00A0) is frequently used in text to keep conceptually inseparable text such as “100 km” from automatic word breaks <URL:https://en.wikipedia.org/wiki/Non-breaking_space>. Because of established, conflicting conventions for separating groups of digits (“1,234.00” in many countries; “1.234,00” in many others) <URL:https://en.wikipedia.org/wiki/Thousands_separator#Digit_grouping>, the “ ” U+2009 THIN SPACE <URL:https://en.wikipedia.org/wiki/Thin_Space> is recommended for separating digit groups (e.g. “1 234 567 m”) <URL:https://en.wikipedia.org/wiki/SI_units#General_rules>. -- \ “We spend the first twelve months of our children's lives | `\ teaching them to walk and talk and the next twelve years | _o__) telling them to sit down and shut up.” —Phyllis Diller | Ben Finney
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2014-05-02 17:13 -0400 |
| Message-ID | <roy-661A3F.17135502052014@news.panix.com> |
| In reply to | #70876 |
In article <mailman.9659.1399064866.18130.python-list@python.org>, Ben Finney <ben@benfinney.id.au> wrote: > The non-breaking space (“ ” U+00A0) is frequently used in text to keep > conceptually inseparable text such as “100 km” from automatic word > breaks <URL:https://en.wikipedia.org/wiki/Non-breaking_space>. Which, by the way, argparse doesn't honor... http://bugs.python.org/issue16623
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2014-05-02 09:03 -0700 |
| Message-ID | <4e348f0a-9c31-447e-8438-9282942a08b2@googlegroups.com> |
| In reply to | #70864 |
On Friday, May 2, 2014 5:25:37 PM UTC+5:30, Steven D'Aprano wrote: > On Fri, 02 May 2014 03:39:34 -0700, Rustom Mody wrote: > > On Friday, May 2, 2014 2:15:41 PM UTC+5:30, Steven D'Aprano wrote: > >> On Thu, 01 May 2014 19:02:48 -0700, Rustom Mody wrote: > >> > - Worst of all what we > >> > *dont* see -- how many others dont see what we see? > >> Again, this a deficiency of the font. There are very few code points in > >> Unicode which are intended to be invisible, e.g. space, newline, zero- > >> width joiner, control characters, etc., but they ought to be equally > >> invisible to everyone. No printable character should ever be invisible > >> in any decent font. > > Thats not what I meant. > > I wrote http://blog.languager.org/2014/04/unicoded-python.html > > – mostly on a debian box. > > Later on seeing it on a less heavily setup ubuntu box, I see > > ⟮ ⟯ ⟬ ⟭ ⦇ ⦈ ⦉ ⦊ > > have become 'missing-glyph' boxes. > > It leads me ask, how much else of what I am writing, some random reader > > has simply not seen? > > Quite simply we can never know – because most are going to go away > > saying "mojibaked/garbled rubbish" > > Speaking of what you understood of what I said: Yes invisible chars is > > another problem I was recently bitten by. I pasted something from google > > into emacs' org mode. Following that link again I kept getting a broken > > link. > > Until I found that the link had an invisible char > > The problem was that emacs was faithfully rendering that char according > > to standard, ie invisibly! > And you've never been bitten by an invisible control character in ASCII > text? You've lived a sheltered life! For control characters Ive seen: - garbage (the ASCII equiv of mojibake) - Straight ^A^B^C - Maybe their names NUL,SOH,STX,ETX,EOT,ENQ,ACK… - Or maybe just a little dot . - More pathological behavior: a control sequence putting the terminal into some other mode But I dont ever remember seeing a control character become invisible (except [ \t\n\f])
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2014-05-02 09:50 -0700 |
| Message-ID | <5e91529c-c03f-44ee-a610-5697fea167b2@googlegroups.com> |
| In reply to | #70864 |
On Friday, May 2, 2014 5:25:37 PM UTC+5:30, Steven D'Aprano wrote:
> On Fri, 02 May 2014 03:39:34 -0700, Rustom Mody wrote:
> > On Friday, May 2, 2014 2:15:41 PM UTC+5:30, Steven D'Aprano wrote:
> >> On Thu, 01 May 2014 19:02:48 -0700, Rustom Mody wrote:
> >> > - Worst of all what we
> >> > *dont* see -- how many others dont see what we see?
> >> Again, this a deficiency of the font. There are very few code points in
> >> Unicode which are intended to be invisible, e.g. space, newline, zero-
> >> width joiner, control characters, etc., but they ought to be equally
> >> invisible to everyone. No printable character should ever be invisible
> >> in any decent font.
> > Thats not what I meant.
> > I wrote http://blog.languager.org/2014/04/unicoded-python.html
> > – mostly on a debian box.
> > Later on seeing it on a less heavily setup ubuntu box, I see
> > ⟮ ⟯ ⟬ ⟭ ⦇ ⦈ ⦉ ⦊
> > have become 'missing-glyph' boxes.
> > It leads me ask, how much else of what I am writing, some random reader
> > has simply not seen?
> > Quite simply we can never know – because most are going to go away
> > saying "mojibaked/garbled rubbish"
> > Speaking of what you understood of what I said: Yes invisible chars is
> > another problem I was recently bitten by. I pasted something from google
> > into emacs' org mode. Following that link again I kept getting a broken
> > link.
> > Until I found that the link had an invisible char
> > The problem was that emacs was faithfully rendering that char according
> > to standard, ie invisibly!
> And you've never been bitten by an invisible control character in ASCII
> text? You've lived a sheltered life!
> Nothing you are describing is unique to Unicode.
Just noticed a small thing in which python does a bit better than haskell:
$ ghci
let (fine, fine) = (1,2)
Prelude> (fine, fine)
(1,2)
Prelude>
In case its not apparent, the fi in the first fine is a ligature.
Python just barfs:
>>> fine = 1
File "<stdin>", line 1
fine = 1
^
SyntaxError: invalid syntax
>>>
The point of that example is to show that unicode gives all kind of
"Aaah! Gotcha!!" opportunities that just dont exist in the old world.
Python may have got this one right but there are surely dozens of others.
On the other hand I see more eagerness for unicode source-text there
eg.
https://github.com/i-tu/Hasklig
http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#unicode-syntax
http://www.haskell.org/haskellwiki/Unicode-symbols
http://hackage.haskell.org/package/base-unicode-symbols
Some music 𝄞 𝄢 ♭ 𝄱 to appease the utf-8 gods
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2014-05-02 11:39 -0600 |
| Message-ID | <mailman.9656.1399052391.18130.python-list@python.org> |
| In reply to | #70869 |
On 05/02/2014 10:50 AM, Rustom Mody wrote: > Python just barfs: > >>>> fine = 1 > File "<stdin>", line 1 > fine = 1 > ^ > SyntaxError: invalid syntax >>>> > > The point of that example is to show that unicode gives all kind of > "Aaah! Gotcha!!" opportunities that just dont exist in the old world. > Python may have got this one right but there are surely dozens of others. Except that it doesn't. This has nothing to do with unicode handling. It has everything to do with what defines an identifier in Python. This is no different than someone wondering why they can't start an identifier in Python 1.x with a number or punctuation mark.
[toc] | [prev] | [next] | [standalone]
| From | Ned Batchelder <ned@nedbatchelder.com> |
|---|---|
| Date | 2014-05-02 13:46 -0400 |
| Message-ID | <mailman.9657.1399052782.18130.python-list@python.org> |
| In reply to | #70869 |
On 5/2/14 12:50 PM, Rustom Mody wrote:
> Just noticed a small thing in which python does a bit better than haskell:
> $ ghci
> let (fine, fine) = (1,2)
> Prelude> (fine, fine)
> (1,2)
> Prelude>
>
> In case its not apparent, the fi in the first fine is a ligature.
>
> Python just barfs:
>
>>>> >>>fine = 1
> File "<stdin>", line 1
> fine = 1
> ^
> SyntaxError: invalid syntax
>>>> >>>
Surely by now we could at least be explicit about which version of
Python we are talking about?
$ python2.7
Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on
darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> fine = 1
File "<stdin>", line 1
fine = 1
^
SyntaxError: invalid syntax
>>> ^D
$ python3.4
Python 3.4.0b1 (default, Dec 16 2013, 21:05:22)
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> fine = 1
>>> fine
1
In Python 2 identifiers must be ASCII. Python 3 allows many Unicode
characters in identifiers (see PEP 3131 for details:
http://legacy.python.org/dev/peps/pep-3131/)
--
Ned Batchelder, http://nedbatchelder.com
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2014-05-02 20:07 +0200 |
| Message-ID | <mailman.9658.1399054040.18130.python-list@python.org> |
| In reply to | #70869 |
Rustom Mody wrote:
> Just noticed a small thing in which python does a bit better than haskell:
> $ ghci
> let (fine, fine) = (1,2)
> Prelude> (fine, fine)
> (1,2)
> Prelude>
>
> In case its not apparent, the fi in the first fine is a ligature.
>
> Python just barfs:
Not Python 3:
Python 3.3.2+ (default, Feb 28 2014, 00:52:16)
[GCC 4.8.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> (fine, fine) = (1,2)
>>> (fine, fine)
(2, 2)
No copy-and-paste errors involved:
>>> eval("\ufb01ne")
2
>>> eval(b"fine".decode("ascii"))
2
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2014-05-02 17:58 -0700 |
| Message-ID | <432508d1-984d-4c07-890b-31a7058429c6@googlegroups.com> |
| In reply to | #70874 |
On Friday, May 2, 2014 11:37:02 PM UTC+5:30, Peter Otten wrote:
> Rustom Mody wrote:
> > Just noticed a small thing in which python does a bit better than haskell:
> > $ ghci
> > let (fine, fine) = (1,2)
> > Prelude> (fine, fine)
> > (1,2)
> > In case its not apparent, the fi in the first fine is a ligature.
> > Python just barfs:
> Not Python 3:
> Python 3.3.2+ (default, Feb 28 2014, 00:52:16)
> [GCC 4.8.1] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> (fine, fine) = (1,2)
> >>> (fine, fine)
> (2, 2)
> No copy-and-paste errors involved:
> >>> eval("\ufb01ne")
> 2
> >>> eval(b"fine".decode("ascii"))
> 2
Aah! Thanks Peter (and Ned and Michael) — 2-3 confusion — my bad.
I am confused about the tone however:
You think this
>>> (fine, fine) = (1,2) # and no issue about it
is fine?
[toc] | [prev] | [next] | [standalone]
| From | Ned Batchelder <ned@nedbatchelder.com> |
|---|---|
| Date | 2014-05-02 21:18 -0400 |
| Message-ID | <mailman.9661.1399079916.18130.python-list@python.org> |
| In reply to | #70880 |
On 5/2/14 8:58 PM, Rustom Mody wrote:
> On Friday, May 2, 2014 11:37:02 PM UTC+5:30, Peter Otten wrote:
>> Rustom Mody wrote:
>
>>> Just noticed a small thing in which python does a bit better than haskell:
>>> $ ghci
>>> let (fine, fine) = (1,2)
>>> Prelude> (fine, fine)
>>> (1,2)
>>> In case its not apparent, the fi in the first fine is a ligature.
>>> Python just barfs:
>
>> Not Python 3:
>
>> Python 3.3.2+ (default, Feb 28 2014, 00:52:16)
>> [GCC 4.8.1] on linux
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> (fine, fine) = (1,2)
>>>>> (fine, fine)
>> (2, 2)
>
>> No copy-and-paste errors involved:
>
>>>>> eval("\ufb01ne")
>> 2
>>>>> eval(b"fine".decode("ascii"))
>> 2
>
> Aah! Thanks Peter (and Ned and Michael) — 2-3 confusion — my bad.
>
> I am confused about the tone however:
> You think this
>
>>>> (fine, fine) = (1,2) # and no issue about it
>
> is fine?
>
>
Can you be more explicit? It seems like you think it isn't fine. Why
not? What bothers you about it? Should there be an issue?
--
Ned Batchelder, http://nedbatchelder.com
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2014-05-02 18:42 -0700 |
| Message-ID | <eff49eb5-f631-4adf-b5cd-8c31759669db@googlegroups.com> |
| In reply to | #70881 |
On Saturday, May 3, 2014 6:48:21 AM UTC+5:30, Ned Batchelder wrote:
> On 5/2/14 8:58 PM, Rustom Mody wrote:
> > On Friday, May 2, 2014 11:37:02 PM UTC+5:30, Peter Otten wrote:
> >> Rustom Mody wrote:
> >>> Just noticed a small thing in which python does a bit better than haskell:
> >>> $ ghci
> >>> let (fine, fine) = (1,2)
> >>> Prelude> (fine, fine)
> >>> (1,2)
> >>> In case its not apparent, the fi in the first fine is a ligature.
> >>> Python just barfs:
> >> Not Python 3:
> >> Python 3.3.2+ (default, Feb 28 2014, 00:52:16)
> >> [GCC 4.8.1] on linux
> >> Type "help", "copyright", "credits" or "license" for more information.
> >>>>> (fine, fine) = (1,2)
> >>>>> (fine, fine)
> >> (2, 2)
> >> No copy-and-paste errors involved:
> >>>>> eval("\ufb01ne")
> >> 2
> >>>>> eval(b"fine".decode("ascii"))
> >> 2
> > Aah! Thanks Peter (and Ned and Michael) — 2-3 confusion — my bad.
> > I am confused about the tone however:
> > You think this
> >>>> (fine, fine) = (1,2) # and no issue about it
> > is fine?
> Can you be more explicit? It seems like you think it isn't fine. Why
> not? What bothers you about it? Should there be an issue?
Two identifiers that to some programmers
- can look the same
- and not to others
- and that the language treats as different
is not fine (or fine) to me.
Putting them together as I did is summarizing the problem.
Think of them textually widely separated.
And the code (un)serendipitously 'working' (ie not giving NameErrors)
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-05-03 11:54 +1000 |
| Message-ID | <mailman.9662.1399082056.18130.python-list@python.org> |
| In reply to | #70883 |
On Sat, May 3, 2014 at 11:42 AM, Rustom Mody <rustompmody@gmail.com> wrote: > Two identifiers that to some programmers > - can look the same > - and not to others > - and that the language treats as different > > is not fine (or fine) to me. The language treats them as the same, though. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2014-05-02 19:02 -0700 |
| Message-ID | <4c2e056d-eeda-4e2c-9eaa-c411212628d0@googlegroups.com> |
| In reply to | #70884 |
On Saturday, May 3, 2014 7:24:08 AM UTC+5:30, Chris Angelico wrote: > On Sat, May 3, 2014 at 11:42 AM, Rustom Mody wrote: > > Two identifiers that to some programmers > > - can look the same > > - and not to others > > - and that the language treats as different > > is not fine (or fine) to me. > The language treats them as the same, though. Whoops! I seem to be goofing a lot today Saw Peter's >>> (fine, fine) = (1,2) Didn't notice his next line >>> (fine, fine) (2, 2) So then I am back to my original point: Python is giving better behavior than Haskell in this regard! [Earlier reached this conclusion via a wrong path]
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-05-03 11:15 +1000 |
| Message-ID | <mailman.9660.1399079758.18130.python-list@python.org> |
| In reply to | #70880 |
On Sat, May 3, 2014 at 10:58 AM, Rustom Mody <rustompmody@gmail.com> wrote:
> You think this
>
>>>> (fine, fine) = (1,2) # and no issue about it
>
> is fine?
Not sure which part you're objecting to. Are you saying that this
should be an error:
>>> a, a = 1, 2 # simple ASCII identifier used twice
or that Python should take the exact sequence of codepoints, rather
than normalizing?
Python 3.5.0a0 (default:6a0def54c63d, Mar 26 2014, 01:11:09)
[GCC 4.7.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> fine = 1
>>> vars()
{'__package__': None, '__spec__': None, '__doc__': None, 'fine': 1,
'__loader__': <class '_frozen_importlib.BuiltinImporter'>,
'__builtins__': <module 'builtins' (built-in)>, '__name__':
'__main__'}
As regards normalization, I would be happy with either "keep it
exactly as you provided" or "normalize according to <insert Unicode
standard normalization here>", as long as it's consistent. It's like
what happens with SQL identifiers: according to the standard, an
unquoted name should be uppercased, but some databases instead
lowercase them. It doesn't break code (modulo quoted names, not
applicable here), as long as it's consistent.
(My reading of PEP 3131 is that NFKC is used; is that what's
implemented, or was that a temporary measure and/or something for Py2
to consider?)
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-05-03 02:02 +0000 |
| Message-ID | <53644e38$0$29965$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #70880 |
On Fri, 02 May 2014 17:58:51 -0700, Rustom Mody wrote: > I am confused about the tone however: You think this > >>>> (fine, fine) = (1,2) # and no issue about it > > is fine? It's no worse than any other obfuscated variable name: MOOSE, MO0SE, M0OSE = 1, 2, 3 xl, x1 = 1, 2 If you know your victim is reading source code in Ariel font, "rn" and "m" are virtually indistinguishable except at very large sizes. -- Steven D'Aprano http://import-that.dreamwidth.org/
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-05-03 02:04 +0000 |
| Message-ID | <53644ebf$0$29965$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #70886 |
On Sat, 03 May 2014 02:02:32 +0000, Steven D'Aprano wrote: > On Fri, 02 May 2014 17:58:51 -0700, Rustom Mody wrote: > >> I am confused about the tone however: You think this >> >>>>> (fine, fine) = (1,2) # and no issue about it >> >> is fine? > > > It's no worse than any other obfuscated variable name: > > MOOSE, MO0SE, M0OSE = 1, 2, 3 > xl, x1 = 1, 2 > > If you know your victim is reading source code in Ariel font, "rn" and > "m" are virtually indistinguishable except at very large sizes. Ooops! I too missed that Python normalises the name fine to fine, so in fact this is not a case of obfuscation. -- Steven D'Aprano http://import-that.dreamwidth.org/
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-05-03 12:17 +1000 |
| Message-ID | <mailman.9663.1399083429.18130.python-list@python.org> |
| In reply to | #70886 |
On Sat, May 3, 2014 at 12:02 PM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > If you know your victim is reading source code in Ariel font, "rn" and > "m" are virtually indistinguishable except at very large sizes. I kinda like the idea of naming it after a bratty teenager who rebels against her father and runs away from home, but normally the font's called Arial. :) ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2014-05-02 22:19 -0400 |
| Message-ID | <mailman.9664.1399083661.18130.python-list@python.org> |
| In reply to | #70880 |
On 5/2/2014 9:15 PM, Chris Angelico wrote: > (My reading of PEP 3131 is that NFKC is used; is that what's > implemented, or was that a temporary measure and/or something for Py2 > to consider?) The 3.4 docs say "The syntax of identifiers in Python is based on the Unicode standard annex UAX-31, with elaboration and changes as defined below; see also PEP 3131 for further details." ... "All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC." Without reading UAX-31, I don't know how much was changed, but I suspect not much. In any case, the current rules are intended and very unlikely to change as that would break code going either forward or back for little purpose. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2014-05-03 12:57 -0400 |
| Message-ID | <mailman.9666.1399136294.18130.python-list@python.org> |
| In reply to | #70864 |
On 02 May 2014 11:55:37 GMT, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> declaimed the following:
>And you've never been bitten by an invisible control character in ASCII
>text? You've lived a sheltered life!
>
Xerox Sigma CP/V would even permit them in file names (though the
system was EBCDIC, not ASCII -- just feeding lots of ASCII terminals).
Think of the pain someone would have trying to figure out where in a 32
character file name the <BEL> was positioned. Even on a 1200bps serial
line, one couldn't really determine between which printable characters the
terminal beeped while listing the directory.
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [next] | [standalone]
Page 2 of 3 — ← Prev page 1 [2] 3 Next page →
Back to top | Article view | comp.lang.python
csiph-web