Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #69513

Re: unicode as valid naming symbols

From Ned Batchelder <ned@nedbatchelder.com>
Subject Re: unicode as valid naming symbols
Date 2014-04-01 09:33 -0400
References (11 earlier) <CALwzid==eS-QryN8dhoDaREG4QTD1xa-NacH2KMR8Tfe3Sg-Pw@mail.gmail.com> <533A96E9.1030107@rece.vub.ac.be> <CAPTjJmqUbOf5VAgS2zYSk9ah=uq-e_n_gdXTS=TZz5VTt5eKkQ@mail.gmail.com> <533AAA13.4010309@rece.vub.ac.be> <CAPTjJmopX9Q3i1x39eWKAkq83tHQ7UMUxjDjkwA-36sVaAQkPA@mail.gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.8801.1396359227.18130.python-list@python.org> (permalink)

Show all headers | View raw


On 4/1/14 9:00 AM, Chris Angelico wrote:
> On Tue, Apr 1, 2014 at 10:59 PM, Antoon Pardon
> <antoon.pardon@rece.vub.ac.be> wrote:
>> On 01-04-14 12:58, Chris Angelico wrote:
>>> But because, in the future, Python may choose to create new operators,
>>> the simplest and safest way to ensure safety is to put a boundary on
>>> what can be operators and what can be names; Unicode character classes
>>> are perfect for this. It's also possible that all Unicode whitespace
>>> characters might become legal for indentation and separation (maybe
>>> they are already??), so obviously they're ruled out as identifiers;
>>> anyway, I honestly do not think people would want to use U+2007 FIGURE
>>> SPACE inside a name. So if we deny whitespace, and accept letters and
>>> digits, it makes good sense to deny mathematical symbols so as to keep
>>> them available for operators. (It also makes reasonable sense to
>>> *permit* mathematical symbols, thus allowing you to use them for
>>> functions/methods, in the same way that you can use "n", "o", and "t",
>>> but not "not"; but with word operators, the entire word has to be used
>>> as-is before it's a collision - with a symbolic one, any instance of
>>> that symbol inside a name will change parsing entirely. It's a
>>> trade-off, and Python's made a decision one way and not the other.)
>>
>> This mostly makes sense to me. The only caveat I have is that since we
>> also allow _ (U+005F LOW LINE) in names which belongs to the category
>> <puctuation, connector>, we should allow other symbols within this
>> category in a name.
>>
>> But I confess that is mostly personal taste, since I find names_like_this
>> ugly. Names-like-this look better to me but that wouldn't be workable
>> in python. But maybe there is some connector that would be aestetically
>> pleasing and not causing other problems.
>
> That's reasonable. The Pc category doesn't have much in it:
>
> http://www.fileformat.info/info/unicode/category/Pc/list.htm
>
> If the definition of "characters permitted in identifiers" is derived
> exclusively from the Unicode categories, including Pc would make fine
> sense. Probably the definition should be: First character is L* or Pc,
> subsequent characters are L*, N*, or Pc, and either Mn or M*
> (combining characters). Or something like that.

Maybe I'm misunderstanding the discussion... It seems like we're talking 
about a hypothetical definition of identifiers based on Unicode 
character categories, but there's no need: Python 3 has defined 
precisely that.  From the docs 
(https://docs.python.org/3/reference/lexical_analysis.html#identifiers):

---<snip>---------

Python 3.0 introduces additional characters from outside the ASCII range 
(see PEP 3131). For these characters, the classification uses the 
version of the Unicode Character Database as included in the unicodedata 
module.

Identifiers are unlimited in length. Case is significant.

identifier   ::=  xid_start xid_continue*
id_start     ::=  <all characters in general categories Lu, Ll, Lt, Lm, 
Lo, Nl, the underscore, and characters with the Other_ID_Start property>
id_continue  ::=  <all characters in id_start, plus characters in the 
categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
xid_start    ::=  <all characters in id_start whose NFKC normalization 
is in "id_start xid_continue*">
xid_continue ::=  <all characters in id_continue whose NFKC 
normalization is in "id_continue*">

The Unicode category codes mentioned above stand for:

     Lu - uppercase letters
     Ll - lowercase letters
     Lt - titlecase letters
     Lm - modifier letters
     Lo - other letters
     Nl - letter numbers
     Mn - nonspacing marks
     Mc - spacing combining marks
     Nd - decimal numbers
     Pc - connector punctuations
     Other_ID_Start - explicit list of characters in PropList.txt to 
support backwards compatibility
     Other_ID_Continue - likewise

All identifiers are converted into the normal form NFKC while parsing; 
comparison of identifiers is based on NFKC.

---<end snip>-----

>
> ChrisA
>


-- 
Ned Batchelder, http://nedbatchelder.com

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

unicode as valid naming symbols Mark H Harris <harrismh777@gmail.com> - 2014-03-25 13:30 -0500
  Re: unicode as valid naming symbols wxjmfauth@gmail.com - 2014-03-25 11:52 -0700
    Re: unicode as valid naming symbols Mark H Harris <harrismh777@gmail.com> - 2014-03-25 14:24 -0500
    Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-03-25 19:16 -0700
  Re: unicode as valid naming symbols MRAB <python@mrabarnett.plus.com> - 2014-03-25 19:24 +0000
    Re: unicode as valid naming symbols Mark H Harris <harrismh777@gmail.com> - 2014-03-25 14:29 -0500
      Re: unicode as valid naming symbols Marko Rauhamaa <marko@pacujo.net> - 2014-03-25 21:48 +0200
        Re: unicode as valid naming symbols Skip Montanaro <skip@pobox.com> - 2014-03-25 14:54 -0500
        Re: unicode as valid naming symbols Cameron Simpson <cs@zip.com.au> - 2014-03-26 09:16 +1100
      Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-25 13:49 -0600
      Re: unicode as valid naming symbols Tim Chase <python.list@tim.thechases.com> - 2014-03-25 15:29 -0500
      Re: unicode as valid naming symbols Ethan Furman <ethan@stoneleaf.us> - 2014-03-25 15:47 -0700
      Re: unicode as valid naming symbols Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-25 23:58 +0000
        Re: unicode as valid naming symbols Mark H Harris <harrismh777@gmail.com> - 2014-03-27 10:28 -0500
          Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-03-27 08:51 -0700
            Re: unicode as valid naming symbols Mark H Harris <harrismh777@gmail.com> - 2014-03-27 11:03 -0500
              Re: unicode as valid naming symbols Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-28 12:45 +1300
            Re: unicode as valid naming symbols MRAB <python@mrabarnett.plus.com> - 2014-03-27 17:17 +0000
              Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-03-27 10:53 -0700
          Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-27 10:22 -0600
            Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-03-27 10:41 -0700
          Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-03-28 03:23 +1100
          Re: unicode as valid naming symbols Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-03-31 11:55 +0200
          Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-31 11:40 -0600
          Re: unicode as valid naming symbols Tim Chase <python.list@tim.thechases.com> - 2014-03-31 13:02 -0500
          Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-31 12:10 -0600
          Re: unicode as valid naming symbols Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-03-31 21:31 +0200
          Re: unicode as valid naming symbols Terry Reedy <tjreedy@udel.edu> - 2014-03-31 16:12 -0400
          Re: unicode as valid naming symbols Terry Reedy <tjreedy@udel.edu> - 2014-03-31 16:15 -0400
            Re: unicode as valid naming symbols Marko Rauhamaa <marko@pacujo.net> - 2014-03-31 23:34 +0300
          Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-31 18:47 -0600
          Re: unicode as valid naming symbols David Hutto <dwightdhutto@gmail.com> - 2014-03-31 23:58 -0400
          Re: unicode as valid naming symbols David Hutto <dwightdhutto@gmail.com> - 2014-04-01 00:11 -0400
          Re: unicode as valid naming symbols Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-04-01 10:19 +0200
          Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-04-01 03:18 -0600
            Re: unicode as valid naming symbols Marko Rauhamaa <marko@pacujo.net> - 2014-04-01 12:32 +0300
              Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-04-01 03:58 -0600
                Re: unicode as valid naming symbols Marko Rauhamaa <marko@pacujo.net> - 2014-04-01 15:02 +0300
                Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-04-01 23:54 +1100
                Re: unicode as valid naming symbols Marko Rauhamaa <marko@pacujo.net> - 2014-04-01 16:16 +0300
                Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-04-02 00:32 +1100
                Re: unicode as valid naming symbols Marko Rauhamaa <marko@pacujo.net> - 2014-04-01 18:59 +0300
                Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-04-01 19:58 -0700
                Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-04-01 20:16 -0700
                Re: unicode as valid naming symbols Marko Rauhamaa <marko@pacujo.net> - 2014-04-02 08:55 +0300
              Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-04-01 21:39 +1100
          Re: unicode as valid naming symbols Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-04-01 12:37 +0200
          Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-04-01 21:58 +1100
          Re: unicode as valid naming symbols Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-04-01 13:59 +0200
            Re: unicode as valid naming symbols Roy Smith <roy@panix.com> - 2014-04-01 08:29 -0400
              Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-04-02 00:08 +1100
                Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-04-01 06:34 -0700
          Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-04-02 00:00 +1100
          Re: unicode as valid naming symbols Ned Batchelder <ned@nedbatchelder.com> - 2014-04-01 09:33 -0400
          Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-04-02 00:44 +1100
            Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-04-01 06:58 -0700
          Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-04-01 09:53 -0600
      Re: unicode as valid naming symbols MRAB <python@mrabarnett.plus.com> - 2014-03-26 02:56 +0000
      Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-03-26 14:09 +1100
      Re: unicode as valid naming symbols Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-03-26 09:25 +0100
      Re: unicode as valid naming symbols Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-03-26 09:52 +0100
      Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-26 10:37 -0600
      Re: unicode as valid naming symbols Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-03-27 10:36 +0100
        Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-03-27 08:10 -0700
          Re: unicode as valid naming symbols Tim Chase <python.list@tim.thechases.com> - 2014-03-27 10:34 -0500
          Re: unicode as valid naming symbols random832@fastmail.us - 2014-03-28 14:55 -0400
            Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-03-28 22:00 -0700
              Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-03-29 16:12 +1100
              Re: unicode as valid naming symbols Ben Finney <ben+python@benfinney.id.au> - 2014-03-29 16:32 +1100
              Re: unicode as valid naming symbols Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-03-29 14:11 -0400
              Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-03-30 09:01 +1100
                Re: unicode as valid naming symbols Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-30 19:16 +1300
    Re: unicode as valid naming symbols Mark H Harris <harrismh777@gmail.com> - 2014-03-25 14:29 -0500
  Re:unicode as valid naming symbols Dave Angel <davea@davea.name> - 2014-03-25 15:45 -0400
  Re: unicode as valid naming symbols Terry Reedy <tjreedy@udel.edu> - 2014-03-25 22:26 -0400

csiph-web