Groups > comp.lang.python > #69049 > unrolled thread

unicode as valid naming symbols

Started by	Mark H Harris <harrismh777@gmail.com>
First post	2014-03-25 13:30 -0500
Last post	2014-03-25 22:26 -0400
Articles	20 on this page of 75 — 22 participants

Back to article view | Back to comp.lang.python

  unicode as valid naming symbols Mark H Harris <harrismh777@gmail.com> - 2014-03-25 13:30 -0500
    Re: unicode as valid naming symbols wxjmfauth@gmail.com - 2014-03-25 11:52 -0700
      Re: unicode as valid naming symbols Mark H Harris <harrismh777@gmail.com> - 2014-03-25 14:24 -0500
      Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-03-25 19:16 -0700
    Re: unicode as valid naming symbols MRAB <python@mrabarnett.plus.com> - 2014-03-25 19:24 +0000
      Re: unicode as valid naming symbols Mark H Harris <harrismh777@gmail.com> - 2014-03-25 14:29 -0500
        Re: unicode as valid naming symbols Marko Rauhamaa <marko@pacujo.net> - 2014-03-25 21:48 +0200
          Re: unicode as valid naming symbols Skip Montanaro <skip@pobox.com> - 2014-03-25 14:54 -0500
          Re: unicode as valid naming symbols Cameron Simpson <cs@zip.com.au> - 2014-03-26 09:16 +1100
        Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-25 13:49 -0600
        Re: unicode as valid naming symbols Tim Chase <python.list@tim.thechases.com> - 2014-03-25 15:29 -0500
        Re: unicode as valid naming symbols Ethan Furman <ethan@stoneleaf.us> - 2014-03-25 15:47 -0700
        Re: unicode as valid naming symbols Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-25 23:58 +0000
          Re: unicode as valid naming symbols Mark H Harris <harrismh777@gmail.com> - 2014-03-27 10:28 -0500
            Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-03-27 08:51 -0700
              Re: unicode as valid naming symbols Mark H Harris <harrismh777@gmail.com> - 2014-03-27 11:03 -0500
                Re: unicode as valid naming symbols Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-28 12:45 +1300
              Re: unicode as valid naming symbols MRAB <python@mrabarnett.plus.com> - 2014-03-27 17:17 +0000
                Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-03-27 10:53 -0700
            Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-27 10:22 -0600
              Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-03-27 10:41 -0700
            Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-03-28 03:23 +1100
            Re: unicode as valid naming symbols Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-03-31 11:55 +0200
            Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-31 11:40 -0600
            Re: unicode as valid naming symbols Tim Chase <python.list@tim.thechases.com> - 2014-03-31 13:02 -0500
            Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-31 12:10 -0600
            Re: unicode as valid naming symbols Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-03-31 21:31 +0200
            Re: unicode as valid naming symbols Terry Reedy <tjreedy@udel.edu> - 2014-03-31 16:12 -0400
            Re: unicode as valid naming symbols Terry Reedy <tjreedy@udel.edu> - 2014-03-31 16:15 -0400
              Re: unicode as valid naming symbols Marko Rauhamaa <marko@pacujo.net> - 2014-03-31 23:34 +0300
            Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-31 18:47 -0600
            Re: unicode as valid naming symbols David Hutto <dwightdhutto@gmail.com> - 2014-03-31 23:58 -0400
            Re: unicode as valid naming symbols David Hutto <dwightdhutto@gmail.com> - 2014-04-01 00:11 -0400
            Re: unicode as valid naming symbols Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-04-01 10:19 +0200
            Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-04-01 03:18 -0600
              Re: unicode as valid naming symbols Marko Rauhamaa <marko@pacujo.net> - 2014-04-01 12:32 +0300
                Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-04-01 03:58 -0600
                  Re: unicode as valid naming symbols Marko Rauhamaa <marko@pacujo.net> - 2014-04-01 15:02 +0300
                    Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-04-01 23:54 +1100
                      Re: unicode as valid naming symbols Marko Rauhamaa <marko@pacujo.net> - 2014-04-01 16:16 +0300
                        Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-04-02 00:32 +1100
                          Re: unicode as valid naming symbols Marko Rauhamaa <marko@pacujo.net> - 2014-04-01 18:59 +0300
                            Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-04-01 19:58 -0700
                              Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-04-01 20:16 -0700
                                Re: unicode as valid naming symbols Marko Rauhamaa <marko@pacujo.net> - 2014-04-02 08:55 +0300
                Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-04-01 21:39 +1100
            Re: unicode as valid naming symbols Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-04-01 12:37 +0200
            Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-04-01 21:58 +1100
            Re: unicode as valid naming symbols Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-04-01 13:59 +0200
              Re: unicode as valid naming symbols Roy Smith <roy@panix.com> - 2014-04-01 08:29 -0400
                Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-04-02 00:08 +1100
                  Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-04-01 06:34 -0700
            Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-04-02 00:00 +1100
            Re: unicode as valid naming symbols Ned Batchelder <ned@nedbatchelder.com> - 2014-04-01 09:33 -0400
            Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-04-02 00:44 +1100
              Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-04-01 06:58 -0700
            Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-04-01 09:53 -0600
        Re: unicode as valid naming symbols MRAB <python@mrabarnett.plus.com> - 2014-03-26 02:56 +0000
        Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-03-26 14:09 +1100
        Re: unicode as valid naming symbols Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-03-26 09:25 +0100
        Re: unicode as valid naming symbols Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-03-26 09:52 +0100
        Re: unicode as valid naming symbols Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-26 10:37 -0600
        Re: unicode as valid naming symbols Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-03-27 10:36 +0100
          Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-03-27 08:10 -0700
            Re: unicode as valid naming symbols Tim Chase <python.list@tim.thechases.com> - 2014-03-27 10:34 -0500
            Re: unicode as valid naming symbols random832@fastmail.us - 2014-03-28 14:55 -0400
              Re: unicode as valid naming symbols Rustom Mody <rustompmody@gmail.com> - 2014-03-28 22:00 -0700
                Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-03-29 16:12 +1100
                Re: unicode as valid naming symbols Ben Finney <ben+python@benfinney.id.au> - 2014-03-29 16:32 +1100
                Re: unicode as valid naming symbols Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-03-29 14:11 -0400
                Re: unicode as valid naming symbols Chris Angelico <rosuav@gmail.com> - 2014-03-30 09:01 +1100
                  Re: unicode as valid naming symbols Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-30 19:16 +1300
      Re: unicode as valid naming symbols Mark H Harris <harrismh777@gmail.com> - 2014-03-25 14:29 -0500
    Re:unicode as valid naming symbols Dave Angel <davea@davea.name> - 2014-03-25 15:45 -0400
    Re: unicode as valid naming symbols Terry Reedy <tjreedy@udel.edu> - 2014-03-25 22:26 -0400

Page 3 of 4 — ← Prev page 1 2 [3] 4 Next page →

#69512

From	Chris Angelico <rosuav@gmail.com>
Date	2014-04-02 00:32 +1100
Message-ID	<mailman.8800.1396359183.18130.python-list@python.org>
In reply to	#69511

On Wed, Apr 2, 2014 at 12:16 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
> I implemented the loops in the scheme way. Recursion is how iteration is
> done by the Believers. Traditional looping structures are available to
> scheme, but if you felt the need for them, you might as well program in
> Python.

Then I'm happily a pagan who uses while loops instead of recursion.
Why should every loop become a named function?

      find_divisor: for ( factor = 2 ; i%factor ; factor++ )
      {
        if ( factor == i )
        {
           printf("%d\n",i);
           count--;
           break;
        }
      }

Does that label add anything? If you really need to put a name to
every loop you ever write, there's something wrong with the code; some
loops' purposes should be patently obvious by their body. All you do
is add duplicate information that might be wrong.

ChrisA

[toc] | [prev] | [next] | [standalone]

#69523

From	Marko Rauhamaa <marko@pacujo.net>
Date	2014-04-01 18:59 +0300
Message-ID	<874n2dt50g.fsf@elektro.pacujo.net>
In reply to	#69512

Chris Angelico <rosuav@gmail.com>:

> On Wed, Apr 2, 2014 at 12:16 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
>> I implemented the loops in the scheme way. Recursion is how iteration
>> is done by the Believers.
>
> Then I'm happily a pagan who uses while loops instead of recursion.
> Why should every loop become a named function?

Every language has its idioms. The principal aesthetic motivation for
named-let loops is the avoidance of (set!), I think. Secondarily, you
get to shift gears in the middle of your loops; something you can often,
but not always, accomplish in Python with break, return and continue.

Don't take me wrong. Python has its own idioms and avoiding loops in
Python would be equally blasphemous. In C++ you avoid void pointers like
the plague, in C you celebrate them.

Marko

[toc] | [prev] | [next] | [standalone]

#69534

From	Rustom Mody <rustompmody@gmail.com>
Date	2014-04-01 19:58 -0700
Message-ID	<b6d2ce2f-5cf2-4aae-9b49-1c71c512de5e@googlegroups.com>
In reply to	#69523

On Tuesday, April 1, 2014 9:29:27 PM UTC+5:30, Marko Rauhamaa wrote:
> Chris Angelico :

> > On Wed, Apr 2, 2014 at 12:16 AM, Marko Rauhamaa wrote:
> >> I implemented the loops in the scheme way. Recursion is how iteration
> >> is done by the Believers.
> > Then I'm happily a pagan who uses while loops instead of recursion.
> > Why should every loop become a named function?

> Every language has its idioms. The principal aesthetic motivation for
> named-let loops is the avoidance of (set!), I think. Secondarily, you
> get to shift gears in the middle of your loops; something you can often,
> but not always, accomplish in Python with break, return and continue.

You are forgetting the main point: In scheme, in a named-let, the name
chosen was very often 'loop' (if I remember the PC scheme manuals
correctly).  IOW if you had a dozen loops implemented with
named-letted-tail-recursion, you could call all of them 'loop'.  How
is that different from calling all of them 'while' or 'for' ?

> Don't take me wrong. Python has its own idioms and avoiding loops in
> Python would be equally blasphemous. In C++ you avoid void pointers like
> the plague, in C you celebrate them.

Yeah... I guess that is the issue.
People brought up on imperative (which includes OO) programming, think
recursion and iteration are fundamentally different, just as assembly
language programmers think of memory and register as fundamentally
different. Sure is but if you are a C programmer the distinction is
irrelevant 99% of the time! 

Continues downward... For an assembly language programmer, memory and
cache-memory is not a distinction he needs to make 99% of the time. Not so for 
the hardware engineer

[toc] | [prev] | [next] | [standalone]

#69535

From	Rustom Mody <rustompmody@gmail.com>
Date	2014-04-01 20:16 -0700
Message-ID	<ba7a9cb4-8203-430c-9bd3-2a5ecebaac22@googlegroups.com>
In reply to	#69534

On Wednesday, April 2, 2014 8:28:02 AM UTC+5:30, Rustom Mody wrote:
> On Tuesday, April 1, 2014 9:29:27 PM UTC+5:30, Marko Rauhamaa wrote:
> > Chris Angelico :

> > > On Wed, Apr 2, 2014 at 12:16 AM, Marko Rauhamaa wrote:
> > >> I implemented the loops in the scheme way. Recursion is how iteration
> > >> is done by the Believers.
> > > Then I'm happily a pagan who uses while loops instead of recursion.
> > > Why should every loop become a named function?

> > Every language has its idioms. The principal aesthetic motivation for
> > named-let loops is the avoidance of (set!), I think. Secondarily, you
> > get to shift gears in the middle of your loops; something you can often,
> > but not always, accomplish in Python with break, return and continue.

> You are forgetting the main point: In scheme, in a named-let, the name
> chosen was very often 'loop' (if I remember the PC scheme manuals
> correctly).  IOW if you had a dozen loops implemented with
> named-letted-tail-recursion, you could call all of them 'loop'.  How
> is that different from calling all of them 'while' or 'for' ?

Umm... I see from your prime number example that there are nested loops
in which sometimes you restart the inner and sometimes the outer.
So you could not possibly call both of them 'loop' :-).

So "you could call all of them 'loop'" is over-statement.
"Good many" may be more appropriate?

[toc] | [prev] | [next] | [standalone]

#69539

From	Marko Rauhamaa <marko@pacujo.net>
Date	2014-04-02 08:55 +0300
Message-ID	<87d2h09swm.fsf@elektro.pacujo.net>
In reply to	#69535

Rustom Mody <rustompmody@gmail.com>:

> On Wednesday, April 2, 2014 8:28:02 AM UTC+5:30, Rustom Mody wrote:
>> In scheme, in a named-let, the name
>> chosen was very often 'loop'
>
> Umm... I see from your prime number example that there are nested
> loops in which sometimes you restart the inner and sometimes the
> outer. So you could not possibly call both of them 'loop' :-).

Correct. I could call them "inner" and "outer". After all, the code uses
variables like "i", "c" and "n".

However, it doesn't hurt to use variable/function/loop names that convey
meaning.

Marko

[toc] | [prev] | [next] | [standalone]

#69502

From	Chris Angelico <rosuav@gmail.com>
Date	2014-04-01 21:39 +1100
Message-ID	<mailman.8793.1396348750.18130.python-list@python.org>
In reply to	#69499

On Tue, Apr 1, 2014 at 8:58 PM, Ian Kelly <ian.g.kelly@gmail.com> wrote:
> Setting aside the fact that C doesn't have anonymous functions, I'll
> approximate it as best I can:
>
> static int n = 3;
>
> int f()
> {
>     return n;
> }
>
> int main()
> {
>     n = 7;
>     return f();
> }
>
> C: 10
> Scheme: 20

And the less trivial the example, the more difference you'll see. This
is Scheme inside LilyPond, a little translator that lets me input
lyrics in a tidy way that can then be turned into something that works
well with both MIDI Karaoke and printed score:

#(define (bang2slashn lst) (
    cond ((null? lst) 0)
    (else (begin
        (if (equal? (ly:music-property (car lst) 'name) 'LyricEvent)
(let ((txt (ly:music-property (car lst) 'text)))
            (if (equal? (string-ref txt 0) #\!) (begin
            ; Debugging display
            ; (display (ly:music-property (car lst) 'name)) (display "
- ") (display txt) (newline)
            ; Prepend a newline instead of the exclamation mark -
works for both MIDI Karaoke and page layout
            (ly:music-set-property! (car lst) 'text (string-append
"\n" (substring txt 1 (string-length txt))))
        ))))
        (bang2slashn (ly:music-property (car lst) 'elements))
        (bang2slashn (cdr lst))
    ))
))
% Call the above recursive function
lyr=#(define-music-function (parser location lyrics) (ly:music?)
    (bang2slashn (ly:music-property lyrics 'elements))
    lyrics
)

Now, this was written by a non-Scheme programmer, so it's not going to
be optimal code, but I doubt it's going to lose a huge number of
parentheses. Not counting the commented-out debugging line, that's 41
pairs of them in a short but non-trivial piece of code. Translating it
to C isn't easy, in the same way that it's hard to explain how to do
client-side web form validation in Lua; but here's an attempt. It
assumes a broadly C-like structure to LilyPond (eg that the elements
are passed as a structure; they are a tree already, as you can see by
the double-recursive function above), which is of course not the case,
but here goes:

void bang2slashn(struct element *lst)
{
    while (lst)
    {
        if (!strcmp(lst->name, "LyricEvent"))
        {
            char *text = music_property(lst, "text");
            /* Okay, C doesn't have string manipulation, so I cheat here */
            /* If this were C++ or Pike, some notation nearer to the
original would work */
            if (*text == '!') music_set_property(lst, "text", "\n" + text[1..]);
        }
        bang2slashn(lst->elements);
        lst = lst->next;
    }
}

DEFINE_MUSIC_FUNCTION(PARSER_LOCATION_LYRICS, bang2slashn);

That's nine pair parens, three braces, and one square. I assume a lot
about the supposed C-like interface to LilyPond, but I think anyone
who knows both C and Scheme would agree that I haven't been
horrendously unfair in the translation. (Though I will accept an
alternate implementation of the Scheme version. If you can cut it down
to just 26 pair parens, you'll achieve the 2:1 ratio that Ian
mentioned. And if you can cut it down to 13 pairs, you've equalled my
count.) The only way to have the C figure come up approximately equal
is to count a semicolon as if it were a pair of parens - Scheme has an
extra set of parens doing the job of separating one function call from
another. But that adds only another 5, bringing C up to a total of 18
(plus a few more if I used functions to do my string manipulation, so
let's say about 20-25) where Scheme is still at roughly twice that.

ChrisA

[toc] | [prev] | [next] | [standalone]

#69501

From	Antoon Pardon <antoon.pardon@rece.vub.ac.be>
Date	2014-04-01 12:37 +0200
Message-ID	<mailman.8792.1396348682.18130.python-list@python.org>
In reply to	#69195

On 01-04-14 11:18, Ian Kelly wrote:
> On Tue, Apr 1, 2014 at 2:19 AM, Antoon Pardon
> <antoon.pardon@rece.vub.ac.be> wrote:
>> On 01-04-14 02:47, Ian Kelly wrote:
>>
>>> Well, this is the path taken by APL.  It has its supporters.  It's not
>>> known for being readable.
>> No that is not the path taken by APL. AFAICS identifiers in APL are just
>> like identifiers in python. The path taken by APL was that there were
>> a lot more operators available that used non-alphanumeric characters.
>>
>> AFICS APL programs tend to be unreadable because they are mostly written
>> in a very concise style.
>>
>> I think this is more the path taken by lisp-like languages where '+' is
>> a name just like 'alpha' or 'r2d2'. In scheme I can just do the following.
>>
>> (define √ sqrt)
>> (√ 4)
> You're still using the symbol as the name of an operation, though, so
> I see no practical difference from the APL style.  The operation just
> happens to be user-defined rather than built-in.

Python also uses symbols for names of operations, like '+'. And when
someone suggested python might consider increasing the number of
operations and gave some symbols for those extra operations, nobody
suggested that would make python unreadable, though it would be far
more like the path taken by APL then what we are discussing now.

But the idea we are discussing here has nothing to do with introducing
more operators and use symbolic characters for that and as such wouldn't
make python more APL like. You only bring up APL because it uses a number
of unfamilar symbols and you attribute the unreadabilty of APL programs
mostly to that. But regarding the functionality we are talking here
APL doesn't have it. So we are not talking about the path taken by
APL.

-- 
Antoon Pardon

[toc] | [prev] | [next] | [standalone]

#69503

From	Chris Angelico <rosuav@gmail.com>
Date	2014-04-01 21:58 +1100
Message-ID	<mailman.8794.1396349898.18130.python-list@python.org>
In reply to	#69195

On Tue, Apr 1, 2014 at 9:37 PM, Antoon Pardon
<antoon.pardon@rece.vub.ac.be> wrote:
> Python also uses symbols for names of operations, like '+'. And when
> someone suggested python might consider increasing the number of
> operations and gave some symbols for those extra operations, nobody
> suggested that would make python unreadable, though it would be far
> more like the path taken by APL then what we are discussing now.

Actually, people did. But mainly the thread (look up "Time we switched
to unicode?") went off looking at how hard it'd be to type those
operators, and therefore the more serious point that there would
either be hard-to-type language elements or duplicate syntactic tokens
("lambda" as well as "λ", etc). That isn't an issue with names,
because any name has only one, well, name. If you choose to use both
"alpha" and "α" as names, that's fine, and they're distinct names. You
can make your code unreadable, and it doesn't impact my code at all.
Language-level features like operators have stronger concerns.

But because, in the future, Python may choose to create new operators,
the simplest and safest way to ensure safety is to put a boundary on
what can be operators and what can be names; Unicode character classes
are perfect for this. It's also possible that all Unicode whitespace
characters might become legal for indentation and separation (maybe
they are already??), so obviously they're ruled out as identifiers;
anyway, I honestly do not think people would want to use U+2007 FIGURE
SPACE inside a name. So if we deny whitespace, and accept letters and
digits, it makes good sense to deny mathematical symbols so as to keep
them available for operators. (It also makes reasonable sense to
*permit* mathematical symbols, thus allowing you to use them for
functions/methods, in the same way that you can use "n", "o", and "t",
but not "not"; but with word operators, the entire word has to be used
as-is before it's a collision - with a symbolic one, any instance of
that symbol inside a name will change parsing entirely. It's a
trade-off, and Python's made a decision one way and not the other.)

ChrisA

[toc] | [prev] | [next] | [standalone]

#69506

From	Antoon Pardon <antoon.pardon@rece.vub.ac.be>
Date	2014-04-01 13:59 +0200
Message-ID	<mailman.8796.1396354601.18130.python-list@python.org>
In reply to	#69195

On 01-04-14 12:58, Chris Angelico wrote:
> But because, in the future, Python may choose to create new operators,
> the simplest and safest way to ensure safety is to put a boundary on
> what can be operators and what can be names; Unicode character classes
> are perfect for this. It's also possible that all Unicode whitespace
> characters might become legal for indentation and separation (maybe
> they are already??), so obviously they're ruled out as identifiers;
> anyway, I honestly do not think people would want to use U+2007 FIGURE
> SPACE inside a name. So if we deny whitespace, and accept letters and
> digits, it makes good sense to deny mathematical symbols so as to keep
> them available for operators. (It also makes reasonable sense to
> *permit* mathematical symbols, thus allowing you to use them for
> functions/methods, in the same way that you can use "n", "o", and "t",
> but not "not"; but with word operators, the entire word has to be used
> as-is before it's a collision - with a symbolic one, any instance of
> that symbol inside a name will change parsing entirely. It's a
> trade-off, and Python's made a decision one way and not the other.)

This mostly makes sense to me. The only caveat I have is that since we
also allow _ (U+005F LOW LINE) in names which belongs to the category
<puctuation, connector>, we should allow other symbols within this
category in a name.

But I confess that is mostly personal taste, since I find names_like_this
ugly. Names-like-this look better to me but that wouldn't be workable
in python. But maybe there is some connector that would be aestetically
pleasing and not causing other problems.

-- 
Antoon Pardon

[toc] | [prev] | [next] | [standalone]

#69507

From	Roy Smith <roy@panix.com>
Date	2014-04-01 08:29 -0400
Message-ID	<roy-677855.08291301042014@news.panix.com>
In reply to	#69506

In article <mailman.8796.1396354601.18130.python-list@python.org>,
 Antoon Pardon <antoon.pardon@rece.vub.ac.be> wrote:

> On 01-04-14 12:58, Chris Angelico wrote:
> > But because, in the future, Python may choose to create new operators,
> > the simplest and safest way to ensure safety is to put a boundary on
> > what can be operators and what can be names; Unicode character classes
> > are perfect for this. It's also possible that all Unicode whitespace
> > characters might become legal for indentation and separation (maybe
> > they are already??), so obviously they're ruled out as identifiers;
> > anyway, I honestly do not think people would want to use U+2007 FIGURE
> > SPACE inside a name. So if we deny whitespace, and accept letters and
> > digits, it makes good sense to deny mathematical symbols so as to keep
> > them available for operators. (It also makes reasonable sense to
> > *permit* mathematical symbols, thus allowing you to use them for
> > functions/methods, in the same way that you can use "n", "o", and "t",
> > but not "not"; but with word operators, the entire word has to be used
> > as-is before it's a collision - with a symbolic one, any instance of
> > that symbol inside a name will change parsing entirely. It's a
> > trade-off, and Python's made a decision one way and not the other.)
> 
> This mostly makes sense to me. The only caveat I have is that since we
> also allow _ (U+005F LOW LINE) in names which belongs to the category
> <puctuation, connector>, we should allow other symbols within this
> category in a name.
> 
> But I confess that is mostly personal taste, since I find names_like_this
> ugly. Names-like-this look better to me but that wouldn't be workable
> in python. But maybe there is some connector that would be aestetically
> pleasing and not causing other problems.

Semi-seriously, let me suggest (names like this).  It's not valid syntax 
now, so it can't break any existing code.  It reuses existing 
punctuation in a way which is a logical extension of its traditional 
meaning, i.e. "group these things together".

[toc] | [prev] | [next] | [standalone]

#69510

From	Chris Angelico <rosuav@gmail.com>
Date	2014-04-02 00:08 +1100
Message-ID	<mailman.8799.1396357704.18130.python-list@python.org>
In reply to	#69507

On Tue, Apr 1, 2014 at 11:29 PM, Roy Smith <roy@panix.com> wrote:
>> But I confess that is mostly personal taste, since I find names_like_this
>> ugly. Names-like-this look better to me but that wouldn't be workable
>> in python. But maybe there is some connector that would be aestetically
>> pleasing and not causing other problems.
>
> Semi-seriously, let me suggest (names like this).  It's not valid syntax
> now, so it can't break any existing code.  It reuses existing
> punctuation in a way which is a logical extension of its traditional
> meaning, i.e. "group these things together".

I'd really rather not have a drastically different concept of "name"
to every other language's definition! Reading over COBOL code is
confusing in ways that reading, say, Ruby code isn't; the ? and !
suffixes aren't nearly as confusing as:

http://www.math-cs.gordon.edu/courses/cs323/COBOL/cobol.html
"""
COBOL identifers are 1-30 alphanumeric characters, at least one of
which must be non-numeric.
In certain contexts it is permissible to use a totally numeric
identifier; however, that usage
is discouraged.  Hyphens may be included in an identifier anywhere
except the first of last
character.
"""

Hyphens in names! Ugh! That means subtraction! :)

But there is a solution! You can have *anything you want* in your
identifiers. Watch:

v = {}

v["names like this"] = 42
print(v["names like this"])

Yes, that's a five-character delimiter/marker. But it works!!

ChrisA

[toc] | [prev] | [next] | [standalone]

#69514

From	Rustom Mody <rustompmody@gmail.com>
Date	2014-04-01 06:34 -0700
Message-ID	<c50f8129-cc48-4981-bb60-43e680bdd155@googlegroups.com>
In reply to	#69510

On Tuesday, April 1, 2014 6:38:14 PM UTC+5:30, Chris Angelico wrote:
> On Tue, Apr 1, 2014 at 11:29 PM, Roy Smith wrote:
> >> But I confess that is mostly personal taste, since I find names_like_this
> >> ugly. Names-like-this look better to me but that wouldn't be workable
> >> in python. But maybe there is some connector that would be aestetically
> >> pleasing and not causing other problems.
> > Semi-seriously, let me suggest (names like this).  It's not valid syntax
> > now, so it can't break any existing code.  It reuses existing
> > punctuation in a way which is a logical extension of its traditional
> > meaning, i.e. "group these things together".

> I'd really rather not have a drastically different concept of "name"
> to every other language's definition! Reading over COBOL code is
> confusing in ways that reading, say, Ruby code isn't; the ? and !
> suffixes aren't nearly as confusing as:

> http://www.math-cs.gordon.edu/courses/cs323/COBOL/cobol.html
> """
> COBOL identifers are 1-30 alphanumeric characters, at least one of
> which must be non-numeric.
> In certain contexts it is permissible to use a totally numeric
> identifier; however, that usage
> is discouraged.  Hyphens may be included in an identifier anywhere
> except the first of last
> character.
> """

> Hyphens in names! Ugh! That means subtraction! :)

Just temporarily switch to a domain other than programming --
one that has not been under the absolute hegemony of ASCII for 40 years
and you may get different results -- See 1st item from here:
http://searchengineland.com/9-seo-quirks-you-should-be-aware-of-146465

[toc] | [prev] | [next] | [standalone]

#69509

From	Chris Angelico <rosuav@gmail.com>
Date	2014-04-02 00:00 +1100
Message-ID	<mailman.8798.1396357260.18130.python-list@python.org>
In reply to	#69195

On Tue, Apr 1, 2014 at 10:59 PM, Antoon Pardon
<antoon.pardon@rece.vub.ac.be> wrote:
> On 01-04-14 12:58, Chris Angelico wrote:
>> But because, in the future, Python may choose to create new operators,
>> the simplest and safest way to ensure safety is to put a boundary on
>> what can be operators and what can be names; Unicode character classes
>> are perfect for this. It's also possible that all Unicode whitespace
>> characters might become legal for indentation and separation (maybe
>> they are already??), so obviously they're ruled out as identifiers;
>> anyway, I honestly do not think people would want to use U+2007 FIGURE
>> SPACE inside a name. So if we deny whitespace, and accept letters and
>> digits, it makes good sense to deny mathematical symbols so as to keep
>> them available for operators. (It also makes reasonable sense to
>> *permit* mathematical symbols, thus allowing you to use them for
>> functions/methods, in the same way that you can use "n", "o", and "t",
>> but not "not"; but with word operators, the entire word has to be used
>> as-is before it's a collision - with a symbolic one, any instance of
>> that symbol inside a name will change parsing entirely. It's a
>> trade-off, and Python's made a decision one way and not the other.)
>
> This mostly makes sense to me. The only caveat I have is that since we
> also allow _ (U+005F LOW LINE) in names which belongs to the category
> <puctuation, connector>, we should allow other symbols within this
> category in a name.
>
> But I confess that is mostly personal taste, since I find names_like_this
> ugly. Names-like-this look better to me but that wouldn't be workable
> in python. But maybe there is some connector that would be aestetically
> pleasing and not causing other problems.

That's reasonable. The Pc category doesn't have much in it:

http://www.fileformat.info/info/unicode/category/Pc/list.htm

If the definition of "characters permitted in identifiers" is derived
exclusively from the Unicode categories, including Pc would make fine
sense. Probably the definition should be: First character is L* or Pc,
subsequent characters are L*, N*, or Pc, and either Mn or M*
(combining characters). Or something like that.

ChrisA

[toc] | [prev] | [next] | [standalone]

#69513

From	Ned Batchelder <ned@nedbatchelder.com>
Date	2014-04-01 09:33 -0400
Message-ID	<mailman.8801.1396359227.18130.python-list@python.org>
In reply to	#69195

On 4/1/14 9:00 AM, Chris Angelico wrote:
> On Tue, Apr 1, 2014 at 10:59 PM, Antoon Pardon
> <antoon.pardon@rece.vub.ac.be> wrote:
>> On 01-04-14 12:58, Chris Angelico wrote:
>>> But because, in the future, Python may choose to create new operators,
>>> the simplest and safest way to ensure safety is to put a boundary on
>>> what can be operators and what can be names; Unicode character classes
>>> are perfect for this. It's also possible that all Unicode whitespace
>>> characters might become legal for indentation and separation (maybe
>>> they are already??), so obviously they're ruled out as identifiers;
>>> anyway, I honestly do not think people would want to use U+2007 FIGURE
>>> SPACE inside a name. So if we deny whitespace, and accept letters and
>>> digits, it makes good sense to deny mathematical symbols so as to keep
>>> them available for operators. (It also makes reasonable sense to
>>> *permit* mathematical symbols, thus allowing you to use them for
>>> functions/methods, in the same way that you can use "n", "o", and "t",
>>> but not "not"; but with word operators, the entire word has to be used
>>> as-is before it's a collision - with a symbolic one, any instance of
>>> that symbol inside a name will change parsing entirely. It's a
>>> trade-off, and Python's made a decision one way and not the other.)
>>
>> This mostly makes sense to me. The only caveat I have is that since we
>> also allow _ (U+005F LOW LINE) in names which belongs to the category
>> <puctuation, connector>, we should allow other symbols within this
>> category in a name.
>>
>> But I confess that is mostly personal taste, since I find names_like_this
>> ugly. Names-like-this look better to me but that wouldn't be workable
>> in python. But maybe there is some connector that would be aestetically
>> pleasing and not causing other problems.
>
> That's reasonable. The Pc category doesn't have much in it:
>
> http://www.fileformat.info/info/unicode/category/Pc/list.htm
>
> If the definition of "characters permitted in identifiers" is derived
> exclusively from the Unicode categories, including Pc would make fine
> sense. Probably the definition should be: First character is L* or Pc,
> subsequent characters are L*, N*, or Pc, and either Mn or M*
> (combining characters). Or something like that.

Maybe I'm misunderstanding the discussion... It seems like we're talking 
about a hypothetical definition of identifiers based on Unicode 
character categories, but there's no need: Python 3 has defined 
precisely that.  From the docs 
(https://docs.python.org/3/reference/lexical_analysis.html#identifiers):

---<snip>---------

Python 3.0 introduces additional characters from outside the ASCII range 
(see PEP 3131). For these characters, the classification uses the 
version of the Unicode Character Database as included in the unicodedata 
module.

Identifiers are unlimited in length. Case is significant.

identifier   ::=  xid_start xid_continue*
id_start     ::=  <all characters in general categories Lu, Ll, Lt, Lm, 
Lo, Nl, the underscore, and characters with the Other_ID_Start property>
id_continue  ::=  <all characters in id_start, plus characters in the 
categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
xid_start    ::=  <all characters in id_start whose NFKC normalization 
is in "id_start xid_continue*">
xid_continue ::=  <all characters in id_continue whose NFKC 
normalization is in "id_continue*">

The Unicode category codes mentioned above stand for:

     Lu - uppercase letters
     Ll - lowercase letters
     Lt - titlecase letters
     Lm - modifier letters
     Lo - other letters
     Nl - letter numbers
     Mn - nonspacing marks
     Mc - spacing combining marks
     Nd - decimal numbers
     Pc - connector punctuations
     Other_ID_Start - explicit list of characters in PropList.txt to 
support backwards compatibility
     Other_ID_Continue - likewise

All identifiers are converted into the normal form NFKC while parsing; 
comparison of identifiers is based on NFKC.

---<end snip>-----

>
> ChrisA
>


-- 
Ned Batchelder, http://nedbatchelder.com

[toc] | [prev] | [next] | [standalone]

#69515

From	Chris Angelico <rosuav@gmail.com>
Date	2014-04-02 00:44 +1100
Message-ID	<mailman.8802.1396359858.18130.python-list@python.org>
In reply to	#69195

On Wed, Apr 2, 2014 at 12:33 AM, Ned Batchelder <ned@nedbatchelder.com> wrote:
> Maybe I'm misunderstanding the discussion... It seems like we're talking
> about a hypothetical definition of identifiers based on Unicode character
> categories, but there's no need: Python 3 has defined precisely that.  From
> the docs
> (https://docs.python.org/3/reference/lexical_analysis.html#identifiers):
>

"Python 3.0 introduces **additional characters** from outside the
ASCII range" - emphasis mine.

Python currently has - at least, per that documentation - a hybrid
system with ASCII characters defined in the classic way, and non-ASCII
characters defined by their Unicode character classes. I'm talking
about a system that's _purely_ defined by Unicode character classes.
It may turn out that the class list exactly compasses the ASCII
characters listed, though, in which case you'd be right: it's not
hypothetical.

In any case, Pc is included, which I should have checked beforehand.
So that part is, as you say, not hypothetical. Go for it! Use 'em.

ChrisA

[toc] | [prev] | [next] | [standalone]

#69516

From	Rustom Mody <rustompmody@gmail.com>
Date	2014-04-01 06:58 -0700
Message-ID	<d9f78a2a-94a9-41f4-8c5b-a7c519796946@googlegroups.com>
In reply to	#69515

On Tuesday, April 1, 2014 7:14:15 PM UTC+5:30, Chris Angelico wrote:
> On Wed, Apr 2, 2014 at 12:33 AM, Ned Batchelder  wrote:
> > Maybe I'm misunderstanding the discussion... It seems like we're talking
> > about a hypothetical definition of identifiers based on Unicode character
> > categories, but there's no need: Python 3 has defined precisely that.  From
> > the docs
> > (https://docs.python.org/3/reference/lexical_analysis.html#identifiers):

> "Python 3.0 introduces **additional characters** from outside the
> ASCII range" - emphasis mine.

> Python currently has - at least, per that documentation - a hybrid
> system with ASCII characters defined in the classic way, and non-ASCII
> characters defined by their Unicode character classes. I'm talking
> about a system that's _purely_ defined by Unicode character classes.
> It may turn out that the class list exactly compasses the ASCII
> characters listed, though, in which case you'd be right: it's not
> hypothetical.

> In any case, Pc is included, which I should have checked beforehand.
> So that part is, as you say, not hypothetical. Go for it! Use 'em.

Dunno if you really mean it or are just saying...

Steven gave the example the other day of confusing the identifiers
A and А. There must be easily hundreds (thousands?) of other such confusables.

So you think thats nice and APL(-ese), Scheme(-ish) is not...???

Confused by your stand...

Personally I dont believe that unicode has been designed with
programming languages in mind.

Assuming that unicode categories will naturally and easily fit
programming language lexical/syntax categories is rather naive.

[toc] | [prev] | [next] | [standalone]

#69522

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2014-04-01 09:53 -0600
Message-ID	<mailman.8807.1396367664.18130.python-list@python.org>
In reply to	#69195

On Tue, Apr 1, 2014 at 7:44 AM, Chris Angelico <rosuav@gmail.com> wrote:
> On Wed, Apr 2, 2014 at 12:33 AM, Ned Batchelder <ned@nedbatchelder.com> wrote:
>> Maybe I'm misunderstanding the discussion... It seems like we're talking
>> about a hypothetical definition of identifiers based on Unicode character
>> categories, but there's no need: Python 3 has defined precisely that.  From
>> the docs
>> (https://docs.python.org/3/reference/lexical_analysis.html#identifiers):
>>
>
> "Python 3.0 introduces **additional characters** from outside the
> ASCII range" - emphasis mine.
>
> Python currently has - at least, per that documentation - a hybrid
> system with ASCII characters defined in the classic way, and non-ASCII
> characters defined by their Unicode character classes. I'm talking
> about a system that's _purely_ defined by Unicode character classes.
> It may turn out that the class list exactly compasses the ASCII
> characters listed, though, in which case you'd be right: it's not
> hypothetical.

The only ASCII character not encompassed is that _ is explicitly
permitted to start an identifier (for obvious reasons) whereas
characters in Pc are more generally only permitted to continue
identifiers.

There are also explicit lists of extra permitted characters in
PropList.txt for backward compatibility (once a character is
permitted, it should remain permitted even if its Unicode category
changes).  There are currently 4 extra starting characters and 12
extra continuing characters, but none of these are ASCII.

[toc] | [prev] | [next] | [standalone]

#69097

From	MRAB <python@mrabarnett.plus.com>
Date	2014-03-26 02:56 +0000
Message-ID	<mailman.8560.1395802562.18130.python-list@python.org>
In reply to	#69057

On 2014-03-25 22:47, Ethan Furman wrote:
> On 03/25/2014 12:29 PM, Mark H Harris wrote:
>> On 3/25/14 2:24 PM, MRAB wrote:
>>> It's explained in PEP 3131.
>>>
>>> Basically, a name should to start with a letter (this has been extended
>>> to include Chinese characters, etc) or an underscore.
>>>
>>> λ is a classified as Lowercase_Letter.
>>>
>>> √ is classified as Math_Symbol.
>>
>>     Thanks much!  I'll note that for improvements. Any unicode symbol (that is not a number) should be allowed as an
>> identifier.
>
> No, it shouldn't.  Doing so would mean we could not use √ as the square root operator in the future.
>
Or as a root operator, e.g. 3 √ x (the cube root of x).

> Identifiers are made up of letters, numbers, and the underscore.  Considering all the unicode letters and unicode
> numbers out there, you shouldn't be lacking for names.
>

[toc] | [prev] | [next] | [standalone]

#69098

From	Chris Angelico <rosuav@gmail.com>
Date	2014-03-26 14:09 +1100
Message-ID	<mailman.8561.1395803386.18130.python-list@python.org>
In reply to	#69057

On Wed, Mar 26, 2014 at 1:56 PM, MRAB <python@mrabarnett.plus.com> wrote:
>> No, it shouldn't.  Doing so would mean we could not use √ as the square
>> root operator in the future.
>>
> Or as a root operator, e.g. 3 √ x (the cube root of x).

Or both! It could be like unary negation and binary subtraction.

ChrisA

[toc] | [prev] | [next] | [standalone]

#69106

From	Antoon Pardon <antoon.pardon@rece.vub.ac.be>
Date	2014-03-26 09:25 +0100
Message-ID	<mailman.8564.1395822345.18130.python-list@python.org>
In reply to	#69057

On 25-03-14 23:47, Ethan Furman wrote:
> On 03/25/2014 12:29 PM, Mark H Harris wrote:
>> On 3/25/14 2:24 PM, MRAB wrote:
>>> It's explained in PEP 3131.
>>>
>>> Basically, a name should to start with a letter (this has been extended
>>> to include Chinese characters, etc) or an underscore.
>>>
>>> λ is a classified as Lowercase_Letter.
>>>
>>> √ is classified as Math_Symbol.
>>
>>     Thanks much!  I'll note that for improvements. Any unicode symbol
>> (that is not a number) should be allowed as an
>> identifier.
>
> No, it shouldn't.  Doing so would mean we could not use √ as the
> square root operator in the future.

And what advantage would that bring over just using it as a function?

-- 
Antoon Pardon

[toc] | [prev] | [next] | [standalone]

Page 3 of 4 — ← Prev page 1 2 [3] 4 Next page →

csiph-web

unicode as valid naming symbols

Contents

#69512

#69523

#69534

#69535

#69539

#69502

#69501

#69503

#69506

#69507

#69510

#69514

#69509

#69513

#69515

#69516

#69522

#69097

#69098

#69106