Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #106266 > unrolled thread
| Started by | Michael Okuntsov <okuntsov.mikhail@yandex.ru> |
|---|---|
| First post | 2016-04-02 03:48 +0600 |
| Last post | 2016-04-04 17:19 -0600 |
| Articles | 20 on this page of 110 — 29 participants |
Back to article view | Back to comp.lang.python
[beginner] What's wrong? Michael Okuntsov <okuntsov.mikhail@yandex.ru> - 2016-04-02 03:48 +0600
Re: [beginner] What's wrong? Michael Okuntsov <okuntsov.mikhail@yandex.ru> - 2016-04-02 04:10 +0600
Re: [beginner] What's wrong? sohcahtoa82@gmail.com - 2016-04-01 15:44 -0700
Re: [beginner] What's wrong? Random832 <random832@fastmail.com> - 2016-04-02 00:27 -0400
Re: [beginner] What's wrong? Michael Selik <michael.selik@gmail.com> - 2016-04-02 05:36 +0000
Re: [beginner] What's wrong? William Ray Wing <wrw@mac.com> - 2016-04-02 00:54 -0400
Re: [beginner] What's wrong? Chris Angelico <rosuav@gmail.com> - 2016-04-02 19:15 +1100
Re: [beginner] What's wrong? Michael Selik <michael.selik@gmail.com> - 2016-04-02 14:48 +0000
Re: [beginner] What's wrong? Chris Angelico <rosuav@gmail.com> - 2016-04-03 01:55 +1100
Re: [beginner] What's wrong? Marko Rauhamaa <marko@pacujo.net> - 2016-04-02 18:07 +0300
Re: [beginner] What's wrong? Chris Angelico <rosuav@gmail.com> - 2016-04-03 02:36 +1100
Re: [beginner] What's wrong? Steven D'Aprano <steve@pearwood.info> - 2016-04-03 02:06 +1000
Re: [beginner] What's wrong? Marko Rauhamaa <marko@pacujo.net> - 2016-04-02 19:44 +0300
Re: [beginner] What's wrong? Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2016-04-02 19:12 +0200
Re: [beginner] What's wrong? Rustom Mody <rustompmody@gmail.com> - 2016-04-02 10:28 -0700
Re: [beginner] What's wrong? Marko Rauhamaa <marko@pacujo.net> - 2016-04-02 21:43 +0300
Re: [beginner] What's wrong? Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2016-04-03 13:47 +0200
Re: [beginner] What's wrong? Rustom Mody <rustompmody@gmail.com> - 2016-04-03 07:30 -0700
Re: [beginner] What's wrong? Dan Sommers <dan@tombstonezero.net> - 2016-04-03 15:25 +0000
Re: [beginner] What's wrong? Rustom Mody <rustompmody@gmail.com> - 2016-04-03 08:39 -0700
Re: [beginner] What's wrong? Dan Sommers <dan@tombstonezero.net> - 2016-04-03 16:22 +0000
Re: [beginner] What's wrong? Chris Angelico <rosuav@gmail.com> - 2016-04-04 02:44 +1000
Re: [beginner] What's wrong? Rustom Mody <rustompmody@gmail.com> - 2016-04-03 10:18 -0700
Re: [beginner] What's wrong? Chris Angelico <rosuav@gmail.com> - 2016-04-04 03:35 +1000
Re: [beginner] What's wrong? Dan Sommers <dan@tombstonezero.net> - 2016-04-03 18:26 +0000
Re: [beginner] What's wrong? Rustom Mody <rustompmody@gmail.com> - 2016-04-03 08:46 -0700
Re: [beginner] What's wrong? Larry Martell <larry.martell@gmail.com> - 2016-04-03 11:55 -0400
Re: [beginner] What's wrong? Chris Angelico <rosuav@gmail.com> - 2016-04-04 01:53 +1000
Re: [beginner] What's wrong? Rustom Mody <rustompmody@gmail.com> - 2016-04-03 09:49 -0700
Re: [beginner] What's wrong? Dan Sommers <dan@tombstonezero.net> - 2016-04-03 18:32 +0000
Re: [beginner] What's wrong? Dan Sommers <dan@tombstonezero.net> - 2016-04-03 16:07 +0000
Re: [beginner] What's wrong? Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2016-04-06 21:56 +0200
Unicode normalisation [was Re: [beginner] What's wrong?] Steven D'Aprano <steve@pearwood.info> - 2016-04-07 11:37 +1000
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Marko Rauhamaa <marko@pacujo.net> - 2016-04-07 09:36 +0300
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Peter Pearson <pkpearson@nowhere.invalid> - 2016-04-07 16:51 +0000
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Rustom Mody <rustompmody@gmail.com> - 2016-04-07 21:43 -0700
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Rustom Mody <rustompmody@gmail.com> - 2016-04-07 21:47 -0700
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Chris Angelico <rosuav@gmail.com> - 2016-04-08 14:54 +1000
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Rustom Mody <rustompmody@gmail.com> - 2016-04-08 10:51 -0700
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Steven D'Aprano <steve@pearwood.info> - 2016-04-08 16:00 +1000
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Chris Angelico <rosuav@gmail.com> - 2016-04-08 16:13 +1000
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Peter Pearson <pkpearson@nowhere.invalid> - 2016-04-08 17:21 +0000
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Marko Rauhamaa <marko@pacujo.net> - 2016-04-08 20:44 +0300
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Chris Angelico <rosuav@gmail.com> - 2016-04-09 03:50 +1000
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Peter Pearson <pkpearson@nowhere.invalid> - 2016-04-08 18:03 +0000
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Rustom Mody <rustompmody@gmail.com> - 2016-04-08 11:17 -0700
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Rustom Mody <rustompmody@gmail.com> - 2016-04-08 11:20 -0700
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Rustom Mody <rustompmody@gmail.com> - 2016-04-08 11:04 -0700
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-04-08 20:20 -0400
Re: Unicode normalisation [was Re: [beginner] What's wrong?] alister <alister.ware@ntlworld.com> - 2016-04-09 08:30 +0000
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Ben Bacarisse <ben.usenet@bsb.me.uk> - 2016-04-09 14:43 +0100
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Ben Bacarisse <ben.usenet@bsb.me.uk> - 2016-04-09 15:34 +0100
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-04-09 14:30 -0400
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Rustom Mody <rustompmody@gmail.com> - 2016-04-09 09:08 -0700
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Ben Bacarisse <ben.usenet@bsb.me.uk> - 2016-04-09 19:27 +0100
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-04-09 20:25 +0100
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Stephen Hansen <me@ixokai.io> - 2016-04-09 12:45 -0700
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-04-10 20:35 +1200
QWERTY was not designed to intentionally slow typists down (was: Unicode normalisation [was Re: [beginner] What's wrong?]) Ben Finney <ben+python@benfinney.id.au> - 2016-04-09 10:43 +1000
Re: QWERTY was not designed to intentionally slow typists down (was: Unicode normalisation [was Re: [beginner] What's wrong?]) Steven D'Aprano <steve@pearwood.info> - 2016-04-09 13:28 +1000
Re: QWERTY was not designed to intentionally slow typists down (was: Unicode normalisation [was Re: [beginner] What's wrong?]) Random832 <random832@fastmail.com> - 2016-04-09 11:44 -0400
Re: QWERTY was not designed to intentionally slow typists down (was: Unicode normalisation [was Re: [beginner] What's wrong?]) Random832 <random832@fastmail.com> - 2016-04-09 11:53 -0400
Re: QWERTY was not designed to intentionally slow typists down (was: Unicode normalisation [was Re: [beginner] What's wrong?]) Steven D'Aprano <steve@pearwood.info> - 2016-04-18 11:39 +1000
Re: QWERTY was not designed to intentionally slow typists down (was: Unicode normalisation [was Re: [beginner] What's wrong?]) Random832 <random832@fastmail.com> - 2016-04-17 22:01 -0400
Re: QWERTY was not designed to intentionally slow typists down (was: Unicode normalisation [was Re: [beginner] What's wrong?]) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-04-18 17:21 +1000
Re: QWERTY was not designed to intentionally slow typists down Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-04-18 21:17 +1200
Re: QWERTY was not designed to intentionally slow typists down (was: Unicode normalisation [was Re: [beginner] What's wrong?]) Chris Angelico <rosuav@gmail.com> - 2016-04-18 12:09 +1000
Re: QWERTY was not designed to intentionally slow typists down Michael Torrie <torriem@gmail.com> - 2016-04-17 21:50 -0600
Re: QWERTY was not designed to intentionally slow typists down (was: Unicode normalisation [was Re: [beginner] What's wrong?]) Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-04-18 00:06 -0400
Re: QWERTY was not designed to intentionally slow typists down (was: Unicode normalisation [was Re: [beginner] What's wrong?]) Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-04-09 14:52 -0400
Re: QWERTY was not designed to intentionally slow typists down (was: Unicode normalisation [was Re: [beginner] What's wrong?]) pyotr filipivich <phamp@mindspring.com> - 2016-04-09 20:09 -0700
Re: QWERTY was not designed to intentionally slow typists down (was: Unicode normalisation [was Re: [beginner] What's wrong?]) Ian Kelly <ian.g.kelly@gmail.com> - 2016-04-10 07:43 -0600
Re: QWERTY was not designed to intentionally slow typists down (was: Unicode normalisation [was Re: [beginner] What's wrong?]) pyotr filipivich <phamp@mindspring.com> - 2016-04-10 19:14 -0700
Re: QWERTY was not designed to intentionally slow typists down Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-04-09 20:13 +0100
Re: QWERTY was not designed to intentionally slow typists down alister <alister.ware@ntlworld.com> - 2016-04-09 20:22 +0000
Re: QWERTY was not designed to intentionally slow typists down Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-04-09 22:23 +0100
Re: QWERTY was not designed to intentionally slow typists down Tim Golden <mail@timgolden.me.uk> - 2016-04-09 22:51 +0100
Re: QWERTY was not designed to intentionally slow typists down Tim Golden <mail@timgolden.me.uk> - 2016-04-09 20:25 +0100
Re: QWERTY was not designed to intentionally slow typists down Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-04-09 20:36 +0100
Re: QWERTY was not designed to intentionally slow typists down Ethan Furman <ethan@stoneleaf.us> - 2016-04-09 14:33 -0700
RE: [E] QWERTY was not designed to intentionally slow typists down (was: Unicode normalisation [was Re: [beginner] What's wrong?]) "Coll-Barth, Michael" <Michael.Coll-Barth@VerizonWireless.com> - 2016-04-09 13:31 -0400
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Steven D'Aprano <steve@pearwood.info> - 2016-04-09 04:44 +1000
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Marko Rauhamaa <marko@pacujo.net> - 2016-04-08 21:55 +0300
Re: Unicode normalisation [was Re: [beginner] What's wrong?] Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-04-10 21:25 +1200
Re: [beginner] What's wrong? Steven D'Aprano <steve@pearwood.info> - 2016-04-03 09:49 +1000
Re: [beginner] What's wrong? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-04-03 01:26 +0100
Re: [beginner] What's wrong? Rustom Mody <rustompmody@gmail.com> - 2016-04-03 07:52 -0700
Re: [beginner] What's wrong? Michael Okuntsov <okuntsov.mikhail@yandex.ru> - 2016-04-03 22:24 +0600
Re: [beginner] What's wrong? Chris Angelico <rosuav@gmail.com> - 2016-04-04 02:28 +1000
Re: [beginner] What's wrong? Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-04-03 16:57 +1200
Re: [beginner] What's wrong? Steven D'Aprano <steve@pearwood.info> - 2016-04-03 15:34 +1000
Re: [beginner] What's wrong? Terry Reedy <tjreedy@udel.edu> - 2016-04-02 15:07 -0400
Re: [beginner] What's wrong? Marko Rauhamaa <marko@pacujo.net> - 2016-04-02 22:36 +0300
Re: [beginner] What's wrong? Michael Selik <michael.selik@gmail.com> - 2016-04-02 21:42 +0000
Re: [beginner] What's wrong? Steven D'Aprano <steve@pearwood.info> - 2016-04-03 10:48 +1000
Re: [beginner] What's wrong? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-04-03 02:04 +0100
Re: [beginner] What's wrong? alister <alister.ware@ntlworld.com> - 2016-04-03 12:37 +0000
Re: [beginner] What's wrong? Terry Reedy <tjreedy@udel.edu> - 2016-04-02 14:59 -0400
Re: [beginner] What's wrong? Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-04-03 16:43 +1200
Re: [beginner] What's wrong? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-04-02 12:31 -0400
Re: [beginner] What's wrong? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-04-03 00:58 +0100
Re: [beginner] What's wrong? sohcahtoa82@gmail.com - 2016-04-08 15:59 -0700
Re: [beginner] What's wrong? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-04-09 00:07 +0100
Re: [beginner] What's wrong? Michael Torrie <torriem@gmail.com> - 2016-04-02 16:49 -0600
Re: [beginner] What's wrong? Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2016-04-03 10:12 +0200
Re: [beginner] What's wrong? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-04-04 15:04 +0100
Re: [beginner] What's wrong? BartC <bc@freeuk.com> - 2016-04-04 15:51 +0100
From email addresses sometimes strange on this list - was Re: [beginner] What's wrong? Michael Torrie <torriem@gmail.com> - 2016-04-04 16:55 -0600
Re: From email addresses sometimes strange on this list - was Re: [beginner] What's wrong? Chris Angelico <rosuav@gmail.com> - 2016-04-05 08:58 +1000
Re: From email addresses sometimes strange on this list - was Re: [beginner] What's wrong? Michael Torrie <torriem@gmail.com> - 2016-04-04 17:19 -0600
Page 2 of 6 — ← Prev page 1 [2] 3 4 5 6 Next page →
| From | Dan Sommers <dan@tombstonezero.net> |
|---|---|
| Date | 2016-04-03 16:22 +0000 |
| Message-ID | <ndrg0v$9va$3@dont-email.me> |
| In reply to | #106367 |
On Sun, 03 Apr 2016 08:39:02 -0700, Rustom Mody wrote: > On Sunday, April 3, 2016 at 8:58:59 PM UTC+5:30, Dan Sommers wrote: >> On Sun, 03 Apr 2016 07:30:47 -0700, Rustom Mody wrote: >> >> > So here are some examples to illustrate what I am saying: >> >> [A vs a, A vs A, flag vs flag, etc.] > <snip> >> I understand that in some use cases, flag and flag represent the same >> English word, but please don't extend that to identifiers in my >> software. > I wonder once again if you are getting my point opposite to the one I > am making. With ASCII there were problems like O vs 0 -- niggling but > small. > > With Unicode its a gigantic pandora box. Python by allowing unicode > identifiers without restraint has made grief for unsuspecting > programmers. What about the A vs a case, which comes up even with ASCII-only characters? If those are the same, then I, as a reader of Python code, have to understand all the rules about ß (which I think have changed over time), and potentially þ and others. > That is why my original suggestion that there should have been alongside this > 'brave new world', a pragma wherein a programmer can EXPLICITLY declare > #language Greek > Then he is knowingly opting into possible clashes between A and Α > But not between A and А. If I declared #language Greek, then I'd expect an identifier like A to be rejected by the compiler. That said, I don't know if that sort of distinction is as clear cut in every language supported by Unicode. And just to cause trouble (because that's the way I feel today), can I declare #γλώσσα Ελληνική ;-) > [And if you think the above is a philosophical disquisition on > Aristotle's law of identity: "A is A" you just proved my point that > unconstrained Unicode identifiers is a mess] Can we take a "we're all adults here" approach? For the same reason that adults don't use identifiers like xl0, x10, xlO, and xl0 anywhere near each other, shouldn't we also not use A and A anywhere near each other? I certainly don't want the language itself to [try to] reject x10 and xIO because they look too much alike in many fonts.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-04-04 02:44 +1000 |
| Message-ID | <mailman.400.1459701865.28225.python-list@python.org> |
| In reply to | #106375 |
On Mon, Apr 4, 2016 at 2:22 AM, Dan Sommers <dan@tombstonezero.net> wrote: > What about the A vs a case, which comes up even with ASCII-only > characters? If those are the same, then I, as a reader of Python code, > have to understand all the rules about ß (which I think have changed > over time), and potentially þ and others. And Iİıi, and Σσς, and (if you want completeness) ſ too. And various other case conversion rules. It's not possible to case-fold perfectly without knowing what language something is. This, coupled with the extremely useful case distinction between "Classes" and "instances", means I'm very much glad Python is case sensitive. "base = Base()" is perfectly legal and meaningful, no matter what language you translate those words into (well, as long as it's bicameral - otherwise you need to adorn one of them somehow, but you'd have to anyway). ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2016-04-03 10:18 -0700 |
| Message-ID | <257887ea-df88-4229-b045-57d50b7e60b1@googlegroups.com> |
| In reply to | #106375 |
On Sunday, April 3, 2016 at 9:56:24 PM UTC+5:30, Dan Sommers wrote:
> On Sun, 03 Apr 2016 08:39:02 -0700, Rustom Mody wrote:
>
> > On Sunday, April 3, 2016 at 8:58:59 PM UTC+5:30, Dan Sommers wrote:
> >> On Sun, 03 Apr 2016 07:30:47 -0700, Rustom Mody wrote:
> >>
> >> > So here are some examples to illustrate what I am saying:
> >>
> >> [A vs a, A vs A, flag vs flag, etc.]
> > <snip>
> >> I understand that in some use cases, flag and flag represent the same
> >> English word, but please don't extend that to identifiers in my
> >> software.
>
> > I wonder once again if you are getting my point opposite to the one I
> > am making. With ASCII there were problems like O vs 0 -- niggling but
> > small.
> >
> > With Unicode its a gigantic pandora box. Python by allowing unicode
> > identifiers without restraint has made grief for unsuspecting
> > programmers.
>
> What about the A vs a case, which comes up even with ASCII-only
> characters? If those are the same, then I, as a reader of Python code,
> have to understand all the rules about ß (which I think have changed
> over time), and potentially þ and others.
Dont get your point.
If you know German then these rules should be clear enough to you
If not youve probably got bigger problems reading that code anyway
As illustration, here is Marko's code few posts back:
for oppilas in luokka:
if oppilas.hylätty():
oppilas.ilmoita(oppilas.koetulokset)
Does it make sense to you?
>
> > That is why my original suggestion that there should have been alongside this
> > 'brave new world', a pragma wherein a programmer can EXPLICITLY declare
> > #language Greek
> > Then he is knowingly opting into possible clashes between A and Α
> > But not between A and А.
>
> If I declared #language Greek, then I'd expect an identifier like A to
> be rejected by the compiler. That said, I don't know if that sort of
> distinction is as clear cut in every language supported by Unicode.
>
> And just to cause trouble (because that's the way I feel today), can I
> declare
>
> #γλώσσα Ελληνική
>
> ;-)
>
> > [And if you think the above is a philosophical disquisition on
> > Aristotle's law of identity: "A is A" you just proved my point that
> > unconstrained Unicode identifiers is a mess]
>
> Can we take a "we're all adults here" approach?
Who's the 'we' we are talking about?
> For the same reason
> that adults don't use identifiers like xl0, x10, xlO, and xl0 anywhere
> near each other, shouldn't we also not use A and A anywhere near each
> other? I certainly don't want the language itself to [try to] reject
> x10 and xIO because they look too much alike in many fonts.
When Kernighan and Ritchie wrote C there was no problem with gets.
Then suddenly, decades later the problem exploded.
What happened?
Here's an analysis:
Security means two almost completely unrelated concepts
- protection against shooting oneself in the foot (remember the 'protected'
keyword of C++ ?)
- protection against intelligent, capable, motivated criminals
Lets call them security-s (against stupidity) and security-c (against criminals)
Security-c didnt figure because computers were anyway physically secured and
there was no much internet to speak of.
gets was provided exactly on your principle of 'consenting-adults' -- if you
use it you know what you are using.
Then suddenly computers became net-facing and their servers could be written by
'consenting' (to whom?) adults using gets.
Voila -- Security has just become a lucrative profession!
I believe python's situation of laissez-faire unicode is similarly trouble-inviting.
While I personally dont know enough about security to be able to demonstrate a
full sequence of events, here's a little fun I had with Chris:
https://mail.python.org/pipermail/python-list/2014-May/672413.html
Do you not think this could be tailored into something more sinister and
dangerous?
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-04-04 03:35 +1000 |
| Message-ID | <mailman.403.1459704960.28225.python-list@python.org> |
| In reply to | #106380 |
On Mon, Apr 4, 2016 at 3:18 AM, Rustom Mody <rustompmody@gmail.com> wrote: > While I personally dont know enough about security to be able to demonstrate a > full sequence of events, here's a little fun I had with Chris: > > https://mail.python.org/pipermail/python-list/2014-May/672413.html > > Do you not think this could be tailored into something more sinister and > dangerous? I honestly don't know what you're proving there. You didn't import a file called "1.py"; you just created a file with a non-ASCII name and used a non-ASCII identifier to import it. In other words, you did exactly what Unicode should allow: names in any language. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Dan Sommers <dan@tombstonezero.net> |
|---|---|
| Date | 2016-04-03 18:26 +0000 |
| Message-ID | <ndrn97$9va$4@dont-email.me> |
| In reply to | #106380 |
On Sun, 03 Apr 2016 10:18:45 -0700, Rustom Mody wrote:
> On Sunday, April 3, 2016 at 9:56:24 PM UTC+5:30, Dan Sommers wrote:
>> On Sun, 03 Apr 2016 08:39:02 -0700, Rustom Mody wrote:
>>
>> > On Sunday, April 3, 2016 at 8:58:59 PM UTC+5:30, Dan Sommers wrote:
>> >> On Sun, 03 Apr 2016 07:30:47 -0700, Rustom Mody wrote:
>> >>
>> >> > So here are some examples to illustrate what I am saying:
>> >>
>> >> [A vs a, A vs A, flag vs flag, etc.]
>> > <snip>
>> >> I understand that in some use cases, flag and flag represent the same
>> >> English word, but please don't extend that to identifiers in my
>> >> software.
>>
>> > I wonder once again if you are getting my point opposite to the one I
>> > am making. With ASCII there were problems like O vs 0 -- niggling but
>> > small.
>> >
>> > With Unicode its a gigantic pandora box. Python by allowing unicode
>> > identifiers without restraint has made grief for unsuspecting
>> > programmers.
>>
>> What about the A vs a case, which comes up even with ASCII-only
>> characters? If those are the same, then I, as a reader of Python code,
>> have to understand all the rules about ß (which I think have changed
>> over time), and potentially þ and others.
>
> Dont get your point.
> If you know German then these rules should be clear enough to you
> If not youve probably got bigger problems reading that code anyway
My point is that case sensitivity is good. I was disagreeing with your
point about scheme getting A vs a "right" and Python and C and Unix
getting it "wrong."
My larger point, and my experience, is that case sensitivity is easier
for to handle than case insensitivity. Most of the time, the same
letter's capital and small renditions look different from each other (A
vs a, Q vs q, and even Þ and þ is no worse than O and o), and there are
no context sensitive conversion rules to worry about.
> As illustration, here is Marko's code few posts back:
>
> for oppilas in luokka:
> if oppilas.hylätty():
> oppilas.ilmoita(oppilas.koetulokset)
>
> Does it make sense to you?
It makes enough sense to recognize the idiom: for each item in a
collection that satisfies a predicate, call a method on the item.
My point here is that while the identifiers themselves can be enormously
helpful to someone seeing a block of code for the first time or
maintaining it five years later, it's just as important to recognize
quickly that one identifier is not the same as another one, or that a
particular identifier only appears once or only in certain syntactical
constructs.
If the above code were written a little differently, we'd be having a
completely different discussion:
for list in object:
if list.clear():
list.pop(list.append)
>> Can we take a "we're all adults here" approach?
>
> Who's the 'we' we are talking about?
The community, who has accepted Python as a case-sensitive language and
knows better than to use identifiers that look too much alike or are
otherwise deliberatly mis-leading.
>> For the same reason
>> that adults don't use identifiers like xl0, x10, xlO, and xl0 anywhere
>> near each other, shouldn't we also not use A and A anywhere near each
>> other? I certainly don't want the language itself to [try to] reject
>> x10 and xIO because they look too much alike in many fonts.
>
> When Kernighan and Ritchie wrote C there was no problem with gets.
> Then suddenly, decades later the problem exploded.
When Kernighan and Ritchie wrote C there *was* a problem with gets.
> What happened?
The problem was no longer isolated to taking down one Unix process or a
single machine, or discovering passwords on that one machine.
> Here's an analysis:
> Security means two almost completely unrelated concepts
> - protection against shooting oneself in the foot (remember the 'protected'
> keyword of C++ ?)
> - protection against intelligent, capable, motivated criminals
> Lets call them security-s (against stupidity) and security-c (against criminals)
>
> Security-c didnt figure because computers were anyway physically secured and
> there was no much internet to speak of.
> gets was provided exactly on your principle of 'consenting-adults' -- if you
> use it you know what you are using.
>
> Then suddenly computers became net-facing and their servers could be
> written by 'consenting' (to whom?) adults using gets.
>
> Voila -- Security has just become a lucrative profession!
I can't prevent insecure web servers, or unknowing users. Allowing or
disallowing A and A and А to coexist in the source code doesn't matter.
> I believe python's situation of laissez-faire unicode is similarly
> trouble-inviting.
I'm not sure I agree, but I didn't timing attacks on cryptographic
algorithms or devices reading passwords from air-gapped computers
coming, either.
I do know that complexity is also a source of bugs and security risks.
Allowing or disallowing certain unicode code points in identifiers, and
declaring that identifiers consisting of the same sequence of code
points are the same, is way less complex than getting something else
(even something as "simple" as case-insensitivity) right for all cases.
> While I personally dont know enough about security to be able to demonstrate a
> full sequence of events, here's a little fun I had with Chris:
>
> https://mail.python.org/pipermail/python-list/2014-May/672413.html
>
> Do you not think this could be tailored into something more sinister and
> dangerous?
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2016-04-03 08:46 -0700 |
| Message-ID | <a3abadcc-a9f9-49ea-b395-15e1c32c9618@googlegroups.com> |
| In reply to | #106366 |
On Sunday, April 3, 2016 at 8:58:59 PM UTC+5:30, Dan Sommers wrote: > Yes, it's marginally annoying, and a security hole waiting to happen, > that A and A often look very much alike. "A security hole waiting to happen" = "Marginally annoying" Frankly I find this juxtaposition alarming Personal note: I once was idiot enough to have root with password root123 and transferring some files to a friend ... over ssh... Lost my entire installation in a matter of minutes
[toc] | [prev] | [next] | [standalone]
| From | Larry Martell <larry.martell@gmail.com> |
|---|---|
| Date | 2016-04-03 11:55 -0400 |
| Message-ID | <mailman.395.1459698984.28225.python-list@python.org> |
| In reply to | #106368 |
On Sun, Apr 3, 2016 at 11:46 AM, Rustom Mody <rustompmody@gmail.com> wrote: > Personal note: I once was idiot enough to have root with password root123 I changed my password to "incorrect," so whenever I forget it the computer will say, "Your password is incorrect."
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-04-04 01:53 +1000 |
| Message-ID | <mailman.396.1459699225.28225.python-list@python.org> |
| In reply to | #106368 |
On Mon, Apr 4, 2016 at 1:46 AM, Rustom Mody <rustompmody@gmail.com> wrote: > On Sunday, April 3, 2016 at 8:58:59 PM UTC+5:30, Dan Sommers wrote: >> Yes, it's marginally annoying, and a security hole waiting to happen, >> that A and A often look very much alike. > > > "A security hole waiting to happen" = "Marginally annoying" > > Frankly I find this juxtaposition alarming > > Personal note: I once was idiot enough to have root with password root123 > and transferring some files to a friend ... over ssh... > Lost my entire installation in a matter of minutes Exactly why did you have root ssh access with a password? ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2016-04-03 09:49 -0700 |
| Message-ID | <ba4e0e23-c12a-4d51-8a86-17b3b7e19eef@googlegroups.com> |
| In reply to | #106370 |
On Sunday, April 3, 2016 at 9:30:40 PM UTC+5:30, Chris Angelico wrote: > Exactly why did you have root ssh access with a password? Umm... Dont exactly remember. Probably it was not strictly necessary. Combination of carelessness, stupidity, hurry.... Brings me to... On Sunday, April 3, 2016 at 9:41:11 PM UTC+5:30, Dan Sommers wrote: > On Sun, 03 Apr 2016 08:46:59 -0700, Rustom Mody wrote: > > > On Sunday, April 3, 2016 at 8:58:59 PM UTC+5:30, Dan Sommers wrote: > >> Yes, it's marginally annoying, and a security hole waiting to happen, > >> that A and A often look very much alike. > > > > "A security hole waiting to happen" = "Marginally annoying" > > > > Frankly I find this juxtaposition alarming > > Sorry about that. > > I didn't mean to equate the two. I meant to point out that the fact > that A and A look alike can be one, or both, of those things. Perhaps I > should have used "or" instead of "and." Chill! No offence. Just that when you have the above ingredients (carelessness, stupidity, hurry....) multiplied by a GHz clock, it makes for spicy security incidents(!). I just meant to say that "Just a lil security incident" is not a helpful attitude to foster
[toc] | [prev] | [next] | [standalone]
| From | Dan Sommers <dan@tombstonezero.net> |
|---|---|
| Date | 2016-04-03 18:32 +0000 |
| Message-ID | <ndrnjp$9va$5@dont-email.me> |
| In reply to | #106379 |
On Sun, 03 Apr 2016 09:49:03 -0700, Rustom Mody wrote: > On Sunday, April 3, 2016 at 9:41:11 PM UTC+5:30, Dan Sommers wrote: >> On Sun, 03 Apr 2016 08:46:59 -0700, Rustom Mody wrote: >> >> > On Sunday, April 3, 2016 at 8:58:59 PM UTC+5:30, Dan Sommers wrote: >> >> Yes, it's marginally annoying, and a security hole waiting to happen, >> >> that A and A often look very much alike. >> > >> > "A security hole waiting to happen" = "Marginally annoying" >> > >> > Frankly I find this juxtaposition alarming >> >> Sorry about that. >> >> I didn't mean to equate the two. I meant to point out that the fact >> that A and A look alike can be one, or both, of those things. Perhaps I >> should have used "or" instead of "and." > > Chill! No offence. I'm chilled. :-) No offense taken. I am arguably overly sensitive to putting forth an argument that isn't clear and concise (because I've also been known to derail the proceedings until I can get my head around someone else's argument). > Just that when you have the above ingredients (carelessness, > stupidity, hurry....) multiplied by a GHz clock, it makes for spicy > security incidents(!). I just meant to say that "Just a lil security > incident" is not a helpful attitude to foster On this we agree. :-)
[toc] | [prev] | [next] | [standalone]
| From | Dan Sommers <dan@tombstonezero.net> |
|---|---|
| Date | 2016-04-03 16:07 +0000 |
| Message-ID | <ndrf4e$9va$2@dont-email.me> |
| In reply to | #106368 |
On Sun, 03 Apr 2016 08:46:59 -0700, Rustom Mody wrote: > On Sunday, April 3, 2016 at 8:58:59 PM UTC+5:30, Dan Sommers wrote: >> Yes, it's marginally annoying, and a security hole waiting to happen, >> that A and A often look very much alike. > > "A security hole waiting to happen" = "Marginally annoying" > > Frankly I find this juxtaposition alarming Sorry about that. I didn't mean to equate the two. I meant to point out that the fact that A and A look alike can be one, or both, of those things. Perhaps I should have used "or" instead of "and."
[toc] | [prev] | [next] | [standalone]
| From | Thomas 'PointedEars' Lahn <PointedEars@web.de> |
|---|---|
| Date | 2016-04-06 21:56 +0200 |
| Message-ID | <1584744.4h7ToaqLat@PointedEars.de> |
| In reply to | #106360 |
Rustom Mody wrote: > On Sunday, April 3, 2016 at 5:17:36 PM UTC+5:30, Thomas 'PointedEars' Lahn > wrote: >> Rustom Mody wrote: >> > When python went to full unicode identifers it should have also added >> > pragmas for which blocks the programmer intended to use -- something >> > like a charset declaration of html. >> > >> > This way if the programmer says "I want latin and greek" >> > and then A and Α get mixed up well he asked for it. >> > If he didn't ask then springing it on him seems unnecessary and >> > uncalled for >> >> Nonsense. > > Some misunderstanding of what I said it looks > [Guessing also from Marko's "...silly..."] First of all, while bandwidth might not be precious anymore to some, free time still is. So please trim your quotations to the relevant minimum, to the parts you are actually referring to, and summarize properly if necessary. For if you continue this mindbogglingly stupid full-quoting, this is going to be my last reply to you for a long time. You have been warned. <https://www.netmeister.org/news/learn2quote.html> > So here are some examples to illustrate what I am saying: > > Example 1 -- Ligatures: > > Python3 gets it right >>>> flag = 1 >>>> flag > 1 Fascinating; confirmed with | $ python3 | Python 3.4.4 (default, Jan 5 2016, 15:35:18) | [GCC 5.3.1 20160101] on linux | […] I do not think this is correct, though. Different Unicode code sequences, after normalization, should result in different symbols. > Whereas haskell gets it wrong: > Prelude> let flag = 1 > Prelude> flag > > <interactive>:3:1: Not in scope: ‘flag’ > Prelude> flag > 1 > Prelude> I think Haskell gets it right here, while Py3k does not. The “fl” is not to be decomposed to “fl”. > Example 2 Case Sensitivity > Scheme¹ gets it right > >> (define a 1) >> A > 1 >> a > 1 So Scheme is case-insensitive there. So is (Visual) Basic. That does not make it (any) better. > Python gets it wrong >>>> a=1 >>>> A > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > NameError: name 'A' is not defined This is not wrong; it is just different. And given that identifiers starting with uppercase ought to be class names in Python (and other OOPLs that are case-sensitive there), and that a class name serves in constructor calls (in Python, instantiating a class is otherwise indistinguishable from a function call), it makes sense that the (maybe local) variable “a” should be different from the (probably global) class “A”. > [Likewise filenames windows gets right; Unix wrong] Utter nonsense. Apparently you are blissfully unaware of how much grief it has caused WinDOS lusers and users alike over the years that Micro$~1 decided in their infinite wisdom that letter case was not important. Example: By contrast to previous versions, FAT32 supports long filenames (VFAT). Go try changing a long filename from uppercase (“Really Long Filename.txt”) to partial lowercase (“Really long filename.txt”). It does not work, you get an error, because the underlying “short filename” is the same as it is has to be case-insensitive for backwards compatibility (“REALLY~1.TXT”) First you have to rename the file so that its name results in a different “short filename” (“REALLY~2.TXT”). Then you have to rename it again to get the proper letter case (by which the “short filename” might either become “REALLY~1.TXT” again or “REALLY~3.TXT”). > Unicode Identifiers in the spirit of IDN homograph attack. > Every language that 'supports' unicode gets it wrong NAK, see above. > Python3 >>>> A=1 >>>> Α > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > NameError: name 'Α' is not defined >>>> A > 1 > > Can you make out why A both is and is not defined? Fallacy. “A” is _not_ both defined and not defined. There is only one “A”. However, given the proper font, I might see at a glance what is wrong there. In fact, in my Konsole[tm] where the default font is “Courier 10 Pitch” I clearly see what is wrong there. “A” (U+0041 LATIN CAPITAL LETTER A) is displayed using that serif font where the letter has a serif to the left at cap height and serifs left and right on the baseline, while “Α” (U+0391 GREEK CAPITAL LETTER ALPHA) is displayed using a sans-serif font, where also the cap height is considerably higher. > When the language does not support it eg python2 the behavior is better NAK. Being able to use Unicode strings verbatim in a program without having to declare them is infinitely useful. Unicode identifiers appear to be merely a (happy?) side effect of that. > The notion of 'variable' in programming language is inherently based on > that of 'identifier'. ACK. > With ASCII the problems are minor: Case-distinct identifiers are distinct > -- they dont IDENTIFY. I do not think this is a problem. > This contradicts standard English usage and practice No, it does not. English distinguishes between proper *nouns* and proper *names* (the latter can be the former). For example, “Wednesday”, regardless where it occurs in a sentence, is an English word, a proper *name*; by contrast, “wednesday” is not only neither a proper noun nor a proper name; it is not a proper English *word* in the first place. “i” might be the imaginary unit or a marketing abbreviation for “internet” [1]; “I” is (AFAIK) *only* the English pronoun for referring to oneself. [1] <https://en.wikipedia.org/wiki/IMac#History> -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-04-07 11:37 +1000 |
| Subject | Unicode normalisation [was Re: [beginner] What's wrong?] |
| Message-ID | <5705b9ef$0$1611$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #106599 |
On Thu, 7 Apr 2016 05:56 am, Thomas 'PointedEars' Lahn wrote:
> Rustom Mody wrote:
>> So here are some examples to illustrate what I am saying:
>>
>> Example 1 -- Ligatures:
>>
>> Python3 gets it right
>>>>> flag = 1
>>>>> flag
>> 1
Python identifiers are intentionally normalised to reduce security issues,
or at least confusion and annoyance, due to visually-identical identifiers
being treated as different.
Unicode has technical standards dealing with identifiers:
http://www.unicode.org/reports/tr31/
and visual spoofing and confusables:
http://www.unicode.org/reports/tr39/
I don't believe that CPython goes to the full extreme of checking for mixed
script confusables, but it does partially mitigate the problem by
normalising identifiers.
Unfortunately PEP 3131 leaves a number of questions open. Presumably they
were answered in the implementation, but they aren't documented in the PEP.
https://www.python.org/dev/peps/pep-3131/
> Fascinating; confirmed with
>
> | $ python3
> | Python 3.4.4 (default, Jan 5 2016, 15:35:18)
> | [GCC 5.3.1 20160101] on linux
> | […]
>
> I do not think this is correct, though. Different Unicode code sequences,
> after normalization, should result in different symbols.
I think you are confused about normalisation. By definition, normalising
different Unicode code sequences may result in the same symbols, since that
is what normalisation means.
Consider two distinct strings which nevertheless look identical:
py> a = "\N{LATIN SMALL LETTER U}\N{COMBINING DIAERESIS}"
py> b = "\N{LATIN SMALL LETTER U WITH DIAERESIS}"
py> a == b
False
py> print(a, b)
ü ü
The purpose of normalisation is to turn one into the other:
py> unicodedata.normalize('NFKC', a) == b # compose 2 code points --> 1
True
py> unicodedata.normalize('NFKD', b) == a # decompose 1 code point --> 2
True
In the case of the fl ligature, normalisation splits the ligature into
individual 'f' and 'l' code points regardless of whether you compose or
decompose:
py> unicodedata.normalize('NFKC', "flag") == "flag"
True
py> unicodedata.normalize('NFKD', "flag") == "flag"
True
That's using the combatability composition form. Using the default
composition form leaves the ligature unchanged.
Note that UTS #39 (security mechanisms) suggests that identifiers should be
normalised using NFKC.
[...]
> I think Haskell gets it right here, while Py3k does not. The “fl” is not
> to be decomposed to “fl”.
The Unicode consortium seems to disagree with you. Table 1 of UTS #39 (see
link above) includes "Characters that cannot occur in strings normalized to
NFKC" in the Restricted category, that is, characters which should not be
used in identifiers. fl cannot occur in such normalised strings, and so it
is classified as Restricted and should not be used in identifiers.
I'm not entirely sure just how closely Python's identifiers follow the
standard, but I think that the intention is to follow something close to
"UAX31-R4. Equivalent Normalized Identifiers":
http://www.unicode.org/reports/tr31/#R4
[Rustom]
>> Python gets it wrong
>>>>> a=1
>>>>> A
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> NameError: name 'A' is not defined
>
> This is not wrong; it is just different.
I agree with Thomas here. Case-insensitivity is a choice, and I don't think
it is a good choice for programming identifiers. Being able to make case
distinctions between (let's say):
SPAM # a constant, or at least constant-by-convention
Spam # a class or type
spam # an instance
is useful.
[Rustom]
>> With ASCII the problems are minor: Case-distinct identifiers are distinct
>> -- they dont IDENTIFY.
>
> I do not think this is a problem.
>
>> This contradicts standard English usage and practice
>
> No, it does not.
I agree with Thomas here too. Although it is rare for case to make a
distinction in English, it does happen. As the old joke goes:
Capitalisation is the difference between helping my Uncle Jack off a horse,
and helping my uncle jack off a horse.
So even in English, capitalisation can make a semantic difference.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Marko Rauhamaa <marko@pacujo.net> |
|---|---|
| Date | 2016-04-07 09:36 +0300 |
| Subject | Re: Unicode normalisation [was Re: [beginner] What's wrong?] |
| Message-ID | <87mvp6hqnd.fsf@elektro.pacujo.net> |
| In reply to | #106608 |
Steven D'Aprano <steve@pearwood.info>: > So even in English, capitalisation can make a semantic difference. It can even make a pronunciation difference: polish vs Polish. Marko
[toc] | [prev] | [next] | [standalone]
| From | Peter Pearson <pkpearson@nowhere.invalid> |
|---|---|
| Date | 2016-04-07 16:51 +0000 |
| Subject | Re: Unicode normalisation [was Re: [beginner] What's wrong?] |
| Message-ID | <dmnhhaFq3t3U1@mid.individual.net> |
| In reply to | #106608 |
On Thu, 07 Apr 2016 11:37:50 +1000, Steven D'Aprano wrote:
> On Thu, 7 Apr 2016 05:56 am, Thomas 'PointedEars' Lahn wrote:
>> Rustom Mody wrote:
>
>>> So here are some examples to illustrate what I am saying:
>>>
>>> Example 1 -- Ligatures:
>>>
>>> Python3 gets it right
>>>>>> flag = 1
>>>>>> flag
>>> 1
[snip]
>>
>> I do not think this is correct, though. Different Unicode code sequences,
>> after normalization, should result in different symbols.
>
> I think you are confused about normalisation. By definition, normalising
> different Unicode code sequences may result in the same symbols, since that
> is what normalisation means.
>
> Consider two distinct strings which nevertheless look identical:
>
> py> a = "\N{LATIN SMALL LETTER U}\N{COMBINING DIAERESIS}"
> py> b = "\N{LATIN SMALL LETTER U WITH DIAERESIS}"
> py> a == b
> False
> py> print(a, b)
> ü ü
>
>
> The purpose of normalisation is to turn one into the other:
>
> py> unicodedata.normalize('NFKC', a) == b # compose 2 code points --> 1
> True
> py> unicodedata.normalize('NFKD', b) == a # decompose 1 code point --> 2
> True
It's all great fun until someone loses an eye.
Seriously, it's cute how neatly normalisation works when you're
watching closely and using it in the circumstances for which it was
intended, but that hardly proves that these practices won't cause much
trouble when they're used more casually and nobody's watching closely.
Considering how much energy good software engineers spend eschewing
unnecessary complexity, do we really want to embrace the prospect of
having different things look identical? (A relevant reference point:
mixtures of spaces and tabs in Python indentation.)
[snip]
> The Unicode consortium seems to disagree with you.
<cranky_geezer_font>
The Unicode consortium was certifiably insane when it went into the
typesetting business. The pile-of-poo character was just frosting on
the cake.
</cranky_geezer_font>
(Sorry to leave you with that image.)
--
To email me, substitute nowhere->runbox, invalid->com.
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2016-04-07 21:43 -0700 |
| Subject | Re: Unicode normalisation [was Re: [beginner] What's wrong?] |
| Message-ID | <e990973b-8777-4441-9401-b1b162b000fc@googlegroups.com> |
| In reply to | #106632 |
On Thursday, April 7, 2016 at 10:22:18 PM UTC+5:30, Peter Pearson wrote:
> On Thu, 07 Apr 2016 11:37:50 +1000, Steven D'Aprano wrote:
> > On Thu, 7 Apr 2016 05:56 am, Thomas 'PointedEars' Lahn wrote:
> >> Rustom Mody wrote:
> >
> >>> So here are some examples to illustrate what I am saying:
> >>>
> >>> Example 1 -- Ligatures:
> >>>
> >>> Python3 gets it right
> >>>>>> flag = 1
> >>>>>> flag
> >>> 1
> [snip]
> >>
> >> I do not think this is correct, though. Different Unicode code sequences,
> >> after normalization, should result in different symbols.
> >
> > I think you are confused about normalisation. By definition, normalising
> > different Unicode code sequences may result in the same symbols, since that
> > is what normalisation means.
> >
> > Consider two distinct strings which nevertheless look identical:
> >
> > py> a = "\N{LATIN SMALL LETTER U}\N{COMBINING DIAERESIS}"
> > py> b = "\N{LATIN SMALL LETTER U WITH DIAERESIS}"
> > py> a == b
> > False
> > py> print(a, b)
> > ü ü
> >
> >
> > The purpose of normalisation is to turn one into the other:
> >
> > py> unicodedata.normalize('NFKC', a) == b # compose 2 code points --> 1
> > True
> > py> unicodedata.normalize('NFKD', b) == a # decompose 1 code point --> 2
> > True
>
> It's all great fun until someone loses an eye.
>
> Seriously, it's cute how neatly normalisation works when you're
> watching closely and using it in the circumstances for which it was
> intended, but that hardly proves that these practices won't cause much
> trouble when they're used more casually and nobody's watching closely.
> Considering how much energy good software engineers spend eschewing
> unnecessary complexity, do we really want to embrace the prospect of
> having different things look identical? (A relevant reference point:
> mixtures of spaces and tabs in Python indentation.)
That kind of sums up my position.
To be a casual user of unicode is one thing
To support it is another -- unicode strings in python3 -- ok so far
To mix up these two is a third without enough thought or consideration --
unicode identifiers is likely a security hole waiting to happen...
No I am not clever/criminal enough to know how to write a text that is visually
close to
print "Hello World"
but is internally closer to
rm -rf /
For me this:
>>> Α = 1
>>> A = 2
>>> Α + 1 == A
True
>>>
is cure enough that I am not amused
[The only reason I brought up case distinction is that this is in the same
direction and way worse than that]
If python had been more serious about embracing the brave new world of
unicode it should have looked in this direction:
http://blog.languager.org/2014/04/unicoded-python.html
Also here I suggest a classification of unicode, that, while not
official or even formalizable is (I believe) helpful
http://blog.languager.org/2015/03/whimsical-unicode.html
Specifically as far as I am concerned if python were to throw back say
a ligature in an identifier as a syntax error -- exactly what python2 does --
I think it would be perfectly fine and a more sane choice
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2016-04-07 21:47 -0700 |
| Subject | Re: Unicode normalisation [was Re: [beginner] What's wrong?] |
| Message-ID | <7e55a6df-d272-4217-9c45-1f9dea9b7afd@googlegroups.com> |
| In reply to | #106641 |
On Friday, April 8, 2016 at 10:13:16 AM UTC+5:30, Rustom Mody wrote: > No I am not clever/criminal enough to know how to write a text that is visually > close to > print "Hello World" > but is internally closer to > rm -rf / > > For me this: > >>> Α = 1 > >>> A = 2 > >>> Α + 1 == A > True > >>> > > > is cure enough that I am not amused Um... "cute" was the intention [Or is it cuʇe ?]
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-04-08 14:54 +1000 |
| Subject | Re: Unicode normalisation [was Re: [beginner] What's wrong?] |
| Message-ID | <mailman.63.1460091243.2253.python-list@python.org> |
| In reply to | #106641 |
On Fri, Apr 8, 2016 at 2:43 PM, Rustom Mody <rustompmody@gmail.com> wrote: > No I am not clever/criminal enough to know how to write a text that is visually > close to > print "Hello World" > but is internally closer to > rm -rf / > > For me this: > >>> Α = 1 >>>> A = 2 >>>> Α + 1 == A > True >>>> > > > is cure enough that I am not amused To me, the above is a contrived example. And you can contrive examples that are just as confusing while still being ASCII-only, like swimmer/swirnmer in many fonts, or I and l, or any number of other visually-confusing glyphs. I propose that we ban the letters 'r' and 'l' from identifiers, to ensure that people can't mess with themselves. > Specifically as far as I am concerned if python were to throw back say > a ligature in an identifier as a syntax error -- exactly what python2 does -- > I think it would be perfectly fine and a more sane choice The ligature is handled straight-forwardly: it gets decomposed into its component letters. I'm not seeing a problem here. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2016-04-08 10:51 -0700 |
| Subject | Re: Unicode normalisation [was Re: [beginner] What's wrong?] |
| Message-ID | <df998f95-929f-4d7b-9eed-cde6bde040fa@googlegroups.com> |
| In reply to | #106643 |
On Friday, April 8, 2016 at 10:24:17 AM UTC+5:30, Chris Angelico wrote: > On Fri, Apr 8, 2016 at 2:43 PM, Rustom Mody wrote: > > No I am not clever/criminal enough to know how to write a text that is visually > > close to > > print "Hello World" > > but is internally closer to > > rm -rf / > > > > For me this: > > >>> Α = 1 > >>>> A = 2 > >>>> Α + 1 == A > > True > >>>> > > > > > > is cure enough that I am not amused > > To me, the above is a contrived example. And you can contrive examples > that are just as confusing while still being ASCII-only, like > swimmer/swirnmer in many fonts, or I and l, or any number of other > visually-confusing glyphs. I propose that we ban the letters 'r' and > 'l' from identifiers, to ensure that people can't mess with > themselves. swirnmer and swimmer are distinguished by squiting a bit А and A only by digging down into the hex. If you categorize them as similar/same... well I am not arguing... will come to you when I am short of straw... > > > Specifically as far as I am concerned if python were to throw back say > > a ligature in an identifier as a syntax error -- exactly what python2 does -- > > I think it would be perfectly fine and a more sane choice > > The ligature is handled straight-forwardly: it gets decomposed into > its component letters. I'm not seeing a problem here. Yes... there is no problem... HERE [I did say python gets this right that haskell for example gets wrong] Whats wrong is the whole approach of swallowing gobs of characters that need not be legal at all and then getting indigestion: Note the "non-normative" in https://docs.python.org/3/reference/lexical_analysis.html#identifiers If a language reference is not normative what is?
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-04-08 16:00 +1000 |
| Subject | Re: Unicode normalisation [was Re: [beginner] What's wrong?] |
| Message-ID | <570748ec$0$1620$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #106632 |
On Fri, 8 Apr 2016 02:51 am, Peter Pearson wrote: > Seriously, it's cute how neatly normalisation works when you're > watching closely and using it in the circumstances for which it was > intended, but that hardly proves that these practices won't cause much > trouble when they're used more casually and nobody's watching closely. > Considering how much energy good software engineers spend eschewing > unnecessary complexity, Maybe so, but it's not good software engineers we have to worry about, but the other 99.9% :-) > do we really want to embrace the prospect of > having different things look identical? You mean like ASCII identifiers? I'm afraid it's about fifty years too late to ban identifiers using O and 0, or l, I and 1, or rn and m. Or for that matter: a = akjhvciwfdwkejfc2qweoduycwldvqspjcwuhoqwe9fhlcjbqvcbhsiauy37wkg() + 100 b = 100 + akjhvciwfdwkejfc2qweoduycwldvqspjcwuhoqew9fhlcjbqvcbhsiauy37wkg() How easily can you tell them apart at a glance? The reality is that we trust our coders not to deliberately mess us about. As the Obfuscated C and the Underhanded C contest prove, you don't need Unicode to hide hostile code. In fact, the use of Unicode confusables in an otherwise all-ASCII file is a dead giveaway that something fishy is going on. I think that, beyond normalisation, the compiler need not be too concerned by confusables. I wouldn't *object* to the compiler raising a warning if it detected confusable identifiers, or mixed script identifiers, but I think that's more the job for a linter or human code review. > (A relevant reference point: > mixtures of spaces and tabs in Python indentation.) Most editors have an option to display whitespace, and tabs and spaces look different. Typically the tab is shown with an arrow, and the space by a dot. If people *still* confuse them, the issue is easily managed by a combination of "well don't do that" and TabError. > [snip] >> The Unicode consortium seems to disagree with you. > > <cranky_geezer_font> > > The Unicode consortium was certifiably insane when it went into the > typesetting business. They are not, and never have been, in the typesetting business. Perhaps characters are not the only things easily confused *wink* (Although some members of the consortium may be. But the consortium itself isn't.) > The pile-of-poo character was just frosting on > the cake. Blame the Japanese mobile phone companies for that. When you pay your membership fee, you get to object to the addition of characters too. (Anyone, I think, can propose a new character, but only members get to choose which proposals are accepted.) But really, why should we object? Is "pile-of-poo" any more silly than any of the other dingbats, graphics characters, and other non-alphabetical characters? Unicode is not just for "letters of the alphabet". -- Steven
[toc] | [prev] | [next] | [standalone]
Page 2 of 6 — ← Prev page 1 [2] 3 4 5 6 Next page →
Back to top | Article view | comp.lang.python
csiph-web