Groups > comp.lang.python > #70527 > unrolled thread

Unicode in Python

Started by	Rustom Mody <rustompmody@gmail.com>
First post	2014-04-22 22:31 -0700
Last post	2014-04-23 16:41 +1000
Articles	20 on this page of 21 — 10 participants

Back to article view | Back to comp.lang.python

  Unicode in Python Rustom Mody <rustompmody@gmail.com> - 2014-04-22 22:31 -0700
    Re: Unicode in Python Chris Angelico <rosuav@gmail.com> - 2014-04-23 15:50 +1000
      Re: Unicode in Python Rustom Mody <rustompmody@gmail.com> - 2014-04-22 23:57 -0700
        Re: Unicode in Python Chris Angelico <rosuav@gmail.com> - 2014-04-23 17:06 +1000
        Re: Unicode in Python Steven D'Aprano <steve@pearwood.info> - 2014-04-23 07:29 +0000
        Re: Unicode in Python Steven D'Aprano <steve@pearwood.info> - 2014-04-23 07:53 +0000
          Re: Unicode in Python Rustom Mody <rustompmody@gmail.com> - 2014-04-23 10:59 -0700
            Re: Unicode in Python wxjmfauth@gmail.com - 2014-04-26 00:15 -0700
              Re: Unicode in Python "Frank Millman" <frank@chagford.com> - 2014-04-26 09:45 +0200
              Re: Unicode in Python Ben Finney <ben@benfinney.id.au> - 2014-04-26 17:50 +1000
              Re: Unicode in Python Ian Kelly <ian.g.kelly@gmail.com> - 2014-04-26 09:38 -0400
                Re: Unicode in Python wxjmfauth@gmail.com - 2014-04-27 07:29 -0700
                Re: Unicode in Python wxjmfauth@gmail.com - 2014-04-28 01:57 -0700
                  Re: Unicode in Python random832@fastmail.us - 2014-05-01 13:21 -0400
                    Re: Unicode in Python wxjmfauth@gmail.com - 2014-05-07 23:04 -0700
                  Re: Unicode in Python Michael Torrie <torriem@gmail.com> - 2014-05-01 21:50 -0600
                    Re: Unicode in Python wxjmfauth@gmail.com - 2014-05-03 00:46 -0700
            Re: Unicode in Python Rustom Mody <rustompmody@gmail.com> - 2014-04-27 10:39 -0700
    Re: Unicode in Python Steven D'Aprano <steve@pearwood.info> - 2014-04-23 05:52 +0000
      Re: Unicode in Python Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-04-22 23:19 -0700
      Re: Unicode in Python Ben Finney <ben@benfinney.id.au> - 2014-04-23 16:41 +1000

Page 1 of 2 [1] 2 Next page →

#70527 — Unicode in Python

From	Rustom Mody <rustompmody@gmail.com>
Date	2014-04-22 22:31 -0700
Subject	Unicode in Python
Message-ID	<0f253434-5e7d-4eea-88e1-7997fec2bd2d@googlegroups.com>

Chris Angelico wrote:
> it's impossible for most people to type (and programming with a palette
> of arbitrary syntactic tokens isn't my idea of fun)...

Where's the suggestion to use a "palette of arbitrary tokens" ?

I just tried a greek keyboard; ie do
$ setxkbmap -option "grp:switch,grp:alt_shift_toggle,grp_led:scroll" -layout "us,gr"

Thereafter typing
abcdefghijklmnopqrstuvwxyz
after a Shift-Alt
gives
αβψδεφγηιξκλμνοπ;ρστθωςχυζ

One more Shift-Alt and back to roman

IOW the extra typing cost for greek letters is negligible
over the corresponding roman ones

Of course
- One would need to define such a keyboard (setxkb)
- One would have to find similar technologies for other OSes (Im on
debian; even ubuntu/unity grabs too many keys)

[toc] | [next] | [standalone]

#70529

From	Chris Angelico <rosuav@gmail.com>
Date	2014-04-23 15:50 +1000
Message-ID	<mailman.9452.1398232238.18130.python-list@python.org>
In reply to	#70527

On Wed, Apr 23, 2014 at 3:31 PM, Rustom Mody <rustompmody@gmail.com> wrote:
> Chris Angelico wrote:
>> it's impossible for most people to type (and programming with a palette
>> of arbitrary syntactic tokens isn't my idea of fun)...
>
> Where's the suggestion to use a "palette of arbitrary tokens" ?
>
> I just tried a greek keyboard; ie do
> $ setxkbmap -option "grp:switch,grp:alt_shift_toggle,grp_led:scroll" -layout "us,gr"
>
> Thereafter typing
> abcdefghijklmnopqrstuvwxyz
> after a Shift-Alt
> gives
> αβψδεφγηιξκλμνοπ;ρστθωςχυζ
>
> One more Shift-Alt and back to roman

Okay. Now what about your other symbols? Your alternative assignment
operator, for instance. How do you type that?

ChrisA

[toc] | [prev] | [next] | [standalone]

#70535

From	Rustom Mody <rustompmody@gmail.com>
Date	2014-04-22 23:57 -0700
Message-ID	<773afa7d-4b6d-4d67-8d40-ea90b335a1a2@googlegroups.com>
In reply to	#70529

On Wednesday, April 23, 2014 11:22:33 AM UTC+5:30, Steven D'Aprano wrote:

> 25 Unicode characters down, 1114000+ to go :-)

The question would arise if there was some suggestion to add
1114000(+) characters to the syntactic/lexical definition of python.

IOW while its true that unicode is a character-set, its better to think
of it as a repertory  -- here is the universal set from which a choice is available.

On Wednesday, April 23, 2014 11:20:35 AM UTC+5:30, Chris Angelico wrote:
> On Wed, Apr 23, 2014 at 3:31 PM, Rustom Mody wrote:
> > Chris Angelico wrote:
> >> it's impossible for most people to type (and programming with a palette
> >> of arbitrary syntactic tokens isn't my idea of fun)...
> > Where's the suggestion to use a "palette of arbitrary tokens" ?
> > I just tried a greek keyboard; ie do
> > $ setxkbmap -option "grp:switch,grp:alt_shift_toggle,grp_led:scroll" -layout "us,gr"
> > Thereafter typing
> > abcdefghijklmnopqrstuvwxyz
> > after a Shift-Alt
> > gives
> > αβψδεφγηιξκλμνοπ;ρστθωςχυζ
> > One more Shift-Alt and back to roman

> Okay. Now what about your other symbols? Your alternative assignment
> operator, for instance. How do you type that?

In case you missed it, I said:

> Of course
> - One would need to define such a keyboard (setxkb)
> - One would have to find similar technologies for other OSes

In more detail:
In our normal use of a US-104 keyboard, every letter 'costs' something.
eg 'a' costs 1 keystroke
   'A' costs 2 (Shift+a)
Most people do not count that as a significant cost.
and when kids come on this list and talk smsese -- i wanna do so-n-so

we chide them for keystrokes at the cost of readability.

In such a (default) setup typing a ∧ or ∨ is not possible at all without
something like a char-picker and at best has an ergonomic cost that is an
order of magnitude higher than the 'naturally available' characters.

On the other hand when/if a keyboard mapping is defined in which
the characters that are commonly needed are available, it is
reasonable to expect the ∨,∧ to cost no more than 2 strokes each
(ie about as much as an 'A'; slightly more than an 'a'. Which means
that '∨' is expected to cost about the same as 'or' and ∧ to cost less than an 'and'

Readability is another question altogether.
Random example from my machine
calendar.py line 99
If one finds this:

return year % 4 == 0 and (year % 100 != 0 or year % 400 == 0)

more readable than
return year%4=0 ∧ (year%100≠0  ∨  year%100 = 0)
then perhaps the following is the most preferred?

COMPUTE YEAR MODULO 4 EQUALS 0 AND YEAR MODULO 100 NOT
EQUAL TO ZERO OR YEAR MODULO 100 EQUAL to 0

IOW COBOL is desirable?

[toc] | [prev] | [next] | [standalone]

#70536

From	Chris Angelico <rosuav@gmail.com>
Date	2014-04-23 17:06 +1000
Message-ID	<mailman.9457.1398236806.18130.python-list@python.org>
In reply to	#70535

On Wed, Apr 23, 2014 at 4:57 PM, Rustom Mody <rustompmody@gmail.com> wrote:
> In such a (default) setup typing a ∧ or ∨ is not possible at all without
> something like a char-picker and at best has an ergonomic cost that is an
> order of magnitude higher than the 'naturally available' characters.
>
> On the other hand when/if a keyboard mapping is defined in which
> the characters that are commonly needed are available, it is
> reasonable to expect the ∨,∧ to cost no more than 2 strokes each
> (ie about as much as an 'A'; slightly more than an 'a'. Which means
> that '∨' is expected to cost about the same as 'or' and ∧ to cost less than an 'and'

So how much effort are you going to go to for, effectively, the same
end result? You can type "or" with the same keystrokes, and it takes
zero setup work and zero memorization (you may forget which keystroke
you set up for ∨, but I doubt you'll forget how to spell "or", even if
you think it means gold/yellow). Where's the benefit? I'm seriously
not seeing it.

ChrisA

[toc] | [prev] | [next] | [standalone]

#70537

From	Steven D'Aprano <steve@pearwood.info>
Date	2014-04-23 07:29 +0000
Message-ID	<53576bdd$0$11109$c3e8da3@news.astraweb.com>
In reply to	#70535

On Tue, 22 Apr 2014 23:57:46 -0700, Rustom Mody wrote:

> perhaps the following is the most preferred?
> 
> COMPUTE YEAR MODULO 4 EQUALS 0 AND YEAR MODULO 100 NOT EQUAL TO ZERO OR
> YEAR MODULO 100 EQUAL to 0
> 
> IOW COBOL is desirable?

If the only choices are COBOL on one hand and the mutant offspring of 
Perl and APL on the other, I'd vote for COBOL.

But surely they aren't the only options, and it is possible to find a 
happy medium which is neither excessively verbose nor painfully, 
cryptically terse.

Remember that we're talking about general purpose programming here. There 
are domains which favour terseness and a vast number of symbols, e.g. 
mathematics, but most programming is not in that domain, even when it 
uses tools from that domain.

-- 
Steve

[toc] | [prev] | [next] | [standalone]

#70538

From	Steven D'Aprano <steve@pearwood.info>
Date	2014-04-23 07:53 +0000
Message-ID	<5357715c$0$11109$c3e8da3@news.astraweb.com>
In reply to	#70535

On Tue, 22 Apr 2014 23:57:46 -0700, Rustom Mody wrote:

> On the other hand when/if a keyboard mapping is defined in which the
> characters that are commonly needed are available, it is reasonable to
> expect the ∨,∧ to cost no more than 2 strokes each (ie about as much as
> an 'A'; slightly more than an 'a'. Which means that '∨' is expected to
> cost about the same as 'or' and ∧ to cost less than an 'and'

Oh, a further thought...

Consider your example:

    return year%4=0 ∧ (year%100≠0 ∨ year%100 = 0)

vs 

    return year%4=0 and (year%100!=0 or year%100 = 0) 

[aside: personally I like ≠ and if there was a platform independent way 
to type it in any editor, I'd much prefer it over != or <> ]

Apart from the memorization problem, which I've already touched on, there 
is the mode problem. Keyboard layouts are modes, and you're swapping 
modes. Every time you swap modes, there is a small mental cost. Think of 
it as an interrupt which has to be caught, pausing the current thought 
and starting a new one. So rather than:

    char char char char char char char ...

you have:

    char char char INTERRUPT
    char INTERRUPT
    char char char ...

which is a heavier cost that it appears from just counting keystrokes. Of 
course, the more experienced you become, the smaller that cost will be, 
but it will never be quite as low as just a "regular" keystroke.

Normally, when people use multiple keyboards, its because that interrupt 
cost is amortized over a significant amount of typing:

    INTERRUPT (English layout)
    paragraph paragraph paragraph paragraph
    INTERRUPT (Greek layout)
    paragraph paragraph paragraph
    INTERRUPT (English again)
    paragraph ...

and possibly even lost in the noise of a far greater interrupt, namely 
task-switching from one application to another. So it's manageable. But 
switching layouts for a single character is likely to be far more 
painful, especially for casual users of that layout. 

Based on an extremely generous estimate that I use "lambda" four times in 
100 lines of code, I might use λ perhaps once in a thousand non-Greek 
characters. Similarly, I might use ∧ or ∨ maybe once per hundred 
characters. That means I'm unlikely to ever get familiar enough with 
those that the cost of two interrupts per use will be negligible.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#70546

From	Rustom Mody <rustompmody@gmail.com>
Date	2014-04-23 10:59 -0700
Message-ID	<aa55b40a-9032-401c-a24d-1b7518ebe1e1@googlegroups.com>
In reply to	#70538

On Wednesday, April 23, 2014 1:23:00 PM UTC+5:30, Steven D'Aprano wrote:
> On Tue, 22 Apr 2014 23:57:46 -0700, Rustom Mody wrote:

> > On the other hand when/if a keyboard mapping is defined in which the
> > characters that are commonly needed are available, it is reasonable to
> > expect the ∨,∧ to cost no more than 2 strokes each (ie about as much as
> > an 'A'; slightly more than an 'a'. Which means that '∨' is expected to
> > cost about the same as 'or' and ∧ to cost less than an 'and'

> Oh, a further thought...

> Consider your example:

>     return year%4=0 ∧ (year%100≠0 ∨ year%100 = 0)

> vs

>     return year%4=0 and (year%100!=0 or year%100 = 0)

> [aside: personally I like ≠ and if there was a platform independent way
> to type it in any editor, I'd much prefer it over != or <> ]

> Apart from the memorization problem, which I've already touched on, there
> is the mode problem. Keyboard layouts are modes, and you're swapping
> modes. Every time you swap modes, there is a small mental cost. Think of
> it as an interrupt which has to be caught, pausing the current thought
> and starting a new one. So rather than:

>     char char char char char char char ...

> you have:

>     char char char INTERRUPT
>     char INTERRUPT
>     char char char ...

> which is a heavier cost that it appears from just counting keystrokes. Of
> course, the more experienced you become, the smaller that cost will be,
> but it will never be quite as low as just a "regular" keystroke.

> Normally, when people use multiple keyboards, its because that interrupt
> cost is amortized over a significant amount of typing:

>     INTERRUPT (English layout)
>     paragraph paragraph paragraph paragraph
>     INTERRUPT (Greek layout)
>     paragraph paragraph paragraph
>     INTERRUPT (English again)
>     paragraph ...

> and possibly even lost in the noise of a far greater interrupt, namely
> task-switching from one application to another. So it's manageable. But
> switching layouts for a single character is likely to be far more
> painful, especially for casual users of that layout.

> Based on an extremely generous estimate that I use "lambda" four times in
> 100 lines of code, I might use λ perhaps once in a thousand non-Greek
> characters. Similarly, I might use ∧ or ∨ maybe once per hundred
> characters. That means I'm unlikely to ever get familiar enough with
> those that the cost of two interrupts per use will be negligible.

Its gratifying to see an argument whose framing is cognitive-based!

More on that later.

For now: mode/modeless

Yes most of us prefer the Shift key to the Caps Lock even for stretches of capitals.  So analogously here is a modeless solution

Earlier I found this mode-switching version
$ setxkbmap -option "grp:switch,grp:alt_shift_toggle,grp_led:scroll" -layout "us,gr"
this makes Shift-Alt the mode-switcher

This one on the other hand
$ setxkbmap -layout "us,gr" -option "grp:switch"
will make right-alt behave like 'Greek-Shift'

ie typing
abcdefghijklmnopqrstuvwxyz
with RAlt depressed throughout, produces
αβψδεφγηιξκλμνοπ;ρστθωςχυζ

This makes the a Greek letter's ergonomic cost identical to a capital English
letter's:  For Greek use RAlt the way one uses Shift for English.

Notes:
1. Tried on Debian and Ubuntu -- Recent Ubuntus are rather more ill-mannered in
the way they appropriates keys. Still it works as far as I can see.

2. ';' ?? ie semicolon is produced from 'q'? Whats that semicolon doing there?? But then Greek is -- well -- Greek to me! (As is xkb!)

[toc] | [prev] | [next] | [standalone]

#70627

From	wxjmfauth@gmail.com
Date	2014-04-26 00:15 -0700
Message-ID	<03bb12d8-93be-4ef6-94ae-4a02789aea2d@googlegroups.com>
In reply to	#70546

==========

I wrote once 90 % of Python 2 apps (a generic term) supposed to
process text, strings are not working.

In Python 3, that's 100 %. It is somehow only by chance, apps may
give the illusion they are properly working.

jmf

[toc] | [prev] | [next] | [standalone]

#70629

From	"Frank Millman" <frank@chagford.com>
Date	2014-04-26 09:45 +0200
Message-ID	<mailman.9515.1398498323.18130.python-list@python.org>
In reply to	#70627

<wxjmfauth@gmail.com> wrote in message 
news:03bb12d8-93be-4ef6-94ae-4a02789aea2d@googlegroups.com...
> ==========
>
> I wrote once 90 % of Python 2 apps (a generic term) supposed to
> process text, strings are not working.
>
> In Python 3, that's 100 %. It is somehow only by chance, apps may
> give the illusion they are properly working.
>

It is quite frustrating when you make these statements without explaining 
what you mean by 'not working'.

It would be really useful if you could spell out -

1. what you did
2. what you expected to happen
3. what actually happened

Frank Millman

[toc] | [prev] | [next] | [standalone]

#70630

From	Ben Finney <ben@benfinney.id.au>
Date	2014-04-26 17:50 +1000
Message-ID	<mailman.9516.1398498617.18130.python-list@python.org>
In reply to	#70627

"Frank Millman" <frank@chagford.com> writes:

> <wxjmfauth@gmail.com> wrote […]

> It is quite frustrating when you make these statements without
> explaining what you mean by 'not working'.

Please do not engage “wxjmfauth” on this topic; he is an
amply-demonstrated troll with nothing tangible to back up his incessant
complaints about Unicode in Python. He is best ignored, IMO.

-- 
 \          “As the evening sky faded from a salmon color to a sort of |
  `\   flint gray, I thought back to the salmon I caught that morning, |
_o__)    and how gray he was, and how I named him Flint.” —Jack Handey |
Ben Finney

[toc] | [prev] | [next] | [standalone]

#70632

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2014-04-26 09:38 -0400
Message-ID	<mailman.9518.1398519519.18130.python-list@python.org>
In reply to	#70627

[Multipart message — attachments visible in raw view] — view raw

On Apr 26, 2014 3:46 AM, "Frank Millman" <frank@chagford.com> wrote:
>
>
> <wxjmfauth@gmail.com> wrote in message
> news:03bb12d8-93be-4ef6-94ae-4a02789aea2d@googlegroups.com...
> > ==========
> >
> > I wrote once 90 % of Python 2 apps (a generic term) supposed to
> > process text, strings are not working.
> >
> > In Python 3, that's 100 %. It is somehow only by chance, apps may
> > give the illusion they are properly working.
> >
>
> It is quite frustrating when you make these statements without explaining
> what you mean by 'not working'.

As far as anybody has been able to determine, what jmf means by "not
working" is  that strings containing the € character are handled less
efficiently than strings that do not contain it in certain contrived test
cases.

[toc] | [prev] | [next] | [standalone]

#70653

From	wxjmfauth@gmail.com
Date	2014-04-27 07:29 -0700
Message-ID	<d9b3a423-259a-471f-a763-49a657416ae3@googlegroups.com>
In reply to	#70632

Le samedi 26 avril 2014 15:38:29 UTC+2, Ian a écrit :
> On Apr 26, 2014 3:46 AM, "Frank Millman" <fr...@chagford.com> wrote:
> 
> >
> 
> >
> 
> > <wxjm...@gmail.com> wrote in message
> 
> > news:03bb12d8-93be-4ef6-94ae-4a02789aea2d@googlegroups.com...
> 
> > > ==========
> 
> > >
> 
> > > I wrote once 90 % of Python 2 apps (a generic term) supposed to
> 
> > > process text, strings are not working.
> 
> > >
> 
> > > In Python 3, that's 100 %. It is somehow only by chance, apps may
> 
> > > give the illusion they are properly working.
> 
> > >
> 
> >
> 
> > It is quite frustrating when you make these statements without explaining
> 
> > what you mean by 'not working'.
> 
> As far as anybody has been able to determine, what jmf means by "not working" is  that strings containing the EURO character are handled less efficiently than strings that do not contain it in certain contrived test cases.

-----


'EURO SIGN' ? No, it's just a character!

[toc] | [prev] | [next] | [standalone]

#70671

From	wxjmfauth@gmail.com
Date	2014-04-28 01:57 -0700
Message-ID	<bcd76ee0-4703-45ed-95c3-ad0cac35a889@googlegroups.com>
In reply to	#70632

Le samedi 26 avril 2014 15:38:29 UTC+2, Ian a écrit :
> On Apr 26, 2014 3:46 AM, "Frank Millman" <fr...@chagford.com> wrote:
> 
> >
> 
> >
> 
> > <wxjm...@gmail.com> wrote in message
> 
> > news:03bb12d8-93be-4ef6-94ae-4a02789aea2d@googlegroups.com...
> 
> > > ==========
> 
> > >
> 
> > > I wrote once 90 % of Python 2 apps (a generic term) supposed to
> 
> > > process text, strings are not working.
> 
> > >
> 
> > > In Python 3, that's 100 %. It is somehow only by chance, apps may
> 
> > > give the illusion they are properly working.
> 
> > >
> 
> >
> 
> > It is quite frustrating when you make these statements without explaining
> 
> > what you mean by 'not working'.
> 
> As far as anybody has been able to determine, what jmf means by "not working" is  that strings containing the EURO character are handled less efficiently than strings that do not contain it in certain contrived test cases.

----

Python 2.7 + cp1252:
- Solid and coherent system (nothing to do with the Euro).

Python 3:
- It missed the unicode shift.
- Covering the whole unicode range will not make
Python a unicode compliant product.
- Flexible String Representation (a problem per se),
a mathematical absurditiy which does the opposite of
the coding schemes endorsed by Unicord.org (sheet of
paper and pencil!)
- Very deeply buggy (quadrature of the circle problem).

Positive side:
- A very nice tool to teach the coding of characters
and unicode.

jmf

[toc] | [prev] | [next] | [standalone]

#70817

From	random832@fastmail.us
Date	2014-05-01 13:21 -0400
Message-ID	<mailman.9631.1398964881.18130.python-list@python.org>
In reply to	#70671

On Mon, Apr 28, 2014, at 4:57, wxjmfauth@gmail.com wrote:
> Python 3:
> - It missed the unicode shift.
> - Covering the whole unicode range will not make
> Python a unicode compliant product.

Please cite exactly what portion of the unicode standard requires
operations with all characters to be handled in the same amount of time
and space, and forbids optimizations that make some characters handled
faster or in less space than others.

[toc] | [prev] | [next] | [standalone]

#71078

From	wxjmfauth@gmail.com
Date	2014-05-07 23:04 -0700
Message-ID	<90b5fb36-e99d-4dcb-8df0-77044ff53be9@googlegroups.com>
In reply to	#70817

Le jeudi 1 mai 2014 19:21:14 UTC+2, rand...@fastmail.us a écrit :
> On Mon, Apr 28, 2014, at 4:57, wxjmfauth@gmail.com wrote:
> 
> > Python 3:
> 
> > - It missed the unicode shift.
> 
> > - Covering the whole unicode range will not make
> 
> > Python a unicode compliant product.
> 
> 
> 
> Please cite exactly what portion of the unicode standard requires
> 
> operations with all characters to be handled in the same amount of time
> 
> and space, and forbids optimizations that make some characters handled
> 
> faster or in less space than others.

==========

I missed you comment. Regression is only a side effect.

I can make Python failing (lead Python to failures) with
any piece of text or valid sequence of characters I wish [*].

I'm no more writing code (apps), only maintaining
my interactive interpreters.

[*] I do not count as failures, issues like cp65001,
only "basic" text/string manipulations.

jmf

[toc] | [prev] | [next] | [standalone]

#70844

From	Michael Torrie <torriem@gmail.com>
Date	2014-05-01 21:50 -0600
Message-ID	<mailman.9645.1399003680.18130.python-list@python.org>
In reply to	#70671

Can't help but feed the troll... forgive me.

On 04/28/2014 02:57 AM, wxjmfauth@gmail.com wrote:
> Python 2.7 + cp1252:
> - Solid and coherent system (nothing to do with the Euro).

Except that cp1252 is not unicode.  Perhaps some subset of unicode can
be encoded into bytes using cp1252.  But if it works for you keep using
it, and stop spreading nonsense about FSR.

> Python 3:
> - Flexible String Representation (a problem per se),
> a mathematical absurditiy which does the opposite of
> the coding schemes endorsed by Unicord.org (sheet of
> paper and pencil!)
> - Very deeply buggy (quadrature of the circle problem).

Maybe it's the language barrier, but whatever it is you are talking
about, I certainly can't make out.

You've been ranting about FSR for years without being able to clearly
say what's wrong with it.  Please quote unicode specifications that you
feel Python does not implement.  What unicode characters cannot be
represented?  Does Python choke on certain unicode strings or expose
entities it should not (like Javascript does)?

Why would you think that the unicode consortium's list of byte encodings
are the only possible valid ways of encoding unicode to a byte stream?

If you're going to continue to write this sort of stuff, please have the
decency to answer these questions at least.

> Positive side:
> - A very nice tool to teach the coding of characters
> and unicode.

Indeed.

[toc] | [prev] | [next] | [standalone]

#70890

From	wxjmfauth@gmail.com
Date	2014-05-03 00:46 -0700
Message-ID	<aa6da561-0284-4119-9217-57a8ef09ab96@googlegroups.com>
In reply to	#70844

Le vendredi 2 mai 2014 05:50:40 UTC+2, Michael Torrie a écrit :
> Can't help but feed the troll... forgive me.
> 
> 
> 
> On 04/28/2014 02:57 AM, wxjmfauth@gmail.com wrote:
> 
> > Python 2.7 + cp1252:
> 
> > - Solid and coherent system (nothing to do with the Euro).
> 
> 
> 
> Except that cp1252 is not unicode.  Perhaps some subset of unicode can
> 
> be encoded into bytes using cp1252.  But if it works for you keep using
> 
> it, and stop spreading nonsense about FSR.
> 
> 
> 
> > Python 3:
> 
> > - Flexible String Representation (a problem per se),
> 
> > a mathematical absurditiy which does the opposite of
> 
> > the coding schemes endorsed by Unicord.org (sheet of
> 
> > paper and pencil!)
> 
> > - Very deeply buggy (quadrature of the circle problem).
> 
> 
> 
> Maybe it's the language barrier, but whatever it is you are talking
> 
> about, I certainly can't make out.
> 
> 
> 
> You've been ranting about FSR for years without being able to clearly
> 
> say what's wrong with it.  Please quote unicode specifications that you
> 
> feel Python does not implement.  What unicode characters cannot be
> 
> represented?  Does Python choke on certain unicode strings or expose
> 
> entities it should not (like Javascript does)?
> 
> 
> 
> Why would you think that the unicode consortium's list of byte encodings
> 
> are the only possible valid ways of encoding unicode to a byte stream?
> 
> 
> 
> If you're going to continue to write this sort of stuff, please have the
> 
> decency to answer these questions at least.
> 
> 
> 
> > Positive side:
> 
> > - A very nice tool to teach the coding of characters
> 
> > and unicode.
> 
> 
> 
> Indeed.

========

-

[toc] | [prev] | [next] | [standalone]

#70657

From	Rustom Mody <rustompmody@gmail.com>
Date	2014-04-27 10:39 -0700
Message-ID	<ae5ba198-cf01-41a1-981f-307de7a460b7@googlegroups.com>
In reply to	#70546

On Wednesday, April 23, 2014 11:29:13 PM UTC+5:30, Rustom Mody wrote:
> On Wednesday, April 23, 2014 1:23:00 PM UTC+5:30, Steven D'Aprano wrote:
> > On Tue, 22 Apr 2014 23:57:46 -0700, Rustom Mody wrote:

> > > On the other hand when/if a keyboard mapping is defined in which the
> > > characters that are commonly needed are available, it is reasonable to
> > > expect the ∨,∧ to cost no more than 2 strokes each (ie about as much as
> > > an 'A'; slightly more than an 'a'. Which means that '∨' is expected to
> > > cost about the same as 'or' and ∧ to cost less than an 'and'

> > Oh, a further thought...

> > Consider your example:

> >     return year%4=0 ∧ (year%100≠0 ∨ year%100 = 0)

> > vs

> >     return year%4=0 and (year%100!=0 or year%100 = 0)

> > [aside: personally I like ≠ and if there was a platform independent way
> > to type it in any editor, I'd much prefer it over != or <> ]

I checked haskell and find the unicode support is better.

For variables (ie identifiers) python and haskell are much the same:

Python3:

>>> α = 1
>>> α
1

Haskell:

Prelude> let α = 1
Prelude> α
1


However in haskell one can also do this unlike python:
*Main> 2 ≠ 3
True

All that's needed to make this work is this set of new-in-terms-of-old definitions:

[The -- is comments for those things that dont work as one may wish]
--------------
import qualified Data.Set as Set
-- Experimenting with Unicode in Haskell source

-- Numbers
x ≠ y   = x /= y
x ≤ y   = x <= y
x ≥ y   = x >= y
x ÷ y   = divMod x y
x ⇑ y   = x ^ y
         
x × y   = x * y -- readability hmmm !!!
π = pi   
         
-- ⌊ x = floor x
-- ⌈ x = ceiling x

-- Lists         
xs ⤚ ys = xs ++ ys
n ↑ xs = take n xs
n ↓ xs = drop n xs

-- Bools
x ∧ y   = x && y 
x ∨ y   = y || y
-- ¬x = not x


-- Sets

x ∈ s   = x `Set.member` s
s ∪ t   = s `Set.union` t
s ∩ t   = s `Set.intersection` t
s ⊆ t   = s `Set.isSubsetOf` t
s ⊂ t   = s `Set.isProperSubsetOf` t
s ⊈ t   = not (s `Set.isSubsetOf` t)
-- ∅ = Set.null

[toc] | [prev] | [next] | [standalone]

#70530

From	Steven D'Aprano <steve@pearwood.info>
Date	2014-04-23 05:52 +0000
Message-ID	<53575521$0$11109$c3e8da3@news.astraweb.com>
In reply to	#70527

On Tue, 22 Apr 2014 22:31:41 -0700, Rustom Mody wrote:

> Chris Angelico wrote:
>> it's impossible for most people to type (and programming with a palette
>> of arbitrary syntactic tokens isn't my idea of fun)...
> 
> Where's the suggestion to use a "palette of arbitrary tokens" ?
> 
> I just tried a greek keyboard; ie do
> $ setxkbmap -option "grp:switch,grp:alt_shift_toggle,grp_led:scroll"
> -layout "us,gr"
> 
> Thereafter typing
> abcdefghijklmnopqrstuvwxyz
> after a Shift-Alt
> gives
> αβψδεφγηιξκλμνοπ;ρστθωςχυζ
> 
> One more Shift-Alt and back to roman
> 
> IOW the extra typing cost for greek letters is negligible over the
> corresponding roman ones

25 Unicode characters down, 1114000+ to go :-)

There's not just the keyboard mapping. There's the mental cost of knowing 
which keyboard mapping you need ("is it Greek, Hebrew, or maths 
symbols?"), the cost of remembering the mapping from the keys you see on 
the keyboard to the keys they are mapped to ("is Ω mapped to O or W?") 
and so forth. If you know lambda-calculus, you might associate λ with 
functions, but if you don't, it's as obfuscated as associating Ч with 
raising exceptions.

if not isinstance(obj, int):
    ЧTypeError("expected an int, got %r" % type(obj))

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#70533

From	Devin Jeanpierre <jeanpierreda@gmail.com>
Date	2014-04-22 23:19 -0700
Message-ID	<mailman.9455.1398234027.18130.python-list@python.org>
In reply to	#70530

On Tue, Apr 22, 2014 at 10:52 PM, Steven D'Aprano <steve@pearwood.info> wrote:
> There's not just the keyboard mapping. There's the mental cost of knowing
> which keyboard mapping you need ("is it Greek, Hebrew, or maths
> symbols?"), the cost of remembering the mapping from the keys you see on
> the keyboard to the keys they are mapped to ("is Ω mapped to O or W?")
> and so forth. If you know lambda-calculus, you might associate λ with
> functions, [...]

Or if you know Python and the name of the letter ("lambda").

But yes, typing out the special characters is annoying. I just use
words. The only downside to using words is, how do you specify capital
versus lowercase letters? "Gamma = ..." violates the style guide! :(

-- Devin

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

Unicode in Python

Contents

#70527 — Unicode in Python

#70529

#70535

#70536

#70537

#70538

#70546

#70627

#70629

#70630

#70632

#70653

#70671

#70817

#71078

#70844

#70890

#70657

#70530

#70533