Groups > comp.lang.python > #60781 > unrolled thread

Python Unicode handling wins again -- mostly

Started by	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
First post	2013-11-30 00:44 +0000
Last post	2013-12-04 14:38 +0000
Articles	20 on this page of 76 — 22 participants

Back to article view | Back to comp.lang.python

  Python Unicode handling wins again -- mostly Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-30 00:44 +0000
    Re: Python Unicode handling wins again -- mostly Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-30 01:07 +0000
    Re: Python Unicode handling wins again -- mostly Roy Smith <roy@panix.com> - 2013-11-29 21:08 -0500
      Re: Python Unicode handling wins again -- mostly Chris Angelico <rosuav@gmail.com> - 2013-11-30 13:12 +1100
        Re: Python Unicode handling wins again -- mostly Roy Smith <roy@panix.com> - 2013-11-29 21:28 -0500
          Re: Python Unicode handling wins again -- mostly Dave Angel <davea@davea.name> - 2013-11-29 22:06 -0500
      Re: Python Unicode handling wins again -- mostly Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-30 04:21 +0000
        Re: Python Unicode handling wins again -- mostly Roy Smith <roy@panix.com> - 2013-11-29 23:30 -0500
        Re: Python Unicode handling wins again -- mostly Zero Piraeus <z@etiol.net> - 2013-11-30 02:05 -0300
          Re: Python Unicode handling wins again -- mostly Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-30 06:25 +0000
        Re: Python Unicode handling wins again -- mostly Gene Heskett <gheskett@wdtv.com> - 2013-11-30 00:25 -0500
        Re: Python Unicode handling wins again -- mostly Roy Smith <roy@panix.com> - 2013-11-30 00:37 -0500
          Re: Python Unicode handling wins again -- mostly Ian Kelly <ian.g.kelly@gmail.com> - 2013-11-29 23:00 -0700
            Re: Python Unicode handling wins again -- mostly Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-30 07:11 +0000
          Re: Python Unicode handling wins again -- mostly Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-30 07:41 +0000
            Re: Python Unicode handling wins again -- mostly Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2013-12-01 11:41 +1300
      Re: Python Unicode handling wins again -- mostly Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-30 08:07 +0000
      Re: Python Unicode handling wins again -- mostly wxjmfauth@gmail.com - 2013-11-30 11:11 -0800
        Re: Python Unicode handling wins again -- mostly Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2013-12-01 11:37 +1300
          Re: Python Unicode handling wins again -- mostly Ned Batchelder <ned@nedbatchelder.com> - 2013-11-30 18:07 -0500
            Re: Python Unicode handling wins again -- mostly wxjmfauth@gmail.com - 2013-12-01 08:57 -0800
          Re: Python Unicode handling wins again -- mostly Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-01 00:22 +0000
            Re: Python Unicode handling wins again -- mostly Tim Chase <python.list@tim.thechases.com> - 2013-11-30 18:52 -0600
              Re: Python Unicode handling wins again -- mostly Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-01 00:54 +0000
                Re: Python Unicode handling wins again -- mostly Tim Chase <python.list@tim.thechases.com> - 2013-11-30 19:05 -0600
                Re: Python Unicode handling wins again -- mostly Chris Angelico <rosuav@gmail.com> - 2013-12-01 12:13 +1100
                  Re: Python Unicode handling wins again -- mostly Roy Smith <roy@panix.com> - 2013-11-30 20:27 -0500
                    Re: Python Unicode handling wins again -- mostly Chris Angelico <rosuav@gmail.com> - 2013-12-01 12:31 +1100
    Re: Python Unicode handling wins again -- mostly Serhiy Storchaka <storchaka@gmail.com> - 2013-12-01 20:00 +0200
      Re: Python Unicode handling wins again -- mostly wxjmfauth@gmail.com - 2013-12-01 12:15 -0800
        Re: Python Unicode handling wins again -- mostly Tim Delaney <timothy.c.delaney@gmail.com> - 2013-12-02 07:54 +1100
          Re: Python Unicode handling wins again -- mostly wxjmfauth@gmail.com - 2013-12-02 04:39 -0800
            Re: Python Unicode handling wins again -- mostly Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-02 14:46 +0000
            Re: Python Unicode handling wins again -- mostly Ned Batchelder <ned@nedbatchelder.com> - 2013-12-02 10:22 -0500
            Re: Python Unicode handling wins again -- mostly Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-02 15:45 +0000
            Re: Python Unicode handling wins again -- mostly Chris Angelico <rosuav@gmail.com> - 2013-12-03 02:49 +1100
            Re: Python Unicode handling wins again -- mostly Ned Batchelder <ned@nedbatchelder.com> - 2013-12-02 10:58 -0500
            Re: Python Unicode handling wins again -- mostly Terry Reedy <tjreedy@udel.edu> - 2013-12-02 15:26 -0500
            Re: Python Unicode handling wins again -- mostly Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-02 20:45 +0000
            Re: Python Unicode handling wins again -- mostly Ned Batchelder <ned@nedbatchelder.com> - 2013-12-02 16:44 -0500
            Code of Conduct, Trolls, and Thankless Jobs  [was Re: Python Unicode handling wins again -- mostly] Ethan Furman <ethan@stoneleaf.us> - 2013-12-02 13:25 -0800
            Re: Code of Conduct, Trolls, and Thankless Jobs  [was Re: Python Unicode handling wins again -- mostly] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-02 22:04 +0000
              Re: Code of Conduct, Trolls, and Thankless Jobs  [was Re: Python Unicode handling wins again -- mostly] Roy Smith <roy@panix.com> - 2013-12-02 20:38 -0500
                Pythonista Goals  [was Re: Code of Conduct, Trolls, and Thankless Jobs] Ethan Furman <ethan@stoneleaf.us> - 2013-12-02 17:56 -0800
                Re: Code of Conduct, Trolls, and Thankless Jobs  [was Re: Python Unicode handling wins again -- mostly] Grant Edwards <invalid@invalid.invalid> - 2013-12-03 04:32 +0000
                  Re: Code of Conduct, Trolls, and Thankless Jobs  [was Re: Python Unicode handling wins again -- mostly] Steven D'Aprano <steve@pearwood.info> - 2013-12-03 05:41 +0000
                  Re: Code of Conduct, Trolls, and Thankless Jobs  [was Re: Python Unicode handling wins again -- mostly] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-03 12:14 +0000
                Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-03 12:11 +0000
            Re: Code of Conduct, Trolls, and Thankless Jobs  [was Re: Python Unicode handling wins again -- mostly] Ned Batchelder <ned@nedbatchelder.com> - 2013-12-02 17:23 -0500
            Re: Python Unicode handling wins again -- mostly Ned Batchelder <ned@nedbatchelder.com> - 2013-12-02 17:24 -0500
            Re: Python Unicode handling wins again -- mostly Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-02 22:32 +0000
            Re: Python Unicode handling wins again -- mostly Ned Batchelder <ned@nedbatchelder.com> - 2013-12-02 17:53 -0500
            Re: Code of Conduct, Trolls, and Thankless Jobs Ben Finney <ben+python@benfinney.id.au> - 2013-12-03 10:11 +1100
            Re: Python Unicode handling wins again -- mostly Ethan Furman <ethan@stoneleaf.us> - 2013-12-02 14:41 -0800
            Re: Code of Conduct, Trolls, and Thankless Jobs  [was Re: Python Unicode handling wins again -- mostly] Terry Reedy <tjreedy@udel.edu> - 2013-12-02 22:22 -0500
            Re: Code of Conduct, Trolls, and Thankless Jobs Terry Reedy <tjreedy@udel.edu> - 2013-12-02 22:39 -0500
            Re: Code of Conduct, Trolls, and Thankless Jobs  [was Re: Python Unicode handling wins again -- mostly] Ethan Furman <ethan@stoneleaf.us> - 2013-12-02 20:11 -0800
        Re: Python Unicode handling wins again -- mostly Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-01 22:06 +0000
        Re: Python Unicode handling wins again -- mostly Tim Delaney <timothy.c.delaney@gmail.com> - 2013-12-02 09:29 +1100
        Re: Python Unicode handling wins again -- mostly Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-01 23:10 +0000
        Re: Python Unicode handling wins again -- mostly Ethan Furman <ethan@stoneleaf.us> - 2013-12-01 14:50 -0800
        Re: Python Unicode handling wins again -- mostly Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-02 00:43 +0000
    Re: Python Unicode handling wins again -- mostly Ethan Furman <ethan@stoneleaf.us> - 2013-12-02 12:38 -0800
    Re: Python Unicode handling wins again -- mostly Ned Batchelder <ned@nedbatchelder.com> - 2013-12-02 16:14 -0500
      Re: Python Unicode handling wins again -- mostly Steven D'Aprano <steve@pearwood.info> - 2013-12-03 05:06 +0000
        Re: Python Unicode handling wins again -- mostly joe <joeedh@gmail.com> - 2013-12-02 23:35 -0800
        Re: Python Unicode handling wins again -- mostly wxjmfauth@gmail.com - 2013-12-03 10:34 -0800
    Re: Python Unicode handling wins again -- mostly Chris Angelico <rosuav@gmail.com> - 2013-12-03 08:23 +1100
    Re: Python Unicode handling wins again -- mostly MRAB <python@mrabarnett.plus.com> - 2013-12-02 21:27 +0000
    Re: Python Unicode handling wins again -- mostly Ethan Furman <ethan@stoneleaf.us> - 2013-12-02 13:27 -0800
    Re: Python Unicode handling wins again -- mostly Ben Finney <ben+python@benfinney.id.au> - 2013-12-03 09:56 +1100
    Re: Python Unicode handling wins again -- mostly Neil Cerutti <neilc@norwich.edu> - 2013-12-03 13:47 +0000
    Re: Python Unicode handling wins again -- mostly Ethan Furman <ethan@stoneleaf.us> - 2013-12-03 06:26 -0800
      Re: Python Unicode handling wins again -- mostly wxjmfauth@gmail.com - 2013-12-04 05:52 -0800
        Re: Python Unicode handling wins again -- mostly Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-04 14:07 +0000
        Re: Python Unicode handling wins again -- mostly Neil Cerutti <neilc@norwich.edu> - 2013-12-04 14:38 +0000

Page 2 of 4 — ← Prev page 1 [2] 3 4 Next page →

#60830

From	wxjmfauth@gmail.com
Date	2013-12-01 08:57 -0800
Message-ID	<15147203-f1e3-44dc-ab11-3847c638b37c@googlegroups.com>
In reply to	#60812

Le dimanche 1 décembre 2013 00:07:36 UTC+1, Ned Batchelder a écrit :
> On 11/30/13 5:37 PM, Gregory Ewing wrote:
> 
> > wxjmfauth@gmail.com wrote:
> 
> >> And do you know the origin of this typographical feature?
> 
> >> Because, mechanically, the dot of the "i" broke too often.
> 
> >>
> 
> >> In my opinion, a very plausible explanation.
> 
> >
> 
> > It doesn't sound very plausible to me, because there
> 
> > are a lot more stand-alone 'i's in English text than
> 
> > there are ones following an f. What is there to stop
> 
> > them from breaking?
> 
> >
> 
> > It's more likely to be simply a kerning issue. You
> 
> > want to get the stems of the f and the i close together,
> 
> > and the only practical way to do that with mechanical
> 
> > type is to merge them into one piece of metal.
> 
> >
> 
> > Which makes it even sillier to have an 'ffi' character
> 
> > in this day and age, when you can simply space the
> 
> > characters so that they overlap.
> 
> >
> 
> 
> 
> The fi ligature was created because visually, an f and i wouldn't work 
> 
> well together: the crossbar of the f was near, but not connected to the 
> 
> serif of the i, and the terminal bulb of the f was close to, but not 
> 
> coincident, with the dot of the i.
> 
> 
> 
> This article goes into great detail, and has a good illustration of how 
> 
> an f and i can clash, and how an fi ligature can fix the problem: 
> 
> http://opentype.info/blog/2012/11/20/whats-a-ligature/ . Note the second 
> 
> fi illustration, which demonstrates using a ligature to make the letters 
> 
> appear *less* connected than they would individually!
> 
> 
> 
> This is also why "simply spacing the characters" isn't a solution: a 
> 
> specially designed ligature looks better than a separate f and i, no 
> 
> matter how minutely kerned.
> 
> 
> 
> It's unfortunate that Unicode includes presentation alternatives like 
> 
> the fi (and ff, fl, ffi, and fl) ligatures.  It was done to be a 
> 
> superset of existing encodings.
> 
> 
> 
> Many typefaces have other non-encoded ligatures as well, especially 
> 
> display faces, which also have alternate glyphs.  Unicode is a funny mix 
> 
> in that it includes some forms of alternates, but can't include all of 
> 
> them, so we have to put up with both an ad-hoc Unicode that includes 
> 
> presentational variants, and also some other way to specify variants 
> 
> because Unicode can't include all of them.
> 

I'm speaking about those times where the "characters" (some) were
not even built with metal, but with wood (see Garamond, Bodoni).

---------

Unicode is "only" collecting "characters" in the sense "abstract
entities". What is supposed to be a "character" is one problem.
How a tool is supposed to handle these "characters" is a problem
too, but a different one.

"Unicode" is not a coding scheme, it is a "repertoire".

Illustrative examples instead of explanations.

The ffl ligature is a "character" because it has always
existed.

The & and œ are considered today as unique "characters".
They were historically "ligaturated forms".

The Fahrenheit, Kelvin and Celsius are considered as
"characters", despite Fahrenheit, Kelvin are "letters".

Text justification. Calculating the space between "words"
in "rendering units" makes sense. Using a specific "character"
like a thin space to force a predefined space makes sense too.

The miscellaneous zeroes one may see, like uppercase O, O with
a dot in the center or a striked O are all the same zero, but
with stylistic variants, => a single "character" in the unicode
table.

... but this medieval "character" existing in two forms (I do not
remember which one) was finally registrated as two "characters",
and not as a stylistic variant of a single "character".

There are no "characters" for the symbols of the chemical elements,
a latin script is good enough.

The QPlainTextEdit widget from Qt does not know '\n'. It uses
only the paragraph separator and the line separator. To render
a paragraph separator, it uses one another "character", the
pilcrow.

The µ "character" in the iso-8859-1 coding scheme is a greek
letter, it must be used or percieved as a SI unit prefix.
Unicode category: Ll, unicode name: micro sign.

How to place an arrow (vector) on top of an ê, if one cann't
decompose it?

Related, there are dotless variants of i and j.

STIX fonts with the huge number of math symbols, not
yet in the unicode repertoire but present in the PUA.

etc.

Unicode is quite open. It's a good idea to keep that
openess to the developer. Shortly, if a coder decomposes
a "character" like "â" in a "a" plus a "^", it's up to
the developer to know what to do when reversing such a
string and to count this sequence as two real "characters".

jmf

[toc] | [prev] | [next] | [standalone]

#60815

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-12-01 00:22 +0000
Message-ID	<529a8144$0$29993$c3e8da3$5496439d@news.astraweb.com>
In reply to	#60809

On Sun, 01 Dec 2013 11:37:30 +1300, Gregory Ewing wrote:

> Which makes it even sillier to have an 'ffi' character in this day and
> age, when you can simply space the characters so that they overlap.

It's in Unicode to support legacy character sets that included it[1]. 
There are a bunch of similar cases:

* LATIN CAPITAL LETTER A WITH RING ABOVE versus ANGSTROM SIGN
* KELVIN SIGN versus LATIN CAPITAL LETTER A
* DEGREE CELSIUS and DEGREE FAHRENHEIT
* the whole set of full-width and half-width forms

On the other hand, there are cases which to a naive reader might look 
like needless duplication but actually aren't. For example, there are a 
bunch of visually indistinguishable characters[2] in European languages, 
like AΑА and BΒВ. The reason for this becomes more obvious[3] when you 
lowercase them:

py> 'AΑА BΒВ'.lower()
'aαа bβв'

Sorting and case-conversion rules would become insanely complicated, and 
context-sensitive, if Unicode only included a single code point per thing-
that-looks-the-same.

The rules for deciding what is and what isn't a distinct character can be 
quite complex, and often politically charged. There's a lot of opposition 
to Unicode in East Asian countries because it unifies Han ideograms that 
look and behave the same in Chinese, Japanese and Korean. The reason they 
do this is for the same reason that Unicode doesn't distinguish between 
(say) English A, German A and French A. One reason some East Asians want 
it to is for the same reason you or I might wish to flag a section of 
text as English and another section of text as German, and have them 
displayed in slightly different typefaces and spell-checked with a 
different dictionary. The Unicode Consortium's answer to that is, this is 
beyond the remit of the character set, and is best handled by markup or 
higher-level formatting.

(Another reason for opposing Han unification is, let's be frank, pure 
nationalism.)

[1] As far as I can tell, the only character supported by legacy 
character sets which is not included in Unicode is the Apple logo from 
Mac charsets.

[2] The actual glyphs depends on the typeface used.

[3] Again, modulo the typeface you're using to view them.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#60817

From	Tim Chase <python.list@tim.thechases.com>
Date	2013-11-30 18:52 -0600
Message-ID	<mailman.3429.1385859089.18130.python-list@python.org>
In reply to	#60815

On 2013-12-01 00:22, Steven D'Aprano wrote:
> * KELVIN SIGN versus LATIN CAPITAL LETTER A

I should hope so ;-)

-tkc

[toc] | [prev] | [next] | [standalone]

#60818

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-12-01 00:54 +0000
Message-ID	<529a88ba$0$29993$c3e8da3$5496439d@news.astraweb.com>
In reply to	#60817

On Sat, 30 Nov 2013 18:52:48 -0600, Tim Chase wrote:

> On 2013-12-01 00:22, Steven D'Aprano wrote:
>> * KELVIN SIGN versus LATIN CAPITAL LETTER A
> 
> I should hope so ;-)

I blame my keyboard, where letters A and K are practically right next to 
each other, only seven letters apart. An easy typo to make.

-- 
Stpvpn

[toc] | [prev] | [next] | [standalone]

#60819

From	Tim Chase <python.list@tim.thechases.com>
Date	2013-11-30 19:05 -0600
Message-ID	<mailman.3430.1385859830.18130.python-list@python.org>
In reply to	#60818

On 2013-12-01 00:54, Steven D'Aprano wrote:
> On Sat, 30 Nov 2013 18:52:48 -0600, Tim Chase wrote:
> 
> > On 2013-12-01 00:22, Steven D'Aprano wrote:  
> >> * KELVIN SIGN versus LATIN CAPITAL LETTER A  
> > 
> > I should hope so ;-)  
> 
> 
> I blame my keyboard, where letters A and K are practically right
> next to each other, only seven letters apart. An easy typo to make.
> 
> 
> 
> -- 
> Stpvpn

I suppose I should have modified my attribution-quote to read "Steven
D'Kprano wrote" then :-)

-tkc

[toc] | [prev] | [next] | [standalone]

#60820

From	Chris Angelico <rosuav@gmail.com>
Date	2013-12-01 12:13 +1100
Message-ID	<mailman.3431.1385860444.18130.python-list@python.org>
In reply to	#60818

On Sun, Dec 1, 2013 at 11:54 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> On Sat, 30 Nov 2013 18:52:48 -0600, Tim Chase wrote:
>
>> On 2013-12-01 00:22, Steven D'Aprano wrote:
>>> * KELVIN SIGN versus LATIN CAPITAL LETTER A
>>
>> I should hope so ;-)
>
>
> I blame my keyboard, where letters A and K are practically right next to
> each other, only seven letters apart. An easy typo to make.

“It’s an easy mistake to make” the PFY concurs “Many’s the time I’ve
picked up a cattle prod thinking it was a lint remover as I’ve helped
groom one of your predecessors before an important board meeting about
slashing the IT budget.”

http://www.theregister.co.uk/2010/11/26/bofh_2010_episode_18/

ChrisA

[toc] | [prev] | [next] | [standalone]

#60821

From	Roy Smith <roy@panix.com>
Date	2013-11-30 20:27 -0500
Message-ID	<roy-4684EB.20275730112013@news.panix.com>
In reply to	#60820

In article <mailman.3431.1385860444.18130.python-list@python.org>,
 Chris Angelico <rosuav@gmail.com> wrote:

> On Sun, Dec 1, 2013 at 11:54 AM, Steven D'Aprano
> <steve+comp.lang.python@pearwood.info> wrote:
> > On Sat, 30 Nov 2013 18:52:48 -0600, Tim Chase wrote:
> >
> >> On 2013-12-01 00:22, Steven D'Aprano wrote:
> >>> * KELVIN SIGN versus LATIN CAPITAL LETTER A
> >>
> >> I should hope so ;-)
> >
> >
> > I blame my keyboard, where letters A and K are practically right next to
> > each other, only seven letters apart. An easy typo to make.
> 
> “It’s an easy mistake to make” the PFY concurs “Many’s the time I’ve
> picked up a cattle prod thinking it was a lint remover as I’ve helped
> groom one of your predecessors before an important board meeting about
> slashing the IT budget.”
> 
> http://www.theregister.co.uk/2010/11/26/bofh_2010_episode_18/
> 
> ChrisA

What means "PFY"?  The only thing I can think of is "Poor F---ing 
Yankee" :-)

[toc] | [prev] | [next] | [standalone]

#60822

From	Chris Angelico <rosuav@gmail.com>
Date	2013-12-01 12:31 +1100
Message-ID	<mailman.3432.1385861973.18130.python-list@python.org>
In reply to	#60821

On Sun, Dec 1, 2013 at 12:27 PM, Roy Smith <roy@panix.com> wrote:
>> http://www.theregister.co.uk/2010/11/26/bofh_2010_episode_18/
>>
>> ChrisA
>
> What means "PFY"?  The only thing I can think of is "Poor F---ing
> Yankee" :-)

In the context of the BOFH, it stands for Pimply-Faced Youth and means
BOFH's assistant.

ChrisA

[toc] | [prev] | [next] | [standalone]

#60831

From	Serhiy Storchaka <storchaka@gmail.com>
Date	2013-12-01 20:00 +0200
Message-ID	<mailman.3438.1385920822.18130.python-list@python.org>
In reply to	#60781

30.11.13 02:44, Steven D'Aprano написав(ла):
> (2) If you reverse that string, does it give "lëon"? The implication of
> this question is that strings should operate on grapheme clusters rather
> than code points. Python fails this test:
>
> py> print("noe\u0308l"[::-1])
> leon

 >>> print(unicodedata.normalize('NFC', "noe\u0308l")[::-1])
lëon

> (3) What are the first three characters? The author suggests that the
> answer should be "noë", in which case Python fails again:
>
> py> print("noe\u0308l"[:3])
> noe

 >>> print(unicodedata.normalize('NFC', "noe\u0308l")[:3])
noë

> (4) Likewise, what is the length of the decomposed string? The author
> expects 4, but Python gives 5:
>
> py> len("noe\u0308l")
> 5

 >>> print(len(unicodedata.normalize('NFC', "noe\u0308l")))
4

[toc] | [prev] | [next] | [standalone]

#60838

From	wxjmfauth@gmail.com
Date	2013-12-01 12:15 -0800
Message-ID	<ce8504d2-7e4e-4a2b-9cee-82bc6445492b@googlegroups.com>
In reply to	#60831

0.11.13 02:44, Steven D'Aprano написав(ла):
> (2) If you reverse that string, does it give "lëon"? The implication of
> this question is that strings should operate on grapheme clusters rather
> than code points. ...
> 

BTW, a grapheme cluster *is* a code points cluster.

jmf

[toc] | [prev] | [next] | [standalone]

#60840

From	Tim Delaney <timothy.c.delaney@gmail.com>
Date	2013-12-02 07:54 +1100
Message-ID	<mailman.3443.1385931297.18130.python-list@python.org>
In reply to	#60838

[Multipart message — attachments visible in raw view] — view raw

On 2 December 2013 07:15, <wxjmfauth@gmail.com> wrote:

> 0.11.13 02:44, Steven D'Aprano написав(ла):
> > (2) If you reverse that string, does it give "lëon"? The implication of
> > this question is that strings should operate on grapheme clusters rather
> > than code points. ...
> >
>
> BTW, a grapheme cluster *is* a code points cluster.
>

Anyone with a decent level of reading comprehension would have understood
that Steven knows that. The implied word is "individual" i.e. "... rather
than [individual] code points".

Why am I responding to a troll? Probably because out of all his baseless
complaints about the FSR, he *did* have one valid point about performance
that has now been fixed.

Tim Delaney

[toc] | [prev] | [next] | [standalone]

#60864

From	wxjmfauth@gmail.com
Date	2013-12-02 04:39 -0800
Message-ID	<23ee5279-bdfd-4fb0-b535-042f7c3bab23@googlegroups.com>
In reply to	#60840

Le dimanche 1 décembre 2013 21:54:48 UTC+1, Tim Delaney a écrit :
> On 2 December 2013 07:15,  <wxjm...@gmail.com> wrote:
> 
> 
> 0.11.13 02:44, Steven D'Aprano написав(ла):
> 
> 
> > (2) If you reverse that string, does it give "lëon"? The implication of
> 
> > this question is that strings should operate on grapheme clusters rather
> 
> > than code points. ...
> 
> >
> 
> 
> 
> BTW, a grapheme cluster *is* a code points cluster.
> 
> 
> 
> Anyone with a decent level of reading comprehension would have understood that Steven knows that. The implied word is "individual" i.e. "... rather than [individual] code points".
> 
> 
> 
> Why am I responding to a troll? Probably because out of all his baseless complaints about the FSR, he *did* have one valid point about performance that has now been fixed.
> 
> 
> Tim Delaney


My English is far too be perfect, I think I understood
it correctly.

The point in not in the words "grapheme" or "code point",
neither in "individual", ;-), the point is in "rather".

If one wishes to work on a set of graphemes, one can
only work with the set of the corresponding code points.


To complete Serhiy Storchaka's example:

>>> len(unicodedata.normalize('NFKD', '\ufdfa')) == 18
True

is correct.

jmf

PS I did not even speak about the FSR.

[toc] | [prev] | [next] | [standalone]

#60870

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2013-12-02 14:46 +0000
Message-ID	<mailman.3464.1385995594.18130.python-list@python.org>
In reply to	#60864

On 02/12/2013 12:39, wxjmfauth@gmail.com wrote:
>
> My English is far too be perfect, I think I understood
> it correctly.
>
> PS I did not even speak about the FSR.
>

1) Your English is far from perfect as you clearly do not understand the 
repeated requests *NOT* to send us double spaced crap via google groups.

2) You can't speak about the FSR as you know precisely nothing about it, 
but as they say, ignorance is bliss.

-- 
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#60871

From	Ned Batchelder <ned@nedbatchelder.com>
Date	2013-12-02 10:22 -0500
Message-ID	<mailman.3465.1385997784.18130.python-list@python.org>
In reply to	#60864

On 12/2/13 9:46 AM, Mark Lawrence wrote:
> On 02/12/2013 12:39, wxjmfauth@gmail.com wrote:
>>
>> My English is far too be perfect, I think I understood
>> it correctly.
>>
>> PS I did not even speak about the FSR.
>>
>
> 1) Your English is far from perfect as you clearly do not understand the
> repeated requests *NOT* to send us double spaced crap via google groups.
>
> 2) You can't speak about the FSR as you know precisely nothing about it,
> but as they say, ignorance is bliss.
>

As annoying as baseless claims against the FSR were, wxjmafauth is 
right: he didn't even mention the FSR in this thread.  There's really no 
point dragging this thread into that territory.

--Ned.

[toc] | [prev] | [next] | [standalone]

#60872

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2013-12-02 15:45 +0000
Message-ID	<mailman.3466.1385999127.18130.python-list@python.org>
In reply to	#60864

On 02/12/2013 15:22, Ned Batchelder wrote:
> On 12/2/13 9:46 AM, Mark Lawrence wrote:
>> On 02/12/2013 12:39, wxjmfauth@gmail.com wrote:
>>>
>>> My English is far too be perfect, I think I understood
>>> it correctly.
>>>
>>> PS I did not even speak about the FSR.
>>>
>>
>> 1) Your English is far from perfect as you clearly do not understand the
>> repeated requests *NOT* to send us double spaced crap via google groups.
>>
>> 2) You can't speak about the FSR as you know precisely nothing about it,
>> but as they say, ignorance is bliss.
>>
>
> As annoying as baseless claims against the FSR were, wxjmafauth is
> right: he didn't even mention the FSR in this thread.  There's really no
> point dragging this thread into that territory.
>
> --Ned.
>

He's quite deliberately dragged it up by using p.s.  Without doubt he's 
the worst loser in the world and I'm *NOT* stopping getting at him.  I 
find his behaviour, continuously and groundlessly insulting the Python 
core developers, quite disgusting.

-- 
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#60873

From	Chris Angelico <rosuav@gmail.com>
Date	2013-12-03 02:49 +1100
Message-ID	<mailman.3467.1385999378.18130.python-list@python.org>
In reply to	#60864

On Tue, Dec 3, 2013 at 2:45 AM, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote:
> He's quite deliberately dragged it up by using p.s.  Without doubt he's the
> worst loser in the world and I'm *NOT* stopping getting at him.  I find his
> behaviour, continuously and groundlessly insulting the Python core
> developers, quite disgusting.

What he does is make very sure that the awesomeness of Python 3.3+ is
constantly being brought up on python-list. New users of Python who
come here will, within a fairly short time, learn that Python actually
gets Unicode right, unlike most languages out there, and that it's
efficient and high performance.

ChrisA

[toc] | [prev] | [next] | [standalone]

#60874

From	Ned Batchelder <ned@nedbatchelder.com>
Date	2013-12-02 10:58 -0500
Message-ID	<mailman.3468.1385999919.18130.python-list@python.org>
In reply to	#60864

On 12/2/13 10:45 AM, Mark Lawrence wrote:
> On 02/12/2013 15:22, Ned Batchelder wrote:
>> On 12/2/13 9:46 AM, Mark Lawrence wrote:
>>> On 02/12/2013 12:39, wxjmfauth@gmail.com wrote:
>>>>
>>>> My English is far too be perfect, I think I understood
>>>> it correctly.
>>>>
>>>> PS I did not even speak about the FSR.
>>>>
>>>
>>> 1) Your English is far from perfect as you clearly do not understand the
>>> repeated requests *NOT* to send us double spaced crap via google groups.
>>>
>>> 2) You can't speak about the FSR as you know precisely nothing about it,
>>> but as they say, ignorance is bliss.
>>>
>>
>> As annoying as baseless claims against the FSR were, wxjmafauth is
>> right: he didn't even mention the FSR in this thread.  There's really no
>> point dragging this thread into that territory.
>>
>> --Ned.
>>
>
> He's quite deliberately dragged it up by using p.s.  Without doubt he's
> the worst loser in the world and I'm *NOT* stopping getting at him.  I
> find his behaviour, continuously and groundlessly insulting the Python
> core developers, quite disgusting.
>

His PS is in reference to you, Ethan, and Tim reminiscing about his past 
complaints against the FSR.  He made three posts to this thread before 
you started in on him, and none of them mentioned the FSR.  Tim first 
mentioned it.

There's no need to call him "the worst loser in the world."  Nothing 
good will come from that kind of attack.  It doesn't make this community 
better, and it will not change his behavior.

He said nothing in this thread that insulted the Python core developers. 
His posts in this thread are not about the FSR, and yet you dragged the 
old fights into it.  You are being the troll here.

--Ned.

[toc] | [prev] | [next] | [standalone]

#60881

From	Terry Reedy <tjreedy@udel.edu>
Date	2013-12-02 15:26 -0500
Message-ID	<mailman.3474.1386015982.18130.python-list@python.org>
In reply to	#60864

On 12/2/2013 10:45 AM, Mark Lawrence wrote:

> the worst loser in the world

Mark, I consider your continual direct personal attacks on other posters 
to be a violation of the PSF Code of Conduct, which *does* apply to 
python-list. Please stop.

-- 
Terry Jan Reedy, one of multiple list moderators

[toc] | [prev] | [next] | [standalone]

#60882

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2013-12-02 20:45 +0000
Message-ID	<mailman.3475.1386017125.18130.python-list@python.org>
In reply to	#60864

On 02/12/2013 20:26, Terry Reedy wrote:
> On 12/2/2013 10:45 AM, Mark Lawrence wrote:
>
>> the worst loser in the world
>
> Mark, I consider your continual direct personal attacks on other posters
> to be a violation of the PSF Code of Conduct, which *does* apply to
> python-list. Please stop.
>

The attacks that "Joseph McCarthy" has been launching on the core 
developers for the last 15 months are in my view now perfectly 
acceptable.  This is excellent news.  Everybody can now say what they 
like about the core developers and there's no comeback.

You can also stuff the code of conduct, it's quite clearly only brought 
into play when it suits.  Never, ever aim it at somebody who goes out of 
their way to stir things up, always target it at the people who fight 
back *IS THE RULE HERE*.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#60889

From	Ned Batchelder <ned@nedbatchelder.com>
Date	2013-12-02 16:44 -0500
Message-ID	<mailman.3482.1386020658.18130.python-list@python.org>
In reply to	#60864

On 12/2/13 3:45 PM, Mark Lawrence wrote:
> On 02/12/2013 20:26, Terry Reedy wrote:
>> On 12/2/2013 10:45 AM, Mark Lawrence wrote:
>>
>>> the worst loser in the world
>>
>> Mark, I consider your continual direct personal attacks on other posters
>> to be a violation of the PSF Code of Conduct, which *does* apply to
>> python-list. Please stop.
>>
>
> The attacks that "Joseph McCarthy" has been launching on the core
> developers for the last 15 months are in my view now perfectly
> acceptable.  This is excellent news.  Everybody can now say what they
> like about the core developers and there's no comeback.
>
> You can also stuff the code of conduct, it's quite clearly only brought
> into play when it suits.  Never, ever aim it at somebody who goes out of
> their way to stir things up, always target it at the people who fight
> back *IS THE RULE HERE*.
>

The point is that in this thread, no one was making attacks on core 
developers.  You were bringing up old animosity here for no reason at 
all, and making them personal attacks to boot.

I don't see how you think wxjmfauth was "going out of his way to stir 
things up" in *this* thread.  He made three comments, none of which 
mentioned the FSR or any other controversial topic.  Can't we respond to 
the content of posts, and not to past offenses by the poster?

Additionally, wxjmfauth's past complaints about the flexible string 
representation were not personal.  He didn't say, "Joe Smith is the 
worst loser in the world for writing the FSR".  He complained about a 
feature of CPython, baselessly, but he never attacked the people doing 
the work.  His continued complaints were aggravating, I agree. I don't 
know that they rose to the level of "disrespectful".

I know that your behavior here is disrespectful.

As to when the code of conduct is brought up, it's only fairly recently 
that it has been mentioned in this forum.  There have clearly been posts 
in recent memory (the last year) which could have been examined in light 
of the code of conduct, and were not. I think we are using it more 
uniformly now. You helped me realize better how to apply it to this 
forum, and I thank you for that.  I welcome your help in applying it 
better still.  But it applies to you as well and I don't think it's too 
much to ask that you abide by it.

The way to improve this list is to respectfully point to and demonstrate 
community norms and ask people to conform to them.  Spewing vitriol 
isn't going to fix anything.

--Ned.

[toc] | [prev] | [next] | [standalone]

Page 2 of 4 — ← Prev page 1 [2] 3 4 Next page →

csiph-web

Python Unicode handling wins again -- mostly

Contents

#60830

#60815

#60817

#60818

#60819

#60820

#60821

#60822

#60831

#60838

#60840

#60864

#60870

#60871

#60872

#60873

#60874

#60881

#60882

#60889