Groups > comp.lang.python > #107885 > unrolled thread

Re: Not x.islower() has different output than x.isupper() in list output...

Started by	Christopher Reimer <christopher_reimer@icloud.com>
First post	2016-04-29 18:55 -0700
Last post	2016-05-04 19:06 +1000
Articles	20 on this page of 24 — 10 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Not x.islower() has different output than x.isupper() in list output... Christopher Reimer <christopher_reimer@icloud.com> - 2016-04-29 18:55 -0700
    Re: Not x.islower() has different output than x.isupper() in list   output... Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-04-30 17:47 +1200
    Re: Not x.islower() has different output than x.isupper() in list output... pavlovevidence@gmail.com - 2016-05-03 03:00 -0700
      Re: Not x.islower() has different output than x.isupper() in list output... Chris Angelico <rosuav@gmail.com> - 2016-05-03 20:25 +1000
        Re: Not x.islower() has different output than x.isupper() in list output... Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-05-03 14:25 +0300
          Re: Not x.islower() has different output than x.isupper() in list output... Chris Angelico <rosuav@gmail.com> - 2016-05-03 22:00 +1000
            Re: Not x.islower() has different output than x.isupper() in list output... DFS <nospam@dfs.com> - 2016-05-03 09:01 -0400
              Re: Not x.islower() has different output than x.isupper() in list output... Chris Angelico <rosuav@gmail.com> - 2016-05-03 23:13 +1000
                Re: Not x.islower() has different output than x.isupper() in list output... DFS <nospam@dfs.com> - 2016-05-03 09:19 -0400
                  Re: Not x.islower() has different output than x.isupper() in list output... Chris Angelico <rosuav@gmail.com> - 2016-05-03 23:23 +1000
                  Re: Not x.islower() has different output than x.isupper() in list output... Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-05-03 17:49 +0300
                    Re: Not x.islower() has different output than x.isupper() in list output... DFS <nospam@dfs.com> - 2016-05-03 11:12 -0400
                      Re: Not x.islower() has different output than x.isupper() in list output... Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-05-03 18:27 +0300
                        Re: Not x.islower() has different output than x.isupper() in list output... Grant Edwards <grant.b.edwards@gmail.com> - 2016-05-03 15:42 +0000
                        Re: Not x.islower() has different output than x.isupper() in list output... Terry Reedy <tjreedy@udel.edu> - 2016-05-03 12:37 -0400
                    Re: Not x.islower() has different output than x.isupper() in list output... Steven D'Aprano <steve@pearwood.info> - 2016-05-04 13:28 +1000
                      Re: Not x.islower() has different output than x.isupper() in list output... DFS <nospam@dfs.com> - 2016-05-04 10:09 -0400
                        Re: Not x.islower() has different output than x.isupper() in list output... Chris Angelico <rosuav@gmail.com> - 2016-05-05 00:37 +1000
                        Re: Not x.islower() has different output than x.isupper() in list output... Steven D'Aprano <steve@pearwood.info> - 2016-05-05 01:37 +1000
                          Re: Not x.islower() has different output than x.isupper() in list output... DFS <nospam@dfs.com> - 2016-05-04 17:05 -0400
            Re: Not x.islower() has different output than x.isupper() in list output... Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-05-03 17:42 +0300
              Re: Not x.islower() has different output than x.isupper() in list output... Steven D'Aprano <steve@pearwood.info> - 2016-05-04 11:30 +1000
              Re: Not x.islower() has different output than x.isupper() in list output... Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-05-04 20:34 +1200
                Re: Not x.islower() has different output than x.isupper() in list output... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-05-04 19:06 +1000

Page 1 of 2 [1] 2 Next page →

#107885 — Re: Not x.islower() has different output than x.isupper() in list output...

From	Christopher Reimer <christopher_reimer@icloud.com>
Date	2016-04-29 18:55 -0700
Subject	Re: Not x.islower() has different output than x.isupper() in list output...
Message-ID	<mailman.242.1461981344.32212.python-list@python.org>

On 4/29/2016 6:29 PM, Stephen Hansen wrote:
> If isupper/islower were perfect opposites of each-other, there'd be no 
> need for both. But since characters can be upper, lower, or *neither*, 
> you run into this situation.

Based upon the official documentation, I was expecting perfect opposites.

str.islower(): "Return true if all cased characters [4] in the string 
are lowercase and there is at least one cased character, false otherwise."

https://docs.python.org/3/library/stdtypes.html?highlight=islower#str.islower

str.isupper(): "Return true if all cased characters [4] in the string 
are uppercase and there is at least one cased character, false otherwise."

https://docs.python.org/3/library/stdtypes.html?highlight=isupper#str.isupper

Here's the footnote that may or not be relevant to this discussion: "[4] 
Cased characters are those with general category property being one of 
“Lu” (Letter, uppercase), “Ll” (Letter, lowercase), or “Lt” (Letter, 
titlecase)."

A bug in the docs?

Thank you,

Chris R.

[toc] | [next] | [standalone]

#107907 — Re: Not x.islower() has different output than x.isupper() in list output...

From	Gregory Ewing <greg.ewing@canterbury.ac.nz>
Date	2016-04-30 17:47 +1200
Subject	Re: Not x.islower() has different output than x.isupper() in list output...
Message-ID	<doiv7uFegcvU1@mid.individual.net>
In reply to	#107885

Christopher Reimer wrote:

> str.islower(): "Return true if all cased characters [4] in the string 
> are lowercase and there is at least one cased character, false otherwise."
> 
> str.isupper(): "Return true if all cased characters [4] in the string 
> are uppercase and there is at least one cased character, false otherwise."

A string consisting of a single space doesn't contain any
cased characters, so both islower(" ") and isupper(" ")
return false according to these rules. The docs are correct.

-- 
Greg

[toc] | [prev] | [next] | [standalone]

#108052

From	pavlovevidence@gmail.com
Date	2016-05-03 03:00 -0700
Message-ID	<e1e5bfe4-7998-4cf7-a4f8-53cf5426c7c5@googlegroups.com>
In reply to	#107885

On Friday, April 29, 2016 at 6:55:56 PM UTC-7, Christopher Reimer wrote:
> On 4/29/2016 6:29 PM, Stephen Hansen wrote:
> > If isupper/islower were perfect opposites of each-other, there'd be no 
> > need for both. But since characters can be upper, lower, or *neither*, 
> > you run into this situation.
> 
> Based upon the official documentation, I was expecting perfect opposites.
> 
> str.islower(): "Return true if all cased characters [4] in the string 
> are lowercase and there is at least one cased character, false otherwise."
> 
> https://docs.python.org/3/library/stdtypes.html?highlight=islower#str.islower
> 
> str.isupper(): "Return true if all cased characters [4] in the string 
> are uppercase and there is at least one cased character, false otherwise."
> 
> https://docs.python.org/3/library/stdtypes.html?highlight=isupper#str.isupper

Just to take this discussion in a more pure logic direction.  What you call perfect opposites (that is, the functions being negations of each other) is not what the similar wording in the documentation actually implies: you shouldn't have been expecting that.

What you should have been expecting is a symmetry.  Say you have a string G.  islower(G) will return a certain result.  Now take every letter in G and swap the case, and call that string g.  isupper(g) will always return the same result is islower(G).

More succinctly, for any string x, the following is always ture:

islower(x) == isupper(swapcase(x))

But that is not the same thing, and does not imply, as the following identity (which it turns out is not always true, as we've seen):

islower(x) == not isupper(x)

Another example of functions that behave like this are ispositive and isnegative.  The identity "ispositive(x) == isnegative(-x)" is always true.  However, "ispositive(x) == not isnegative(x)" is false if x == 0.

However, I can understand your confusion, because there are some pairs of functions where both identities are true, and if you've seen a few of them it's fairly easy for your intuition to overgeneralize a bit.  An example I can think of offhand is iseven(x) and isodd(x), for any integer x.  The identities "iseven(x) == isodd(x^1)" and "iseven(x) == not isodd(x)" are both always true.

Carl Banks

[toc] | [prev] | [next] | [standalone]

#108054

From	Chris Angelico <rosuav@gmail.com>
Date	2016-05-03 20:25 +1000
Message-ID	<mailman.339.1462271120.32212.python-list@python.org>
In reply to	#108052

On Tue, May 3, 2016 at 8:00 PM,  <pavlovevidence@gmail.com> wrote:
>
> What you should have been expecting is a symmetry.  Say you have a string G.  islower(G) will return a certain result.  Now take every letter in G and swap the case, and call that string g.  isupper(g) will always return the same result is islower(G).
>
> More succinctly, for any string x, the following is always ture:
>
> islower(x) == isupper(swapcase(x))
>
> But that is not the same thing, and does not imply, as the following identity (which it turns out is not always true, as we've seen):
>
> islower(x) == not isupper(x)
>
>
> Another example of functions that behave like this are ispositive and isnegative.  The identity "ispositive(x) == isnegative(-x)" is always true.  However, "ispositive(x) == not isnegative(x)" is false if x == 0.
>

This assumes, of course, that there is a function swapcase which can
return a string with case inverted. I'm not sure such a function
exists.

ChrisA

[toc] | [prev] | [next] | [standalone]

#108057

From	Jussi Piitulainen <jussi.piitulainen@helsinki.fi>
Date	2016-05-03 14:25 +0300
Message-ID	<lf5r3dje63l.fsf@ling.helsinki.fi>
In reply to	#108054

Chris Angelico writes:

> This assumes, of course, that there is a function swapcase which can
> return a string with case inverted. I'm not sure such a function
> exists.

   str.swapcase("foO")
   'FOo'

[toc] | [prev] | [next] | [standalone]

#108062

From	Chris Angelico <rosuav@gmail.com>
Date	2016-05-03 22:00 +1000
Message-ID	<mailman.341.1462276849.32212.python-list@python.org>
In reply to	#108057

On Tue, May 3, 2016 at 9:25 PM, Jussi Piitulainen
<jussi.piitulainen@helsinki.fi> wrote:
> Chris Angelico writes:
>
>> This assumes, of course, that there is a function swapcase which can
>> return a string with case inverted. I'm not sure such a function
>> exists.
>
>    str.swapcase("foO")
>    'FOo'

I suppose for this discussion it doesn't matter if it's imperfect.

>>> "\N{ANGSTROM SIGN}".swapcase().swapcase() == "\N{ANGSTROM SIGN}"
False
>>> "\N{LATIN SMALL LETTER SHARP S}".swapcase().swapcase()
'ss'

But drawing the analogy with the negation of real numbers implies
something that doesn't exist.

ChrisA

[toc] | [prev] | [next] | [standalone]

#108065

From	DFS <nospam@dfs.com>
Date	2016-05-03 09:01 -0400
Message-ID	<nga79f$gau$1@dont-email.me>
In reply to	#108062

On 5/3/2016 8:00 AM, Chris Angelico wrote:
> On Tue, May 3, 2016 at 9:25 PM, Jussi Piitulainen
> <jussi.piitulainen@helsinki.fi> wrote:
>> Chris Angelico writes:
>>
>>> This assumes, of course, that there is a function swapcase which can
>>> return a string with case inverted. I'm not sure such a function
>>> exists.
>>
>>    str.swapcase("foO")
>>    'FOo'
>
> I suppose for this discussion it doesn't matter if it's imperfect.


What was imperfect?

[toc] | [prev] | [next] | [standalone]

#108066

From	Chris Angelico <rosuav@gmail.com>
Date	2016-05-03 23:13 +1000
Message-ID	<mailman.344.1462281210.32212.python-list@python.org>
In reply to	#108065

On Tue, May 3, 2016 at 11:01 PM, DFS <nospam@dfs.com> wrote:
> On 5/3/2016 8:00 AM, Chris Angelico wrote:
>>
>> On Tue, May 3, 2016 at 9:25 PM, Jussi Piitulainen
>> <jussi.piitulainen@helsinki.fi> wrote:
>>>
>>> Chris Angelico writes:
>>>
>>>> This assumes, of course, that there is a function swapcase which can
>>>> return a string with case inverted. I'm not sure such a function
>>>> exists.
>>>
>>>
>>>    str.swapcase("foO")
>>>    'FOo'
>>
>>
>> I suppose for this discussion it doesn't matter if it's imperfect.
>
>
>
> What was imperfect?

It doesn't invert, the way numeric negation does. And if you try to
define exactly what it does, you'll come right back to
isupper()/islower(), so it's not much help in defining those.

ChrisA

[toc] | [prev] | [next] | [standalone]

#108068

From	DFS <nospam@dfs.com>
Date	2016-05-03 09:19 -0400
Message-ID	<nga89p$ku2$1@dont-email.me>
In reply to	#108066

On 5/3/2016 9:13 AM, Chris Angelico wrote:
> On Tue, May 3, 2016 at 11:01 PM, DFS <nospam@dfs.com> wrote:
>> On 5/3/2016 8:00 AM, Chris Angelico wrote:
>>>
>>> On Tue, May 3, 2016 at 9:25 PM, Jussi Piitulainen
>>> <jussi.piitulainen@helsinki.fi> wrote:
>>>>
>>>> Chris Angelico writes:
>>>>
>>>>> This assumes, of course, that there is a function swapcase which can
>>>>> return a string with case inverted. I'm not sure such a function
>>>>> exists.
>>>>
>>>>
>>>>    str.swapcase("foO")
>>>>    'FOo'
>>>
>>>
>>> I suppose for this discussion it doesn't matter if it's imperfect.
>>
>>
>>
>> What was imperfect?
>
> It doesn't invert, the way numeric negation does.


What do you mean by 'case inverted'?

It looks like it swaps the case correctly between upper and lower.




> And if you try to
> define exactly what it does, you'll come right back to
> isupper()/islower(), so it's not much help in defining those.
>
> ChrisA

[toc] | [prev] | [next] | [standalone]

#108070

From	Chris Angelico <rosuav@gmail.com>
Date	2016-05-03 23:23 +1000
Message-ID	<mailman.345.1462281819.32212.python-list@python.org>
In reply to	#108068

On Tue, May 3, 2016 at 11:19 PM, DFS <nospam@dfs.com> wrote:
> What do you mean by 'case inverted'?
>
> It looks like it swaps the case correctly between upper and lower.

I gave two examples in my previous post. Did you read them? You
trimmed them from the quote.

ChrisA

[toc] | [prev] | [next] | [standalone]

#108074

From	Jussi Piitulainen <jussi.piitulainen@helsinki.fi>
Date	2016-05-03 17:49 +0300
Message-ID	<lf5eg9jdwmr.fsf@ling.helsinki.fi>
In reply to	#108068

DFS writes:

> On 5/3/2016 9:13 AM, Chris Angelico wrote:

>> It doesn't invert, the way numeric negation does.
>
> What do you mean by 'case inverted'?
>
> It looks like it swaps the case correctly between upper and lower.

There's letters that do not come in exact pairs of upper and lower case,
so _some_ swaps are not invertible: you swap twice and end up somewhere
else than your starting point.

The "\N{ANSGTROM SIGN}" looks like the Swedish upper-case
a-with-ring-above but isn't the same character, yet Python swaps its
case to the actual lower-case a-with-ring above. It can't go back to
_both_ the Angstrom sign and the actual upper case letter.

(Not sure why the sign is considered a cased letter at all.)

[toc] | [prev] | [next] | [standalone]

#108075

From	DFS <nospam@dfs.com>
Date	2016-05-03 11:12 -0400
Message-ID	<ngaev1$fv0$1@dont-email.me>
In reply to	#108074

On 5/3/2016 10:49 AM, Jussi Piitulainen wrote:
> DFS writes:
>
>> On 5/3/2016 9:13 AM, Chris Angelico wrote:
>
>>> It doesn't invert, the way numeric negation does.
>>
>> What do you mean by 'case inverted'?
>>
>> It looks like it swaps the case correctly between upper and lower.
>
> There's letters that do not come in exact pairs of upper and lower case,
> so _some_ swaps are not invertible: you swap twice and end up somewhere
> else than your starting point.
>
> The "\N{ANSGTROM SIGN}" looks like the Swedish upper-case
> a-with-ring-above but isn't the same character, yet Python swaps its
> case to the actual lower-case a-with-ring above. It can't go back to
> _both_ the Angstrom sign and the actual upper case letter.
>
> (Not sure why the sign is considered a cased letter at all.)


Thanks for the explanation.

Does that mean:

lower(Å) != å ?

and

upper(å) != Å ?

[toc] | [prev] | [next] | [standalone]

#108076

From	Jussi Piitulainen <jussi.piitulainen@helsinki.fi>
Date	2016-05-03 18:27 +0300
Message-ID	<lf560uvduwe.fsf@ling.helsinki.fi>
In reply to	#108075

DFS writes:

> On 5/3/2016 10:49 AM, Jussi Piitulainen wrote:
>> DFS writes:
>>
>>> On 5/3/2016 9:13 AM, Chris Angelico wrote:
>>
>>>> It doesn't invert, the way numeric negation does.
>>>
>>> What do you mean by 'case inverted'?
>>>
>>> It looks like it swaps the case correctly between upper and lower.
>>
>> There's letters that do not come in exact pairs of upper and lower case,
>> so _some_ swaps are not invertible: you swap twice and end up somewhere
>> else than your starting point.
>>
>> The "\N{ANSGTROM SIGN}" looks like the Swedish upper-case
>> a-with-ring-above but isn't the same character, yet Python swaps its
>> case to the actual lower-case a-with-ring above. It can't go back to
>> _both_ the Angstrom sign and the actual upper case letter.
>>
>> (Not sure why the sign is considered a cased letter at all.)
>
>
> Thanks for the explanation.
>
> Does that mean:
>
> lower(Å) != å ?
>
> and
>
> upper(å) != Å ?

It means "\N{ANGSTROM SIGN}" != "Å", yet both lower to "å", which then
uppers back to "Å" (U+00c5).

The Ångström sign (U+212b) looks like this: Å. Indistinguishable from Å
in the font that I'm seeing - for all I know, it's the same glyph.

[toc] | [prev] | [next] | [standalone]

#108077

From	Grant Edwards <grant.b.edwards@gmail.com>
Date	2016-05-03 15:42 +0000
Message-ID	<mailman.347.1462290169.32212.python-list@python.org>
In reply to	#108076

On 2016-05-03, Jussi Piitulainen <jussi.piitulainen@helsinki.fi> wrote:

>> Does that mean:
>>
>> lower(Å) != å ?
>>
>> and
>>
>> upper(å) != Å ?
>
> It means "\N{ANGSTROM SIGN}" != "Å", yet both lower to "å", which then
> uppers back to "Å" (U+00c5).
>
> The Ångström sign (U+212b) looks like this: Å. Indistinguishable from Å
> in the font that I'm seeing - for all I know, it's the same glyph.

Interesting. FWIW, Å and Å definitely look different with the terminal
and font I'm using (urxvt with -misc-fixed-medium-r-normal-*-18-120-*-*-*-90-iso10646-*)

Expecting upper/lower operations to be 100% invertible is probably a
ASCII-centric mindset that will falls over as soon as you start
dealing with non-ASCII encodings.

-- 
Grant Edwards               grant.b.edwards        Yow! Xerox your lunch
                                  at               and file it under "sex
                              gmail.com            offenders"!

[toc] | [prev] | [next] | [standalone]

#108081

From	Terry Reedy <tjreedy@udel.edu>
Date	2016-05-03 12:37 -0400
Message-ID	<mailman.350.1462293512.32212.python-list@python.org>
In reply to	#108076

On 5/3/2016 11:42 AM, Grant Edwards wrote:

> Interesting. FWIW, Å and Å definitely look different with the terminal
> and font I'm using (urxvt with -misc-fixed-medium-r-normal-*-18-120-*-*-*-90-iso10646-*)

In the fixed pitch font used by Thunderbird (Courier?), Angstrom Å has 
the circle touching the A while letter Å has the circle spaced above.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#108113

From	Steven D'Aprano <steve@pearwood.info>
Date	2016-05-04 13:28 +1000
Message-ID	<57296c7a$0$1589$c3e8da3$5496439d@news.astraweb.com>
In reply to	#108074

On Wed, 4 May 2016 12:49 am, Jussi Piitulainen wrote:

> DFS writes:
> 
>> On 5/3/2016 9:13 AM, Chris Angelico wrote:
> 
>>> It doesn't invert, the way numeric negation does.
>>
>> What do you mean by 'case inverted'?
>>
>> It looks like it swaps the case correctly between upper and lower.
> 
> There's letters that do not come in exact pairs of upper and lower case,

Languages with two distinct lettercases, like English, are called bicameral.
The two cases are technically called majuscule and minuscule, but
colloquially known as uppercase and lowercase since movable type printers
traditionally used to keep the majuscule letters in a drawer above the
minuscule letters.

Many alphabets are unicameral, that is, they only have a single lettercase.
Examples include Hebrew, Arabic, Hangul, and many others. Georgian is an
interesting example, as it is the only known written alphabet that started
as a bicameral script and then became unicameral.

Consequently, many letters are neither upper nor lower case, and have
Unicode category "Letter other":

py> c = u'\N{ARABIC LETTER FEH}'
py> unicodedata.category(c)
'Lo'
py> c.isalpha()
True
py> c.isupper()
False
py> c.islower()
False

Even among bicameral alphabets, there are a few anomalies. The three most
obvious ones are Greek sigma, German Eszett (or "sharp S") and Turkish I.

(1) The Greek sigma is usually written as Σ or σ in uppercase and lowercase
respectively, but at the end of a word, lowercase sigma is written as ς.

(This final sigma is sometimes called "stigma", but should not be confused
with the archaic Greek letter stigma, which has two cases Ϛ ϛ, at least
when it is not being written as digamma Ϝϝ -- and if you're confused, so
are the Greeks :-)

Python 3.3 correctly handles the sigma/final sigma when upper- and
lowercasing:

py> 'ΘΠΣΤΣ'.lower()
'θπστς'

py> 'ΘΠΣΤΣ'.lower().upper()
'ΘΠΣΤΣ'

(2) The German Eszett ß traditionally existed in only lowercase forms, but
despite the existence of an uppercase form since at least the 19th century,
when the Germans moved away from blackletter to Roman-style letters, the
uppercase form was left out. In recent years, printers in Germany have
started to reintroduce an uppercase version, and the German government have
standardized on its use for placenames, but not other words.

(Aside: in Germany, ß is not considered a distinct letter of the alphabet,
but a ligature of ss; historically it derived from a ligature of ſs, ſz or
ſʒ. The funny characters you may or may not be able to see are the long-S
and round-Z.)

Python follows common, but not universal, German practice for eszett:

py> 'ẞ'.lower()
'ß'
py> 'ß'.upper()
'SS'

Note that this is lossy: given a name like "STRASSER", it is impossible to
tell whether it should be title-cased to "Strasser" or "Straßer". It also
means that uppercasing a string can make it longer.

For more on the uppercase eszett, see:

https://typography.guru/journal/germanys-new-character/
https://typography.guru/journal/how-to-draw-a-capital-sharp-s-r18/

(3) In most Latin alphabets, the lowercase i and j have a "tittle" diacritic
on them, but not the uppercase forms I and J. Turkish and a few other
languages have both I-with-tittle and I-without-tittle.

(As far as I know, there is no language with a dotless J.)

So in Turkish, the correct uppercase to lowercase and back again should go:

Dotless I: I -> ı -> I

Dotted I: İ -> i -> İ

Python does not quite manage to handle this correctly for Turkish
applications, since it loses the dotted/dotless distinction:

py> 'ı'.upper()
'I'
py> 'İ'.lower()
'i'

and further case conversions follow the non-Turkish rules.

Note that sometimes getting this wrong can have serious consequences:

http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#108131

From	DFS <nospam@dfs.com>
Date	2016-05-04 10:09 -0400
Message-ID	<ngcvjm$3ou$1@dont-email.me>
In reply to	#108113

On 5/3/2016 11:28 PM, Steven D'Aprano wrote:
> On Wed, 4 May 2016 12:49 am, Jussi Piitulainen wrote:
>
>> DFS writes:
>>
>>> On 5/3/2016 9:13 AM, Chris Angelico wrote:
>>
>>>> It doesn't invert, the way numeric negation does.
>>>
>>> What do you mean by 'case inverted'?
>>>
>>> It looks like it swaps the case correctly between upper and lower.
>>
>> There's letters that do not come in exact pairs of upper and lower case,
>
> Languages with two distinct lettercases, like English, are called bicameral.
> The two cases are technically called majuscule and minuscule, but
> colloquially known as uppercase and lowercase since movable type printers
> traditionally used to keep the majuscule letters in a drawer above the
> minuscule letters.
>
> Many alphabets are unicameral, that is, they only have a single lettercase.
> Examples include Hebrew, Arabic, Hangul, and many others. Georgian is an
> interesting example, as it is the only known written alphabet that started
> as a bicameral script and then became unicameral.
>
> Consequently, many letters are neither upper nor lower case, and have
> Unicode category "Letter other":
>
> py> c = u'\N{ARABIC LETTER FEH}'
> py> unicodedata.category(c)
> 'Lo'
> py> c.isalpha()
> True
> py> c.isupper()
> False
> py> c.islower()
> False
>
>
> Even among bicameral alphabets, there are a few anomalies. The three most
> obvious ones are Greek sigma, German Eszett (or "sharp S") and Turkish I.
>
> (1) The Greek sigma is usually written as Σ or σ in uppercase and lowercase
> respectively, but at the end of a word, lowercase sigma is written as ς.
>
> (This final sigma is sometimes called "stigma", but should not be confused
> with the archaic Greek letter stigma, which has two cases Ϛ ϛ, at least
> when it is not being written as digamma Ϝϝ -- and if you're confused, so
> are the Greeks :-)
>
> Python 3.3 correctly handles the sigma/final sigma when upper- and
> lowercasing:
>
> py> 'ΘΠΣΤΣ'.lower()
> 'θπστς'
>
> py> 'ΘΠΣΤΣ'.lower().upper()
> 'ΘΠΣΤΣ'
>
>
>
> (2) The German Eszett ß traditionally existed in only lowercase forms, but
> despite the existence of an uppercase form since at least the 19th century,
> when the Germans moved away from blackletter to Roman-style letters, the
> uppercase form was left out. In recent years, printers in Germany have
> started to reintroduce an uppercase version, and the German government have
> standardized on its use for placenames, but not other words.
>
> (Aside: in Germany, ß is not considered a distinct letter of the alphabet,
> but a ligature of ss; historically it derived from a ligature of ſs, ſz or
> ſʒ. The funny characters you may or may not be able to see are the long-S
> and round-Z.)
>
> Python follows common, but not universal, German practice for eszett:
>
> py> 'ẞ'.lower()
> 'ß'
> py> 'ß'.upper()
> 'SS'
>
> Note that this is lossy: given a name like "STRASSER", it is impossible to
> tell whether it should be title-cased to "Strasser" or "Straßer". It also
> means that uppercasing a string can make it longer.
>
>
> For more on the uppercase eszett, see:
>
> https://typography.guru/journal/germanys-new-character/
> https://typography.guru/journal/how-to-draw-a-capital-sharp-s-r18/
>
>
> (3) In most Latin alphabets, the lowercase i and j have a "tittle" diacritic
> on them, but not the uppercase forms I and J. Turkish and a few other
> languages have both I-with-tittle and I-without-tittle.
>
> (As far as I know, there is no language with a dotless J.)
>
> So in Turkish, the correct uppercase to lowercase and back again should go:
>
> Dotless I: I -> ı -> I
>
> Dotted I: İ -> i -> İ
>
> Python does not quite manage to handle this correctly for Turkish
> applications, since it loses the dotted/dotless distinction:
>
> py> 'ı'.upper()
> 'I'
> py> 'İ'.lower()
> 'i'
>
> and further case conversions follow the non-Turkish rules.
>
> Note that sometimes getting this wrong can have serious consequences:
>
> http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail


Linguist much?

[toc] | [prev] | [next] | [standalone]

#108132

From	Chris Angelico <rosuav@gmail.com>
Date	2016-05-05 00:37 +1000
Message-ID	<mailman.384.1462372639.32212.python-list@python.org>
In reply to	#108131

On Thu, May 5, 2016 at 12:09 AM, DFS <nospam@dfs.com> wrote:
> On 5/3/2016 11:28 PM, Steven D'Aprano wrote:
>> [ lengthy piece about text, Unicode, and letter case ]
>
> Linguist much?

As an English-only speaker who writes code that needs to be used
around the world, you end up accruing tidbits of language and text
trivia in the form of edge cases that you need to remember to test.
Among them:

* Turkish dotless and dotted i
* Greek medial and final sigma
* German eszett
* Hebrew and Arabic right-to-left text
* Chinese non-BMP characters
* Combining characters (eg diacriticals starting U+0300)
* Non-characters eg U+FFFE

And then a post like Steven's basically comes from pulling up all
those from your memory, and maybe doing a spot of quick testing and/or
research to get some explanatory details. You don't have to be a
linguist, necessarily - just a competent debugger.

ChrisA

[toc] | [prev] | [next] | [standalone]

#108141

From	Steven D'Aprano <steve@pearwood.info>
Date	2016-05-05 01:37 +1000
Message-ID	<572a1752$0$1614$c3e8da3$5496439d@news.astraweb.com>
In reply to	#108131

On Thu, 5 May 2016 12:09 am, DFS wrote:

> On 5/3/2016 11:28 PM, Steven D'Aprano wrote:

>> Languages with two distinct lettercases, like English, are called
>> bicameral.
[...] 

> Linguist much?


Possibly even a cunning one.




Somebody-had-to-say-it-ly y'rs,


-- 
Steven

[toc] | [prev] | [next] | [standalone]

#108151

From	DFS <nospam@dfs.com>
Date	2016-05-04 17:05 -0400
Message-ID	<ngdnvh$416$2@dont-email.me>
In reply to	#108141

On 5/4/2016 11:37 AM, Steven D'Aprano wrote:
> On Thu, 5 May 2016 12:09 am, DFS wrote:
>
>> On 5/3/2016 11:28 PM, Steven D'Aprano wrote:
>
>>> Languages with two distinct lettercases, like English, are called
>>> bicameral.
> [...]
>
>> Linguist much?
>
>
> Possibly even a cunning one.


I see you as more of a Colonel Angus.

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

Re: Not x.islower() has different output than x.isupper() in list output...

Contents

#107885 — Re: Not x.islower() has different output than x.isupper() in list output...

#107907 — Re: Not x.islower() has different output than x.isupper() in list output...

#108052

#108054

#108057

#108062

#108065

#108066

#108068

#108070

#108074

#108075

#108076

#108077

#108081

#108113

#108131

#108132

#108141

#108151