Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #107885 > unrolled thread

Re: Not x.islower() has different output than x.isupper() in list output...

Started byChristopher Reimer <christopher_reimer@icloud.com>
First post2016-04-29 18:55 -0700
Last post2016-05-04 19:06 +1000
Articles 20 on this page of 24 — 10 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Not x.islower() has different output than x.isupper() in list output... Christopher Reimer <christopher_reimer@icloud.com> - 2016-04-29 18:55 -0700
    Re: Not x.islower() has different output than x.isupper() in list   output... Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-04-30 17:47 +1200
    Re: Not x.islower() has different output than x.isupper() in list output... pavlovevidence@gmail.com - 2016-05-03 03:00 -0700
      Re: Not x.islower() has different output than x.isupper() in list output... Chris Angelico <rosuav@gmail.com> - 2016-05-03 20:25 +1000
        Re: Not x.islower() has different output than x.isupper() in list output... Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-05-03 14:25 +0300
          Re: Not x.islower() has different output than x.isupper() in list output... Chris Angelico <rosuav@gmail.com> - 2016-05-03 22:00 +1000
            Re: Not x.islower() has different output than x.isupper() in list output... DFS <nospam@dfs.com> - 2016-05-03 09:01 -0400
              Re: Not x.islower() has different output than x.isupper() in list output... Chris Angelico <rosuav@gmail.com> - 2016-05-03 23:13 +1000
                Re: Not x.islower() has different output than x.isupper() in list output... DFS <nospam@dfs.com> - 2016-05-03 09:19 -0400
                  Re: Not x.islower() has different output than x.isupper() in list output... Chris Angelico <rosuav@gmail.com> - 2016-05-03 23:23 +1000
                  Re: Not x.islower() has different output than x.isupper() in list output... Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-05-03 17:49 +0300
                    Re: Not x.islower() has different output than x.isupper() in list output... DFS <nospam@dfs.com> - 2016-05-03 11:12 -0400
                      Re: Not x.islower() has different output than x.isupper() in list output... Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-05-03 18:27 +0300
                        Re: Not x.islower() has different output than x.isupper() in list output... Grant Edwards <grant.b.edwards@gmail.com> - 2016-05-03 15:42 +0000
                        Re: Not x.islower() has different output than x.isupper() in list output... Terry Reedy <tjreedy@udel.edu> - 2016-05-03 12:37 -0400
                    Re: Not x.islower() has different output than x.isupper() in list output... Steven D'Aprano <steve@pearwood.info> - 2016-05-04 13:28 +1000
                      Re: Not x.islower() has different output than x.isupper() in list output... DFS <nospam@dfs.com> - 2016-05-04 10:09 -0400
                        Re: Not x.islower() has different output than x.isupper() in list output... Chris Angelico <rosuav@gmail.com> - 2016-05-05 00:37 +1000
                        Re: Not x.islower() has different output than x.isupper() in list output... Steven D'Aprano <steve@pearwood.info> - 2016-05-05 01:37 +1000
                          Re: Not x.islower() has different output than x.isupper() in list output... DFS <nospam@dfs.com> - 2016-05-04 17:05 -0400
            Re: Not x.islower() has different output than x.isupper() in list output... Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-05-03 17:42 +0300
              Re: Not x.islower() has different output than x.isupper() in list output... Steven D'Aprano <steve@pearwood.info> - 2016-05-04 11:30 +1000
              Re: Not x.islower() has different output than x.isupper() in list output... Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-05-04 20:34 +1200
                Re: Not x.islower() has different output than x.isupper() in list output... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-05-04 19:06 +1000

Page 1 of 2  [1] 2  Next page →


#107885 — Re: Not x.islower() has different output than x.isupper() in list output...

FromChristopher Reimer <christopher_reimer@icloud.com>
Date2016-04-29 18:55 -0700
SubjectRe: Not x.islower() has different output than x.isupper() in list output...
Message-ID<mailman.242.1461981344.32212.python-list@python.org>
On 4/29/2016 6:29 PM, Stephen Hansen wrote:
> If isupper/islower were perfect opposites of each-other, there'd be no 
> need for both. But since characters can be upper, lower, or *neither*, 
> you run into this situation.

Based upon the official documentation, I was expecting perfect opposites.

str.islower(): "Return true if all cased characters [4] in the string 
are lowercase and there is at least one cased character, false otherwise."

https://docs.python.org/3/library/stdtypes.html?highlight=islower#str.islower

str.isupper(): "Return true if all cased characters [4] in the string 
are uppercase and there is at least one cased character, false otherwise."

https://docs.python.org/3/library/stdtypes.html?highlight=isupper#str.isupper

Here's the footnote that may or not be relevant to this discussion: "[4] 
Cased characters are those with general category property being one of 
“Lu” (Letter, uppercase), “Ll” (Letter, lowercase), or “Lt” (Letter, 
titlecase)."

A bug in the docs?

Thank you,

Chris R.

[toc] | [next] | [standalone]


#107907 — Re: Not x.islower() has different output than x.isupper() in list output...

FromGregory Ewing <greg.ewing@canterbury.ac.nz>
Date2016-04-30 17:47 +1200
SubjectRe: Not x.islower() has different output than x.isupper() in list output...
Message-ID<doiv7uFegcvU1@mid.individual.net>
In reply to#107885
Christopher Reimer wrote:

> str.islower(): "Return true if all cased characters [4] in the string 
> are lowercase and there is at least one cased character, false otherwise."
> 
> str.isupper(): "Return true if all cased characters [4] in the string 
> are uppercase and there is at least one cased character, false otherwise."

A string consisting of a single space doesn't contain any
cased characters, so both islower(" ") and isupper(" ")
return false according to these rules. The docs are correct.

-- 
Greg

[toc] | [prev] | [next] | [standalone]


#108052

Frompavlovevidence@gmail.com
Date2016-05-03 03:00 -0700
Message-ID<e1e5bfe4-7998-4cf7-a4f8-53cf5426c7c5@googlegroups.com>
In reply to#107885
On Friday, April 29, 2016 at 6:55:56 PM UTC-7, Christopher Reimer wrote:
> On 4/29/2016 6:29 PM, Stephen Hansen wrote:
> > If isupper/islower were perfect opposites of each-other, there'd be no 
> > need for both. But since characters can be upper, lower, or *neither*, 
> > you run into this situation.
> 
> Based upon the official documentation, I was expecting perfect opposites.
> 
> str.islower(): "Return true if all cased characters [4] in the string 
> are lowercase and there is at least one cased character, false otherwise."
> 
> https://docs.python.org/3/library/stdtypes.html?highlight=islower#str.islower
> 
> str.isupper(): "Return true if all cased characters [4] in the string 
> are uppercase and there is at least one cased character, false otherwise."
> 
> https://docs.python.org/3/library/stdtypes.html?highlight=isupper#str.isupper


Just to take this discussion in a more pure logic direction.  What you call perfect opposites (that is, the functions being negations of each other) is not what the similar wording in the documentation actually implies: you shouldn't have been expecting that.

What you should have been expecting is a symmetry.  Say you have a string G.  islower(G) will return a certain result.  Now take every letter in G and swap the case, and call that string g.  isupper(g) will always return the same result is islower(G).

More succinctly, for any string x, the following is always ture:

islower(x) == isupper(swapcase(x))

But that is not the same thing, and does not imply, as the following identity (which it turns out is not always true, as we've seen):

islower(x) == not isupper(x)


Another example of functions that behave like this are ispositive and isnegative.  The identity "ispositive(x) == isnegative(-x)" is always true.  However, "ispositive(x) == not isnegative(x)" is false if x == 0.


However, I can understand your confusion, because there are some pairs of functions where both identities are true, and if you've seen a few of them it's fairly easy for your intuition to overgeneralize a bit.  An example I can think of offhand is iseven(x) and isodd(x), for any integer x.  The identities "iseven(x) == isodd(x^1)" and "iseven(x) == not isodd(x)" are both always true.


Carl Banks

[toc] | [prev] | [next] | [standalone]


#108054

FromChris Angelico <rosuav@gmail.com>
Date2016-05-03 20:25 +1000
Message-ID<mailman.339.1462271120.32212.python-list@python.org>
In reply to#108052
On Tue, May 3, 2016 at 8:00 PM,  <pavlovevidence@gmail.com> wrote:
>
> What you should have been expecting is a symmetry.  Say you have a string G.  islower(G) will return a certain result.  Now take every letter in G and swap the case, and call that string g.  isupper(g) will always return the same result is islower(G).
>
> More succinctly, for any string x, the following is always ture:
>
> islower(x) == isupper(swapcase(x))
>
> But that is not the same thing, and does not imply, as the following identity (which it turns out is not always true, as we've seen):
>
> islower(x) == not isupper(x)
>
>
> Another example of functions that behave like this are ispositive and isnegative.  The identity "ispositive(x) == isnegative(-x)" is always true.  However, "ispositive(x) == not isnegative(x)" is false if x == 0.
>

This assumes, of course, that there is a function swapcase which can
return a string with case inverted. I'm not sure such a function
exists.

ChrisA

[toc] | [prev] | [next] | [standalone]


#108057

FromJussi Piitulainen <jussi.piitulainen@helsinki.fi>
Date2016-05-03 14:25 +0300
Message-ID<lf5r3dje63l.fsf@ling.helsinki.fi>
In reply to#108054
Chris Angelico writes:

> This assumes, of course, that there is a function swapcase which can
> return a string with case inverted. I'm not sure such a function
> exists.

   str.swapcase("foO")
   'FOo'

[toc] | [prev] | [next] | [standalone]


#108062

FromChris Angelico <rosuav@gmail.com>
Date2016-05-03 22:00 +1000
Message-ID<mailman.341.1462276849.32212.python-list@python.org>
In reply to#108057
On Tue, May 3, 2016 at 9:25 PM, Jussi Piitulainen
<jussi.piitulainen@helsinki.fi> wrote:
> Chris Angelico writes:
>
>> This assumes, of course, that there is a function swapcase which can
>> return a string with case inverted. I'm not sure such a function
>> exists.
>
>    str.swapcase("foO")
>    'FOo'

I suppose for this discussion it doesn't matter if it's imperfect.

>>> "\N{ANGSTROM SIGN}".swapcase().swapcase() == "\N{ANGSTROM SIGN}"
False
>>> "\N{LATIN SMALL LETTER SHARP S}".swapcase().swapcase()
'ss'

But drawing the analogy with the negation of real numbers implies
something that doesn't exist.

ChrisA

[toc] | [prev] | [next] | [standalone]


#108065

FromDFS <nospam@dfs.com>
Date2016-05-03 09:01 -0400
Message-ID<nga79f$gau$1@dont-email.me>
In reply to#108062
On 5/3/2016 8:00 AM, Chris Angelico wrote:
> On Tue, May 3, 2016 at 9:25 PM, Jussi Piitulainen
> <jussi.piitulainen@helsinki.fi> wrote:
>> Chris Angelico writes:
>>
>>> This assumes, of course, that there is a function swapcase which can
>>> return a string with case inverted. I'm not sure such a function
>>> exists.
>>
>>    str.swapcase("foO")
>>    'FOo'
>
> I suppose for this discussion it doesn't matter if it's imperfect.


What was imperfect?


[toc] | [prev] | [next] | [standalone]


#108066

FromChris Angelico <rosuav@gmail.com>
Date2016-05-03 23:13 +1000
Message-ID<mailman.344.1462281210.32212.python-list@python.org>
In reply to#108065
On Tue, May 3, 2016 at 11:01 PM, DFS <nospam@dfs.com> wrote:
> On 5/3/2016 8:00 AM, Chris Angelico wrote:
>>
>> On Tue, May 3, 2016 at 9:25 PM, Jussi Piitulainen
>> <jussi.piitulainen@helsinki.fi> wrote:
>>>
>>> Chris Angelico writes:
>>>
>>>> This assumes, of course, that there is a function swapcase which can
>>>> return a string with case inverted. I'm not sure such a function
>>>> exists.
>>>
>>>
>>>    str.swapcase("foO")
>>>    'FOo'
>>
>>
>> I suppose for this discussion it doesn't matter if it's imperfect.
>
>
>
> What was imperfect?

It doesn't invert, the way numeric negation does. And if you try to
define exactly what it does, you'll come right back to
isupper()/islower(), so it's not much help in defining those.

ChrisA

[toc] | [prev] | [next] | [standalone]


#108068

FromDFS <nospam@dfs.com>
Date2016-05-03 09:19 -0400
Message-ID<nga89p$ku2$1@dont-email.me>
In reply to#108066
On 5/3/2016 9:13 AM, Chris Angelico wrote:
> On Tue, May 3, 2016 at 11:01 PM, DFS <nospam@dfs.com> wrote:
>> On 5/3/2016 8:00 AM, Chris Angelico wrote:
>>>
>>> On Tue, May 3, 2016 at 9:25 PM, Jussi Piitulainen
>>> <jussi.piitulainen@helsinki.fi> wrote:
>>>>
>>>> Chris Angelico writes:
>>>>
>>>>> This assumes, of course, that there is a function swapcase which can
>>>>> return a string with case inverted. I'm not sure such a function
>>>>> exists.
>>>>
>>>>
>>>>    str.swapcase("foO")
>>>>    'FOo'
>>>
>>>
>>> I suppose for this discussion it doesn't matter if it's imperfect.
>>
>>
>>
>> What was imperfect?
>
> It doesn't invert, the way numeric negation does.


What do you mean by 'case inverted'?

It looks like it swaps the case correctly between upper and lower.




> And if you try to
> define exactly what it does, you'll come right back to
> isupper()/islower(), so it's not much help in defining those.
>
> ChrisA




[toc] | [prev] | [next] | [standalone]


#108070

FromChris Angelico <rosuav@gmail.com>
Date2016-05-03 23:23 +1000
Message-ID<mailman.345.1462281819.32212.python-list@python.org>
In reply to#108068
On Tue, May 3, 2016 at 11:19 PM, DFS <nospam@dfs.com> wrote:
> What do you mean by 'case inverted'?
>
> It looks like it swaps the case correctly between upper and lower.

I gave two examples in my previous post. Did you read them? You
trimmed them from the quote.

ChrisA

[toc] | [prev] | [next] | [standalone]


#108074

FromJussi Piitulainen <jussi.piitulainen@helsinki.fi>
Date2016-05-03 17:49 +0300
Message-ID<lf5eg9jdwmr.fsf@ling.helsinki.fi>
In reply to#108068
DFS writes:

> On 5/3/2016 9:13 AM, Chris Angelico wrote:

>> It doesn't invert, the way numeric negation does.
>
> What do you mean by 'case inverted'?
>
> It looks like it swaps the case correctly between upper and lower.

There's letters that do not come in exact pairs of upper and lower case,
so _some_ swaps are not invertible: you swap twice and end up somewhere
else than your starting point.

The "\N{ANSGTROM SIGN}" looks like the Swedish upper-case
a-with-ring-above but isn't the same character, yet Python swaps its
case to the actual lower-case a-with-ring above. It can't go back to
_both_ the Angstrom sign and the actual upper case letter.

(Not sure why the sign is considered a cased letter at all.)

[toc] | [prev] | [next] | [standalone]


#108075

FromDFS <nospam@dfs.com>
Date2016-05-03 11:12 -0400
Message-ID<ngaev1$fv0$1@dont-email.me>
In reply to#108074
On 5/3/2016 10:49 AM, Jussi Piitulainen wrote:
> DFS writes:
>
>> On 5/3/2016 9:13 AM, Chris Angelico wrote:
>
>>> It doesn't invert, the way numeric negation does.
>>
>> What do you mean by 'case inverted'?
>>
>> It looks like it swaps the case correctly between upper and lower.
>
> There's letters that do not come in exact pairs of upper and lower case,
> so _some_ swaps are not invertible: you swap twice and end up somewhere
> else than your starting point.
>
> The "\N{ANSGTROM SIGN}" looks like the Swedish upper-case
> a-with-ring-above but isn't the same character, yet Python swaps its
> case to the actual lower-case a-with-ring above. It can't go back to
> _both_ the Angstrom sign and the actual upper case letter.
>
> (Not sure why the sign is considered a cased letter at all.)


Thanks for the explanation.

Does that mean:

lower(Å) != å ?

and

upper(å) != Å ?

[toc] | [prev] | [next] | [standalone]


#108076

FromJussi Piitulainen <jussi.piitulainen@helsinki.fi>
Date2016-05-03 18:27 +0300
Message-ID<lf560uvduwe.fsf@ling.helsinki.fi>
In reply to#108075
DFS writes:

> On 5/3/2016 10:49 AM, Jussi Piitulainen wrote:
>> DFS writes:
>>
>>> On 5/3/2016 9:13 AM, Chris Angelico wrote:
>>
>>>> It doesn't invert, the way numeric negation does.
>>>
>>> What do you mean by 'case inverted'?
>>>
>>> It looks like it swaps the case correctly between upper and lower.
>>
>> There's letters that do not come in exact pairs of upper and lower case,
>> so _some_ swaps are not invertible: you swap twice and end up somewhere
>> else than your starting point.
>>
>> The "\N{ANSGTROM SIGN}" looks like the Swedish upper-case
>> a-with-ring-above but isn't the same character, yet Python swaps its
>> case to the actual lower-case a-with-ring above. It can't go back to
>> _both_ the Angstrom sign and the actual upper case letter.
>>
>> (Not sure why the sign is considered a cased letter at all.)
>
>
> Thanks for the explanation.
>
> Does that mean:
>
> lower(Å) != å ?
>
> and
>
> upper(å) != Å ?

It means "\N{ANGSTROM SIGN}" != "Å", yet both lower to "å", which then
uppers back to "Å" (U+00c5).

The Ångström sign (U+212b) looks like this: Å. Indistinguishable from Å
in the font that I'm seeing - for all I know, it's the same glyph.

[toc] | [prev] | [next] | [standalone]


#108077

FromGrant Edwards <grant.b.edwards@gmail.com>
Date2016-05-03 15:42 +0000
Message-ID<mailman.347.1462290169.32212.python-list@python.org>
In reply to#108076
On 2016-05-03, Jussi Piitulainen <jussi.piitulainen@helsinki.fi> wrote:

>> Does that mean:
>>
>> lower(Å) != å ?
>>
>> and
>>
>> upper(å) != Å ?
>
> It means "\N{ANGSTROM SIGN}" != "Å", yet both lower to "å", which then
> uppers back to "Å" (U+00c5).
>
> The Ångström sign (U+212b) looks like this: Å. Indistinguishable from Å
> in the font that I'm seeing - for all I know, it's the same glyph.

Interesting. FWIW, Å and Å definitely look different with the terminal
and font I'm using (urxvt with -misc-fixed-medium-r-normal-*-18-120-*-*-*-90-iso10646-*)

Expecting upper/lower operations to be 100% invertible is probably a
ASCII-centric mindset that will falls over as soon as you start
dealing with non-ASCII encodings.

-- 
Grant Edwards               grant.b.edwards        Yow! Xerox your lunch
                                  at               and file it under "sex
                              gmail.com            offenders"!

[toc] | [prev] | [next] | [standalone]


#108081

FromTerry Reedy <tjreedy@udel.edu>
Date2016-05-03 12:37 -0400
Message-ID<mailman.350.1462293512.32212.python-list@python.org>
In reply to#108076
On 5/3/2016 11:42 AM, Grant Edwards wrote:

> Interesting. FWIW, Å and Å definitely look different with the terminal
> and font I'm using (urxvt with -misc-fixed-medium-r-normal-*-18-120-*-*-*-90-iso10646-*)

In the fixed pitch font used by Thunderbird (Courier?), Angstrom Å has 
the circle touching the A while letter Å has the circle spaced above.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#108113

FromSteven D'Aprano <steve@pearwood.info>
Date2016-05-04 13:28 +1000
Message-ID<57296c7a$0$1589$c3e8da3$5496439d@news.astraweb.com>
In reply to#108074
On Wed, 4 May 2016 12:49 am, Jussi Piitulainen wrote:

> DFS writes:
> 
>> On 5/3/2016 9:13 AM, Chris Angelico wrote:
> 
>>> It doesn't invert, the way numeric negation does.
>>
>> What do you mean by 'case inverted'?
>>
>> It looks like it swaps the case correctly between upper and lower.
> 
> There's letters that do not come in exact pairs of upper and lower case,

Languages with two distinct lettercases, like English, are called bicameral.
The two cases are technically called majuscule and minuscule, but
colloquially known as uppercase and lowercase since movable type printers
traditionally used to keep the majuscule letters in a drawer above the
minuscule letters.

Many alphabets are unicameral, that is, they only have a single lettercase.
Examples include Hebrew, Arabic, Hangul, and many others. Georgian is an
interesting example, as it is the only known written alphabet that started
as a bicameral script and then became unicameral.

Consequently, many letters are neither upper nor lower case, and have
Unicode category "Letter other":

py> c = u'\N{ARABIC LETTER FEH}'
py> unicodedata.category(c)
'Lo'
py> c.isalpha()
True
py> c.isupper()
False
py> c.islower()
False


Even among bicameral alphabets, there are a few anomalies. The three most
obvious ones are Greek sigma, German Eszett (or "sharp S") and Turkish I.

(1) The Greek sigma is usually written as Σ or σ in uppercase and lowercase
respectively, but at the end of a word, lowercase sigma is written as ς.

(This final sigma is sometimes called "stigma", but should not be confused
with the archaic Greek letter stigma, which has two cases Ϛ ϛ, at least
when it is not being written as digamma Ϝϝ -- and if you're confused, so
are the Greeks :-)

Python 3.3 correctly handles the sigma/final sigma when upper- and
lowercasing:

py> 'ΘΠΣΤΣ'.lower()
'θπστς'

py> 'ΘΠΣΤΣ'.lower().upper()
'ΘΠΣΤΣ'



(2) The German Eszett ß traditionally existed in only lowercase forms, but
despite the existence of an uppercase form since at least the 19th century,
when the Germans moved away from blackletter to Roman-style letters, the
uppercase form was left out. In recent years, printers in Germany have
started to reintroduce an uppercase version, and the German government have
standardized on its use for placenames, but not other words.

(Aside: in Germany, ß is not considered a distinct letter of the alphabet,
but a ligature of ss; historically it derived from a ligature of ſs, ſz or
ſʒ. The funny characters you may or may not be able to see are the long-S
and round-Z.)

Python follows common, but not universal, German practice for eszett:

py> 'ẞ'.lower()
'ß'
py> 'ß'.upper()
'SS'

Note that this is lossy: given a name like "STRASSER", it is impossible to
tell whether it should be title-cased to "Strasser" or "Straßer". It also
means that uppercasing a string can make it longer.


For more on the uppercase eszett, see:

https://typography.guru/journal/germanys-new-character/
https://typography.guru/journal/how-to-draw-a-capital-sharp-s-r18/


(3) In most Latin alphabets, the lowercase i and j have a "tittle" diacritic
on them, but not the uppercase forms I and J. Turkish and a few other
languages have both I-with-tittle and I-without-tittle.

(As far as I know, there is no language with a dotless J.)

So in Turkish, the correct uppercase to lowercase and back again should go:

Dotless I: I -> ı -> I

Dotted I: İ -> i -> İ

Python does not quite manage to handle this correctly for Turkish
applications, since it loses the dotted/dotless distinction:

py> 'ı'.upper()
'I'
py> 'İ'.lower()
'i'

and further case conversions follow the non-Turkish rules.

Note that sometimes getting this wrong can have serious consequences:

http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#108131

FromDFS <nospam@dfs.com>
Date2016-05-04 10:09 -0400
Message-ID<ngcvjm$3ou$1@dont-email.me>
In reply to#108113
On 5/3/2016 11:28 PM, Steven D'Aprano wrote:
> On Wed, 4 May 2016 12:49 am, Jussi Piitulainen wrote:
>
>> DFS writes:
>>
>>> On 5/3/2016 9:13 AM, Chris Angelico wrote:
>>
>>>> It doesn't invert, the way numeric negation does.
>>>
>>> What do you mean by 'case inverted'?
>>>
>>> It looks like it swaps the case correctly between upper and lower.
>>
>> There's letters that do not come in exact pairs of upper and lower case,
>
> Languages with two distinct lettercases, like English, are called bicameral.
> The two cases are technically called majuscule and minuscule, but
> colloquially known as uppercase and lowercase since movable type printers
> traditionally used to keep the majuscule letters in a drawer above the
> minuscule letters.
>
> Many alphabets are unicameral, that is, they only have a single lettercase.
> Examples include Hebrew, Arabic, Hangul, and many others. Georgian is an
> interesting example, as it is the only known written alphabet that started
> as a bicameral script and then became unicameral.
>
> Consequently, many letters are neither upper nor lower case, and have
> Unicode category "Letter other":
>
> py> c = u'\N{ARABIC LETTER FEH}'
> py> unicodedata.category(c)
> 'Lo'
> py> c.isalpha()
> True
> py> c.isupper()
> False
> py> c.islower()
> False
>
>
> Even among bicameral alphabets, there are a few anomalies. The three most
> obvious ones are Greek sigma, German Eszett (or "sharp S") and Turkish I.
>
> (1) The Greek sigma is usually written as Σ or σ in uppercase and lowercase
> respectively, but at the end of a word, lowercase sigma is written as ς.
>
> (This final sigma is sometimes called "stigma", but should not be confused
> with the archaic Greek letter stigma, which has two cases Ϛ ϛ, at least
> when it is not being written as digamma Ϝϝ -- and if you're confused, so
> are the Greeks :-)
>
> Python 3.3 correctly handles the sigma/final sigma when upper- and
> lowercasing:
>
> py> 'ΘΠΣΤΣ'.lower()
> 'θπστς'
>
> py> 'ΘΠΣΤΣ'.lower().upper()
> 'ΘΠΣΤΣ'
>
>
>
> (2) The German Eszett ß traditionally existed in only lowercase forms, but
> despite the existence of an uppercase form since at least the 19th century,
> when the Germans moved away from blackletter to Roman-style letters, the
> uppercase form was left out. In recent years, printers in Germany have
> started to reintroduce an uppercase version, and the German government have
> standardized on its use for placenames, but not other words.
>
> (Aside: in Germany, ß is not considered a distinct letter of the alphabet,
> but a ligature of ss; historically it derived from a ligature of ſs, ſz or
> ſʒ. The funny characters you may or may not be able to see are the long-S
> and round-Z.)
>
> Python follows common, but not universal, German practice for eszett:
>
> py> 'ẞ'.lower()
> 'ß'
> py> 'ß'.upper()
> 'SS'
>
> Note that this is lossy: given a name like "STRASSER", it is impossible to
> tell whether it should be title-cased to "Strasser" or "Straßer". It also
> means that uppercasing a string can make it longer.
>
>
> For more on the uppercase eszett, see:
>
> https://typography.guru/journal/germanys-new-character/
> https://typography.guru/journal/how-to-draw-a-capital-sharp-s-r18/
>
>
> (3) In most Latin alphabets, the lowercase i and j have a "tittle" diacritic
> on them, but not the uppercase forms I and J. Turkish and a few other
> languages have both I-with-tittle and I-without-tittle.
>
> (As far as I know, there is no language with a dotless J.)
>
> So in Turkish, the correct uppercase to lowercase and back again should go:
>
> Dotless I: I -> ı -> I
>
> Dotted I: İ -> i -> İ
>
> Python does not quite manage to handle this correctly for Turkish
> applications, since it loses the dotted/dotless distinction:
>
> py> 'ı'.upper()
> 'I'
> py> 'İ'.lower()
> 'i'
>
> and further case conversions follow the non-Turkish rules.
>
> Note that sometimes getting this wrong can have serious consequences:
>
> http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail


Linguist much?

[toc] | [prev] | [next] | [standalone]


#108132

FromChris Angelico <rosuav@gmail.com>
Date2016-05-05 00:37 +1000
Message-ID<mailman.384.1462372639.32212.python-list@python.org>
In reply to#108131
On Thu, May 5, 2016 at 12:09 AM, DFS <nospam@dfs.com> wrote:
> On 5/3/2016 11:28 PM, Steven D'Aprano wrote:
>> [ lengthy piece about text, Unicode, and letter case ]
>
> Linguist much?

As an English-only speaker who writes code that needs to be used
around the world, you end up accruing tidbits of language and text
trivia in the form of edge cases that you need to remember to test.
Among them:

* Turkish dotless and dotted i
* Greek medial and final sigma
* German eszett
* Hebrew and Arabic right-to-left text
* Chinese non-BMP characters
* Combining characters (eg diacriticals starting U+0300)
* Non-characters eg U+FFFE

And then a post like Steven's basically comes from pulling up all
those from your memory, and maybe doing a spot of quick testing and/or
research to get some explanatory details. You don't have to be a
linguist, necessarily - just a competent debugger.

ChrisA

[toc] | [prev] | [next] | [standalone]


#108141

FromSteven D'Aprano <steve@pearwood.info>
Date2016-05-05 01:37 +1000
Message-ID<572a1752$0$1614$c3e8da3$5496439d@news.astraweb.com>
In reply to#108131
On Thu, 5 May 2016 12:09 am, DFS wrote:

> On 5/3/2016 11:28 PM, Steven D'Aprano wrote:

>> Languages with two distinct lettercases, like English, are called
>> bicameral.
[...] 

> Linguist much?


Possibly even a cunning one.




Somebody-had-to-say-it-ly y'rs,


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#108151

FromDFS <nospam@dfs.com>
Date2016-05-04 17:05 -0400
Message-ID<ngdnvh$416$2@dont-email.me>
In reply to#108141
On 5/4/2016 11:37 AM, Steven D'Aprano wrote:
> On Thu, 5 May 2016 12:09 am, DFS wrote:
>
>> On 5/3/2016 11:28 PM, Steven D'Aprano wrote:
>
>>> Languages with two distinct lettercases, like English, are called
>>> bicameral.
> [...]
>
>> Linguist much?
>
>
> Possibly even a cunning one.


I see you as more of a Colonel Angus.



[toc] | [prev] | [next] | [standalone]


Page 1 of 2  [1] 2  Next page →

Back to top | Article view | comp.lang.python


csiph-web