Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #107885 > unrolled thread
| Started by | Christopher Reimer <christopher_reimer@icloud.com> |
|---|---|
| First post | 2016-04-29 18:55 -0700 |
| Last post | 2016-05-04 19:06 +1000 |
| Articles | 20 on this page of 24 — 10 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Not x.islower() has different output than x.isupper() in list output... Christopher Reimer <christopher_reimer@icloud.com> - 2016-04-29 18:55 -0700
Re: Not x.islower() has different output than x.isupper() in list output... Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-04-30 17:47 +1200
Re: Not x.islower() has different output than x.isupper() in list output... pavlovevidence@gmail.com - 2016-05-03 03:00 -0700
Re: Not x.islower() has different output than x.isupper() in list output... Chris Angelico <rosuav@gmail.com> - 2016-05-03 20:25 +1000
Re: Not x.islower() has different output than x.isupper() in list output... Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-05-03 14:25 +0300
Re: Not x.islower() has different output than x.isupper() in list output... Chris Angelico <rosuav@gmail.com> - 2016-05-03 22:00 +1000
Re: Not x.islower() has different output than x.isupper() in list output... DFS <nospam@dfs.com> - 2016-05-03 09:01 -0400
Re: Not x.islower() has different output than x.isupper() in list output... Chris Angelico <rosuav@gmail.com> - 2016-05-03 23:13 +1000
Re: Not x.islower() has different output than x.isupper() in list output... DFS <nospam@dfs.com> - 2016-05-03 09:19 -0400
Re: Not x.islower() has different output than x.isupper() in list output... Chris Angelico <rosuav@gmail.com> - 2016-05-03 23:23 +1000
Re: Not x.islower() has different output than x.isupper() in list output... Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-05-03 17:49 +0300
Re: Not x.islower() has different output than x.isupper() in list output... DFS <nospam@dfs.com> - 2016-05-03 11:12 -0400
Re: Not x.islower() has different output than x.isupper() in list output... Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-05-03 18:27 +0300
Re: Not x.islower() has different output than x.isupper() in list output... Grant Edwards <grant.b.edwards@gmail.com> - 2016-05-03 15:42 +0000
Re: Not x.islower() has different output than x.isupper() in list output... Terry Reedy <tjreedy@udel.edu> - 2016-05-03 12:37 -0400
Re: Not x.islower() has different output than x.isupper() in list output... Steven D'Aprano <steve@pearwood.info> - 2016-05-04 13:28 +1000
Re: Not x.islower() has different output than x.isupper() in list output... DFS <nospam@dfs.com> - 2016-05-04 10:09 -0400
Re: Not x.islower() has different output than x.isupper() in list output... Chris Angelico <rosuav@gmail.com> - 2016-05-05 00:37 +1000
Re: Not x.islower() has different output than x.isupper() in list output... Steven D'Aprano <steve@pearwood.info> - 2016-05-05 01:37 +1000
Re: Not x.islower() has different output than x.isupper() in list output... DFS <nospam@dfs.com> - 2016-05-04 17:05 -0400
Re: Not x.islower() has different output than x.isupper() in list output... Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-05-03 17:42 +0300
Re: Not x.islower() has different output than x.isupper() in list output... Steven D'Aprano <steve@pearwood.info> - 2016-05-04 11:30 +1000
Re: Not x.islower() has different output than x.isupper() in list output... Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-05-04 20:34 +1200
Re: Not x.islower() has different output than x.isupper() in list output... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-05-04 19:06 +1000
Page 1 of 2 [1] 2 Next page →
| From | Christopher Reimer <christopher_reimer@icloud.com> |
|---|---|
| Date | 2016-04-29 18:55 -0700 |
| Subject | Re: Not x.islower() has different output than x.isupper() in list output... |
| Message-ID | <mailman.242.1461981344.32212.python-list@python.org> |
On 4/29/2016 6:29 PM, Stephen Hansen wrote: > If isupper/islower were perfect opposites of each-other, there'd be no > need for both. But since characters can be upper, lower, or *neither*, > you run into this situation. Based upon the official documentation, I was expecting perfect opposites. str.islower(): "Return true if all cased characters [4] in the string are lowercase and there is at least one cased character, false otherwise." https://docs.python.org/3/library/stdtypes.html?highlight=islower#str.islower str.isupper(): "Return true if all cased characters [4] in the string are uppercase and there is at least one cased character, false otherwise." https://docs.python.org/3/library/stdtypes.html?highlight=isupper#str.isupper Here's the footnote that may or not be relevant to this discussion: "[4] Cased characters are those with general category property being one of “Lu” (Letter, uppercase), “Ll” (Letter, lowercase), or “Lt” (Letter, titlecase)." A bug in the docs? Thank you, Chris R.
[toc] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2016-04-30 17:47 +1200 |
| Subject | Re: Not x.islower() has different output than x.isupper() in list output... |
| Message-ID | <doiv7uFegcvU1@mid.individual.net> |
| In reply to | #107885 |
Christopher Reimer wrote:
> str.islower(): "Return true if all cased characters [4] in the string
> are lowercase and there is at least one cased character, false otherwise."
>
> str.isupper(): "Return true if all cased characters [4] in the string
> are uppercase and there is at least one cased character, false otherwise."
A string consisting of a single space doesn't contain any
cased characters, so both islower(" ") and isupper(" ")
return false according to these rules. The docs are correct.
--
Greg
[toc] | [prev] | [next] | [standalone]
| From | pavlovevidence@gmail.com |
|---|---|
| Date | 2016-05-03 03:00 -0700 |
| Message-ID | <e1e5bfe4-7998-4cf7-a4f8-53cf5426c7c5@googlegroups.com> |
| In reply to | #107885 |
On Friday, April 29, 2016 at 6:55:56 PM UTC-7, Christopher Reimer wrote: > On 4/29/2016 6:29 PM, Stephen Hansen wrote: > > If isupper/islower were perfect opposites of each-other, there'd be no > > need for both. But since characters can be upper, lower, or *neither*, > > you run into this situation. > > Based upon the official documentation, I was expecting perfect opposites. > > str.islower(): "Return true if all cased characters [4] in the string > are lowercase and there is at least one cased character, false otherwise." > > https://docs.python.org/3/library/stdtypes.html?highlight=islower#str.islower > > str.isupper(): "Return true if all cased characters [4] in the string > are uppercase and there is at least one cased character, false otherwise." > > https://docs.python.org/3/library/stdtypes.html?highlight=isupper#str.isupper Just to take this discussion in a more pure logic direction. What you call perfect opposites (that is, the functions being negations of each other) is not what the similar wording in the documentation actually implies: you shouldn't have been expecting that. What you should have been expecting is a symmetry. Say you have a string G. islower(G) will return a certain result. Now take every letter in G and swap the case, and call that string g. isupper(g) will always return the same result is islower(G). More succinctly, for any string x, the following is always ture: islower(x) == isupper(swapcase(x)) But that is not the same thing, and does not imply, as the following identity (which it turns out is not always true, as we've seen): islower(x) == not isupper(x) Another example of functions that behave like this are ispositive and isnegative. The identity "ispositive(x) == isnegative(-x)" is always true. However, "ispositive(x) == not isnegative(x)" is false if x == 0. However, I can understand your confusion, because there are some pairs of functions where both identities are true, and if you've seen a few of them it's fairly easy for your intuition to overgeneralize a bit. An example I can think of offhand is iseven(x) and isodd(x), for any integer x. The identities "iseven(x) == isodd(x^1)" and "iseven(x) == not isodd(x)" are both always true. Carl Banks
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-05-03 20:25 +1000 |
| Message-ID | <mailman.339.1462271120.32212.python-list@python.org> |
| In reply to | #108052 |
On Tue, May 3, 2016 at 8:00 PM, <pavlovevidence@gmail.com> wrote: > > What you should have been expecting is a symmetry. Say you have a string G. islower(G) will return a certain result. Now take every letter in G and swap the case, and call that string g. isupper(g) will always return the same result is islower(G). > > More succinctly, for any string x, the following is always ture: > > islower(x) == isupper(swapcase(x)) > > But that is not the same thing, and does not imply, as the following identity (which it turns out is not always true, as we've seen): > > islower(x) == not isupper(x) > > > Another example of functions that behave like this are ispositive and isnegative. The identity "ispositive(x) == isnegative(-x)" is always true. However, "ispositive(x) == not isnegative(x)" is false if x == 0. > This assumes, of course, that there is a function swapcase which can return a string with case inverted. I'm not sure such a function exists. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Jussi Piitulainen <jussi.piitulainen@helsinki.fi> |
|---|---|
| Date | 2016-05-03 14:25 +0300 |
| Message-ID | <lf5r3dje63l.fsf@ling.helsinki.fi> |
| In reply to | #108054 |
Chris Angelico writes:
> This assumes, of course, that there is a function swapcase which can
> return a string with case inverted. I'm not sure such a function
> exists.
str.swapcase("foO")
'FOo'
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-05-03 22:00 +1000 |
| Message-ID | <mailman.341.1462276849.32212.python-list@python.org> |
| In reply to | #108057 |
On Tue, May 3, 2016 at 9:25 PM, Jussi Piitulainen
<jussi.piitulainen@helsinki.fi> wrote:
> Chris Angelico writes:
>
>> This assumes, of course, that there is a function swapcase which can
>> return a string with case inverted. I'm not sure such a function
>> exists.
>
> str.swapcase("foO")
> 'FOo'
I suppose for this discussion it doesn't matter if it's imperfect.
>>> "\N{ANGSTROM SIGN}".swapcase().swapcase() == "\N{ANGSTROM SIGN}"
False
>>> "\N{LATIN SMALL LETTER SHARP S}".swapcase().swapcase()
'ss'
But drawing the analogy with the negation of real numbers implies
something that doesn't exist.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2016-05-03 09:01 -0400 |
| Message-ID | <nga79f$gau$1@dont-email.me> |
| In reply to | #108062 |
On 5/3/2016 8:00 AM, Chris Angelico wrote:
> On Tue, May 3, 2016 at 9:25 PM, Jussi Piitulainen
> <jussi.piitulainen@helsinki.fi> wrote:
>> Chris Angelico writes:
>>
>>> This assumes, of course, that there is a function swapcase which can
>>> return a string with case inverted. I'm not sure such a function
>>> exists.
>>
>> str.swapcase("foO")
>> 'FOo'
>
> I suppose for this discussion it doesn't matter if it's imperfect.
What was imperfect?
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-05-03 23:13 +1000 |
| Message-ID | <mailman.344.1462281210.32212.python-list@python.org> |
| In reply to | #108065 |
On Tue, May 3, 2016 at 11:01 PM, DFS <nospam@dfs.com> wrote:
> On 5/3/2016 8:00 AM, Chris Angelico wrote:
>>
>> On Tue, May 3, 2016 at 9:25 PM, Jussi Piitulainen
>> <jussi.piitulainen@helsinki.fi> wrote:
>>>
>>> Chris Angelico writes:
>>>
>>>> This assumes, of course, that there is a function swapcase which can
>>>> return a string with case inverted. I'm not sure such a function
>>>> exists.
>>>
>>>
>>> str.swapcase("foO")
>>> 'FOo'
>>
>>
>> I suppose for this discussion it doesn't matter if it's imperfect.
>
>
>
> What was imperfect?
It doesn't invert, the way numeric negation does. And if you try to
define exactly what it does, you'll come right back to
isupper()/islower(), so it's not much help in defining those.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2016-05-03 09:19 -0400 |
| Message-ID | <nga89p$ku2$1@dont-email.me> |
| In reply to | #108066 |
On 5/3/2016 9:13 AM, Chris Angelico wrote:
> On Tue, May 3, 2016 at 11:01 PM, DFS <nospam@dfs.com> wrote:
>> On 5/3/2016 8:00 AM, Chris Angelico wrote:
>>>
>>> On Tue, May 3, 2016 at 9:25 PM, Jussi Piitulainen
>>> <jussi.piitulainen@helsinki.fi> wrote:
>>>>
>>>> Chris Angelico writes:
>>>>
>>>>> This assumes, of course, that there is a function swapcase which can
>>>>> return a string with case inverted. I'm not sure such a function
>>>>> exists.
>>>>
>>>>
>>>> str.swapcase("foO")
>>>> 'FOo'
>>>
>>>
>>> I suppose for this discussion it doesn't matter if it's imperfect.
>>
>>
>>
>> What was imperfect?
>
> It doesn't invert, the way numeric negation does.
What do you mean by 'case inverted'?
It looks like it swaps the case correctly between upper and lower.
> And if you try to
> define exactly what it does, you'll come right back to
> isupper()/islower(), so it's not much help in defining those.
>
> ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-05-03 23:23 +1000 |
| Message-ID | <mailman.345.1462281819.32212.python-list@python.org> |
| In reply to | #108068 |
On Tue, May 3, 2016 at 11:19 PM, DFS <nospam@dfs.com> wrote: > What do you mean by 'case inverted'? > > It looks like it swaps the case correctly between upper and lower. I gave two examples in my previous post. Did you read them? You trimmed them from the quote. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Jussi Piitulainen <jussi.piitulainen@helsinki.fi> |
|---|---|
| Date | 2016-05-03 17:49 +0300 |
| Message-ID | <lf5eg9jdwmr.fsf@ling.helsinki.fi> |
| In reply to | #108068 |
DFS writes:
> On 5/3/2016 9:13 AM, Chris Angelico wrote:
>> It doesn't invert, the way numeric negation does.
>
> What do you mean by 'case inverted'?
>
> It looks like it swaps the case correctly between upper and lower.
There's letters that do not come in exact pairs of upper and lower case,
so _some_ swaps are not invertible: you swap twice and end up somewhere
else than your starting point.
The "\N{ANSGTROM SIGN}" looks like the Swedish upper-case
a-with-ring-above but isn't the same character, yet Python swaps its
case to the actual lower-case a-with-ring above. It can't go back to
_both_ the Angstrom sign and the actual upper case letter.
(Not sure why the sign is considered a cased letter at all.)
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2016-05-03 11:12 -0400 |
| Message-ID | <ngaev1$fv0$1@dont-email.me> |
| In reply to | #108074 |
On 5/3/2016 10:49 AM, Jussi Piitulainen wrote:
> DFS writes:
>
>> On 5/3/2016 9:13 AM, Chris Angelico wrote:
>
>>> It doesn't invert, the way numeric negation does.
>>
>> What do you mean by 'case inverted'?
>>
>> It looks like it swaps the case correctly between upper and lower.
>
> There's letters that do not come in exact pairs of upper and lower case,
> so _some_ swaps are not invertible: you swap twice and end up somewhere
> else than your starting point.
>
> The "\N{ANSGTROM SIGN}" looks like the Swedish upper-case
> a-with-ring-above but isn't the same character, yet Python swaps its
> case to the actual lower-case a-with-ring above. It can't go back to
> _both_ the Angstrom sign and the actual upper case letter.
>
> (Not sure why the sign is considered a cased letter at all.)
Thanks for the explanation.
Does that mean:
lower(Å) != å ?
and
upper(å) != Å ?
[toc] | [prev] | [next] | [standalone]
| From | Jussi Piitulainen <jussi.piitulainen@helsinki.fi> |
|---|---|
| Date | 2016-05-03 18:27 +0300 |
| Message-ID | <lf560uvduwe.fsf@ling.helsinki.fi> |
| In reply to | #108075 |
DFS writes:
> On 5/3/2016 10:49 AM, Jussi Piitulainen wrote:
>> DFS writes:
>>
>>> On 5/3/2016 9:13 AM, Chris Angelico wrote:
>>
>>>> It doesn't invert, the way numeric negation does.
>>>
>>> What do you mean by 'case inverted'?
>>>
>>> It looks like it swaps the case correctly between upper and lower.
>>
>> There's letters that do not come in exact pairs of upper and lower case,
>> so _some_ swaps are not invertible: you swap twice and end up somewhere
>> else than your starting point.
>>
>> The "\N{ANSGTROM SIGN}" looks like the Swedish upper-case
>> a-with-ring-above but isn't the same character, yet Python swaps its
>> case to the actual lower-case a-with-ring above. It can't go back to
>> _both_ the Angstrom sign and the actual upper case letter.
>>
>> (Not sure why the sign is considered a cased letter at all.)
>
>
> Thanks for the explanation.
>
> Does that mean:
>
> lower(Å) != å ?
>
> and
>
> upper(å) != Å ?
It means "\N{ANGSTROM SIGN}" != "Å", yet both lower to "å", which then
uppers back to "Å" (U+00c5).
The Ångström sign (U+212b) looks like this: Å. Indistinguishable from Å
in the font that I'm seeing - for all I know, it's the same glyph.
[toc] | [prev] | [next] | [standalone]
| From | Grant Edwards <grant.b.edwards@gmail.com> |
|---|---|
| Date | 2016-05-03 15:42 +0000 |
| Message-ID | <mailman.347.1462290169.32212.python-list@python.org> |
| In reply to | #108076 |
On 2016-05-03, Jussi Piitulainen <jussi.piitulainen@helsinki.fi> wrote:
>> Does that mean:
>>
>> lower(Å) != å ?
>>
>> and
>>
>> upper(å) != Å ?
>
> It means "\N{ANGSTROM SIGN}" != "Å", yet both lower to "å", which then
> uppers back to "Å" (U+00c5).
>
> The Ångström sign (U+212b) looks like this: Å. Indistinguishable from Å
> in the font that I'm seeing - for all I know, it's the same glyph.
Interesting. FWIW, Å and Å definitely look different with the terminal
and font I'm using (urxvt with -misc-fixed-medium-r-normal-*-18-120-*-*-*-90-iso10646-*)
Expecting upper/lower operations to be 100% invertible is probably a
ASCII-centric mindset that will falls over as soon as you start
dealing with non-ASCII encodings.
--
Grant Edwards grant.b.edwards Yow! Xerox your lunch
at and file it under "sex
gmail.com offenders"!
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2016-05-03 12:37 -0400 |
| Message-ID | <mailman.350.1462293512.32212.python-list@python.org> |
| In reply to | #108076 |
On 5/3/2016 11:42 AM, Grant Edwards wrote: > Interesting. FWIW, Å and Å definitely look different with the terminal > and font I'm using (urxvt with -misc-fixed-medium-r-normal-*-18-120-*-*-*-90-iso10646-*) In the fixed pitch font used by Thunderbird (Courier?), Angstrom Å has the circle touching the A while letter Å has the circle spaced above. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-05-04 13:28 +1000 |
| Message-ID | <57296c7a$0$1589$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #108074 |
On Wed, 4 May 2016 12:49 am, Jussi Piitulainen wrote:
> DFS writes:
>
>> On 5/3/2016 9:13 AM, Chris Angelico wrote:
>
>>> It doesn't invert, the way numeric negation does.
>>
>> What do you mean by 'case inverted'?
>>
>> It looks like it swaps the case correctly between upper and lower.
>
> There's letters that do not come in exact pairs of upper and lower case,
Languages with two distinct lettercases, like English, are called bicameral.
The two cases are technically called majuscule and minuscule, but
colloquially known as uppercase and lowercase since movable type printers
traditionally used to keep the majuscule letters in a drawer above the
minuscule letters.
Many alphabets are unicameral, that is, they only have a single lettercase.
Examples include Hebrew, Arabic, Hangul, and many others. Georgian is an
interesting example, as it is the only known written alphabet that started
as a bicameral script and then became unicameral.
Consequently, many letters are neither upper nor lower case, and have
Unicode category "Letter other":
py> c = u'\N{ARABIC LETTER FEH}'
py> unicodedata.category(c)
'Lo'
py> c.isalpha()
True
py> c.isupper()
False
py> c.islower()
False
Even among bicameral alphabets, there are a few anomalies. The three most
obvious ones are Greek sigma, German Eszett (or "sharp S") and Turkish I.
(1) The Greek sigma is usually written as Σ or σ in uppercase and lowercase
respectively, but at the end of a word, lowercase sigma is written as ς.
(This final sigma is sometimes called "stigma", but should not be confused
with the archaic Greek letter stigma, which has two cases Ϛ ϛ, at least
when it is not being written as digamma Ϝϝ -- and if you're confused, so
are the Greeks :-)
Python 3.3 correctly handles the sigma/final sigma when upper- and
lowercasing:
py> 'ΘΠΣΤΣ'.lower()
'θπστς'
py> 'ΘΠΣΤΣ'.lower().upper()
'ΘΠΣΤΣ'
(2) The German Eszett ß traditionally existed in only lowercase forms, but
despite the existence of an uppercase form since at least the 19th century,
when the Germans moved away from blackletter to Roman-style letters, the
uppercase form was left out. In recent years, printers in Germany have
started to reintroduce an uppercase version, and the German government have
standardized on its use for placenames, but not other words.
(Aside: in Germany, ß is not considered a distinct letter of the alphabet,
but a ligature of ss; historically it derived from a ligature of ſs, ſz or
ſʒ. The funny characters you may or may not be able to see are the long-S
and round-Z.)
Python follows common, but not universal, German practice for eszett:
py> 'ẞ'.lower()
'ß'
py> 'ß'.upper()
'SS'
Note that this is lossy: given a name like "STRASSER", it is impossible to
tell whether it should be title-cased to "Strasser" or "Straßer". It also
means that uppercasing a string can make it longer.
For more on the uppercase eszett, see:
https://typography.guru/journal/germanys-new-character/
https://typography.guru/journal/how-to-draw-a-capital-sharp-s-r18/
(3) In most Latin alphabets, the lowercase i and j have a "tittle" diacritic
on them, but not the uppercase forms I and J. Turkish and a few other
languages have both I-with-tittle and I-without-tittle.
(As far as I know, there is no language with a dotless J.)
So in Turkish, the correct uppercase to lowercase and back again should go:
Dotless I: I -> ı -> I
Dotted I: İ -> i -> İ
Python does not quite manage to handle this correctly for Turkish
applications, since it loses the dotted/dotless distinction:
py> 'ı'.upper()
'I'
py> 'İ'.lower()
'i'
and further case conversions follow the non-Turkish rules.
Note that sometimes getting this wrong can have serious consequences:
http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2016-05-04 10:09 -0400 |
| Message-ID | <ngcvjm$3ou$1@dont-email.me> |
| In reply to | #108113 |
On 5/3/2016 11:28 PM, Steven D'Aprano wrote:
> On Wed, 4 May 2016 12:49 am, Jussi Piitulainen wrote:
>
>> DFS writes:
>>
>>> On 5/3/2016 9:13 AM, Chris Angelico wrote:
>>
>>>> It doesn't invert, the way numeric negation does.
>>>
>>> What do you mean by 'case inverted'?
>>>
>>> It looks like it swaps the case correctly between upper and lower.
>>
>> There's letters that do not come in exact pairs of upper and lower case,
>
> Languages with two distinct lettercases, like English, are called bicameral.
> The two cases are technically called majuscule and minuscule, but
> colloquially known as uppercase and lowercase since movable type printers
> traditionally used to keep the majuscule letters in a drawer above the
> minuscule letters.
>
> Many alphabets are unicameral, that is, they only have a single lettercase.
> Examples include Hebrew, Arabic, Hangul, and many others. Georgian is an
> interesting example, as it is the only known written alphabet that started
> as a bicameral script and then became unicameral.
>
> Consequently, many letters are neither upper nor lower case, and have
> Unicode category "Letter other":
>
> py> c = u'\N{ARABIC LETTER FEH}'
> py> unicodedata.category(c)
> 'Lo'
> py> c.isalpha()
> True
> py> c.isupper()
> False
> py> c.islower()
> False
>
>
> Even among bicameral alphabets, there are a few anomalies. The three most
> obvious ones are Greek sigma, German Eszett (or "sharp S") and Turkish I.
>
> (1) The Greek sigma is usually written as Σ or σ in uppercase and lowercase
> respectively, but at the end of a word, lowercase sigma is written as ς.
>
> (This final sigma is sometimes called "stigma", but should not be confused
> with the archaic Greek letter stigma, which has two cases Ϛ ϛ, at least
> when it is not being written as digamma Ϝϝ -- and if you're confused, so
> are the Greeks :-)
>
> Python 3.3 correctly handles the sigma/final sigma when upper- and
> lowercasing:
>
> py> 'ΘΠΣΤΣ'.lower()
> 'θπστς'
>
> py> 'ΘΠΣΤΣ'.lower().upper()
> 'ΘΠΣΤΣ'
>
>
>
> (2) The German Eszett ß traditionally existed in only lowercase forms, but
> despite the existence of an uppercase form since at least the 19th century,
> when the Germans moved away from blackletter to Roman-style letters, the
> uppercase form was left out. In recent years, printers in Germany have
> started to reintroduce an uppercase version, and the German government have
> standardized on its use for placenames, but not other words.
>
> (Aside: in Germany, ß is not considered a distinct letter of the alphabet,
> but a ligature of ss; historically it derived from a ligature of ſs, ſz or
> ſʒ. The funny characters you may or may not be able to see are the long-S
> and round-Z.)
>
> Python follows common, but not universal, German practice for eszett:
>
> py> 'ẞ'.lower()
> 'ß'
> py> 'ß'.upper()
> 'SS'
>
> Note that this is lossy: given a name like "STRASSER", it is impossible to
> tell whether it should be title-cased to "Strasser" or "Straßer". It also
> means that uppercasing a string can make it longer.
>
>
> For more on the uppercase eszett, see:
>
> https://typography.guru/journal/germanys-new-character/
> https://typography.guru/journal/how-to-draw-a-capital-sharp-s-r18/
>
>
> (3) In most Latin alphabets, the lowercase i and j have a "tittle" diacritic
> on them, but not the uppercase forms I and J. Turkish and a few other
> languages have both I-with-tittle and I-without-tittle.
>
> (As far as I know, there is no language with a dotless J.)
>
> So in Turkish, the correct uppercase to lowercase and back again should go:
>
> Dotless I: I -> ı -> I
>
> Dotted I: İ -> i -> İ
>
> Python does not quite manage to handle this correctly for Turkish
> applications, since it loses the dotted/dotless distinction:
>
> py> 'ı'.upper()
> 'I'
> py> 'İ'.lower()
> 'i'
>
> and further case conversions follow the non-Turkish rules.
>
> Note that sometimes getting this wrong can have serious consequences:
>
> http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail
Linguist much?
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-05-05 00:37 +1000 |
| Message-ID | <mailman.384.1462372639.32212.python-list@python.org> |
| In reply to | #108131 |
On Thu, May 5, 2016 at 12:09 AM, DFS <nospam@dfs.com> wrote: > On 5/3/2016 11:28 PM, Steven D'Aprano wrote: >> [ lengthy piece about text, Unicode, and letter case ] > > Linguist much? As an English-only speaker who writes code that needs to be used around the world, you end up accruing tidbits of language and text trivia in the form of edge cases that you need to remember to test. Among them: * Turkish dotless and dotted i * Greek medial and final sigma * German eszett * Hebrew and Arabic right-to-left text * Chinese non-BMP characters * Combining characters (eg diacriticals starting U+0300) * Non-characters eg U+FFFE And then a post like Steven's basically comes from pulling up all those from your memory, and maybe doing a spot of quick testing and/or research to get some explanatory details. You don't have to be a linguist, necessarily - just a competent debugger. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-05-05 01:37 +1000 |
| Message-ID | <572a1752$0$1614$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #108131 |
On Thu, 5 May 2016 12:09 am, DFS wrote: > On 5/3/2016 11:28 PM, Steven D'Aprano wrote: >> Languages with two distinct lettercases, like English, are called >> bicameral. [...] > Linguist much? Possibly even a cunning one. Somebody-had-to-say-it-ly y'rs, -- Steven
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2016-05-04 17:05 -0400 |
| Message-ID | <ngdnvh$416$2@dont-email.me> |
| In reply to | #108141 |
On 5/4/2016 11:37 AM, Steven D'Aprano wrote: > On Thu, 5 May 2016 12:09 am, DFS wrote: > >> On 5/3/2016 11:28 PM, Steven D'Aprano wrote: > >>> Languages with two distinct lettercases, like English, are called >>> bicameral. > [...] > >> Linguist much? > > > Possibly even a cunning one. I see you as more of a Colonel Angus.
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.lang.python
csiph-web