Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #50904 > unrolled thread

Share Code Tips

Started byDevyn Collier Johnson <devyncjohnson@gmail.com>
First post2013-07-19 09:51 -0400
Last post2013-07-20 08:20 -0400
Articles 20 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  Share Code Tips Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-19 09:51 -0400
    Re: Share Code Tips Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-19 17:59 +0000
      Re: Share Code Tips Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-19 18:08 -0400
        Re: Share Code Tips Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-20 03:18 +0000
          Re: Share Code Tips Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-20 06:07 -0400
      Re: Share Code Tips Chris Angelico <rosuav@gmail.com> - 2013-07-20 09:08 +1000
      Re: Share Code Tips Dave Angel <davea@davea.name> - 2013-07-19 19:09 -0400
      Re: Share Code Tips Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-19 21:04 -0400
        Re: Share Code Tips Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-20 03:44 +0000
          Re: Share Code Tips David Hutto <dwightdhutto@gmail.com> - 2013-07-20 00:15 -0400
          Re: Share Code Tips David Hutto <dwightdhutto@gmail.com> - 2013-07-20 00:22 -0400
          Re: Share Code Tips David Hutto <dwightdhutto@gmail.com> - 2013-07-20 00:26 -0400
          Re: Share Code Tips David Hutto <dwightdhutto@gmail.com> - 2013-07-20 00:27 -0400
          Re: Share Code Tips Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-20 06:10 -0400
          Re: Share Code Tips Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-20 08:36 -0400
      Re: Share Code Tips Chris Angelico <rosuav@gmail.com> - 2013-07-20 11:13 +1000
      Re: Share Code Tips Dave Angel <davea@davea.name> - 2013-07-19 21:51 -0400
      Re: Share Code Tips David Hutto <dwightdhutto@gmail.com> - 2013-07-19 23:42 -0400
      Re: Share Code Tips Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-20 06:06 -0400
      Re: Share Code Tips Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-20 08:20 -0400

#50904 — Share Code Tips

FromDevyn Collier Johnson <devyncjohnson@gmail.com>
Date2013-07-19 09:51 -0400
SubjectShare Code Tips
Message-ID<mailman.4868.1374241904.3114.python-list@python.org>
Aloha Python Users!

I have some coding tips and interesting functions that I want to share 
with all of you. I want to give other programmers ideas and inspiration. 
It is all Python3; most of it should work in Python2. I am a Unix/Linux 
person, so some of these will only work on Unix systems. Sorry Microsuck 
users :-D ;-)

All of the below Python3 code came from Neobot v0.8dev. I host an 
artificial intelligence program on Launchpad (LP Username: 
devyncjohnson-d). I have not released my Python version yet. The current 
version of Neobot (v0.7a) is written in BASH and Python3.

To emulate the Linux shell's date command, use this Python

function def DATE(): print(time.strftime("%a %B %d %H:%M:%S %Z %Y"))

Want an easy way to clear the terminal screen? Then try this:

def clr(): os.system(['clear','cls'][os.name == 'nt'])

Here are two Linux-only functions:

def GETRAM(): print(linecache.getline('/proc/meminfo', 
1).replace('MemTotal:', '').strip()) #Get Total RAM in kilobytes#
def KDE_VERSION(): print(subprocess.getoutput('kded4 --version | awk -F: 
\'NR == 2 {print $2}\'').strip()) ##Get KDE version##

Need a case-insensitive if-statement? Check this out:

if 'YOUR_STRING'.lower() in SOMEVAR.lower():

Have a Python XML browser and want to add awesome tags? This code would 
see if the code to be parsed contains chess tags. If so, then they are 
replaced with chess symbols. I know, many people hate trolls, but trolls 
are my best friends. Try this:

if '<chess_'.lower() in PTRNPRS.lower(): DATA = 
re.sub('<chess_white_king/>', '♔', PTRNPRS, flags=re.I); DATA = 
re.sub('<chess_white_queen/>', '♕', DATA, flags=re.I); DATA = 
re.sub('<chess_white_castle/>', '♖', DATA, flags=re.I); DATA = 
re.sub('<chess_white_bishop/>', '♗', DATA, flags=re.I); DATA = 
re.sub('<chess_white_knight/>', '♘', DATA, flags=re.I); DATA = 
re.sub('<chess_white_pawn/>', '♙', DATA, flags=re.I); DATA = 
re.sub('<chess_black_king/>', '♚', DATA, flags=re.I); DATA = 
re.sub('<chess_black_queen/>', '♛', DATA, flags=re.I); DATA = 
re.sub('<chess_black_castle/>', '♜', DATA, flags=re.I); DATA = 
re.sub('<chess_black_bishop/>', '♝', DATA, flags=re.I); DATA = 
re.sub('<chess_black_knight/>', '♞', DATA, flags=re.I); PTRNPRS = 
re.sub('<chess_black_pawn/>', '♟', DATA, flags=re.I)

For those of you making scripts to be run in a terminal, try this for a 
fancy terminal prompt:

INPUTTEMP = input('User ≻≻≻')


I may share more code later. Tell me what you think of my coding style 
and tips.


Mahalo,

Devyn Collier Johnson
DevynCJohnson@Gmail.com

[toc] | [next] | [standalone]


#50918

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-07-19 17:59 +0000
Message-ID<51e97e6e$0$29971$c3e8da3$5496439d@news.astraweb.com>
In reply to#50904
On Fri, 19 Jul 2013 09:51:23 -0400, Devyn Collier Johnson wrote:

> def KDE_VERSION():
>     print(subprocess.getoutput('kded4 --version | awk -F:
>     \'NR == 2 {print $2}\'').strip()) ##Get KDE version##

I run KDE 3, and the above does not work for me.

*half a wink*

By the way, a comment that doesn't tell you anything that you don't 
already know is worse than useless. The function is called "KDE_VERSION, 
what else would it do other than return the KDE version? 


x += 1  # add 1 to x

Worse than just being useless, redundant comments are dangerous, because 
as a general rule comments that don't say anything useful eventually 
become out-of-date, they become *inaccurate* rather than *redundant*, and 
that's worse than being useless.


> Need a case-insensitive if-statement? Check this out:
> 
> if 'YOUR_STRING'.lower() in SOMEVAR.lower():

Case-insensitivity is very hard. Take German for example:

STRASSE <-> straße

Or Turkish:

İ <-> i
I <-> ı


In Python 3.3, you should use casefold rather than lowercase or uppercase:

if some_string.casefold() in another_string.casefold(): ...


but even that can't always take into account localised rules, e.g. in 
German, you should not convert SS to ß for placenames or person names, so 
for example Herr Meißner and Herr Meissner are two different people. This 
is one of the motivating reasons for introducing the uppercase ß.

http://opentype.info/blog/2011/01/24/capital-sharp-s/



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#50923

FromDevyn Collier Johnson <devyncjohnson@gmail.com>
Date2013-07-19 18:08 -0400
Message-ID<mailman.4885.1374271729.3114.python-list@python.org>
In reply to#50918
On 07/19/2013 01:59 PM, Steven D'Aprano wrote:
> On Fri, 19 Jul 2013 09:51:23 -0400, Devyn Collier Johnson wrote:
>
>> def KDE_VERSION():
>>      print(subprocess.getoutput('kded4 --version | awk -F:
>>      \'NR == 2 {print $2}\'').strip()) ##Get KDE version##
> I run KDE 3, and the above does not work for me.
>
> *half a wink*
>
> By the way, a comment that doesn't tell you anything that you don't
> already know is worse than useless. The function is called "KDE_VERSION,
> what else would it do other than return the KDE version?
>
>
> x += 1  # add 1 to x
>
> Worse than just being useless, redundant comments are dangerous, because
> as a general rule comments that don't say anything useful eventually
> become out-of-date, they become *inaccurate* rather than *redundant*, and
> that's worse than being useless.
>
>
>> Need a case-insensitive if-statement? Check this out:
>>
>> if 'YOUR_STRING'.lower() in SOMEVAR.lower():
> Case-insensitivity is very hard. Take German for example:
>
> STRASSE <-> straße
>
> Or Turkish:
>
> İ <-> i
> I <-> ı
>
>
> In Python 3.3, you should use casefold rather than lowercase or uppercase:
>
> if some_string.casefold() in another_string.casefold(): ...
>
>
> but even that can't always take into account localised rules, e.g. in
> German, you should not convert SS to ß for placenames or person names, so
> for example Herr Meißner and Herr Meissner are two different people. This
> is one of the motivating reasons for introducing the uppercase ß.
>
> http://opentype.info/blog/2011/01/24/capital-sharp-s/
>
>
>
Steven, thanks for your interesting comments. Your emails are very 
insightful.

As for the KDE function, I should fix that. Thank you for catching that. 
Notice that the shell command in the function is "kded4". That would 
only check the version for the KDE4 series. The function will only work 
for KDE4 users. As for the comment, you would be amazed with the people 
that ask me "what does this do?". These people are redundant (^u^).

As for the case-insensitive if-statements, most code uses Latin letters. 
Making a case-insensitive-international if-statement would be 
interesting. I can tackle that later. For now, I only wanted to take 
care of Latin letters. I hope to figure something out for all characters.

Thank you for your reply. I found it to be very helpful.

Mahalo,
DCJ

[toc] | [prev] | [next] | [standalone]


#50940

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-07-20 03:18 +0000
Message-ID<51ea016e$0$29971$c3e8da3$5496439d@news.astraweb.com>
In reply to#50923
On Fri, 19 Jul 2013 18:08:43 -0400, Devyn Collier Johnson wrote:

> As for the case-insensitive if-statements, most code uses Latin letters.
> Making a case-insensitive-international if-statement would be
> interesting. I can tackle that later. For now, I only wanted to take
> care of Latin letters. I hope to figure something out for all
> characters.

As I showed, even for Latin letters, the trick of "if astring.lower() == 
bstring.lower()" doesn't *quite* work, although it can be "close enough" 
for some purposes. For example, some languages treat accents as mere 
guides to pronunciation, so ö == o, while other languages treat them as 
completely different letters. Same with ligatures: in modern English, æ 
should be treated as equal to ae, but in Old English, Danish, Norwegian 
and Icelandic it is a distinct letter.

Case-insensitive testing may be easier in many non-European languages, 
because they don't have cases.

A full solution to the problem of localized string matching requires 
expert knowledge for each language, but a 90% solution is pretty simple:

astring.casefold() == bstring.casefold()

or before version 3.3, just use lowercase. It's not a perfect solution, 
but it works reasonably well if you don't care about full localization.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#50956

FromDevyn Collier Johnson <devyncjohnson@gmail.com>
Date2013-07-20 06:07 -0400
Message-ID<mailman.4912.1374314863.3114.python-list@python.org>
In reply to#50940
On 07/19/2013 11:18 PM, Steven D'Aprano wrote:
> On Fri, 19 Jul 2013 18:08:43 -0400, Devyn Collier Johnson wrote:
>
>> As for the case-insensitive if-statements, most code uses Latin letters.
>> Making a case-insensitive-international if-statement would be
>> interesting. I can tackle that later. For now, I only wanted to take
>> care of Latin letters. I hope to figure something out for all
>> characters.
> As I showed, even for Latin letters, the trick of "if astring.lower() ==
> bstring.lower()" doesn't *quite* work, although it can be "close enough"
> for some purposes. For example, some languages treat accents as mere
> guides to pronunciation, so ö == o, while other languages treat them as
> completely different letters. Same with ligatures: in modern English, æ
> should be treated as equal to ae, but in Old English, Danish, Norwegian
> and Icelandic it is a distinct letter.
>
> Case-insensitive testing may be easier in many non-European languages,
> because they don't have cases.
>
> A full solution to the problem of localized string matching requires
> expert knowledge for each language, but a 90% solution is pretty simple:
>
> astring.casefold() == bstring.casefold()
>
> or before version 3.3, just use lowercase. It's not a perfect solution,
> but it works reasonably well if you don't care about full localization.
>
>
>
Thanks for the tips. I am learning a lot from this mailing list. I hope 
my code helped some people though.

Mahalo,
DCJ

[toc] | [prev] | [next] | [standalone]


#50929

FromChris Angelico <rosuav@gmail.com>
Date2013-07-20 09:08 +1000
Message-ID<mailman.4890.1374275325.3114.python-list@python.org>
In reply to#50918
On Sat, Jul 20, 2013 at 8:08 AM, Devyn Collier Johnson
<devyncjohnson@gmail.com> wrote:
> As for the case-insensitive if-statements, most code uses Latin letters.
> Making a case-insensitive-international if-statement would be interesting. I
> can tackle that later. For now, I only wanted to take care of Latin letters.
> I hope to figure something out for all characters.

Case insensitivity is a *hard* problem. Don't fool yourself that you
can do it with a simple line of code and have it 'just work'. All
you'll have is something that works "most of the time", and then
breaks on certain input. As Steven said, using casefold() rather than
lower() will help, but that's still not perfect. The simplest and
safest way to solve Unicode capitalization issues is to declare that
your protocol is case sensitive. I have a brother who couldn't
understand why Unix file systems have to be case sensitive (why would
anyone ever want to have "readme" and "README" in the same directory,
after all?), until I explained how majorly hard it is with i18n, and
how it suddenly becomes extremely unsafe for your *file system* to get
this wrong.

ChrisA

[toc] | [prev] | [next] | [standalone]


#50930

FromDave Angel <davea@davea.name>
Date2013-07-19 19:09 -0400
Message-ID<mailman.4891.1374275387.3114.python-list@python.org>
In reply to#50918
On 07/19/2013 06:08 PM, Devyn Collier Johnson wrote:
>
> On 07/19/2013 01:59 PM, Steven D'Aprano wrote:

      <snip>
>
> As for the case-insensitive if-statements, most code uses Latin letters.
> Making a case-insensitive-international if-statement would be
> interesting. I can tackle that later. For now, I only wanted to take
> care of Latin letters. I hope to figure something out for all characters.
>

Once Steven gave you the answer, what's to figure out?  You simply use 
casefold() instead of lower().  The only constraint is it's 3.3 and 
later, so you can't use it for anything earlier.

http://docs.python.org/3.3/library/stdtypes.html#str.casefold

"""
str.casefold()
Return a casefolded copy of the string. Casefolded strings may be used 
for caseless matching.

Casefolding is similar to lowercasing but more aggressive because it is 
intended to remove all case distinctions in a string. For example, the 
German lowercase letter 'ß' is equivalent to "ss". Since it is already 
lowercase, lower() would do nothing to 'ß'; casefold() converts it to "ss".

The casefolding algorithm is described in section 3.13 of the Unicode 
Standard.

New in version 3.3.
"""

-- 
DaveA

[toc] | [prev] | [next] | [standalone]


#50936

FromDevyn Collier Johnson <devyncjohnson@gmail.com>
Date2013-07-19 21:04 -0400
Message-ID<mailman.4894.1374282301.3114.python-list@python.org>
In reply to#50918
On 07/19/2013 07:09 PM, Dave Angel wrote:
> On 07/19/2013 06:08 PM, Devyn Collier Johnson wrote:
>>
>> On 07/19/2013 01:59 PM, Steven D'Aprano wrote:
>
>      <snip>
>>
>> As for the case-insensitive if-statements, most code uses Latin letters.
>> Making a case-insensitive-international if-statement would be
>> interesting. I can tackle that later. For now, I only wanted to take
>> care of Latin letters. I hope to figure something out for all 
>> characters.
>>
>
> Once Steven gave you the answer, what's to figure out?  You simply use 
> casefold() instead of lower().  The only constraint is it's 3.3 and 
> later, so you can't use it for anything earlier.
>
> http://docs.python.org/3.3/library/stdtypes.html#str.casefold
>
> """
> str.casefold()
> Return a casefolded copy of the string. Casefolded strings may be used 
> for caseless matching.
>
> Casefolding is similar to lowercasing but more aggressive because it 
> is intended to remove all case distinctions in a string. For example, 
> the German lowercase letter 'ß' is equivalent to "ss". Since it is 
> already lowercase, lower() would do nothing to 'ß'; casefold() 
> converts it to "ss".
>
> The casefolding algorithm is described in section 3.13 of the Unicode 
> Standard.
>
> New in version 3.3.
> """
>
Chris Angelico said that casefold is not perfect. In the future, I want 
to make the perfect international-case-insensitive if-statement. For 
now, my code only supports a limited range of characters. Even with 
casefold, I will have some issues as Chris Angelico mentioned. Also, "ß" 
is not really the same as "ss".

Mahalo,
DCJ

[toc] | [prev] | [next] | [standalone]


#50943

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-07-20 03:44 +0000
Message-ID<51ea07a4$0$29971$c3e8da3$5496439d@news.astraweb.com>
In reply to#50936
On Fri, 19 Jul 2013 21:04:55 -0400, Devyn Collier Johnson wrote:

> In the future, I want to
> make the perfect international-case-insensitive if-statement. For now,
> my code only supports a limited range of characters. Even with casefold,
> I will have some issues as Chris Angelico mentioned.

There are hundreds of written languages in the world, with thousands of 
characters, and most of them have rules about case-sensitivity and 
character normalization. For example, in Greek, lowercase Σ is σ except 
at the end of a word, when it is ς.

≻≻≻ 'Σσς'.upper()
'ΣΣΣ'
≻≻≻ 'Σσς'.lower()
'σσς'
≻≻≻ 'Σσς'.casefold()
'σσσ'


So in this case, casefold() correctly solves the problem, provided you 
are comparing modern Greek text. But if you're comparing text in some 
other language which merely happens to use Greek letters, but doesn't 
have the same rules about letter sigma, then it will be inappropriate. So 
you cannot write a single "perfect" case-insensitive comparison, the best 
you can hope for is to write dozens or hundreds of separate case-
insensitive comparisons, one for each language or family of languages.

For an introduction to the problem:

http://www.w3.org/International/wiki/Case_folding

http://www.unicode.org/faq/casemap_charprop.html




> Also, "ß" is not really the same as "ss".

Sometimes it is. Sometimes it isn't.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#50945

FromDavid Hutto <dwightdhutto@gmail.com>
Date2013-07-20 00:15 -0400
Message-ID<mailman.4902.1374293729.3114.python-list@python.org>
In reply to#50943

[Multipart message — attachments visible in raw view] — view raw

It seems, without utilizing this, or googling, that a case sensitive
library is either developed, or could be implemented by utilizing case
sensitive translation through a google translation page using an urlopener,
and placing in the data to be processed back to the boolean value. Never
attempted, but the algorithm seems simpler than the dozens of solutions
method.

[toc] | [prev] | [next] | [standalone]


#50947

FromDavid Hutto <dwightdhutto@gmail.com>
Date2013-07-20 00:22 -0400
Message-ID<mailman.4904.1374294126.3114.python-list@python.org>
In reply to#50943

[Multipart message — attachments visible in raw view] — view raw

It seems that you could use import re, in my mind's pseudo code, to compile
a translational usage of usernames/passwords that could remain case
sensitive by using just the translational dictionaries, and refining with
data input tests/unit tests.


On Sat, Jul 20, 2013 at 12:15 AM, David Hutto <dwightdhutto@gmail.com>wrote:

> It seems, without utilizing this, or googling, that a case sensitive
> library is either developed, or could be implemented by utilizing case
> sensitive translation through a google translation page using an urlopener,
> and placing in the data to be processed back to the boolean value. Never
> attempted, but the algorithm seems simpler than the dozens of solutions
> method.
>



-- 
Best Regards,
David Hutto
*CEO:* *http://www.hitwebdevelopment.com*

[toc] | [prev] | [next] | [standalone]


#50948

FromDavid Hutto <dwightdhutto@gmail.com>
Date2013-07-20 00:26 -0400
Message-ID<mailman.4905.1374294378.3114.python-list@python.org>
In reply to#50943

[Multipart message — attachments visible in raw view] — view raw

I didn't see that this was for a chess game. That seems more point and
click. Everyone can recognize a bishop from a queen, or a rook from a pawn.
So why would case sensitivity matter other than the 16 pieces on the board?
Or am I misunderstanding the question?



On Sat, Jul 20, 2013 at 12:22 AM, David Hutto <dwightdhutto@gmail.com>wrote:

> It seems that you could use import re, in my mind's pseudo code, to
> compile a translational usage of usernames/passwords that could remain case
> sensitive by using just the translational dictionaries, and refining with
> data input tests/unit tests.
>
>
> On Sat, Jul 20, 2013 at 12:15 AM, David Hutto <dwightdhutto@gmail.com>wrote:
>
>> It seems, without utilizing this, or googling, that a case sensitive
>> library is either developed, or could be implemented by utilizing case
>> sensitive translation through a google translation page using an urlopener,
>> and placing in the data to be processed back to the boolean value. Never
>> attempted, but the algorithm seems simpler than the dozens of solutions
>> method.
>>
>
>
>
> --
> Best Regards,
> David Hutto
> *CEO:* *http://www.hitwebdevelopment.com*
>



-- 
Best Regards,
David Hutto
*CEO:* *http://www.hitwebdevelopment.com*

[toc] | [prev] | [next] | [standalone]


#50949

FromDavid Hutto <dwightdhutto@gmail.com>
Date2013-07-20 00:27 -0400
Message-ID<mailman.4906.1374294454.3114.python-list@python.org>
In reply to#50943

[Multipart message — attachments visible in raw view] — view raw

32 if you count black, and white.


On Sat, Jul 20, 2013 at 12:26 AM, David Hutto <dwightdhutto@gmail.com>wrote:

> I didn't see that this was for a chess game. That seems more point and
> click. Everyone can recognize a bishop from a queen, or a rook from a pawn.
> So why would case sensitivity matter other than the 16 pieces on the board?
> Or am I misunderstanding the question?
>
>
>
> On Sat, Jul 20, 2013 at 12:22 AM, David Hutto <dwightdhutto@gmail.com>wrote:
>
>> It seems that you could use import re, in my mind's pseudo code, to
>> compile a translational usage of usernames/passwords that could remain case
>> sensitive by using just the translational dictionaries, and refining with
>> data input tests/unit tests.
>>
>>
>> On Sat, Jul 20, 2013 at 12:15 AM, David Hutto <dwightdhutto@gmail.com>wrote:
>>
>>> It seems, without utilizing this, or googling, that a case sensitive
>>> library is either developed, or could be implemented by utilizing case
>>> sensitive translation through a google translation page using an urlopener,
>>> and placing in the data to be processed back to the boolean value. Never
>>> attempted, but the algorithm seems simpler than the dozens of solutions
>>> method.
>>>
>>
>>
>>
>> --
>> Best Regards,
>> David Hutto
>> *CEO:* *http://www.hitwebdevelopment.com*
>>
>
>
>
> --
> Best Regards,
> David Hutto
> *CEO:* *http://www.hitwebdevelopment.com*
>



-- 
Best Regards,
David Hutto
*CEO:* *http://www.hitwebdevelopment.com*

[toc] | [prev] | [next] | [standalone]


#50957

FromDevyn Collier Johnson <devyncjohnson@gmail.com>
Date2013-07-20 06:10 -0400
Message-ID<mailman.4913.1374315056.3114.python-list@python.org>
In reply to#50943
On 07/19/2013 11:44 PM, Steven D'Aprano wrote:
> On Fri, 19 Jul 2013 21:04:55 -0400, Devyn Collier Johnson wrote:
>
>> In the future, I want to
>> make the perfect international-case-insensitive if-statement. For now,
>> my code only supports a limited range of characters. Even with casefold,
>> I will have some issues as Chris Angelico mentioned.
> There are hundreds of written languages in the world, with thousands of
> characters, and most of them have rules about case-sensitivity and
> character normalization. For example, in Greek, lowercase Σ is σ except
> at the end of a word, when it is ς.
>
> ≻≻≻ 'Σσς'.upper()
> 'ΣΣΣ'
> ≻≻≻ 'Σσς'.lower()
> 'σσς'
> ≻≻≻ 'Σσς'.casefold()
> 'σσσ'
>
>
> So in this case, casefold() correctly solves the problem, provided you
> are comparing modern Greek text. But if you're comparing text in some
> other language which merely happens to use Greek letters, but doesn't
> have the same rules about letter sigma, then it will be inappropriate. So
> you cannot write a single "perfect" case-insensitive comparison, the best
> you can hope for is to write dozens or hundreds of separate case-
> insensitive comparisons, one for each language or family of languages.
>
> For an introduction to the problem:
>
> http://www.w3.org/International/wiki/Case_folding
>
> http://www.unicode.org/faq/casemap_charprop.html
>
>
>
>
>> Also, "ß" is not really the same as "ss".
> Sometimes it is. Sometimes it isn't.
>
>
>
Wow, my if-statement is so imperfect! Thankfully, only English people 
will talk to an English chatbot (I hope), so for my use of the code, it 
will work.
Do the main Python3 developers plan to do something about this?

Mahalo,
DCJ

[toc] | [prev] | [next] | [standalone]


#50964

FromDevyn Collier Johnson <devyncjohnson@gmail.com>
Date2013-07-20 08:36 -0400
Message-ID<mailman.4920.1374323827.3114.python-list@python.org>
In reply to#50943

[Multipart message — attachments visible in raw view] — view raw

On 07/20/2013 12:26 AM, David Hutto wrote:
> I didn't see that this was for a chess game. That seems more point and 
> click. Everyone can recognize a bishop from a queen, or a rook from a 
> pawn. So why would case sensitivity matter other than the 16 pieces on 
> the board? Or am I misunderstanding the question?
>
>
>
> On Sat, Jul 20, 2013 at 12:22 AM, David Hutto <dwightdhutto@gmail.com 
> <mailto:dwightdhutto@gmail.com>> wrote:
>
>     It seems that you could use import re, in my mind's pseudo code,
>     to compile a translational usage of usernames/passwords that could
>     remain case sensitive by using just the translational
>     dictionaries, and refining with data input tests/unit tests.
>
>
>     On Sat, Jul 20, 2013 at 12:15 AM, David Hutto
>     <dwightdhutto@gmail.com <mailto:dwightdhutto@gmail.com>> wrote:
>
>         It seems, without utilizing this, or googling, that a case
>         sensitive library is either developed, or could be implemented
>         by utilizing case sensitive translation through a google
>         translation page using an urlopener, and placing in the data
>         to be processed back to the boolean value. Never attempted,
>         but the algorithm seems simpler than the dozens of solutions
>         method.
>
>
>
>
>     -- 
>     Best Regards,
>     David Hutto
>     /*CEO:*/ _http://www.hitwebdevelopment.com_
>
>
>
>
> -- 
> Best Regards,
> David Hutto
> /*CEO:*/ _http://www.hitwebdevelopment.com_
>
>
In the email, I am sharing various code snippets to give others ideas 
and inspiration for coding. I that particular snippet, I am giving 
Python3 programmers the idea of making chess tags on a HTML or XML 
interpreter. It would be neat to type a tag that would generate chess 
pieces instead of remembering the HTML ASCII code.

 From my understanding, that email is not being displayed correctly. Are 
all of the lines run together?

Thank you for asking. I want everyone to understand the purpose of the 
email and that particular snippet. Remember, assumption is the lowest 
form of knowledge.

Mahalo,
DCJ

[toc] | [prev] | [next] | [standalone]


#50937

FromChris Angelico <rosuav@gmail.com>
Date2013-07-20 11:13 +1000
Message-ID<mailman.4896.1374282806.3114.python-list@python.org>
In reply to#50918
On Sat, Jul 20, 2013 at 11:04 AM, Devyn Collier Johnson
<devyncjohnson@gmail.com> wrote:
>
> On 07/19/2013 07:09 PM, Dave Angel wrote:
>>
>> On 07/19/2013 06:08 PM, Devyn Collier Johnson wrote:
>>>
>>>
>>> On 07/19/2013 01:59 PM, Steven D'Aprano wrote:
>>
>>
>>      <snip>
>>>
>>>
>>> As for the case-insensitive if-statements, most code uses Latin letters.
>>> Making a case-insensitive-international if-statement would be
>>> interesting. I can tackle that later. For now, I only wanted to take
>>> care of Latin letters. I hope to figure something out for all characters.
>>>
>>
>> Once Steven gave you the answer, what's to figure out?  You simply use
>> casefold() instead of lower().  The only constraint is it's 3.3 and later,
>> so you can't use it for anything earlier.
>>
>> http://docs.python.org/3.3/library/stdtypes.html#str.casefold
>>
>> """
>> str.casefold()
>> Return a casefolded copy of the string. Casefolded strings may be used for
>> caseless matching.
>>
>> Casefolding is similar to lowercasing but more aggressive because it is
>> intended to remove all case distinctions in a string. For example, the
>> German lowercase letter 'ß' is equivalent to "ss". Since it is already
>> lowercase, lower() would do nothing to 'ß'; casefold() converts it to "ss".
>>
>> The casefolding algorithm is described in section 3.13 of the Unicode
>> Standard.
>>
>> New in version 3.3.
>> """
>>
> Chris Angelico said that casefold is not perfect. In the future, I want to
> make the perfect international-case-insensitive if-statement. For now, my
> code only supports a limited range of characters. Even with casefold, I will
> have some issues as Chris Angelico mentioned. Also, "ß" is not really the
> same as "ss".

Well, casefold is about as good as it's ever going to be, but that's
because "the perfect international-case-insensitive comparison" is a
fundamentally impossible goal. Your last sentence hints as to why;
there is no simple way to compare strings containing those characters,
because the correct treatment varies according to context.

Your two best options are: Be case sensitive (and then you need only
worry about composition and combining characters and all those
nightmares - the ones you have to worry about either way), or use
casefold(). Of those, I prefer the first, because it's safer; the
second is also a good option.

ChrisA

[toc] | [prev] | [next] | [standalone]


#50938

FromDave Angel <davea@davea.name>
Date2013-07-19 21:51 -0400
Message-ID<mailman.4897.1374285114.3114.python-list@python.org>
In reply to#50918
On 07/19/2013 09:04 PM, Devyn Collier Johnson wrote:
>

      <snip>
>>
> Chris Angelico said that casefold is not perfect. In the future, I want
> to make the perfect international-case-insensitive if-statement. For
> now, my code only supports a limited range of characters. Even with
> casefold, I will have some issues as Chris Angelico mentioned. Also, "ß"
> is not really the same as "ss".
>

Sure, the casefold() method has its problems.  But you're going to avoid 
using it till you can do a "perfect" one?

Perfect in what context?  For "case sensitively" comparing people's 
names in a single language in a single country?  Perhaps that can be 
made perfect.  For certain combinations of language and country.

But if you want to compare words in an unspecified language with an 
unspecified country, it cannot be done.

If you've got a particular goal in mind, great.  But as a library 
function, you're better off using the best standard method available, 
and document what its limitations are.  One way of documenting such is 
to quote the appropriate standards, with their caveats.


By the way, you mentioned earlier that you're restricting yourself to 
Latin characters.  The lower() method is inadequate for many of those as 
well.  Perhaps you meant ASCII instead.

-- 
DaveA

[toc] | [prev] | [next] | [standalone]


#50941

FromDavid Hutto <dwightdhutto@gmail.com>
Date2013-07-19 23:42 -0400
Message-ID<mailman.4899.1374291729.3114.python-list@python.org>
In reply to#50918

[Multipart message — attachments visible in raw view] — view raw

Just use an explanatory user tip that states it should be case sensitive,
just like with most sites, or apps.


On Fri, Jul 19, 2013 at 9:13 PM, Chris Angelico <rosuav@gmail.com> wrote:

> On Sat, Jul 20, 2013 at 11:04 AM, Devyn Collier Johnson
> <devyncjohnson@gmail.com> wrote:
> >
> > On 07/19/2013 07:09 PM, Dave Angel wrote:
> >>
> >> On 07/19/2013 06:08 PM, Devyn Collier Johnson wrote:
> >>>
> >>>
> >>> On 07/19/2013 01:59 PM, Steven D'Aprano wrote:
> >>
> >>
> >>      <snip>
> >>>
> >>>
> >>> As for the case-insensitive if-statements, most code uses Latin
> letters.
> >>> Making a case-insensitive-international if-statement would be
> >>> interesting. I can tackle that later. For now, I only wanted to take
> >>> care of Latin letters. I hope to figure something out for all
> characters.
> >>>
> >>
> >> Once Steven gave you the answer, what's to figure out?  You simply use
> >> casefold() instead of lower().  The only constraint is it's 3.3 and
> later,
> >> so you can't use it for anything earlier.
> >>
> >> http://docs.python.org/3.3/library/stdtypes.html#str.casefold
> >>
> >> """
> >> str.casefold()
> >> Return a casefolded copy of the string. Casefolded strings may be used
> for
> >> caseless matching.
> >>
> >> Casefolding is similar to lowercasing but more aggressive because it is
> >> intended to remove all case distinctions in a string. For example, the
> >> German lowercase letter 'ß' is equivalent to "ss". Since it is already
> >> lowercase, lower() would do nothing to 'ß'; casefold() converts it to
> "ss".
> >>
> >> The casefolding algorithm is described in section 3.13 of the Unicode
> >> Standard.
> >>
> >> New in version 3.3.
> >> """
> >>
> > Chris Angelico said that casefold is not perfect. In the future, I want
> to
> > make the perfect international-case-insensitive if-statement. For now, my
> > code only supports a limited range of characters. Even with casefold, I
> will
> > have some issues as Chris Angelico mentioned. Also, "ß" is not really the
> > same as "ss".
>
> Well, casefold is about as good as it's ever going to be, but that's
> because "the perfect international-case-insensitive comparison" is a
> fundamentally impossible goal. Your last sentence hints as to why;
> there is no simple way to compare strings containing those characters,
> because the correct treatment varies according to context.
>
> Your two best options are: Be case sensitive (and then you need only
> worry about composition and combining characters and all those
> nightmares - the ones you have to worry about either way), or use
> casefold(). Of those, I prefer the first, because it's safer; the
> second is also a good option.
>
> ChrisA
> --
> http://mail.python.org/mailman/listinfo/python-list
>



-- 
Best Regards,
David Hutto
*CEO:* *http://www.hitwebdevelopment.com*

[toc] | [prev] | [next] | [standalone]


#50955

FromDevyn Collier Johnson <devyncjohnson@gmail.com>
Date2013-07-20 06:06 -0400
Message-ID<mailman.4911.1374314778.3114.python-list@python.org>
In reply to#50918
On 07/19/2013 09:51 PM, Dave Angel wrote:
> On 07/19/2013 09:04 PM, Devyn Collier Johnson wrote:
>>
>
>      <snip>
>>>
>> Chris Angelico said that casefold is not perfect. In the future, I want
>> to make the perfect international-case-insensitive if-statement. For
>> now, my code only supports a limited range of characters. Even with
>> casefold, I will have some issues as Chris Angelico mentioned. Also, "ß"
>> is not really the same as "ss".
>>
>
> Sure, the casefold() method has its problems.  But you're going to 
> avoid using it till you can do a "perfect" one?
>
> Perfect in what context?  For "case sensitively" comparing people's 
> names in a single language in a single country?  Perhaps that can be 
> made perfect.  For certain combinations of language and country.
>
> But if you want to compare words in an unspecified language with an 
> unspecified country, it cannot be done.
>
> If you've got a particular goal in mind, great.  But as a library 
> function, you're better off using the best standard method available, 
> and document what its limitations are.  One way of documenting such is 
> to quote the appropriate standards, with their caveats.
>
>
> By the way, you mentioned earlier that you're restricting yourself to 
> Latin characters.  The lower() method is inadequate for many of those 
> as well.  Perhaps you meant ASCII instead.
>
Of course not, Dave; I will implement casefold. I just plan to not stop 
there. My program should not come across unspecified languages. Yeah, I 
meant ASCII, but I was unaware that lower() had some limitation on Latin 
letters.

Mahalo,
DCJ

[toc] | [prev] | [next] | [standalone]


#50962

FromDevyn Collier Johnson <devyncjohnson@gmail.com>
Date2013-07-20 08:20 -0400
Message-ID<mailman.4918.1374322837.3114.python-list@python.org>
In reply to#50918
On 07/19/2013 09:13 PM, Chris Angelico wrote:
> On Sat, Jul 20, 2013 at 11:04 AM, Devyn Collier Johnson
> <devyncjohnson@gmail.com> wrote:
>> On 07/19/2013 07:09 PM, Dave Angel wrote:
>>> On 07/19/2013 06:08 PM, Devyn Collier Johnson wrote:
>>>>
>>>> On 07/19/2013 01:59 PM, Steven D'Aprano wrote:
>>>
>>>       <snip>
>>>>
>>>> As for the case-insensitive if-statements, most code uses Latin letters.
>>>> Making a case-insensitive-international if-statement would be
>>>> interesting. I can tackle that later. For now, I only wanted to take
>>>> care of Latin letters. I hope to figure something out for all characters.
>>>>
>>> Once Steven gave you the answer, what's to figure out?  You simply use
>>> casefold() instead of lower().  The only constraint is it's 3.3 and later,
>>> so you can't use it for anything earlier.
>>>
>>> http://docs.python.org/3.3/library/stdtypes.html#str.casefold
>>>
>>> """
>>> str.casefold()
>>> Return a casefolded copy of the string. Casefolded strings may be used for
>>> caseless matching.
>>>
>>> Casefolding is similar to lowercasing but more aggressive because it is
>>> intended to remove all case distinctions in a string. For example, the
>>> German lowercase letter 'ß' is equivalent to "ss". Since it is already
>>> lowercase, lower() would do nothing to 'ß'; casefold() converts it to "ss".
>>>
>>> The casefolding algorithm is described in section 3.13 of the Unicode
>>> Standard.
>>>
>>> New in version 3.3.
>>> """
>>>
>> Chris Angelico said that casefold is not perfect. In the future, I want to
>> make the perfect international-case-insensitive if-statement. For now, my
>> code only supports a limited range of characters. Even with casefold, I will
>> have some issues as Chris Angelico mentioned. Also, "ß" is not really the
>> same as "ss".
> Well, casefold is about as good as it's ever going to be, but that's
> because "the perfect international-case-insensitive comparison" is a
> fundamentally impossible goal. Your last sentence hints as to why;
> there is no simple way to compare strings containing those characters,
> because the correct treatment varies according to context.
>
> Your two best options are: Be case sensitive (and then you need only
> worry about composition and combining characters and all those
> nightmares - the ones you have to worry about either way), or use
> casefold(). Of those, I prefer the first, because it's safer; the
> second is also a good option.
>
> ChrisA
Thanks everyone (especially Chris Angelico and Steven D'Aprano) for all 
of your helpful suggests and ideas. I plan to implement casefold() in 
some of my programs.

Mahalo,
DCJ

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web