Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #50904 > unrolled thread
| Started by | Devyn Collier Johnson <devyncjohnson@gmail.com> |
|---|---|
| First post | 2013-07-19 09:51 -0400 |
| Last post | 2013-07-20 08:20 -0400 |
| Articles | 20 — 5 participants |
Back to article view | Back to comp.lang.python
Share Code Tips Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-19 09:51 -0400
Re: Share Code Tips Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-19 17:59 +0000
Re: Share Code Tips Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-19 18:08 -0400
Re: Share Code Tips Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-20 03:18 +0000
Re: Share Code Tips Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-20 06:07 -0400
Re: Share Code Tips Chris Angelico <rosuav@gmail.com> - 2013-07-20 09:08 +1000
Re: Share Code Tips Dave Angel <davea@davea.name> - 2013-07-19 19:09 -0400
Re: Share Code Tips Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-19 21:04 -0400
Re: Share Code Tips Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-20 03:44 +0000
Re: Share Code Tips David Hutto <dwightdhutto@gmail.com> - 2013-07-20 00:15 -0400
Re: Share Code Tips David Hutto <dwightdhutto@gmail.com> - 2013-07-20 00:22 -0400
Re: Share Code Tips David Hutto <dwightdhutto@gmail.com> - 2013-07-20 00:26 -0400
Re: Share Code Tips David Hutto <dwightdhutto@gmail.com> - 2013-07-20 00:27 -0400
Re: Share Code Tips Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-20 06:10 -0400
Re: Share Code Tips Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-20 08:36 -0400
Re: Share Code Tips Chris Angelico <rosuav@gmail.com> - 2013-07-20 11:13 +1000
Re: Share Code Tips Dave Angel <davea@davea.name> - 2013-07-19 21:51 -0400
Re: Share Code Tips David Hutto <dwightdhutto@gmail.com> - 2013-07-19 23:42 -0400
Re: Share Code Tips Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-20 06:06 -0400
Re: Share Code Tips Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-20 08:20 -0400
| From | Devyn Collier Johnson <devyncjohnson@gmail.com> |
|---|---|
| Date | 2013-07-19 09:51 -0400 |
| Subject | Share Code Tips |
| Message-ID | <mailman.4868.1374241904.3114.python-list@python.org> |
Aloha Python Users!
I have some coding tips and interesting functions that I want to share
with all of you. I want to give other programmers ideas and inspiration.
It is all Python3; most of it should work in Python2. I am a Unix/Linux
person, so some of these will only work on Unix systems. Sorry Microsuck
users :-D ;-)
All of the below Python3 code came from Neobot v0.8dev. I host an
artificial intelligence program on Launchpad (LP Username:
devyncjohnson-d). I have not released my Python version yet. The current
version of Neobot (v0.7a) is written in BASH and Python3.
To emulate the Linux shell's date command, use this Python
function def DATE(): print(time.strftime("%a %B %d %H:%M:%S %Z %Y"))
Want an easy way to clear the terminal screen? Then try this:
def clr(): os.system(['clear','cls'][os.name == 'nt'])
Here are two Linux-only functions:
def GETRAM(): print(linecache.getline('/proc/meminfo',
1).replace('MemTotal:', '').strip()) #Get Total RAM in kilobytes#
def KDE_VERSION(): print(subprocess.getoutput('kded4 --version | awk -F:
\'NR == 2 {print $2}\'').strip()) ##Get KDE version##
Need a case-insensitive if-statement? Check this out:
if 'YOUR_STRING'.lower() in SOMEVAR.lower():
Have a Python XML browser and want to add awesome tags? This code would
see if the code to be parsed contains chess tags. If so, then they are
replaced with chess symbols. I know, many people hate trolls, but trolls
are my best friends. Try this:
if '<chess_'.lower() in PTRNPRS.lower(): DATA =
re.sub('<chess_white_king/>', '♔', PTRNPRS, flags=re.I); DATA =
re.sub('<chess_white_queen/>', '♕', DATA, flags=re.I); DATA =
re.sub('<chess_white_castle/>', '♖', DATA, flags=re.I); DATA =
re.sub('<chess_white_bishop/>', '♗', DATA, flags=re.I); DATA =
re.sub('<chess_white_knight/>', '♘', DATA, flags=re.I); DATA =
re.sub('<chess_white_pawn/>', '♙', DATA, flags=re.I); DATA =
re.sub('<chess_black_king/>', '♚', DATA, flags=re.I); DATA =
re.sub('<chess_black_queen/>', '♛', DATA, flags=re.I); DATA =
re.sub('<chess_black_castle/>', '♜', DATA, flags=re.I); DATA =
re.sub('<chess_black_bishop/>', '♝', DATA, flags=re.I); DATA =
re.sub('<chess_black_knight/>', '♞', DATA, flags=re.I); PTRNPRS =
re.sub('<chess_black_pawn/>', '♟', DATA, flags=re.I)
For those of you making scripts to be run in a terminal, try this for a
fancy terminal prompt:
INPUTTEMP = input('User ≻≻≻')
I may share more code later. Tell me what you think of my coding style
and tips.
Mahalo,
Devyn Collier Johnson
DevynCJohnson@Gmail.com
[toc] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-07-19 17:59 +0000 |
| Message-ID | <51e97e6e$0$29971$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #50904 |
On Fri, 19 Jul 2013 09:51:23 -0400, Devyn Collier Johnson wrote:
> def KDE_VERSION():
> print(subprocess.getoutput('kded4 --version | awk -F:
> \'NR == 2 {print $2}\'').strip()) ##Get KDE version##
I run KDE 3, and the above does not work for me.
*half a wink*
By the way, a comment that doesn't tell you anything that you don't
already know is worse than useless. The function is called "KDE_VERSION,
what else would it do other than return the KDE version?
x += 1 # add 1 to x
Worse than just being useless, redundant comments are dangerous, because
as a general rule comments that don't say anything useful eventually
become out-of-date, they become *inaccurate* rather than *redundant*, and
that's worse than being useless.
> Need a case-insensitive if-statement? Check this out:
>
> if 'YOUR_STRING'.lower() in SOMEVAR.lower():
Case-insensitivity is very hard. Take German for example:
STRASSE <-> straße
Or Turkish:
İ <-> i
I <-> ı
In Python 3.3, you should use casefold rather than lowercase or uppercase:
if some_string.casefold() in another_string.casefold(): ...
but even that can't always take into account localised rules, e.g. in
German, you should not convert SS to ß for placenames or person names, so
for example Herr Meißner and Herr Meissner are two different people. This
is one of the motivating reasons for introducing the uppercase ß.
http://opentype.info/blog/2011/01/24/capital-sharp-s/
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Devyn Collier Johnson <devyncjohnson@gmail.com> |
|---|---|
| Date | 2013-07-19 18:08 -0400 |
| Message-ID | <mailman.4885.1374271729.3114.python-list@python.org> |
| In reply to | #50918 |
On 07/19/2013 01:59 PM, Steven D'Aprano wrote:
> On Fri, 19 Jul 2013 09:51:23 -0400, Devyn Collier Johnson wrote:
>
>> def KDE_VERSION():
>> print(subprocess.getoutput('kded4 --version | awk -F:
>> \'NR == 2 {print $2}\'').strip()) ##Get KDE version##
> I run KDE 3, and the above does not work for me.
>
> *half a wink*
>
> By the way, a comment that doesn't tell you anything that you don't
> already know is worse than useless. The function is called "KDE_VERSION,
> what else would it do other than return the KDE version?
>
>
> x += 1 # add 1 to x
>
> Worse than just being useless, redundant comments are dangerous, because
> as a general rule comments that don't say anything useful eventually
> become out-of-date, they become *inaccurate* rather than *redundant*, and
> that's worse than being useless.
>
>
>> Need a case-insensitive if-statement? Check this out:
>>
>> if 'YOUR_STRING'.lower() in SOMEVAR.lower():
> Case-insensitivity is very hard. Take German for example:
>
> STRASSE <-> straße
>
> Or Turkish:
>
> İ <-> i
> I <-> ı
>
>
> In Python 3.3, you should use casefold rather than lowercase or uppercase:
>
> if some_string.casefold() in another_string.casefold(): ...
>
>
> but even that can't always take into account localised rules, e.g. in
> German, you should not convert SS to ß for placenames or person names, so
> for example Herr Meißner and Herr Meissner are two different people. This
> is one of the motivating reasons for introducing the uppercase ß.
>
> http://opentype.info/blog/2011/01/24/capital-sharp-s/
>
>
>
Steven, thanks for your interesting comments. Your emails are very
insightful.
As for the KDE function, I should fix that. Thank you for catching that.
Notice that the shell command in the function is "kded4". That would
only check the version for the KDE4 series. The function will only work
for KDE4 users. As for the comment, you would be amazed with the people
that ask me "what does this do?". These people are redundant (^u^).
As for the case-insensitive if-statements, most code uses Latin letters.
Making a case-insensitive-international if-statement would be
interesting. I can tackle that later. For now, I only wanted to take
care of Latin letters. I hope to figure something out for all characters.
Thank you for your reply. I found it to be very helpful.
Mahalo,
DCJ
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-07-20 03:18 +0000 |
| Message-ID | <51ea016e$0$29971$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #50923 |
On Fri, 19 Jul 2013 18:08:43 -0400, Devyn Collier Johnson wrote: > As for the case-insensitive if-statements, most code uses Latin letters. > Making a case-insensitive-international if-statement would be > interesting. I can tackle that later. For now, I only wanted to take > care of Latin letters. I hope to figure something out for all > characters. As I showed, even for Latin letters, the trick of "if astring.lower() == bstring.lower()" doesn't *quite* work, although it can be "close enough" for some purposes. For example, some languages treat accents as mere guides to pronunciation, so ö == o, while other languages treat them as completely different letters. Same with ligatures: in modern English, æ should be treated as equal to ae, but in Old English, Danish, Norwegian and Icelandic it is a distinct letter. Case-insensitive testing may be easier in many non-European languages, because they don't have cases. A full solution to the problem of localized string matching requires expert knowledge for each language, but a 90% solution is pretty simple: astring.casefold() == bstring.casefold() or before version 3.3, just use lowercase. It's not a perfect solution, but it works reasonably well if you don't care about full localization. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Devyn Collier Johnson <devyncjohnson@gmail.com> |
|---|---|
| Date | 2013-07-20 06:07 -0400 |
| Message-ID | <mailman.4912.1374314863.3114.python-list@python.org> |
| In reply to | #50940 |
On 07/19/2013 11:18 PM, Steven D'Aprano wrote: > On Fri, 19 Jul 2013 18:08:43 -0400, Devyn Collier Johnson wrote: > >> As for the case-insensitive if-statements, most code uses Latin letters. >> Making a case-insensitive-international if-statement would be >> interesting. I can tackle that later. For now, I only wanted to take >> care of Latin letters. I hope to figure something out for all >> characters. > As I showed, even for Latin letters, the trick of "if astring.lower() == > bstring.lower()" doesn't *quite* work, although it can be "close enough" > for some purposes. For example, some languages treat accents as mere > guides to pronunciation, so ö == o, while other languages treat them as > completely different letters. Same with ligatures: in modern English, æ > should be treated as equal to ae, but in Old English, Danish, Norwegian > and Icelandic it is a distinct letter. > > Case-insensitive testing may be easier in many non-European languages, > because they don't have cases. > > A full solution to the problem of localized string matching requires > expert knowledge for each language, but a 90% solution is pretty simple: > > astring.casefold() == bstring.casefold() > > or before version 3.3, just use lowercase. It's not a perfect solution, > but it works reasonably well if you don't care about full localization. > > > Thanks for the tips. I am learning a lot from this mailing list. I hope my code helped some people though. Mahalo, DCJ
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-07-20 09:08 +1000 |
| Message-ID | <mailman.4890.1374275325.3114.python-list@python.org> |
| In reply to | #50918 |
On Sat, Jul 20, 2013 at 8:08 AM, Devyn Collier Johnson <devyncjohnson@gmail.com> wrote: > As for the case-insensitive if-statements, most code uses Latin letters. > Making a case-insensitive-international if-statement would be interesting. I > can tackle that later. For now, I only wanted to take care of Latin letters. > I hope to figure something out for all characters. Case insensitivity is a *hard* problem. Don't fool yourself that you can do it with a simple line of code and have it 'just work'. All you'll have is something that works "most of the time", and then breaks on certain input. As Steven said, using casefold() rather than lower() will help, but that's still not perfect. The simplest and safest way to solve Unicode capitalization issues is to declare that your protocol is case sensitive. I have a brother who couldn't understand why Unix file systems have to be case sensitive (why would anyone ever want to have "readme" and "README" in the same directory, after all?), until I explained how majorly hard it is with i18n, and how it suddenly becomes extremely unsafe for your *file system* to get this wrong. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-07-19 19:09 -0400 |
| Message-ID | <mailman.4891.1374275387.3114.python-list@python.org> |
| In reply to | #50918 |
On 07/19/2013 06:08 PM, Devyn Collier Johnson wrote:
>
> On 07/19/2013 01:59 PM, Steven D'Aprano wrote:
<snip>
>
> As for the case-insensitive if-statements, most code uses Latin letters.
> Making a case-insensitive-international if-statement would be
> interesting. I can tackle that later. For now, I only wanted to take
> care of Latin letters. I hope to figure something out for all characters.
>
Once Steven gave you the answer, what's to figure out? You simply use
casefold() instead of lower(). The only constraint is it's 3.3 and
later, so you can't use it for anything earlier.
http://docs.python.org/3.3/library/stdtypes.html#str.casefold
"""
str.casefold()
Return a casefolded copy of the string. Casefolded strings may be used
for caseless matching.
Casefolding is similar to lowercasing but more aggressive because it is
intended to remove all case distinctions in a string. For example, the
German lowercase letter 'ß' is equivalent to "ss". Since it is already
lowercase, lower() would do nothing to 'ß'; casefold() converts it to "ss".
The casefolding algorithm is described in section 3.13 of the Unicode
Standard.
New in version 3.3.
"""
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | Devyn Collier Johnson <devyncjohnson@gmail.com> |
|---|---|
| Date | 2013-07-19 21:04 -0400 |
| Message-ID | <mailman.4894.1374282301.3114.python-list@python.org> |
| In reply to | #50918 |
On 07/19/2013 07:09 PM, Dave Angel wrote: > On 07/19/2013 06:08 PM, Devyn Collier Johnson wrote: >> >> On 07/19/2013 01:59 PM, Steven D'Aprano wrote: > > <snip> >> >> As for the case-insensitive if-statements, most code uses Latin letters. >> Making a case-insensitive-international if-statement would be >> interesting. I can tackle that later. For now, I only wanted to take >> care of Latin letters. I hope to figure something out for all >> characters. >> > > Once Steven gave you the answer, what's to figure out? You simply use > casefold() instead of lower(). The only constraint is it's 3.3 and > later, so you can't use it for anything earlier. > > http://docs.python.org/3.3/library/stdtypes.html#str.casefold > > """ > str.casefold() > Return a casefolded copy of the string. Casefolded strings may be used > for caseless matching. > > Casefolding is similar to lowercasing but more aggressive because it > is intended to remove all case distinctions in a string. For example, > the German lowercase letter 'ß' is equivalent to "ss". Since it is > already lowercase, lower() would do nothing to 'ß'; casefold() > converts it to "ss". > > The casefolding algorithm is described in section 3.13 of the Unicode > Standard. > > New in version 3.3. > """ > Chris Angelico said that casefold is not perfect. In the future, I want to make the perfect international-case-insensitive if-statement. For now, my code only supports a limited range of characters. Even with casefold, I will have some issues as Chris Angelico mentioned. Also, "ß" is not really the same as "ss". Mahalo, DCJ
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-07-20 03:44 +0000 |
| Message-ID | <51ea07a4$0$29971$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #50936 |
On Fri, 19 Jul 2013 21:04:55 -0400, Devyn Collier Johnson wrote: > In the future, I want to > make the perfect international-case-insensitive if-statement. For now, > my code only supports a limited range of characters. Even with casefold, > I will have some issues as Chris Angelico mentioned. There are hundreds of written languages in the world, with thousands of characters, and most of them have rules about case-sensitivity and character normalization. For example, in Greek, lowercase Σ is σ except at the end of a word, when it is ς. ≻≻≻ 'Σσς'.upper() 'ΣΣΣ' ≻≻≻ 'Σσς'.lower() 'σσς' ≻≻≻ 'Σσς'.casefold() 'σσσ' So in this case, casefold() correctly solves the problem, provided you are comparing modern Greek text. But if you're comparing text in some other language which merely happens to use Greek letters, but doesn't have the same rules about letter sigma, then it will be inappropriate. So you cannot write a single "perfect" case-insensitive comparison, the best you can hope for is to write dozens or hundreds of separate case- insensitive comparisons, one for each language or family of languages. For an introduction to the problem: http://www.w3.org/International/wiki/Case_folding http://www.unicode.org/faq/casemap_charprop.html > Also, "ß" is not really the same as "ss". Sometimes it is. Sometimes it isn't. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | David Hutto <dwightdhutto@gmail.com> |
|---|---|
| Date | 2013-07-20 00:15 -0400 |
| Message-ID | <mailman.4902.1374293729.3114.python-list@python.org> |
| In reply to | #50943 |
[Multipart message — attachments visible in raw view] — view raw
It seems, without utilizing this, or googling, that a case sensitive library is either developed, or could be implemented by utilizing case sensitive translation through a google translation page using an urlopener, and placing in the data to be processed back to the boolean value. Never attempted, but the algorithm seems simpler than the dozens of solutions method.
[toc] | [prev] | [next] | [standalone]
| From | David Hutto <dwightdhutto@gmail.com> |
|---|---|
| Date | 2013-07-20 00:22 -0400 |
| Message-ID | <mailman.4904.1374294126.3114.python-list@python.org> |
| In reply to | #50943 |
[Multipart message — attachments visible in raw view] — view raw
It seems that you could use import re, in my mind's pseudo code, to compile a translational usage of usernames/passwords that could remain case sensitive by using just the translational dictionaries, and refining with data input tests/unit tests. On Sat, Jul 20, 2013 at 12:15 AM, David Hutto <dwightdhutto@gmail.com>wrote: > It seems, without utilizing this, or googling, that a case sensitive > library is either developed, or could be implemented by utilizing case > sensitive translation through a google translation page using an urlopener, > and placing in the data to be processed back to the boolean value. Never > attempted, but the algorithm seems simpler than the dozens of solutions > method. > -- Best Regards, David Hutto *CEO:* *http://www.hitwebdevelopment.com*
[toc] | [prev] | [next] | [standalone]
| From | David Hutto <dwightdhutto@gmail.com> |
|---|---|
| Date | 2013-07-20 00:26 -0400 |
| Message-ID | <mailman.4905.1374294378.3114.python-list@python.org> |
| In reply to | #50943 |
[Multipart message — attachments visible in raw view] — view raw
I didn't see that this was for a chess game. That seems more point and click. Everyone can recognize a bishop from a queen, or a rook from a pawn. So why would case sensitivity matter other than the 16 pieces on the board? Or am I misunderstanding the question? On Sat, Jul 20, 2013 at 12:22 AM, David Hutto <dwightdhutto@gmail.com>wrote: > It seems that you could use import re, in my mind's pseudo code, to > compile a translational usage of usernames/passwords that could remain case > sensitive by using just the translational dictionaries, and refining with > data input tests/unit tests. > > > On Sat, Jul 20, 2013 at 12:15 AM, David Hutto <dwightdhutto@gmail.com>wrote: > >> It seems, without utilizing this, or googling, that a case sensitive >> library is either developed, or could be implemented by utilizing case >> sensitive translation through a google translation page using an urlopener, >> and placing in the data to be processed back to the boolean value. Never >> attempted, but the algorithm seems simpler than the dozens of solutions >> method. >> > > > > -- > Best Regards, > David Hutto > *CEO:* *http://www.hitwebdevelopment.com* > -- Best Regards, David Hutto *CEO:* *http://www.hitwebdevelopment.com*
[toc] | [prev] | [next] | [standalone]
| From | David Hutto <dwightdhutto@gmail.com> |
|---|---|
| Date | 2013-07-20 00:27 -0400 |
| Message-ID | <mailman.4906.1374294454.3114.python-list@python.org> |
| In reply to | #50943 |
[Multipart message — attachments visible in raw view] — view raw
32 if you count black, and white. On Sat, Jul 20, 2013 at 12:26 AM, David Hutto <dwightdhutto@gmail.com>wrote: > I didn't see that this was for a chess game. That seems more point and > click. Everyone can recognize a bishop from a queen, or a rook from a pawn. > So why would case sensitivity matter other than the 16 pieces on the board? > Or am I misunderstanding the question? > > > > On Sat, Jul 20, 2013 at 12:22 AM, David Hutto <dwightdhutto@gmail.com>wrote: > >> It seems that you could use import re, in my mind's pseudo code, to >> compile a translational usage of usernames/passwords that could remain case >> sensitive by using just the translational dictionaries, and refining with >> data input tests/unit tests. >> >> >> On Sat, Jul 20, 2013 at 12:15 AM, David Hutto <dwightdhutto@gmail.com>wrote: >> >>> It seems, without utilizing this, or googling, that a case sensitive >>> library is either developed, or could be implemented by utilizing case >>> sensitive translation through a google translation page using an urlopener, >>> and placing in the data to be processed back to the boolean value. Never >>> attempted, but the algorithm seems simpler than the dozens of solutions >>> method. >>> >> >> >> >> -- >> Best Regards, >> David Hutto >> *CEO:* *http://www.hitwebdevelopment.com* >> > > > > -- > Best Regards, > David Hutto > *CEO:* *http://www.hitwebdevelopment.com* > -- Best Regards, David Hutto *CEO:* *http://www.hitwebdevelopment.com*
[toc] | [prev] | [next] | [standalone]
| From | Devyn Collier Johnson <devyncjohnson@gmail.com> |
|---|---|
| Date | 2013-07-20 06:10 -0400 |
| Message-ID | <mailman.4913.1374315056.3114.python-list@python.org> |
| In reply to | #50943 |
On 07/19/2013 11:44 PM, Steven D'Aprano wrote: > On Fri, 19 Jul 2013 21:04:55 -0400, Devyn Collier Johnson wrote: > >> In the future, I want to >> make the perfect international-case-insensitive if-statement. For now, >> my code only supports a limited range of characters. Even with casefold, >> I will have some issues as Chris Angelico mentioned. > There are hundreds of written languages in the world, with thousands of > characters, and most of them have rules about case-sensitivity and > character normalization. For example, in Greek, lowercase Σ is σ except > at the end of a word, when it is ς. > > ≻≻≻ 'Σσς'.upper() > 'ΣΣΣ' > ≻≻≻ 'Σσς'.lower() > 'σσς' > ≻≻≻ 'Σσς'.casefold() > 'σσσ' > > > So in this case, casefold() correctly solves the problem, provided you > are comparing modern Greek text. But if you're comparing text in some > other language which merely happens to use Greek letters, but doesn't > have the same rules about letter sigma, then it will be inappropriate. So > you cannot write a single "perfect" case-insensitive comparison, the best > you can hope for is to write dozens or hundreds of separate case- > insensitive comparisons, one for each language or family of languages. > > For an introduction to the problem: > > http://www.w3.org/International/wiki/Case_folding > > http://www.unicode.org/faq/casemap_charprop.html > > > > >> Also, "ß" is not really the same as "ss". > Sometimes it is. Sometimes it isn't. > > > Wow, my if-statement is so imperfect! Thankfully, only English people will talk to an English chatbot (I hope), so for my use of the code, it will work. Do the main Python3 developers plan to do something about this? Mahalo, DCJ
[toc] | [prev] | [next] | [standalone]
| From | Devyn Collier Johnson <devyncjohnson@gmail.com> |
|---|---|
| Date | 2013-07-20 08:36 -0400 |
| Message-ID | <mailman.4920.1374323827.3114.python-list@python.org> |
| In reply to | #50943 |
[Multipart message — attachments visible in raw view] — view raw
On 07/20/2013 12:26 AM, David Hutto wrote: > I didn't see that this was for a chess game. That seems more point and > click. Everyone can recognize a bishop from a queen, or a rook from a > pawn. So why would case sensitivity matter other than the 16 pieces on > the board? Or am I misunderstanding the question? > > > > On Sat, Jul 20, 2013 at 12:22 AM, David Hutto <dwightdhutto@gmail.com > <mailto:dwightdhutto@gmail.com>> wrote: > > It seems that you could use import re, in my mind's pseudo code, > to compile a translational usage of usernames/passwords that could > remain case sensitive by using just the translational > dictionaries, and refining with data input tests/unit tests. > > > On Sat, Jul 20, 2013 at 12:15 AM, David Hutto > <dwightdhutto@gmail.com <mailto:dwightdhutto@gmail.com>> wrote: > > It seems, without utilizing this, or googling, that a case > sensitive library is either developed, or could be implemented > by utilizing case sensitive translation through a google > translation page using an urlopener, and placing in the data > to be processed back to the boolean value. Never attempted, > but the algorithm seems simpler than the dozens of solutions > method. > > > > > -- > Best Regards, > David Hutto > /*CEO:*/ _http://www.hitwebdevelopment.com_ > > > > > -- > Best Regards, > David Hutto > /*CEO:*/ _http://www.hitwebdevelopment.com_ > > In the email, I am sharing various code snippets to give others ideas and inspiration for coding. I that particular snippet, I am giving Python3 programmers the idea of making chess tags on a HTML or XML interpreter. It would be neat to type a tag that would generate chess pieces instead of remembering the HTML ASCII code. From my understanding, that email is not being displayed correctly. Are all of the lines run together? Thank you for asking. I want everyone to understand the purpose of the email and that particular snippet. Remember, assumption is the lowest form of knowledge. Mahalo, DCJ
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-07-20 11:13 +1000 |
| Message-ID | <mailman.4896.1374282806.3114.python-list@python.org> |
| In reply to | #50918 |
On Sat, Jul 20, 2013 at 11:04 AM, Devyn Collier Johnson <devyncjohnson@gmail.com> wrote: > > On 07/19/2013 07:09 PM, Dave Angel wrote: >> >> On 07/19/2013 06:08 PM, Devyn Collier Johnson wrote: >>> >>> >>> On 07/19/2013 01:59 PM, Steven D'Aprano wrote: >> >> >> <snip> >>> >>> >>> As for the case-insensitive if-statements, most code uses Latin letters. >>> Making a case-insensitive-international if-statement would be >>> interesting. I can tackle that later. For now, I only wanted to take >>> care of Latin letters. I hope to figure something out for all characters. >>> >> >> Once Steven gave you the answer, what's to figure out? You simply use >> casefold() instead of lower(). The only constraint is it's 3.3 and later, >> so you can't use it for anything earlier. >> >> http://docs.python.org/3.3/library/stdtypes.html#str.casefold >> >> """ >> str.casefold() >> Return a casefolded copy of the string. Casefolded strings may be used for >> caseless matching. >> >> Casefolding is similar to lowercasing but more aggressive because it is >> intended to remove all case distinctions in a string. For example, the >> German lowercase letter 'ß' is equivalent to "ss". Since it is already >> lowercase, lower() would do nothing to 'ß'; casefold() converts it to "ss". >> >> The casefolding algorithm is described in section 3.13 of the Unicode >> Standard. >> >> New in version 3.3. >> """ >> > Chris Angelico said that casefold is not perfect. In the future, I want to > make the perfect international-case-insensitive if-statement. For now, my > code only supports a limited range of characters. Even with casefold, I will > have some issues as Chris Angelico mentioned. Also, "ß" is not really the > same as "ss". Well, casefold is about as good as it's ever going to be, but that's because "the perfect international-case-insensitive comparison" is a fundamentally impossible goal. Your last sentence hints as to why; there is no simple way to compare strings containing those characters, because the correct treatment varies according to context. Your two best options are: Be case sensitive (and then you need only worry about composition and combining characters and all those nightmares - the ones you have to worry about either way), or use casefold(). Of those, I prefer the first, because it's safer; the second is also a good option. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-07-19 21:51 -0400 |
| Message-ID | <mailman.4897.1374285114.3114.python-list@python.org> |
| In reply to | #50918 |
On 07/19/2013 09:04 PM, Devyn Collier Johnson wrote:
>
<snip>
>>
> Chris Angelico said that casefold is not perfect. In the future, I want
> to make the perfect international-case-insensitive if-statement. For
> now, my code only supports a limited range of characters. Even with
> casefold, I will have some issues as Chris Angelico mentioned. Also, "ß"
> is not really the same as "ss".
>
Sure, the casefold() method has its problems. But you're going to avoid
using it till you can do a "perfect" one?
Perfect in what context? For "case sensitively" comparing people's
names in a single language in a single country? Perhaps that can be
made perfect. For certain combinations of language and country.
But if you want to compare words in an unspecified language with an
unspecified country, it cannot be done.
If you've got a particular goal in mind, great. But as a library
function, you're better off using the best standard method available,
and document what its limitations are. One way of documenting such is
to quote the appropriate standards, with their caveats.
By the way, you mentioned earlier that you're restricting yourself to
Latin characters. The lower() method is inadequate for many of those as
well. Perhaps you meant ASCII instead.
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | David Hutto <dwightdhutto@gmail.com> |
|---|---|
| Date | 2013-07-19 23:42 -0400 |
| Message-ID | <mailman.4899.1374291729.3114.python-list@python.org> |
| In reply to | #50918 |
[Multipart message — attachments visible in raw view] — view raw
Just use an explanatory user tip that states it should be case sensitive, just like with most sites, or apps. On Fri, Jul 19, 2013 at 9:13 PM, Chris Angelico <rosuav@gmail.com> wrote: > On Sat, Jul 20, 2013 at 11:04 AM, Devyn Collier Johnson > <devyncjohnson@gmail.com> wrote: > > > > On 07/19/2013 07:09 PM, Dave Angel wrote: > >> > >> On 07/19/2013 06:08 PM, Devyn Collier Johnson wrote: > >>> > >>> > >>> On 07/19/2013 01:59 PM, Steven D'Aprano wrote: > >> > >> > >> <snip> > >>> > >>> > >>> As for the case-insensitive if-statements, most code uses Latin > letters. > >>> Making a case-insensitive-international if-statement would be > >>> interesting. I can tackle that later. For now, I only wanted to take > >>> care of Latin letters. I hope to figure something out for all > characters. > >>> > >> > >> Once Steven gave you the answer, what's to figure out? You simply use > >> casefold() instead of lower(). The only constraint is it's 3.3 and > later, > >> so you can't use it for anything earlier. > >> > >> http://docs.python.org/3.3/library/stdtypes.html#str.casefold > >> > >> """ > >> str.casefold() > >> Return a casefolded copy of the string. Casefolded strings may be used > for > >> caseless matching. > >> > >> Casefolding is similar to lowercasing but more aggressive because it is > >> intended to remove all case distinctions in a string. For example, the > >> German lowercase letter 'ß' is equivalent to "ss". Since it is already > >> lowercase, lower() would do nothing to 'ß'; casefold() converts it to > "ss". > >> > >> The casefolding algorithm is described in section 3.13 of the Unicode > >> Standard. > >> > >> New in version 3.3. > >> """ > >> > > Chris Angelico said that casefold is not perfect. In the future, I want > to > > make the perfect international-case-insensitive if-statement. For now, my > > code only supports a limited range of characters. Even with casefold, I > will > > have some issues as Chris Angelico mentioned. Also, "ß" is not really the > > same as "ss". > > Well, casefold is about as good as it's ever going to be, but that's > because "the perfect international-case-insensitive comparison" is a > fundamentally impossible goal. Your last sentence hints as to why; > there is no simple way to compare strings containing those characters, > because the correct treatment varies according to context. > > Your two best options are: Be case sensitive (and then you need only > worry about composition and combining characters and all those > nightmares - the ones you have to worry about either way), or use > casefold(). Of those, I prefer the first, because it's safer; the > second is also a good option. > > ChrisA > -- > http://mail.python.org/mailman/listinfo/python-list > -- Best Regards, David Hutto *CEO:* *http://www.hitwebdevelopment.com*
[toc] | [prev] | [next] | [standalone]
| From | Devyn Collier Johnson <devyncjohnson@gmail.com> |
|---|---|
| Date | 2013-07-20 06:06 -0400 |
| Message-ID | <mailman.4911.1374314778.3114.python-list@python.org> |
| In reply to | #50918 |
On 07/19/2013 09:51 PM, Dave Angel wrote: > On 07/19/2013 09:04 PM, Devyn Collier Johnson wrote: >> > > <snip> >>> >> Chris Angelico said that casefold is not perfect. In the future, I want >> to make the perfect international-case-insensitive if-statement. For >> now, my code only supports a limited range of characters. Even with >> casefold, I will have some issues as Chris Angelico mentioned. Also, "ß" >> is not really the same as "ss". >> > > Sure, the casefold() method has its problems. But you're going to > avoid using it till you can do a "perfect" one? > > Perfect in what context? For "case sensitively" comparing people's > names in a single language in a single country? Perhaps that can be > made perfect. For certain combinations of language and country. > > But if you want to compare words in an unspecified language with an > unspecified country, it cannot be done. > > If you've got a particular goal in mind, great. But as a library > function, you're better off using the best standard method available, > and document what its limitations are. One way of documenting such is > to quote the appropriate standards, with their caveats. > > > By the way, you mentioned earlier that you're restricting yourself to > Latin characters. The lower() method is inadequate for many of those > as well. Perhaps you meant ASCII instead. > Of course not, Dave; I will implement casefold. I just plan to not stop there. My program should not come across unspecified languages. Yeah, I meant ASCII, but I was unaware that lower() had some limitation on Latin letters. Mahalo, DCJ
[toc] | [prev] | [next] | [standalone]
| From | Devyn Collier Johnson <devyncjohnson@gmail.com> |
|---|---|
| Date | 2013-07-20 08:20 -0400 |
| Message-ID | <mailman.4918.1374322837.3114.python-list@python.org> |
| In reply to | #50918 |
On 07/19/2013 09:13 PM, Chris Angelico wrote: > On Sat, Jul 20, 2013 at 11:04 AM, Devyn Collier Johnson > <devyncjohnson@gmail.com> wrote: >> On 07/19/2013 07:09 PM, Dave Angel wrote: >>> On 07/19/2013 06:08 PM, Devyn Collier Johnson wrote: >>>> >>>> On 07/19/2013 01:59 PM, Steven D'Aprano wrote: >>> >>> <snip> >>>> >>>> As for the case-insensitive if-statements, most code uses Latin letters. >>>> Making a case-insensitive-international if-statement would be >>>> interesting. I can tackle that later. For now, I only wanted to take >>>> care of Latin letters. I hope to figure something out for all characters. >>>> >>> Once Steven gave you the answer, what's to figure out? You simply use >>> casefold() instead of lower(). The only constraint is it's 3.3 and later, >>> so you can't use it for anything earlier. >>> >>> http://docs.python.org/3.3/library/stdtypes.html#str.casefold >>> >>> """ >>> str.casefold() >>> Return a casefolded copy of the string. Casefolded strings may be used for >>> caseless matching. >>> >>> Casefolding is similar to lowercasing but more aggressive because it is >>> intended to remove all case distinctions in a string. For example, the >>> German lowercase letter 'ß' is equivalent to "ss". Since it is already >>> lowercase, lower() would do nothing to 'ß'; casefold() converts it to "ss". >>> >>> The casefolding algorithm is described in section 3.13 of the Unicode >>> Standard. >>> >>> New in version 3.3. >>> """ >>> >> Chris Angelico said that casefold is not perfect. In the future, I want to >> make the perfect international-case-insensitive if-statement. For now, my >> code only supports a limited range of characters. Even with casefold, I will >> have some issues as Chris Angelico mentioned. Also, "ß" is not really the >> same as "ss". > Well, casefold is about as good as it's ever going to be, but that's > because "the perfect international-case-insensitive comparison" is a > fundamentally impossible goal. Your last sentence hints as to why; > there is no simple way to compare strings containing those characters, > because the correct treatment varies according to context. > > Your two best options are: Be case sensitive (and then you need only > worry about composition and combining characters and all those > nightmares - the ones you have to worry about either way), or use > casefold(). Of those, I prefer the first, because it's safer; the > second is also a good option. > > ChrisA Thanks everyone (especially Chris Angelico and Steven D'Aprano) for all of your helpful suggests and ideas. I plan to implement casefold() in some of my programs. Mahalo, DCJ
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web