Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!eweka.nl!lightspeed.eweka.nl!194.134.4.91.MISMATCH!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.022 X-Spam-Evidence: '*H*': 0.96; '*S*': 0.00; 'languages,': 0.04; 'string': 0.09; '"if': 0.09; 'collier': 0.09; 'english,': 0.09; 'solution,': 0.09; 'tackle': 0.09; 'language,': 0.12; '3.3,': 0.16; 'distinct': 0.16; 'letters.': 0.16; 'non-european': 0.16; 'reasonably': 0.16; 'wrote:': 0.18; 'work,': 0.20; 'header:User- Agent:1': 0.23; 'subject:Code': 0.24; 'header:In-Reply-To:1': 0.27; 'testing': 0.29; "doesn't": 0.30; 'matching': 0.30; 'code': 0.31; 'easier': 0.31; 'received:10.0.0': 0.31; "d'aprano": 0.31; 'letter.': 0.31; 'steven': 0.31; 'though.': 0.31; 'figure': 0.32; 'languages': 0.32; 'skip:c 30': 0.32; 'fri,': 0.33; 'problem': 0.35; 'knowledge': 0.35; 'something': 0.35; 'equal': 0.35; 'johnson': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'version': 0.36; 'thanks': 0.36; 'received:10.0': 0.36; 'should': 0.36; 'example,': 0.37; 'list.': 0.37; 'received:10': 0.37; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'to:addr:python.org': 0.39; 'mailing': 0.39; 'even': 0.60; 'most': 0.60; 'hope': 0.61; 'full': 0.61; 'making': 0.63; 'different': 0.65; 'guides': 0.74; 'jul': 0.74; 'localized': 0.84; 'norwegian': 0.91; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=aWrcM3D5LEERn5Hrzy4Z7peICrKGp92HmKMvEBsMteA=; b=nVDolyIZJc6g5KNm+ike4a1B++oJySFRUzfJJDOY1drfwmxHQGcsmiCr3XgftEvhYf XaRqroTaFeVscC59xKpV4LbRA1mnGAeU4ndaEyEvAZIQIwEYkyqDmeME9AxiH7QhqYBk SFm2FbMa1QxZJ3K0jsVe4eLpwiaMiOHZTkwGRFQYPA/jkCsvyDa+QEFGCxclKp3mXDzU 4OGakMwKJ6/hXKkCVIBAZegV+8ugV/MtJoVRh6yAKWWQRoecJ1iII67T+YQAsSbd3FOq 3Vhl1gOgscry/+JRFAwnzQKsKcyPcVogsbR4UIWYuGacfGc9FHKDjYJqNVC2ZDTFgRoZ MybQ== X-Received: by 10.50.164.167 with SMTP id yr7mr5417751igb.22.1374314860467; Sat, 20 Jul 2013 03:07:40 -0700 (PDT) Date: Sat, 20 Jul 2013 06:07:38 -0400 From: Devyn Collier Johnson User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130623 Thunderbird/17.0.7 MIME-Version: 1.0 To: "python-list@python.org >> Python Mailing List" Subject: Re: Share Code Tips References: <51e97e6e$0$29971$c3e8da3$5496439d@news.astraweb.com> <51ea016e$0$29971$c3e8da3$5496439d@news.astraweb.com> In-Reply-To: <51ea016e$0$29971$c3e8da3$5496439d@news.astraweb.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 35 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1374314864 news.xs4all.nl 15925 [2001:888:2000:d::a6]:57145 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:50956 On 07/19/2013 11:18 PM, Steven D'Aprano wrote: > On Fri, 19 Jul 2013 18:08:43 -0400, Devyn Collier Johnson wrote: > >> As for the case-insensitive if-statements, most code uses Latin letters. >> Making a case-insensitive-international if-statement would be >> interesting. I can tackle that later. For now, I only wanted to take >> care of Latin letters. I hope to figure something out for all >> characters. > As I showed, even for Latin letters, the trick of "if astring.lower() == > bstring.lower()" doesn't *quite* work, although it can be "close enough" > for some purposes. For example, some languages treat accents as mere > guides to pronunciation, so ö == o, while other languages treat them as > completely different letters. Same with ligatures: in modern English, æ > should be treated as equal to ae, but in Old English, Danish, Norwegian > and Icelandic it is a distinct letter. > > Case-insensitive testing may be easier in many non-European languages, > because they don't have cases. > > A full solution to the problem of localized string matching requires > expert knowledge for each language, but a 90% solution is pretty simple: > > astring.casefold() == bstring.casefold() > > or before version 3.3, just use lowercase. It's not a perfect solution, > but it works reasonably well if you don't care about full localization. > > > Thanks for the tips. I am learning a lot from this mailing list. I hope my code helped some people though. Mahalo, DCJ