Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Chris Angelico Newsgroups: comp.lang.python Subject: Re: Not x.islower() has different output than x.isupper() in list output... Date: Thu, 5 May 2016 00:37:17 +1000 Lines: 25 Message-ID: References: <572407AE.1070703@icloud.com> <1461979797.3824480.593944273.0B8D8DF3@webmail.messagingengine.com> <57241097.7020801@icloud.com> <57296c7a$0$1589$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de w7vBrB9iewTlRG5NkQioWgAb0Tqu37fb2zIByQxE8FJQ== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.033 X-Spam-Evidence: '*H*': 0.93; '*S*': 0.00; 'cc:addr:python-list': 0.09; 'unicode,': 0.09; 'thu,': 0.15; '2016': 0.16; 'debugger.': 0.16; 'dfs': 0.16; 'dotted': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'pulling': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'subject:Not': 0.16; 'trivia': 0.16; 'wrote:': 0.16; 'basically': 0.18; 'test.': 0.18; 'language': 0.19; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'text,': 0.22; 'am,': 0.23; 'header:In-Reply-To:1': 0.24; 'testing': 0.25; 'subject:list': 0.26; 'message- id:@mail.gmail.com': 0.27; 'code': 0.30; 'writes': 0.30; 'post': 0.31; 'german': 0.32; 'maybe': 0.33; "d'aprano": 0.33; 'steven': 0.33; 'received:google.com': 0.35; 'world,': 0.35; 'text': 0.35; 'received:209.85': 0.36; 'cases': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'starting': 0.37; 'doing': 0.38; 'received:209': 0.38; 'end': 0.39; 'some': 0.40; 'your': 0.60; 'research': 0.62; 'skip:n 10': 0.62; 'spot': 0.63; 'necessarily': 0.63; 'combining': 0.66; 'chinese': 0.79; 'arabic': 0.84; 'chrisa': 0.84; 'greek': 0.84; 'non-bmp': 0.84; 'turkish': 0.84; 'to:none': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc; bh=TOQypdPNxvx+vT8Qe1xgDoXuybDvKCx7VDQZjm8h0fc=; b=P54zksewHgh8q1VTV3HJrRPjhvNlq3BJbbXpnb0t33tMjCVM2eShiS4cUFGXz0H/op JjURkJqdZqw+4pOFRBPKXDMGgy2eOENgjdJuKqPt9zzz4lpHSb7//toctVCgP2mTrniq dE+E+eYtQBef0WLK0sYUj40WOVEgWQ1paszbEpTooE6hFdJ+qM8LH5W5HrdE2cymNPR3 eir/2pzzSCL3LjkkDZHXQR7sadj1V+gtdXWZ5YFHIbtzBxGGx8FZ+6wiEj6IpJ7YPYWg uHqpXqi9802FPJcRSefe+zfJFq62xMQt8U67XZnnCoijsA+pCofVETWil7xfhjRQSDrZ qSSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:cc; bh=TOQypdPNxvx+vT8Qe1xgDoXuybDvKCx7VDQZjm8h0fc=; b=NvNtdrcGWoScWNLEAvsut74gXcrYdHW6hGbJ2LWXOMjNQBADBbv4Yr/BYLE3qdu2LI qrtPDetJ3hk8k95wvZs2vJJZROkEGUtohFZiKe0dlUwZoJQBgSvkYRgX3DmAvrqNbrPv jkSFg2WihPnN3hxBtIbbmd4H0OfM2fFBiIypo9twm7L1tCPQcKvW/taItBMb2SSzSJ4X GNCOnl/foyXZpf70vAJGwm7EDkO1EqzerDK4wxgwdfxiDzr9e98nFccD9nS37XrgkDu+ 62Z7o64qM8ctTA2/Jcvzfd6OsDWIbD2AeBAnfqY83egU9Zsm+kNZ6Rw78vH8QrvoVHK/ cGyQ== X-Gm-Message-State: AOPr4FXmHb7T4YAaM4WR3fnO37q4lT4c+ruiKwoGllyp03KJ7+3BEnDZEe2y/Fi+XrHERf32f1j9ymGpPP3ltA== X-Received: by 10.25.80.72 with SMTP id e69mr4205751lfb.96.1462372637610; Wed, 04 May 2016 07:37:17 -0700 (PDT) In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: X-Mailman-Original-References: <572407AE.1070703@icloud.com> <1461979797.3824480.593944273.0B8D8DF3@webmail.messagingengine.com> <57241097.7020801@icloud.com> <57296c7a$0$1589$c3e8da3$5496439d@news.astraweb.com> Xref: csiph.com comp.lang.python:108132 On Thu, May 5, 2016 at 12:09 AM, DFS wrote: > On 5/3/2016 11:28 PM, Steven D'Aprano wrote: >> [ lengthy piece about text, Unicode, and letter case ] > > Linguist much? As an English-only speaker who writes code that needs to be used around the world, you end up accruing tidbits of language and text trivia in the form of edge cases that you need to remember to test. Among them: * Turkish dotless and dotted i * Greek medial and final sigma * German eszett * Hebrew and Arabic right-to-left text * Chinese non-BMP characters * Combining characters (eg diacriticals starting U+0300) * Non-characters eg U+FFFE And then a post like Steven's basically comes from pulling up all those from your memory, and maybe doing a spot of quick testing and/or research to get some explanatory details. You don't have to be a linguist, necessarily - just a competent debugger. ChrisA