Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Chris Angelico Newsgroups: comp.lang.python Subject: Re: Late-binding of function defaults (was Re: What is a function parameter =[] for?) Date: Fri, 27 Nov 2015 10:07:34 +1100 Lines: 87 Message-ID: References: <87d1v5emhl.fsf@elektro.pacujo.net> <564e6a62$0$1620$c3e8da3$5496439d@news.astraweb.com> <565592e9$0$1615$c3e8da3$5496439d@news.astraweb.com> <87k2p54tdr.fsf@elektro.pacujo.net> <565652e1$0$1619$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: news.uni-berlin.de cD225I7HgTGTCtf7Zx99AAssmYoRE1r1v3J9vwAhA5iw== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'defines': 0.07; 'filenames': 0.07; 'omit': 0.07; 'performs': 0.07; 'plenty': 0.07; 'utf-8': 0.07; 'web-site': 0.07; 'cc:addr:python-list': 0.09; 'check.': 0.09; 'impose': 0.09; 'utf8': 0.09; 'python': 0.10; 'subject: \n ': 0.15; 'thu,': 0.15; 'value.': 0.15; '(also': 0.16; '(note:': 0.16; 'categories,': 0.16; 'dns.': 0.16; 'fits': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'hacks': 0.16; 'identifiers': 0.16; 'identifiers,': 0.16; 'insensitive': 0.16; 'languages)': 0.16; 'lookups': 0.16; 'lower- case': 0.16; 'messy': 0.16; 'program),': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'sensitivity.': 0.16; 'sharp': 0.16; 'subject:?)': 0.16; 'windows:': 0.16; 'wrote:': 0.16; 'string': 0.17; 'byte': 0.18; 'bytes': 0.18; 'instance,': 0.18; 'integer': 0.18; 'resolved': 0.18; 'restrictions': 0.18; 'language': 0.19; '>>>': 0.20; 'windows': 0.20; '2015': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'assuming': 0.22; 'names.': 0.22; 'visible': 0.22; 'am,': 0.23; '(or': 0.23; '(where': 0.23; 'second': 0.24; 'header:In-Reply-To:1': 0.24; "doesn't": 0.26; 'chris': 0.26; 'figure': 0.27; 'fri,': 0.27; 'used,': 0.27; 'message-id:@mail.gmail.com': 0.27; 'pieces': 0.27; 'equally': 0.29; 'searches': 0.29; 'such.': 0.29; 'sure,': 0.29; 'character': 0.29; "i'm": 0.30; 'code': 0.30; 'strongly': 0.30; "i'd": 0.31; 'rules': 0.31; 'everyone': 0.31; "can't": 0.32; 'addresses': 0.32; 'generally': 0.32; 'german': 0.32; 'source': 0.33; 'file': 0.34; 'received:google.com': 0.35; 'could': 0.35; 'done': 0.35; 'newer': 0.35; 'nov': 0.35; 'unicode': 0.35; 'something': 0.35; 'problem.': 0.35; 'but': 0.36; 'too': 0.36; 'there': 0.36; 'received:209.85': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'received:209.85.213': 0.37; 'things': 0.38; 'received:209': 0.38; 'mean': 0.38; 'google': 0.39; 'test': 0.39; 'sure': 0.39; 'does': 0.39; 'subject:-': 0.39; 'where': 0.40; 'still': 0.40; 'called': 0.40; 'some': 0.40; 'your': 0.60; "you'll": 0.61; 'back': 0.62; 'here.': 0.62; 'skip:n 10': 0.62; 'making': 0.62; 'matter': 0.63; 'more': 0.63; 'different': 0.63; 'within': 0.64; '8bit%:31': 0.66; 'results': 0.66; 'family': 0.68; '26,': 0.72; 'capital': 0.72; '"ss",': 0.84; 'chrisa': 0.84; 'latin': 0.84; 'presumably': 0.84; 'sensitivity': 0.84; "there'll": 0.84; 'turkish': 0.84; 'world?': 0.84; 'absolutely': 0.88; 'to:none': 0.91; 'grew': 0.91; 'problems?': 0.91; 'reducing': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type:content-transfer-encoding; bh=9kVWg0Ud/0Es1XJgHvdwlsadPoF4hzzjau28wPLEshs=; b=QSnSKDpL1e1hYGb4PrBGkd/HMVZmAGl0MJrMlwqv/sQw61/k7VTM2HzWv+QargYsNv u9JvbU5I0/TbRtjnDPuBj6OkEIvLdZ0uPJOG827hoUCuP3lKD6XPvyKqJf1uYVsZ3gNP sCRIgemnuj3p4tdntFBjuPLebRxhmrXqDo4zFJKXhvaoFF9+wi3zdu9iSVJFoTn8i0Zr pXVmVYp6mTC1CCxXgVuZThawzq4ZWq1YJipMZmsmo/3FJE1vDJOftJGM/u8M4kH0g0aP IsPZmZtX+sAlCt1tVmzC8pXnGAkkKCtO2G+qEYaEr4QNy0xALRVRkWrFWcTZHGELPULM P+rA== X-Received: by 10.50.30.6 with SMTP id o6mr5464202igh.94.1448579254407; Thu, 26 Nov 2015 15:07:34 -0800 (PST) In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:99606 On Fri, Nov 27, 2015 at 9:27 AM, BartC wrote: > On 26/11/2015 13:15, Chris Angelico wrote: >> >> On Thu, Nov 26, 2015 at 11:53 PM, BartC wrote: > > >>> http://pastebin.com/JrVTher6 > > >> #14 and #15: Are you assuming that a character is a byte and that >> diacritical-free English is the only language in the world? > > > I don't think that need be the assumption. Any UTF8 string that fits with= in > 8 bytes could also be represented by an integer value. Okay, so you're making UTF-8 your visible string representation. That's better than assuming character=3D=3Dbyte, but it still has the case insensitivity problem. >> Case >> insensitivity is a *pain* when you try to be language-agnostic; for >> instance, the case-folding rules of English state that U+0069 LATIN >> SMALL LETTER I and U+0049 LATIN CAPITAL LETTER I are identical, but >> Turkish would upper-case the first to U+0130 LATIN CAPITAL LETTER I >> WITH DOT ABOVE and lower-case the second to U+0131 LATIN SMALL LETTER >> DOTLESS I. German has U+00DF LATIN SMALL LETTER SHARP S (also called >> eszett), which traditionally upper-cases to "SS", which lower-cases to >> "ss". > > > I use Windows which is also case insensitive with regard to filenames and > such. How does it solve those problems? How about web-site names, email > addresses and Google searches? Windows: I'm not sure, and frankly, I don't trust it. A quick test showed a couple of failures: C:\Users\Rosuav\Desktop>dir /b TE* te=C3=9Fting C:\Users\Rosuav\Desktop>dir /b TESST* File Not Found C:\Users\Rosuav\Desktop>dir /b Par=C4=B1ld=C4=B1YOR* Par=C4=B1ld=C4=B1yor Parts & Pieces C:\Users\Rosuav\Desktop>dir /b PARILDIYOR* File Not Found It might be case insensitive only for ASCII. (Note: This test was done on Windows 7, because that's the VM I had handy. Things might be different on newer Windowses, but I can't check. Web site names: Presumably you mean DNS. It started out as an ASCII-only protocol, and grew a number of gross hacks to support "internationalized domain names". I'm not sure where the case insensitivity is applied; but it doesn't matter too much, because conflicts can be resolved at registration. Also, you'll generally see IDNs in country-specific TLDs, so there'll be only one language (or a small family of languages) used, reducing the likelihood of collisions. Google searches are (deliberately) a LOT more sloppy than just case sensitivity. You can search for something without diacriticals and get back results with diacriticals; you can transpose letters, omit letters, have extra letters, and it'll generally figure out what you want. This is absolutely awesome for a search engine, but equally horrifying for name lookups in a program. None of these is something I'd recommend following. > Within a program source code (where you have mainly technical users), you > can just impose some restrictions on keywords and identifiers otherwise > there are plenty of problems even without case switching, if you want to > allow Unicode here. I would strongly support ASCII-only *language keywords*. You don't have many of them (compared to the number of identifiers in a program), and everyone has to type them. But for identifiers, Python 3 defines character validity based on Unicode categories, and performs NFKC normalization on all names. That's pretty straight-forward. No case sensitivity hassles, no messy non-transitive equalities, it's easy. ChrisA