Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'say,': 0.05; '"as': 0.07; 'failing': 0.07; 'plenty': 0.07; 'subject:support': 0.07; 'string': 0.09; '(unicode': 0.09; 'bug.': 0.09; 'complicate': 0.09; 'escape': 0.09; 'expense': 0.09; 'mentions': 0.09; 'python': 0.11; 'thread': 0.14; 'fails.': 0.16; 'guilty': 0.16; 'non-ascii': 0.16; 'rule.': 0.16; 'saying.': 0.16; 'semantics': 0.16; 'subject:Unicode': 0.16; 'url:browse_thread': 0.16; 'url:thread': 0.16; 'sat,': 0.16; 'wrote:': 0.18; 'trying': 0.19; 'saying': 0.22; 'to:name:python-list@python.org': 0.22; 'this?': 0.23; 'recognize': 0.24; 'unicode': 0.24; 'header:In-Reply-To:1': 0.27; 'chris': 0.29; 'am,': 0.29; 'character': 0.29; 'especially': 0.30; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'gives': 0.31; 'code': 0.31; 'usually': 0.31; 'way?': 0.31; 'figure': 0.32; 'url:python': 0.33; '-----': 0.33; 'implemented': 0.33; 'totally': 0.33; 'received:209.85': 0.35; 'test': 0.35; 'received:google.com': 0.35; 'really': 0.36; 'received:209.85.210': 0.36; 'should': 0.36; 'two': 0.37; 'received:209': 0.37; 'performance': 0.37; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'previous': 0.38; 'does': 0.39; 'to:addr:python.org': 0.39; 'how': 0.40; 'introduced': 0.61; 'url:group': 0.63; 'provide': 0.64; 'account': 0.65; 'skip:\xe2 10': 0.65; '20,': 0.68; '21st': 0.68; '8bit%:43': 0.74; '.replace': 0.84; 'batchelder': 0.84; 'characters,': 0.84; 'fonts': 0.84; 'hijacking': 0.84; 'horrible': 0.84; 'tex': 0.84; 'url:lang': 0.84; 'mean.': 0.91; '2013': 0.98 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-received:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding :x-gm-message-state; bh=JEpeD3UTQGrojv/as0oMUjfejCBH8z/RRNpgVgzqX+E=; b=cAjEoLUaWWYhaQepK3XLpF38ovVU3iIzytT54dtaYuI63BJZV79Ekw8k5cIs7o50wA XRDY+lRnP/i3naXwf4NNz5hLI3meUP9TrPeS7PBgu6+tdtqB7Kj6rvPtboHfQB83PKSl Xv+g4BrhCnAlH62ycmw5AUhD97X7iEnQNBFy5GFBJWONLp3G0gieKCU5EChusIVvQwNT Jyk1tpzXw6V2xW1uGKu2fRdHE3B9uAykchMV85UnT6vt4c7qbcejjEXUH8cBZS7LapNi algA2bn0V8QLbx0N4LfXoJWRg4k8dqEVxlV7BZ1k0tWE2OXjVpErawi7UFg+oVcB9dFV nqVQ== X-Received: by 10.60.37.68 with SMTP id w4mr6872347oej.62.1366480974782; Sat, 20 Apr 2013 11:02:54 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.60.37.68 with SMTP id w4mr6872342oej.62.1366480974688; Sat, 20 Apr 2013 11:02:54 -0700 (PDT) In-Reply-To: <5172CEE4.9070403@nedbatchelder.com> References: <5172CEE4.9070403@nedbatchelder.com> Date: Sat, 20 Apr 2013 11:02:54 -0700 Subject: Re: Is Unicode support so hard... From: Benjamin Kaplan To: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQmgwXuseVqp5w25PeIrqg+VJoZ90EMhjrYqbl/E4XR8Mli5roCodLO9/JE94uMLG9TJbmUfjpEH8Zft9QazNjwglxDmBzjUfjIZtwHl3vkIKYvrR/fGuXbUpdSUfm1m13svF4S2O7oEB841i0jv9rjlnUyUqw== X-Junkmail-Whitelist: YES (by domain whitelist at mpv2.tis.cwru.edu) X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 53 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1366481215 news.xs4all.nl 2260 [2001:888:2000:d::a6]:56966 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:43961 On Sat, Apr 20, 2013 at 10:22 AM, Ned Batchelder wr= ote: > On 4/20/2013 1:12 PM, jmfauth wrote: >> >> In a previous post, >> >> >> http://groups.google.com/group/comp.lang.python/browse_thread/thread/6ae= c70817705c226# >> , >> >> Chris =E2=80=9CKwpolska=E2=80=9D Warrick wrote: >> >> =E2=80=9CIs Unicode support so hard, especially in the 21st century?=E2= =80=9D >> >> -- >> >> Unicode is not really complicate and it works very well (more >> than two decades of development if you take into account >> iso-14****). >> >> But, - I can say, "as usual" - people prefer to spend their >> time to make a "better Unicode than Unicode" and it usually >> fails. Python does not escape to this rule. >> >> ----- >> >> I'm "busy" with TeX (unicode engine variant), fonts and typography. >> This gives me plenty of ideas to test the "flexible string >> representation" (FSR). I should recognize this FSR is failing >> particulary very well... >> >> I can almost say, a delight. >> >> jmf >> Unicode lover > > I'm totally confused about what you are saying. What does "make a better > Unicode than Unicode" mean? Are you saying that Python is guilty of this= ? > In what way? Can you provide specifics? Or are you saying that you like > how Python has implemented it? "FSR is failing ... a delight"? I don't > know what you mean. > > --Ned. Don't bother trying to figure this out. jmfauth has been hijacking every thread that mentions Unicode to complain about the flexible string representation introduced in Python 3.3. Apparently, having proper Unicode semantics (indexing is based on characters, not code points) at the expense of performance when calling .replace on the only non-ASCII or BMP character in the string is a horrible bug.