Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!eweka.nl!hq-usenetpeers.eweka.nl!xlned.com!feeder1.xlned.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.008 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'skip:[ 20': 0.03; 'locale': 0.07; 'python': 0.09; 'pep': 0.09; 'regression': 0.09; 'sep': 0.09; 'skip:k 40': 0.09; 'subject:string': 0.09; 'type;': 0.09; '"this': 0.13; '"import': 0.16; '"python': 0.16; 'functools': 0.16; 'googling': 0.16; 'instead:': 0.16; 'rationale': 0.16; 'relevant.': 0.16; 'rewriting': 0.16; 'subject:unicode': 0.16; 'wrote:': 0.17; 'example.': 0.17; 'unicode': 0.17; '>>>': 0.18; 'discussion': 0.20; 'import': 0.21; '3.2': 0.22; 'example': 0.23; 'second': 0.24; 'header:In-Reply- To:1': 0.25; 'appear': 0.26; 'am,': 0.27; 'see,': 0.27; 'thoughts': 0.27; 'message-id:@mail.gmail.com': 0.27; '>>>>': 0.29; 'publicly': 0.29; 'character': 0.29; "i'm": 0.29; "skip:' 10": 0.30; 'implement': 0.32; "skip:' 20": 0.32; 'turns': 0.33; 'to:addr:python-list': 0.33; 'received:google.com': 0.34; 'faster': 0.35; 'skip:l 30': 0.35; 'received:209.85': 0.35; 'except': 0.36; 'unable': 0.36; 'available.': 0.37; 'does': 0.37; 'two': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'skip:l 20': 0.38; 'nothing': 0.38; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'real': 0.61; 'subject:, ': 0.61; 'world': 0.63; 'subject:...': 0.63; 'skip:n 10': 0.63; 'show': 0.63; 'charset:windows-1252': 0.65; 'taking': 0.65; 'offer': 0.65; 'account': 0.67; 'everybody': 0.69; '"they': 0.84; '77%': 0.84; 'subject:, ...': 0.84; 'to:name:python': 0.84; 'clueless': 0.91; 'either:': 0.91; 'thoughts,': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=+lzNRffQXcd+M4P53YPkbrltX53b6HOf4bPHeexowos=; b=DQNQ/MzbLMOWuSAuJZDMqmffWYK8R9zQFPd+ZQwKTA1PWsZ/8MCvQdeRfayJq1Qsvm 8gG/lp6Fe2ZEHNRKEfa8mL3EHiUNDi2+x4dpDveFWE74lH9bXKkOU2EZod70d4NKsNag QV5wUSn4oyTU5o0QUOhptJHDRrzyNWbfSxUP6g5lsBDwgG8RAV++43PGMjBB3dmd1LHD gjfxkl6K82HGKyxKqMUZbCijJQop5mVWbSE57p9vxEe9bERMhDjioxvWp3G8suVABxbb zpvNELcjccevcqkN3R/vWv2XhPezpV4QqgQ1/+KQ0hOZ9r83cFG+KOvO37CnejVrfSVj fgeQ== MIME-Version: 1.0 In-Reply-To: <2a12ba52-232a-41b7-a906-1ec379bbddd7@googlegroups.com> References: <1cb3f062-eb45-4b0c-977b-76afb099923c@googlegroups.com> <503a0d51$0$6574$c3e8da3$5496439d@news.astraweb.com> <503a8361$0$6574$c3e8da3$5496439d@news.astraweb.com> <2e92da71-fbd2-467f-9088-1c79fa7bcf69@googlegroups.com> <62566024-df1d-4948-a27a-45c7820ddc6c@googlegroups.com> <503f0e45$0$9416$c3e8da3$76491128@news.astraweb.com> <2a12ba52-232a-41b7-a906-1ec379bbddd7@googlegroups.com> From: Ian Kelly Date: Sun, 2 Sep 2012 03:06:16 -0600 Subject: Re: Flexible string representation, unicode, typography, ... To: Python Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 61 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1346576808 news.xs4all.nl 6969 [2001:888:2000:d::a6]:34489 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:28251 On Sun, Sep 2, 2012 at 1:36 AM, wrote: > I still remember my thoughts when I read the PEP 393 > discussion: "this is not logical", "they do no understand > typography", "atomic character ???", ... That would indicate one of two possibilities. Either: 1) Everybody in the PEP 393 discussion except for you is clueless about how to implement a Unicode type; or 2) You are clueless about how to implement a Unicode type. Taking into account Occam's razor, and also that you seem to be unable or unwilling to offer a solid rationale for those thoughts, I have to say that I'm currently leaning toward the second possibility. > Real world exemples. > >>>> import libfrancais >>>> li =3D ['no=EBl', 'noir', 'n=9Cud', 'noduleux', \ > ... 'no=E9tique', 'no=E8se', 'noir=E2tre'] >>>> r =3D libfrancais.sortfr(li) >>>> r > ['noduleux', 'no=EBl', 'no=E8se', 'no=E9tique', 'n=9Cud', 'noir', > 'noir=E2tre'] libfrancais does not appear to be publicly available. It's not listed in PyPI, and googling for "python libfrancais" turns up nothing relevant. Rewriting the example to use locale.strcoll instead: >>> li =3D ['no=EBl', 'noir', 'n=9Cud', 'noduleux', 'no=E9tique', 'no=E8se'= , 'noir=E2tre'] >>> import locale >>> locale.setlocale(locale.LC_ALL, 'French_France') 'French_France.1252' >>> import functools >>> sorted(li, key=3Dfunctools.cmp_to_key(locale.strcoll)) ['noduleux', 'no=EBl', 'no=E8se', 'no=E9tique', 'n=9Cud', 'noir', 'noir=E2t= re'] # Python 3.2 >>> import timeit >>> timeit.repeat("sorted(li, key=3Dfunctools.cmp_to_key(locale.strcoll))",= "import functools; import locale; li =3D ['no=EBl', 'noir', 'n=9Cud', 'nod= uleux', 'no=E9tique', 'no=E8se', 'noir=E2tre']", number=3D10000) [0.5544277025009592, 0.5370117249557325, 0.5551836677925053] # Python 3.3 >>> import timeit >>> timeit.repeat("sorted(li, key=3Dfunctools.cmp_to_key(locale.strcoll))",= "import functools; import locale; li =3D ['no=EBl', 'noir', 'n=9Cud', 'nod= uleux', 'no=E9tique', 'no=E8se', 'noir=E2tre']", number=3D10000) [0.1421166788364303, 0.12389078130001963, 0.13184190553613462] As you can see, Python 3.3 is about 77% faster than Python 3.2 on this example. If this was intended to show that the Python 3.3 Unicode representation is a regression over the Python 3.2 implementation, then it's a complete failure as an example.