Received: by 10.224.87.77 with SMTP id v13mr9382742qal.5.1346612288526; Sun, 02 Sep 2012 11:58:08 -0700 (PDT) Received: by 10.236.180.41 with SMTP id i29mr1148393yhm.4.1346612288458; Sun, 02 Sep 2012 11:58:08 -0700 (PDT) Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!b19no371304qas.0!news-out.google.com!da15ni7637554qab.0!nntp.google.com!b19no371295qas.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.python Date: Sun, 2 Sep 2012 11:58:08 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=83.79.166.230; posting-account=ung4FAoAAAC46zhHJ0Nsnuox7M5gDvs_ NNTP-Posting-Host: 83.79.166.230 References: <1cb3f062-eb45-4b0c-977b-76afb099923c@googlegroups.com> <503a0d51$0$6574$c3e8da3$5496439d@news.astraweb.com> <503a8361$0$6574$c3e8da3$5496439d@news.astraweb.com> <2e92da71-fbd2-467f-9088-1c79fa7bcf69@googlegroups.com> <62566024-df1d-4948-a27a-45c7820ddc6c@googlegroups.com> <503f0e45$0$9416$c3e8da3$76491128@news.astraweb.com> <2a12ba52-232a-41b7-a906-1ec379bbddd7@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: Re: Flexible string representation, unicode, typography, ... From: wxjmfauth@gmail.com Cc: Python Injection-Date: Sun, 02 Sep 2012 18:58:08 +0000 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Xref: csiph.com comp.lang.python:28292 Le dimanche 2 septembre 2012 11:07:35 UTC+2, Ian a =E9crit=A0: > On Sun, Sep 2, 2012 at 1:36 AM, wrote: >=20 > > I still remember my thoughts when I read the PEP 393 >=20 > > discussion: "this is not logical", "they do no understand >=20 > > typography", "atomic character ???", ... >=20 >=20 >=20 > That would indicate one of two possibilities. Either: >=20 >=20 >=20 > 1) Everybody in the PEP 393 discussion except for you is clueless >=20 > about how to implement a Unicode type; or >=20 >=20 >=20 > 2) You are clueless about how to implement a Unicode type. >=20 >=20 >=20 > Taking into account Occam's razor, and also that you seem to be unable >=20 > or unwilling to offer a solid rationale for those thoughts, I have to >=20 > say that I'm currently leaning toward the second possibility. >=20 >=20 >=20 >=20 >=20 > > Real world exemples. >=20 > > >=20 > >>>> import libfrancais >=20 > >>>> li =3D ['no=EBl', 'noir', 'n=9Cud', 'noduleux', \ >=20 > > ... 'no=E9tique', 'no=E8se', 'noir=E2tre'] >=20 > >>>> r =3D libfrancais.sortfr(li) >=20 > >>>> r >=20 > > ['noduleux', 'no=EBl', 'no=E8se', 'no=E9tique', 'n=9Cud', 'noir', >=20 > > 'noir=E2tre'] >=20 >=20 >=20 > libfrancais does not appear to be publicly available. It's not listed >=20 > in PyPI, and googling for "python libfrancais" turns up nothing >=20 > relevant. >=20 >=20 >=20 > Rewriting the example to use locale.strcoll instead: >=20 >=20 >=20 > >>> li =3D ['no=EBl', 'noir', 'n=9Cud', 'noduleux', 'no=E9tique', 'no=E8s= e', 'noir=E2tre'] >=20 > >>> import locale >=20 > >>> locale.setlocale(locale.LC_ALL, 'French_France') >=20 > 'French_France.1252' >=20 > >>> import functools >=20 > >>> sorted(li, key=3Dfunctools.cmp_to_key(locale.strcoll)) >=20 > ['noduleux', 'no=EBl', 'no=E8se', 'no=E9tique', 'n=9Cud', 'noir', 'noir= =E2tre'] >=20 >=20 >=20 > # Python 3.2 >=20 > >>> import timeit >=20 > >>> timeit.repeat("sorted(li, key=3Dfunctools.cmp_to_key(locale.strcoll))= ", "import functools; import locale; li =3D ['no=EBl', 'noir', 'n=9Cud', 'n= oduleux', 'no=E9tique', 'no=E8se', 'noir=E2tre']", number=3D10000) >=20 > [0.5544277025009592, 0.5370117249557325, 0.5551836677925053] >=20 >=20 >=20 > # Python 3.3 >=20 > >>> import timeit >=20 > >>> timeit.repeat("sorted(li, key=3Dfunctools.cmp_to_key(locale.strcoll))= ", "import functools; import locale; li =3D ['no=EBl', 'noir', 'n=9Cud', 'n= oduleux', 'no=E9tique', 'no=E8se', 'noir=E2tre']", number=3D10000) >=20 > [0.1421166788364303, 0.12389078130001963, 0.13184190553613462] >=20 >=20 > As you can see, Python 3.3 is about 77% faster than Python 3.2 on this >=20 > example. If this was intended to show that the Python 3.3 Unicode >=20 > representation is a regression over the Python 3.2 implementation, >=20 > then it's a complete failure as an example. - Unfortunately, I got opposite and even much worst results on my win box, considering - libfrancais is one of my module and it does a little bit more than the std sorting tools.=20 My rationale: very simple. 1) I never heard about something better than sticking with one of the Unicode coding scheme. (genreral theory) 2) I am not at all convinced by the "new" Py 3.3 algorithm. I'm not the only one guy, who noticed problems. Arguing, "it is fast enough", is not a correct answer. jmf