Path: csiph.com!usenet.pasdenom.info!dedibox.gegeweb.org!gegeweb.eu!nntpfeed.proxad.net!proxad.net!feeder1-2.proxad.net!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.006 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'skip:[ 20': 0.03; 'locale': 0.07; 'python': 0.09; 'pep': 0.09; 'regression': 0.09; 'sep': 0.09; 'skip:k 40': 0.09; 'subject:string': 0.09; 'to:addr:comp.lang.python': 0.09; 'type;': 0.09; 'cc:addr:python- list': 0.10; '"this': 0.13; '"import': 0.16; '"python': 0.16; 'algorithm.': 0.16; 'functools': 0.16; 'googling': 0.16; 'instead:': 0.16; 'opposite': 0.16; 'rationale': 0.16; 'relevant.': 0.16; 'rewriting': 0.16; 'simple.': 0.16; 'sorting': 0.16; 'subject:unicode': 0.16; 'worst': 0.16; 'wrote:': 0.17; 'example.': 0.17; 'unicode': 0.17; '>>>': 0.18; 'module': 0.19; 'discussion': 0.20; 'bit': 0.21; 'import': 0.21; '3.2': 0.22; 'cc:2**0': 0.23; 'example': 0.23; 'second': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header :User-Agent:1': 0.26; 'appear': 0.26; 'am,': 0.27; 'coding': 0.27; 'see,': 0.27; 'thoughts': 0.27; 'correct': 0.28; 'noticed': 0.28; '>>>>': 0.29; 'publicly': 0.29; 'scheme.': 0.29; 'character': 0.29; "i'm": 0.29; "skip:' 10": 0.30; 'implement': 0.32; "skip:' 20": 0.32; 'turns': 0.33; 'received:google.com': 0.34; 'faster': 0.35; 'skip:l 30': 0.35; 'received:209.85': 0.35; 'something': 0.35; 'except': 0.36; 'unable': 0.36; 'available.': 0.37; 'does': 0.37; 'two': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'skip:l 20': 0.38; 'nothing': 0.38; 'received:209.85.214': 0.39; 'little': 0.39; 'from:no real name:2**0': 0.60; 'skip:u 10': 0.60; 'real': 0.61; 'subject:, ': 0.61; 'world': 0.63; 'subject:...': 0.63; 'skip:n 10': 0.63; 'more': 0.63; 'show': 0.63; 'charset:windows-1252': 0.65; 'results': 0.65; 'taking': 0.65; 'offer': 0.65; 'account': 0.67; 'box,': 0.69; 'everybody': 0.69; 'answer.': 0.71; '"they': 0.84; '77%': 0.84; 'received:209.85.214.184': 0.84; 'received:mail- ob0-f184.google.com': 0.84; 'subject:, ...': 0.84; '"it': 0.91; 'clueless': 0.91; 'either:': 0.91; 'thoughts,': 0.91; 'convinced': 0.93 Newsgroups: comp.lang.python Date: Sun, 2 Sep 2012 11:58:08 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=83.79.166.230; posting-account=ung4FAoAAAC46zhHJ0Nsnuox7M5gDvs_ References: <1cb3f062-eb45-4b0c-977b-76afb099923c@googlegroups.com> <503a0d51$0$6574$c3e8da3$5496439d@news.astraweb.com> <503a8361$0$6574$c3e8da3$5496439d@news.astraweb.com> <2e92da71-fbd2-467f-9088-1c79fa7bcf69@googlegroups.com> <62566024-df1d-4948-a27a-45c7820ddc6c@googlegroups.com> <503f0e45$0$9416$c3e8da3$76491128@news.astraweb.com> <2a12ba52-232a-41b7-a906-1ec379bbddd7@googlegroups.com> User-Agent: G2/1.0 X-Google-Web-Client: true X-Google-IP: 83.79.166.230 MIME-Version: 1.0 Subject: Re: Flexible string representation, unicode, typography, ... From: wxjmfauth@gmail.com To: comp.lang.python@googlegroups.com Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Cc: Python X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Message-ID: Lines: 134 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1346612296 news.xs4all.nl 6928 [2001:888:2000:d::a6]:57405 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:28293 Le dimanche 2 septembre 2012 11:07:35 UTC+2, Ian a =E9crit=A0: > On Sun, Sep 2, 2012 at 1:36 AM, wrote: >=20 > > I still remember my thoughts when I read the PEP 393 >=20 > > discussion: "this is not logical", "they do no understand >=20 > > typography", "atomic character ???", ... >=20 >=20 >=20 > That would indicate one of two possibilities. Either: >=20 >=20 >=20 > 1) Everybody in the PEP 393 discussion except for you is clueless >=20 > about how to implement a Unicode type; or >=20 >=20 >=20 > 2) You are clueless about how to implement a Unicode type. >=20 >=20 >=20 > Taking into account Occam's razor, and also that you seem to be unable >=20 > or unwilling to offer a solid rationale for those thoughts, I have to >=20 > say that I'm currently leaning toward the second possibility. >=20 >=20 >=20 >=20 >=20 > > Real world exemples. >=20 > > >=20 > >>>> import libfrancais >=20 > >>>> li =3D ['no=EBl', 'noir', 'n=9Cud', 'noduleux', \ >=20 > > ... 'no=E9tique', 'no=E8se', 'noir=E2tre'] >=20 > >>>> r =3D libfrancais.sortfr(li) >=20 > >>>> r >=20 > > ['noduleux', 'no=EBl', 'no=E8se', 'no=E9tique', 'n=9Cud', 'noir', >=20 > > 'noir=E2tre'] >=20 >=20 >=20 > libfrancais does not appear to be publicly available. It's not listed >=20 > in PyPI, and googling for "python libfrancais" turns up nothing >=20 > relevant. >=20 >=20 >=20 > Rewriting the example to use locale.strcoll instead: >=20 >=20 >=20 > >>> li =3D ['no=EBl', 'noir', 'n=9Cud', 'noduleux', 'no=E9tique', 'no=E8s= e', 'noir=E2tre'] >=20 > >>> import locale >=20 > >>> locale.setlocale(locale.LC_ALL, 'French_France') >=20 > 'French_France.1252' >=20 > >>> import functools >=20 > >>> sorted(li, key=3Dfunctools.cmp_to_key(locale.strcoll)) >=20 > ['noduleux', 'no=EBl', 'no=E8se', 'no=E9tique', 'n=9Cud', 'noir', 'noir= =E2tre'] >=20 >=20 >=20 > # Python 3.2 >=20 > >>> import timeit >=20 > >>> timeit.repeat("sorted(li, key=3Dfunctools.cmp_to_key(locale.strcoll))= ", "import functools; import locale; li =3D ['no=EBl', 'noir', 'n=9Cud', 'n= oduleux', 'no=E9tique', 'no=E8se', 'noir=E2tre']", number=3D10000) >=20 > [0.5544277025009592, 0.5370117249557325, 0.5551836677925053] >=20 >=20 >=20 > # Python 3.3 >=20 > >>> import timeit >=20 > >>> timeit.repeat("sorted(li, key=3Dfunctools.cmp_to_key(locale.strcoll))= ", "import functools; import locale; li =3D ['no=EBl', 'noir', 'n=9Cud', 'n= oduleux', 'no=E9tique', 'no=E8se', 'noir=E2tre']", number=3D10000) >=20 > [0.1421166788364303, 0.12389078130001963, 0.13184190553613462] >=20 >=20 > As you can see, Python 3.3 is about 77% faster than Python 3.2 on this >=20 > example. If this was intended to show that the Python 3.3 Unicode >=20 > representation is a regression over the Python 3.2 implementation, >=20 > then it's a complete failure as an example. - Unfortunately, I got opposite and even much worst results on my win box, considering - libfrancais is one of my module and it does a little bit more than the std sorting tools.=20 My rationale: very simple. 1) I never heard about something better than sticking with one of the Unicode coding scheme. (genreral theory) 2) I am not at all convinced by the "new" Py 3.3 algorithm. I'm not the only one guy, who noticed problems. Arguing, "it is fast enough", is not a correct answer. jmf