Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
To: python-list@python.org
From: Mark Lawrence <breamoreboy@yahoo.co.uk>
Subject: Re: trying to strip out non ascii.. or rather convert non ascii
Date: Tue, 29 Oct 2013 19:54:08 +0000
References: <mailman.1604.1382818293.18130.python-list@python.org> <526c412a$0$29972$c3e8da3$5496439d@news.astraweb.com> <mailman.1628.1382838024.18130.python-list@python.org> <pan.2013.10.27.03.21.57.202000@nowhere.com> <d205042e-29cd-49df-9f6e-600e123f8483@googlegroups.com> <526f4612$0$6512$c3e8da3$5496439d@news.astraweb.com> <63fa9fcd-6445-41ee-8873-e1ee046e2031@googlegroups.com> <mailman.1761.1383061878.18130.python-list@python.org> <9319e982-4628-4f32-b5cc-60eadca121fc@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.0.1
In-Reply-To: <9319e982-4628-4f32-b5cc-60eadca121fc@googlegroups.com>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.1773.1383076460.18130.python-list@python.org>
Lines: 74
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:57964

On 29/10/2013 19:16, wxjmfauth@gmail.com wrote:
> Le mardi 29 octobre 2013 16:52:49 UTC+1, Tim Chase a écrit :
>> On 2013-10-29 08:38, wxjmfauth@gmail.com wrote:
>>
>>>>>> import timeit
>>
>>>>>> timeit.timeit("a = 'hundred'; 'x' in a")
>>
>>> 0.12621293837694095
>>
>>>>>> timeit.timeit("a = 'hundreĳ'; 'x' in a")
>>
>>> 0.26411553466961735
>>
>>
>>
>> That reads to me as "If things were purely UCS4 internally, Python
>>
>> would normally take 0.264... seconds to execute this test, but core
>>
>> devs managed to optimize a particular (lower 127 ASCII characters
>>
>> only) case so that it runs in less than half the time."
>>
>>
>>
>> Is this not what you intended to demonstrate?  'cuz that sounds
>>
>> like a pretty awesome optimization to me.
>>
>>
>>
>> -tkc
>
> --------
>
> That's very naive. In fact, what happens is just the opposite.
> The "best case" with the FSR is worst than the "worst case"
> without the FSR.
>
> And this is just without counting the effect that this poor
> Python is spending its time in switching from one internal
> representation to one another, without forgetting the fact
> that this has to be tested every time.
> The more unicode manipulations one applies, the more time
> it demands.
>
> Two tasks, that come in my mind: re and normalization.
> It's very interesting to observe what happens when one
> normalizes latin text and polytonic Greek text, both with
> plenty of diactrics.
>
> ----
>
> Something different, based on my previous example.
>
> What a European user is supposed to think, when she/he
> sees, she/he can be "penalized" by such an amount,
> simply by using non ascii characters for a product
> which is supposed to be "unicode compliant" ?
>
> jmf
>

Please provide hard evidence to support your claims or stop posting this 
ridiculous nonsense.  Give us real world problems that can be reported 
on the bug tracker, investigated and resolved.

-- 
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence