Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #57995

Re: trying to strip out non ascii.. or rather convert non ascii

Path csiph.com!usenet.pasdenom.info!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <rosuav@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.007
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'string.': 0.05; 'string': 0.09; 'ascii': 0.09; 'badly': 0.09; 'lawrence': 0.09; 'subject:trying': 0.09; '16-bit': 0.16; '8-bit': 0.16; 'compare.': 0.16; 'distinct': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'sliced': 0.16; 'subject:non': 0.16; 'wrote:': 0.18; 'obviously': 0.18; 'wed,': 0.18; 'figures': 0.19; 'examples': 0.20; '>>>': 0.22; 'import': 0.22; 'example.': 0.24; 'unicode': 0.24; 'handling': 0.26; 'this:': 0.26; 'gets': 0.27; 'header:In-Reply-To:1': 0.27; 'idea': 0.28; 'testing': 0.29; 'am,': 0.29; 'character': 0.29; 'message-id:@mail.gmail.com': 0.30; "skip:' 10": 0.31; 'along.': 0.31; 'fast.': 0.31; 'searches': 0.31; 'trivial': 0.31; 'handled': 0.32; 'could': 0.34; 'but': 0.35; 'received:google.com': 0.35; "he's": 0.36; 'done': 0.36; 'two': 0.37; 'to:addr:python-list': 0.38; 'bad': 0.39; 'to:addr:python.org': 0.39; 'how': 0.40; 'even': 0.60; 'areas': 0.61; 'first': 0.61; "you've": 0.63; 'provide': 0.64; '30,': 0.65; 'stated': 0.69; 'subject:.. ': 0.84; 'timings': 0.84; '2013': 0.98
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=4NfmWn5G0kVE9zYO1iDlIYUh74ENXzUMboATZLx2Z6g=; b=nIAw4DAQH3KyaMbGSDxiPwLF/S5hEjQpA1Ye4BlCF6Ry0PqxmxomSxrnNxtHysXSTE H/uF0evjTQZBiVI25B03lC3MwGf4woygNepeQzfmMTJKHBbX/Aln9YCSxiBoLf9ZQCPp fBbe0kfjXyH7/nfnyXgXnl3IwmvRWlGDvGX8GUwhPcCCrlsGtPDM3Njw7wphBUYY9DS5 1DKocI2BEzJwyVvqe6r714nNBkOlz6m0tKbunPzIiiWibMAsA/avOLkN9ZBZjcdur877 nVALu6OhaxsuAYZ2KiqtoUTTKvhev+bynOSbIXcsxNGWtAZIa/PPIO0as5A3fxkCCvBO lwCA==
MIME-Version 1.0
X-Received by 10.66.66.161 with SMTP id g1mr248236pat.175.1383099441563; Tue, 29 Oct 2013 19:17:21 -0700 (PDT)
In-Reply-To <l4olqp$jdn$1@ger.gmane.org>
References <mailman.1604.1382818293.18130.python-list@python.org> <526c412a$0$29972$c3e8da3$5496439d@news.astraweb.com> <mailman.1628.1382838024.18130.python-list@python.org> <pan.2013.10.27.03.21.57.202000@nowhere.com> <d205042e-29cd-49df-9f6e-600e123f8483@googlegroups.com> <526f4612$0$6512$c3e8da3$5496439d@news.astraweb.com> <63fa9fcd-6445-41ee-8873-e1ee046e2031@googlegroups.com> <l4olqp$jdn$1@ger.gmane.org>
Date Wed, 30 Oct 2013 13:17:21 +1100
Subject Re: trying to strip out non ascii.. or rather convert non ascii
From Chris Angelico <rosuav@gmail.com>
To python-list@python.org
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding quoted-printable
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.1787.1383099445.18130.python-list@python.org> (permalink)
Lines 40
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1383099445 news.xs4all.nl 15986 [2001:888:2000:d::a6]:43002
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:57995

Show key headers only | View raw


On Wed, Oct 30, 2013 at 2:56 AM, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote:
> You've stated above that logically unicode is badly handled by the fsr.  You
> then provide a trivial timing example.  WTF???

His idea of bad handling is "oh how terrible, ASCII and BMP have
optimizations". He hates the idea that it could be better in some
areas instead of even timings all along. But the FSR actually has some
distinct benefits even in the areas he's citing - watch this:

>>> import timeit
>>> timeit.timeit("a = 'hundred'; 'x' in a")
0.3625614428649451
>>> timeit.timeit("a = 'hundreij'; 'x' in a")
0.6753936603674484
>>> timeit.timeit("a = 'hundred'; 'ģ' in a")
0.25663261671525106
>>> timeit.timeit("a = 'hundreij'; 'ģ' in a")
0.3582399439035271

The first two examples are his examples done on my computer, so you
can see how all four figures compare. Note how testing for the
presence of a non-Latin1 character in an 8-bit string is very fast.
Same goes for testing for non-BMP character in a 16-bit string. The
difference gets even larger if the string is longer:

>>> timeit.timeit("a = 'hundred'*1000; 'x' in a")
10.083378194714726
>>> timeit.timeit("a = 'hundreij'*1000; 'x' in a")
18.656413035735
>>> timeit.timeit("a = 'hundreij'*1000; 'ģ' in a")
18.436268855399135
>>> timeit.timeit("a = 'hundred'*1000; 'ģ' in a")
2.8308718007456264

Wow! The FSR speeds up searches immensely! It's obviously the best
thing since sliced bread!

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

trying to strip out non ascii.. or rather convert non ascii bruce <badouglas@gmail.com> - 2013-10-26 16:11 -0400
  Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-10-26 22:24 +0000
    Re: trying to strip out non ascii.. or rather convert non ascii Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-10-26 20:51 -0400
      Re: trying to strip out non ascii.. or rather convert non ascii Roy Smith <roy@panix.com> - 2013-10-26 21:11 -0400
        Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-10-27 02:05 +0000
          Re: trying to strip out non ascii.. or rather convert non ascii Chris Angelico <rosuav@gmail.com> - 2013-10-27 13:15 +1100
        Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-27 09:21 +0000
    Re: trying to strip out non ascii.. or rather convert non ascii Tim Chase <python.list@tim.thechases.com> - 2013-10-26 20:41 -0500
      Re: trying to strip out non ascii.. or rather convert non ascii Roy Smith <roy@panix.com> - 2013-10-26 21:54 -0400
        Re: trying to strip out non ascii.. or rather convert non ascii Tim Chase <python.list@tim.thechases.com> - 2013-10-26 21:17 -0500
      Re: trying to strip out non ascii.. or rather convert non ascii Nobody <nobody@nowhere.com> - 2013-10-27 03:21 +0000
        Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-28 07:01 -0700
          Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-28 14:13 +0000
          Re: trying to strip out non ascii.. or rather convert non ascii Tim Chase <python.list@tim.thechases.com> - 2013-10-28 09:23 -0500
            Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve@pearwood.info> - 2013-10-29 05:24 +0000
              Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-30 01:49 -0700
                Re: trying to strip out non ascii.. or rather convert non ascii Ned Batchelder <ned@nedbatchelder.com> - 2013-10-30 08:44 -0400
                Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-30 09:08 -0700
                Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-30 16:24 +0000
                Re: trying to strip out non ascii.. or rather convert non ascii Ned Batchelder <ned@nedbatchelder.com> - 2013-10-30 13:10 -0400
                Re: trying to strip out non ascii.. or rather convert non ascii Michael Torrie <torriem@gmail.com> - 2013-10-30 11:54 -0600
                Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-30 11:38 -0700
                Re: trying to strip out non ascii.. or rather convert non ascii Roy Smith <roy@panix.com> - 2013-10-30 19:28 -0400
                Re: trying to strip out non ascii.. or rather convert non ascii Tim Chase <python.list@tim.thechases.com> - 2013-10-31 06:46 -0500
                Re: trying to strip out non ascii.. or rather convert non ascii Terry Reedy <tjreedy@udel.edu> - 2013-10-30 17:56 -0400
                Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve@pearwood.info> - 2013-10-31 07:10 +0000
                Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-31 07:23 +0000
                Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-31 03:33 -0700
                Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-01 07:16 +0000
                Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-11-01 02:00 -0700
                Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-01 09:18 +0000
          Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve@pearwood.info> - 2013-10-29 05:22 +0000
            Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-29 08:38 -0700
              Re: trying to strip out non ascii.. or rather convert non ascii Tim Chase <python.list@tim.thechases.com> - 2013-10-29 10:52 -0500
                Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-29 12:16 -0700
                Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-29 19:54 +0000
                Re: trying to strip out non ascii.. or rather convert non ascii Piet van Oostrum <piet@vanoostrum.org> - 2013-10-29 21:33 -0400
                Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-30 09:19 +0000
              Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-29 15:56 +0000
              Re: trying to strip out non ascii.. or rather convert non ascii Chris Angelico <rosuav@gmail.com> - 2013-10-30 13:17 +1100
                Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-30 01:13 -0700
                Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-30 15:25 +0000

csiph-web