Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #28134

Re: Flexible string representation, unicode, typography, ...

Path csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <ian.g.kelly@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.002
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'defaults': 0.05; 'extent': 0.07; 'matches': 0.07; 'performs': 0.07; 'python': 0.09; 'byte,': 0.09; 'comfortably': 0.09; 'encode': 0.09; 'received:mail- lpp01m010-f46.google.com': 0.09; 'subject:string': 0.09; 'unicode,': 0.09; 'worse': 0.09; 'programs.': 0.11; 'aug': 0.13; 'encoding': 0.15; 'properly': 0.15; '3.2.': 0.16; 'demonstrably': 0.16; 'describing': 0.16; 'merely': 0.16; 'subject:unicode': 0.16; 'truncating': 0.16; 'useless.': 0.16; 'utterly': 0.16; 'string': 0.17; 'wrote:': 0.17; 'char': 0.17; 'thu,': 0.17; 'unicode': 0.17; '(in': 0.18; 'flexibility': 0.23; 'patch': 0.24; 'random': 0.24; 'non': 0.24; 'header:In-Reply-To:1': 0.25; 'am,': 0.27; 'instead.': 0.27; 'strongly': 0.27; 'message-id:@mail.gmail.com': 0.27; 'actual': 0.28; 'probability': 0.29; 'served': 0.29; 'case,': 0.29; 'character': 0.29; 'points': 0.29; "i'm": 0.29; 'received:209.85.215.46': 0.30; 'code': 0.31; 'could': 0.32; 'doubt': 0.33; 'to:addr:python-list': 0.33; 'received:google.com': 0.34; 'text': 0.34; 'moved': 0.35; 'doing': 0.35; 'received:209.85': 0.35; 'next': 0.35; 'tool': 0.36; 'but': 0.36; 'wanted': 0.36; 'characters': 0.36; 'option': 0.37; 'uses': 0.37; 'why': 0.37; 'rather': 0.37; 'received:209': 0.37; 'far': 0.37; 'subject:: ': 0.38; 'mean': 0.38; 'sure': 0.38; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'most': 0.61; "you've": 0.61; 'subject:, ': 0.61; 'first': 0.61; 'dedicated': 0.61; '30,': 0.62; 'subject:...': 0.63; 'show': 0.63; 'become': 0.65; 'soon': 0.70; 'wish': 0.70; 'funny': 0.78; 'introduce': 0.80; 'benchmark': 0.84; 'irrelevant': 0.84; 'subject:, ...': 0.84; 'to:name:python': 0.84; 'better!': 0.91; 'seriously,': 0.91
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=bWkPpA2K6itN2zuGhenShx5gKWbkVsxJSWt5PihDx7Y=; b=tBPjb6H1WjLASJ/9Zz9Po0zr84b0+iNLFdP+EOHQ3aRqMwgZBwfw+LXC0x0ncupPlE YvzDMkSN/sgovNgfwv7n1ezfaTC2gkzD1bs6Lm1k0J8B2gdmdo0HHwlYiGNOutL+wPwI bGtGmzayaSi1dSClqIFsxL0i6mSitm3sRBi74n2xu1jnDgAEHoMQnoCD5YWw3/lkHEx2 3v+tBit7Dq8M6L3/mI3xARlPyozFM9FiV3rC7iJajXYKKaIsudiZdjyhLUGX7S/QO75O 6+F3HtEGgsT7jeljE8zsUnCKA/DlpVbCXjugtzi2U9boiPeWw1VVF9Cs78chin29cn4w KT/A==
MIME-Version 1.0
In-Reply-To <b7ec5b30-5284-4bec-ba3e-999d1b5ad92d@googlegroups.com>
References <mailman.3784.1345854291.4697.python-list@python.org> <1cb3f062-eb45-4b0c-977b-76afb099923c@googlegroups.com> <k1a40u$r47$2@ger.gmane.org> <mailman.3793.1345888006.4697.python-list@python.org> <f6266544-d67c-4589-a3ed-c14428ead237@googlegroups.com> <mailman.3816.1345933655.4697.python-list@python.org> <mailman.3831.1345964382.4697.python-list@python.org> <503a0d51$0$6574$c3e8da3$5496439d@news.astraweb.com> <mailman.3841.1345995646.4697.python-list@python.org> <503a8361$0$6574$c3e8da3$5496439d@news.astraweb.com> <mailman.3853.1346014938.4697.python-list@python.org> <2e92da71-fbd2-467f-9088-1c79fa7bcf69@googlegroups.com> <UIOdnTQtcNTRlKHNnZ2dnUVZ_vednZ2d@westnet.com.au> <a15ab72d-996e-4aff-a70b-440b7baa6d68@j9g2000pbg.googlegroups.com> <mailman.3920.1346213765.4697.python-list@python.org> <62566024-df1d-4948-a27a-45c7820ddc6c@googlegroups.com> <mailman.3929.1346241717.4697.python-list@python.org> <mailman.3938.1346254994.4697.python-list@python.org> <503f0e45$0$9416$c3e8da3$76491128@news.astraweb.com> <b7ec5b30-5284-4bec-ba3e-999d1b5ad92d@googlegroups.com>
From Ian Kelly <ian.g.kelly@gmail.com>
Date Thu, 30 Aug 2012 10:27:04 -0600
Subject Re: Flexible string representation, unicode, typography, ...
To Python <python-list@python.org>
Content-Type text/plain; charset=ISO-8859-1
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.3976.1346344057.4697.python-list@python.org> (permalink)
Lines 47
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1346344057 news.xs4all.nl 6938 [2001:888:2000:d::a6]:50224
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:28134

Show key headers only | View raw


On Thu, Aug 30, 2012 at 2:51 AM,  <wxjmfauth@gmail.com> wrote:
> But as soon as you introduce artificially a "latin-1"
> bottleneck, all this machinery just become useless.

How is this a bottleneck?  If you removed the Latin-1 encoding
altogether and limited the flexible representation to just UCS-2 /
UCS-4, I doubt very much that you would see any significant speed
gains. The flexibility is the part that makes string creation slower,
not the Latin-1 option in particular.

> This flexible representation is working absurdly.
> It optimizes the characters you are not using (in one
> sense), it defaults to a non optimized form for the
> characters you wish to use.

I'm sure that if you wanted to you could patch Python to use Latin-9
instead.  Just be prepared for it to be slower than UCS-2, since it
would mean having to encode the code points rather than merely
truncating them.

> Pick up a random text and see the probability this
> text match the most optimized case 1 char / 1 byte,
> practically never.

Pick up a random text and see that this text matches the next most
optimized case, 1 char / 2 bytes: practically always.

> If a user will use exclusively latin-1, she/he is  better
> served by using a dedicated tool for "latin-1"

Speaker as a user who almost exclusively uses Latin-1, I strongly
disagree.  What you're describing is Python 2.x.  The user is always
almost better served by not having to worry about the full extent of
the character set their program might use.  That's why we moved to
Unicode strings in Python 3 in the first place.

> If a user will comfortably work with Unicode, she/he is
> better served by using one of this tools which is using
> properly one of the available Unicode schemes.
>
> In a funny way, this is what Python was doing and it
> performs better!

Seriously, please show us just one *real world* benchmark in which
Python 3.3 performs demonstrably worse than Python 3.2.  All you've
shown so far is this one microbenchmark of string creation that is
utterly irrelevant to actual programs.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-23 05:47 -0700
  Re: Flexible string representation, unicode, typography, ... Neil Hodgson <nhodgson@iinet.net.au> - 2012-08-23 23:57 +1000
    Re: Flexible string representation, unicode, typography, ... MRAB <python@mrabarnett.plus.com> - 2012-08-23 16:11 +0100
    Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-23 09:19 -0600
    Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-23 11:33 -0700
      Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-23 13:22 -0600
        Re: Flexible string representation, unicode, typography, ... rusi <rustompmody@gmail.com> - 2012-08-24 09:06 -0700
          Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-24 17:47 +0100
          Re: Flexible string representation, unicode, typography, ... Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-08-24 14:34 -0400
      Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-23 20:34 +0100
  Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-23 15:18 +0100
  Re: Flexible string representation, unicode, typography, ... Ramchandra Apte <maniandram01@gmail.com> - 2012-08-24 07:38 -0700
    Re: Flexible string representation, unicode, typography, ... Antoine Pitrou <solipsis@pitrou.net> - 2012-08-25 00:24 +0000
      Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 00:27 -0700
        Re: Flexible string representation, unicode, typography, ... Ben Finney <ben+python@benfinney.id.au> - 2012-08-25 17:54 +1000
      Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 00:27 -0700
        Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-25 09:58 +0100
        Re: Flexible string representation, unicode, typography, ... Frank Millman <frank@chagford.com> - 2012-08-25 11:46 +0200
          Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 08:47 -0700
          Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 08:47 -0700
            Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-25 16:26 -0600
              Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 23:59 -0700
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 09:50 -0600
              Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 23:59 -0700
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 11:49 +0000
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 09:40 -0600
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 20:13 +0000
                Re: Flexible string representation, unicode, typography, ... Dan Sommers <dan@tombstonezero.net> - 2012-08-26 13:45 -0700
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 12:16 -0700
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-27 14:14 -0600
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 13:37 -0700
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:38 -0700
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:38 -0700
                Re: Flexible string representation, unicode, typography, ... Neil Hodgson <nhodgson@iinet.net.au> - 2012-08-28 09:54 +1000
                Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-29 13:59 +1000
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-28 22:15 -0600
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-29 08:05 +0000
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:40 -0700
                Re: Flexible string representation, unicode, typography, ... Dave Angel <d@davea.name> - 2012-08-29 08:01 -0400
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 08:43 -0700
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-30 06:55 +0000
                Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-30 18:59 +1000
                Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-08-30 07:02 -0400
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-30 16:00 +0000
                Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-08-30 16:44 -0400
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-31 12:32 +0000
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-31 09:13 -0600
                Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-08-31 08:43 -0400
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-31 14:54 +0000
                Re: Flexible string representation, unicode, typography, ... Antoine Pitrou <solipsis@pitrou.net> - 2012-08-30 15:01 +0000
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 00:36 -0700
                Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 09:58 +0100
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-09-02 03:06 -0600
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 11:58 -0700
                Re: Flexible string representation, unicode, typography, ... Michael Torrie <torriem@gmail.com> - 2012-09-02 13:45 -0600
                Re: Flexible string representation, unicode, typography, ... Dave Angel <d@davea.name> - 2012-09-02 16:07 -0400
                Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-02 16:38 -0400
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-03 01:42 +0000
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:26 +0300
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-04 00:53 +0000
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 11:58 -0700
                Re: Flexible string representation, unicode, typography, ... Peter Otten <__peter__@web.de> - 2012-09-02 11:52 +0200
                Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 11:36 +0100
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-02 15:00 +0300
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 22:39 -0700
                Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-03 07:11 +0100
                Re: Flexible string representation, unicode, typography, ... Peter Otten <__peter__@web.de> - 2012-09-03 08:15 +0200
                Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-03 04:38 -0400
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:56 +0300
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 22:39 -0700
                Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 13:23 +0100
                Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-09-02 08:35 -0400
                Re: Flexible string representation, unicode, typography, ... Ramchandra Apte <maniandram01@gmail.com> - 2012-09-02 06:48 -0700
                Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 15:46 +0100
                Re: Flexible string representation, unicode, typography, ... Ramchandra Apte <maniandram01@gmail.com> - 2012-09-02 06:48 -0700
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-09-03 12:33 -0600
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 00:36 -0700
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-30 10:27 -0600
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-02 23:38 +0300
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-03 01:54 +0000
                Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-02 22:33 -0400
                Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-09-03 11:24 -0400
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:41 +0300
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 00:45 +0300
                Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-30 01:54 +1000
                Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-29 22:34 +1000
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:40 -0700
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 12:16 -0700
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 15:42 -0600
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 23:31 +0000
                Re: Flexible string representation, unicode, typography, ... Paul Rubin <no.email@nospam.invalid> - 2012-08-26 17:47 -0700
        Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-25 21:04 +1000
        Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-25 12:05 +0100
        Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-25 21:19 +1000
        Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-08-25 07:23 -0400

csiph-web