Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #27762

Re: Flexible string representation, unicode, typography, ...

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!newsfeed.datemas.de!feeder.erje.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <ian.g.kelly@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'output': 0.04; 'python)': 0.05; 'that?': 0.05; 'bits': 0.07; 'currency': 0.07; 'strings.': 0.07; 'think,': 0.07; 'python': 0.09; 'compression': 0.09; 'correct,': 0.09; 'happen?': 0.09; 'received:mail- lpp01m010-f46.google.com': 0.09; 'subject:string': 0.09; 'sure,': 0.09; 'unicode,': 0.09; 'url:unicode': 0.09; 'aug': 0.13; 'url:)': 0.13; '50)': 0.16; 'dismiss': 0.16; 'inputs': 0.16; 'instance)': 0.16; 'obviously,': 0.16; 'similarly,': 0.16; 'stuff,': 0.16; 'subject:unicode': 0.16; 'symbols,': 0.16; 'unicode?': 0.16; 'string': 0.17; 'wrote:': 0.17; 'bytes': 0.17; 'example.': 0.17; 'library,': 0.17; 'mathematical': 0.17; 'thu,': 0.17; 'unicode': 0.17; 'saying': 0.18; '>>>': 0.18; 'input': 0.18; 'developer': 0.19; 'fine,': 0.22; 'example': 0.23; '(this': 0.24; 'least': 0.25; 'header:In-Reply-To:1': 0.25; 'fit': 0.26; 'message- id:@mail.gmail.com': 0.27; 'represent': 0.28; '>>>>': 0.29; 'strings,': 0.29; 'case,': 0.29; 'character': 0.29; "i'm": 0.29; 'received:209.85.215.46': 0.30; 'code': 0.31; 'point': 0.31; 'surely': 0.33; 'problem': 0.33; 'to:addr:python-list': 0.33; 'received:google.com': 0.34; 'whatever': 0.35; 'nature': 0.35; 'so,': 0.35; 'pm,': 0.35; 'received:209.85': 0.35; 'ability': 0.36; 'but': 0.36; 'url:org': 0.36; 'characters': 0.36; 'should': 0.36; 'too': 0.36; 'possible': 0.37; 'optimization': 0.37; 'does': 0.37; 'being': 0.37; 'why': 0.37; 'received:209': 0.37; 'well.': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'mean': 0.38; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'think': 0.40; 'your': 0.60; 'easy': 0.60; 'range': 0.60; 'most': 0.61; 'subject:, ': 0.61; 'here:': 0.62; 'solve': 0.62; 'necessarily': 0.63; 'subject:...': 0.63; 'times': 0.63; 'french': 0.64; 'charset:windows-1252': 0.65; 'due': 0.66; 'talking': 0.66; 'everybody': 0.69; 'about?': 0.84; 'algorithm,': 0.84; 'hardly': 0.84; 'optimize,': 0.84; 'popularity': 0.84; 'produce.': 0.84; 'shrinking': 0.84; 'subject:, ...': 0.84; 'to:name:python': 0.84; 'widespread': 0.91
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=wql4ahEMj0umkdjPhd1RB6ITkt3/kXe70izluspFsxo=; b=kDq8p+cNGVoze4UnTW9ijZlnPu+JvWKTFyeOX2HOmHCsowyLgHZ+0xYplwLmkKuCOD 3yPvdOV+yMwodk69alMo7gGWK3W2r3QZspIkJmoC78njuvyoNdZEnNlt8nvjA9kPxfI8 lDpfGYzWc4HA/xk0QbZqB7/Jo5RFKKHLdwZ58MTk2bhVClImiMLXi9cCS4C9/XzWBCLa YuaFOjdzVysKOEwOkKfKX5x8VKTbIirMc1wuJxfS/VM2FOv0SRSWtYYPSmRO86DC5OZL U2wMO2HpH0DPrTJM6iiUHgQHOKMt3TcAdtamAriSWQrms4eM/M8KWVWcdVN+ezEGSf/Z CJaw==
MIME-Version 1.0
In-Reply-To <7eaafbcd-597d-4f8c-98a8-ecb537e6e065@googlegroups.com>
References <a81cd504-d889-4aa1-9daa-6df3448b4da8@googlegroups.com> <D7udnfbyKvHEqqvNnZ2dnUVZ_sidnZ2d@westnet.com.au> <7eaafbcd-597d-4f8c-98a8-ecb537e6e065@googlegroups.com>
From Ian Kelly <ian.g.kelly@gmail.com>
Date Thu, 23 Aug 2012 13:22:16 -0600
Subject Re: Flexible string representation, unicode, typography, ...
To Python <python-list@python.org>
Content-Type text/plain; charset=windows-1252
Content-Transfer-Encoding quoted-printable
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.3730.1345749768.4697.python-list@python.org> (permalink)
Lines 74
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1345749768 news.xs4all.nl 6847 [2001:888:2000:d::a6]:52818
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:27762

Show key headers only | View raw


On Thu, Aug 23, 2012 at 12:33 PM,  <wxjmfauth@gmail.com> wrote:
>> >>> sys.getsizeof('a' * 80 * 50)
>>
>> > 4025
>>
>> >>>> sys.getsizeof('a' * 80 * 50 + '•')
>>
>> > 8040
>>
>>
>>
>>     This example is still benefiting from shrinking the number of bytes
>>
>> in half over using 32 bits per character as was the case with Python 3.2:
>>
>>
>>
>>  >>> sys.getsizeof('a' * 80 * 50)
>>
>> 16032
>>
>>  >>> sys.getsizeof('a' * 80 * 50 + '•')
>>
>> 16036
>>
> Correct, but how many times does it happen?
> Practically never.

What are you talking about?  Surely it happens the same number of
times that your example happens, since it's the same example.  By
dismissing this example as being too infrequent to be of any
importance, you dismiss the validity of your own example as well.

> In this unicode stuff, I'm fascinated by the obsession
> to solve a problem which is, due to the nature of
> Unicode, unsolvable.
>
> For every optimization algorithm, for every code
> point range you can optimize, it is always possible
> to find a case breaking that optimization.

So what?  Similarly, for any generalized data compression algorithm,
it is possible to engineer inputs for which the "compressed" output is
as large as or larger than the original input (this is easy to prove).
 Does this mean that compression algorithms are useless?  I hardly
think so, as evidenced by the widespread popularity of tools like gzip
and WinZip.

You seem to be saying that because we cannot pack all Unicode strings
into 1-byte or 2-byte per character representations, we should just
give up and force everybody to use maximum-width representations for
all strings.  That is absurd.

> Sure, it is possible to optimize the unicode usage
> by not using French characters, punctuation, mathematical
> symbols, currency symbols, CJK characters...
> (select undesired characters here: http://www.unicode.org/charts/).
>
> In that case, why using unicode?
> (A problematic not specific to Python)

Obviously, it is because I want to have the *ability* to represent all
those characters in my strings, even if I am not necessarily going to
take advantage of that ability in every single string that I produce.
Not all of the strings I use are going to fit into the 1-byte or
2-byte per character representation.  Fine, whatever -- that's part of
the cost of internationalization.  However, *most* of the strings that
I work with (this entire email message, for instance) -- and, I think,
most of the strings that any developer works with (identifiers in the
standard library, for instance) -- will fit into at least the 2-byte
per character representation.  Why shackle every string everywhere to
4 bytes per character when for a majority of them we can do much
better than that?

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-23 05:47 -0700
  Re: Flexible string representation, unicode, typography, ... Neil Hodgson <nhodgson@iinet.net.au> - 2012-08-23 23:57 +1000
    Re: Flexible string representation, unicode, typography, ... MRAB <python@mrabarnett.plus.com> - 2012-08-23 16:11 +0100
    Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-23 09:19 -0600
    Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-23 11:33 -0700
      Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-23 13:22 -0600
        Re: Flexible string representation, unicode, typography, ... rusi <rustompmody@gmail.com> - 2012-08-24 09:06 -0700
          Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-24 17:47 +0100
          Re: Flexible string representation, unicode, typography, ... Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-08-24 14:34 -0400
      Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-23 20:34 +0100
  Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-23 15:18 +0100
  Re: Flexible string representation, unicode, typography, ... Ramchandra Apte <maniandram01@gmail.com> - 2012-08-24 07:38 -0700
    Re: Flexible string representation, unicode, typography, ... Antoine Pitrou <solipsis@pitrou.net> - 2012-08-25 00:24 +0000
      Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 00:27 -0700
        Re: Flexible string representation, unicode, typography, ... Ben Finney <ben+python@benfinney.id.au> - 2012-08-25 17:54 +1000
      Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 00:27 -0700
        Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-25 09:58 +0100
        Re: Flexible string representation, unicode, typography, ... Frank Millman <frank@chagford.com> - 2012-08-25 11:46 +0200
          Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 08:47 -0700
          Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 08:47 -0700
            Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-25 16:26 -0600
              Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 23:59 -0700
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 09:50 -0600
              Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 23:59 -0700
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 11:49 +0000
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 09:40 -0600
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 20:13 +0000
                Re: Flexible string representation, unicode, typography, ... Dan Sommers <dan@tombstonezero.net> - 2012-08-26 13:45 -0700
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 12:16 -0700
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-27 14:14 -0600
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 13:37 -0700
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:38 -0700
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:38 -0700
                Re: Flexible string representation, unicode, typography, ... Neil Hodgson <nhodgson@iinet.net.au> - 2012-08-28 09:54 +1000
                Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-29 13:59 +1000
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-28 22:15 -0600
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-29 08:05 +0000
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:40 -0700
                Re: Flexible string representation, unicode, typography, ... Dave Angel <d@davea.name> - 2012-08-29 08:01 -0400
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 08:43 -0700
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-30 06:55 +0000
                Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-30 18:59 +1000
                Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-08-30 07:02 -0400
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-30 16:00 +0000
                Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-08-30 16:44 -0400
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-31 12:32 +0000
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-31 09:13 -0600
                Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-08-31 08:43 -0400
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-31 14:54 +0000
                Re: Flexible string representation, unicode, typography, ... Antoine Pitrou <solipsis@pitrou.net> - 2012-08-30 15:01 +0000
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 00:36 -0700
                Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 09:58 +0100
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-09-02 03:06 -0600
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 11:58 -0700
                Re: Flexible string representation, unicode, typography, ... Michael Torrie <torriem@gmail.com> - 2012-09-02 13:45 -0600
                Re: Flexible string representation, unicode, typography, ... Dave Angel <d@davea.name> - 2012-09-02 16:07 -0400
                Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-02 16:38 -0400
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-03 01:42 +0000
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:26 +0300
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-04 00:53 +0000
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 11:58 -0700
                Re: Flexible string representation, unicode, typography, ... Peter Otten <__peter__@web.de> - 2012-09-02 11:52 +0200
                Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 11:36 +0100
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-02 15:00 +0300
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 22:39 -0700
                Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-03 07:11 +0100
                Re: Flexible string representation, unicode, typography, ... Peter Otten <__peter__@web.de> - 2012-09-03 08:15 +0200
                Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-03 04:38 -0400
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:56 +0300
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 22:39 -0700
                Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 13:23 +0100
                Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-09-02 08:35 -0400
                Re: Flexible string representation, unicode, typography, ... Ramchandra Apte <maniandram01@gmail.com> - 2012-09-02 06:48 -0700
                Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 15:46 +0100
                Re: Flexible string representation, unicode, typography, ... Ramchandra Apte <maniandram01@gmail.com> - 2012-09-02 06:48 -0700
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-09-03 12:33 -0600
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 00:36 -0700
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-30 10:27 -0600
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-02 23:38 +0300
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-03 01:54 +0000
                Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-02 22:33 -0400
                Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-09-03 11:24 -0400
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:41 +0300
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 00:45 +0300
                Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-30 01:54 +1000
                Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-29 22:34 +1000
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:40 -0700
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 12:16 -0700
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 15:42 -0600
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 23:31 +0000
                Re: Flexible string representation, unicode, typography, ... Paul Rubin <no.email@nospam.invalid> - 2012-08-26 17:47 -0700
        Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-25 21:04 +1000
        Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-25 12:05 +0100
        Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-25 21:19 +1000
        Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-08-25 07:23 -0400

csiph-web