Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #28140

Re: Flexible string representation, unicode, typography, ...

From Terry Reedy <tjreedy@udel.edu>
Subject Re: Flexible string representation, unicode, typography, ...
Date 2012-08-30 16:44 -0400
References (13 earlier) <mailman.3929.1346241717.4697.python-list@python.org> <mailman.3938.1346254994.4697.python-list@python.org> <503f0e45$0$9416$c3e8da3$76491128@news.astraweb.com> <roy-947BF0.07022430082012@news.panix.com> <503f8e33$0$30001$c3e8da3$5496439d@news.astraweb.com>
Newsgroups comp.lang.python
Message-ID <mailman.3985.1346359524.4697.python-list@python.org> (permalink)

Show all headers | View raw


On 8/30/2012 12:00 PM, Steven D'Aprano wrote:
> On Thu, 30 Aug 2012 07:02:24 -0400, Roy Smith wrote:
>
>> In article <503f0e45$0$9416$c3e8da3$76491128@news.astraweb.com>,
>>   Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
>>
>>> The only thing which is innovative here is that instead of the Python
>>> compiler declaring that "all strings will be stored in UCS-2", the
>>> compiler chooses an implementation for each string as needed. So some
>>> strings will be stored internally as UCS-4, some as UCS-2, and some as
>>> ASCII (which is a standard, but not the Unicode consortium's standard).
>>
>> Is the implementation smart enough to know that x == y is always False
>> if x and y are using different internal representations?

Yes, after checking lengths, and in same circumstances, x != y is True. From
http://hg.python.org/cpython/file/ab6ab44921b2/Objects/unicodeobject.c

PyObject *
PyUnicode_RichCompare(PyObject *left, PyObject *right, int op)
{
     int result;

     if (PyUnicode_Check(left) && PyUnicode_Check(right)) {
         PyObject *v;
         if (PyUnicode_READY(left) == -1 ||
             PyUnicode_READY(right) == -1)
             return NULL;
         if (PyUnicode_GET_LENGTH(left) != PyUnicode_GET_LENGTH(right) ||
             PyUnicode_KIND(left) != PyUnicode_KIND(right)) {
             if (op == Py_EQ) {
                 Py_INCREF(Py_False);
                 return Py_False;
             }
             if (op == Py_NE) {
                 Py_INCREF(Py_True);
                 return Py_True;
             }
         }
...
KIND is 1,2,4 bytes/char

'a in s' is also False if a chars are wider than s chars.

If s is all ascii, s.encode('ascii') or s.encode('utf-8') is a fast, 
constant time operation, as I showed earlier in this discussion. This is 
one thing that is much faster in 3.3.

Such things can be tested by timing with different lengths of strings, 
where the initial string creation is done in setup code rather than in 
the repeated operation code.

> But x and y are not necessarily always False just because they have
> different representations. There may be circumstances where two strings
> have different internal representations even though their content is the
> same, so it's an unsafe optimization to automatically treat them as
> unequal.

I am sure that str objects are always in canonical form once visible to 
Python code. Note that unready (non-canonical) objects are rejected by 
the rich comparison function.

> My expectation is that the initial implementation of PEP 393 will be
> relatively unoptimized,

The initial implementation was a year ago. At least three people have 
expended considerable effort improving it since, so that the slowdown 
mentioned in the PEP has mostly disappeared. The things that are still 
slower are somewhat balanced by things that are faster.

-- 
Terry Jan Reedy

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-23 05:47 -0700
  Re: Flexible string representation, unicode, typography, ... Neil Hodgson <nhodgson@iinet.net.au> - 2012-08-23 23:57 +1000
    Re: Flexible string representation, unicode, typography, ... MRAB <python@mrabarnett.plus.com> - 2012-08-23 16:11 +0100
    Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-23 09:19 -0600
    Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-23 11:33 -0700
      Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-23 13:22 -0600
        Re: Flexible string representation, unicode, typography, ... rusi <rustompmody@gmail.com> - 2012-08-24 09:06 -0700
          Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-24 17:47 +0100
          Re: Flexible string representation, unicode, typography, ... Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-08-24 14:34 -0400
      Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-23 20:34 +0100
  Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-23 15:18 +0100
  Re: Flexible string representation, unicode, typography, ... Ramchandra Apte <maniandram01@gmail.com> - 2012-08-24 07:38 -0700
    Re: Flexible string representation, unicode, typography, ... Antoine Pitrou <solipsis@pitrou.net> - 2012-08-25 00:24 +0000
      Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 00:27 -0700
        Re: Flexible string representation, unicode, typography, ... Ben Finney <ben+python@benfinney.id.au> - 2012-08-25 17:54 +1000
      Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 00:27 -0700
        Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-25 09:58 +0100
        Re: Flexible string representation, unicode, typography, ... Frank Millman <frank@chagford.com> - 2012-08-25 11:46 +0200
          Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 08:47 -0700
          Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 08:47 -0700
            Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-25 16:26 -0600
              Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 23:59 -0700
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 09:50 -0600
              Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 23:59 -0700
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 11:49 +0000
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 09:40 -0600
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 20:13 +0000
                Re: Flexible string representation, unicode, typography, ... Dan Sommers <dan@tombstonezero.net> - 2012-08-26 13:45 -0700
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 12:16 -0700
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-27 14:14 -0600
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 13:37 -0700
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:38 -0700
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:38 -0700
                Re: Flexible string representation, unicode, typography, ... Neil Hodgson <nhodgson@iinet.net.au> - 2012-08-28 09:54 +1000
                Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-29 13:59 +1000
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-28 22:15 -0600
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-29 08:05 +0000
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:40 -0700
                Re: Flexible string representation, unicode, typography, ... Dave Angel <d@davea.name> - 2012-08-29 08:01 -0400
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 08:43 -0700
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-30 06:55 +0000
                Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-30 18:59 +1000
                Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-08-30 07:02 -0400
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-30 16:00 +0000
                Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-08-30 16:44 -0400
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-31 12:32 +0000
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-31 09:13 -0600
                Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-08-31 08:43 -0400
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-31 14:54 +0000
                Re: Flexible string representation, unicode, typography, ... Antoine Pitrou <solipsis@pitrou.net> - 2012-08-30 15:01 +0000
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 00:36 -0700
                Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 09:58 +0100
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-09-02 03:06 -0600
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 11:58 -0700
                Re: Flexible string representation, unicode, typography, ... Michael Torrie <torriem@gmail.com> - 2012-09-02 13:45 -0600
                Re: Flexible string representation, unicode, typography, ... Dave Angel <d@davea.name> - 2012-09-02 16:07 -0400
                Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-02 16:38 -0400
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-03 01:42 +0000
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:26 +0300
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-04 00:53 +0000
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 11:58 -0700
                Re: Flexible string representation, unicode, typography, ... Peter Otten <__peter__@web.de> - 2012-09-02 11:52 +0200
                Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 11:36 +0100
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-02 15:00 +0300
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 22:39 -0700
                Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-03 07:11 +0100
                Re: Flexible string representation, unicode, typography, ... Peter Otten <__peter__@web.de> - 2012-09-03 08:15 +0200
                Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-03 04:38 -0400
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:56 +0300
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 22:39 -0700
                Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 13:23 +0100
                Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-09-02 08:35 -0400
                Re: Flexible string representation, unicode, typography, ... Ramchandra Apte <maniandram01@gmail.com> - 2012-09-02 06:48 -0700
                Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 15:46 +0100
                Re: Flexible string representation, unicode, typography, ... Ramchandra Apte <maniandram01@gmail.com> - 2012-09-02 06:48 -0700
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-09-03 12:33 -0600
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 00:36 -0700
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-30 10:27 -0600
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-02 23:38 +0300
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-03 01:54 +0000
                Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-02 22:33 -0400
                Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-09-03 11:24 -0400
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:41 +0300
                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 00:45 +0300
                Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-30 01:54 +1000
                Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-29 22:34 +1000
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:40 -0700
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 12:16 -0700
                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 15:42 -0600
                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 23:31 +0000
                Re: Flexible string representation, unicode, typography, ... Paul Rubin <no.email@nospam.invalid> - 2012-08-26 17:47 -0700
        Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-25 21:04 +1000
        Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-25 12:05 +0100
        Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-25 21:19 +1000
        Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-08-25 07:23 -0400

csiph-web