Groups > comp.lang.python > #27730 > unrolled thread

Flexible string representation, unicode, typography, ...

Started by	wxjmfauth@gmail.com
First post	2012-08-23 05:47 -0700
Last post	2012-08-25 07:23 -0400
Articles	20 on this page of 95 — 21 participants

Back to article view | Back to comp.lang.python

  Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-23 05:47 -0700
    Re: Flexible string representation, unicode, typography, ... Neil Hodgson <nhodgson@iinet.net.au> - 2012-08-23 23:57 +1000
      Re: Flexible string representation, unicode, typography, ... MRAB <python@mrabarnett.plus.com> - 2012-08-23 16:11 +0100
      Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-23 09:19 -0600
      Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-23 11:33 -0700
        Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-23 13:22 -0600
          Re: Flexible string representation, unicode, typography, ... rusi <rustompmody@gmail.com> - 2012-08-24 09:06 -0700
            Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-24 17:47 +0100
            Re: Flexible string representation, unicode, typography, ... Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-08-24 14:34 -0400
        Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-23 20:34 +0100
    Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-23 15:18 +0100
    Re: Flexible string representation, unicode, typography, ... Ramchandra Apte <maniandram01@gmail.com> - 2012-08-24 07:38 -0700
      Re: Flexible string representation, unicode, typography, ... Antoine Pitrou <solipsis@pitrou.net> - 2012-08-25 00:24 +0000
        Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 00:27 -0700
          Re: Flexible string representation, unicode, typography, ... Ben Finney <ben+python@benfinney.id.au> - 2012-08-25 17:54 +1000
        Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 00:27 -0700
          Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-25 09:58 +0100
          Re: Flexible string representation, unicode, typography, ... Frank Millman <frank@chagford.com> - 2012-08-25 11:46 +0200
            Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 08:47 -0700
            Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 08:47 -0700
              Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-25 16:26 -0600
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 23:59 -0700
                  Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 09:50 -0600
                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 23:59 -0700
                  Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 11:49 +0000
                    Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 09:40 -0600
                      Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 20:13 +0000
                        Re: Flexible string representation, unicode, typography, ... Dan Sommers <dan@tombstonezero.net> - 2012-08-26 13:45 -0700
                          Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 12:16 -0700
                            Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-27 14:14 -0600
                              Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 13:37 -0700
                              Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:38 -0700
                              Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:38 -0700
                            Re: Flexible string representation, unicode, typography, ... Neil Hodgson <nhodgson@iinet.net.au> - 2012-08-28 09:54 +1000
                              Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-29 13:59 +1000
                              Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-28 22:15 -0600
                                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-29 08:05 +0000
                                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:40 -0700
                                  Re: Flexible string representation, unicode, typography, ... Dave Angel <d@davea.name> - 2012-08-29 08:01 -0400
                                    Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 08:43 -0700
                                      Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-30 06:55 +0000
                                        Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-30 18:59 +1000
                                        Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-08-30 07:02 -0400
                                          Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-30 16:00 +0000
                                            Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-08-30 16:44 -0400
                                              Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-31 12:32 +0000
                                                Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-31 09:13 -0600
                                            Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-08-31 08:43 -0400
                                              Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-31 14:54 +0000
                                        Re: Flexible string representation, unicode, typography, ... Antoine Pitrou <solipsis@pitrou.net> - 2012-08-30 15:01 +0000
                                          Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 00:36 -0700
                                            Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 09:58 +0100
                                            Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-09-02 03:06 -0600
                                              Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 11:58 -0700
                                                Re: Flexible string representation, unicode, typography, ... Michael Torrie <torriem@gmail.com> - 2012-09-02 13:45 -0600
                                                Re: Flexible string representation, unicode, typography, ... Dave Angel <d@davea.name> - 2012-09-02 16:07 -0400
                                                Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-02 16:38 -0400
                                                Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-03 01:42 +0000
                                                  Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:26 +0300
                                                    Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-04 00:53 +0000
                                              Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 11:58 -0700
                                            Re: Flexible string representation, unicode, typography, ... Peter Otten <__peter__@web.de> - 2012-09-02 11:52 +0200
                                            Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 11:36 +0100
                                            Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-02 15:00 +0300
                                              Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 22:39 -0700
                                                Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-03 07:11 +0100
                                                Re: Flexible string representation, unicode, typography, ... Peter Otten <__peter__@web.de> - 2012-09-03 08:15 +0200
                                                Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-03 04:38 -0400
                                                Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:56 +0300
                                              Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 22:39 -0700
                                            Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 13:23 +0100
                                              Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-09-02 08:35 -0400
                                              Re: Flexible string representation, unicode, typography, ... Ramchandra Apte <maniandram01@gmail.com> - 2012-09-02 06:48 -0700
                                                Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 15:46 +0100
                                              Re: Flexible string representation, unicode, typography, ... Ramchandra Apte <maniandram01@gmail.com> - 2012-09-02 06:48 -0700
                                            Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-09-03 12:33 -0600
                                          Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 00:36 -0700
                                        Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-30 10:27 -0600
                                        Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-02 23:38 +0300
                                          Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-03 01:54 +0000
                                            Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-02 22:33 -0400
                                            Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-09-03 11:24 -0400
                                            Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:41 +0300
                                        Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 00:45 +0300
                                    Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-30 01:54 +1000
                                  Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-29 22:34 +1000
                                Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:40 -0700
                          Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 12:16 -0700
                        Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 15:42 -0600
                          Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 23:31 +0000
                            Re: Flexible string representation, unicode, typography, ... Paul Rubin <no.email@nospam.invalid> - 2012-08-26 17:47 -0700
          Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-25 21:04 +1000
          Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-25 12:05 +0100
          Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-25 21:19 +1000
          Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-08-25 07:23 -0400

Page 4 of 5 — ← Prev page 1 2 3 [4] 5 Next page →

#28293

From	wxjmfauth@gmail.com
Date	2012-09-02 11:58 -0700
Message-ID	<mailman.102.1346612296.27098.python-list@python.org>
In reply to	#28251

Le dimanche 2 septembre 2012 11:07:35 UTC+2, Ian a écrit :
> On Sun, Sep 2, 2012 at 1:36 AM,  <wxjmfauth@gmail.com> wrote:
> 
> > I still remember my thoughts when I read the PEP 393
> 
> > discussion: "this is not logical", "they do no understand
> 
> > typography", "atomic character ???", ...
> 
> 
> 
> That would indicate one of two possibilities.  Either:
> 
> 
> 
> 1) Everybody in the PEP 393 discussion except for you is clueless
> 
> about how to implement a Unicode type; or
> 
> 
> 
> 2) You are clueless about how to implement a Unicode type.
> 
> 
> 
> Taking into account Occam's razor, and also that you seem to be unable
> 
> or unwilling to offer a solid rationale for those thoughts, I have to
> 
> say that I'm currently leaning toward the second possibility.
> 
> 
> 
> 
> 
> > Real world exemples.
> 
> >
> 
> >>>> import libfrancais
> 
> >>>> li = ['noël', 'noir', 'nœud', 'noduleux', \
> 
> > ...     'noétique', 'noèse', 'noirâtre']
> 
> >>>> r = libfrancais.sortfr(li)
> 
> >>>> r
> 
> > ['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir',
> 
> > 'noirâtre']
> 
> 
> 
> libfrancais does not appear to be publicly available.  It's not listed
> 
> in PyPI, and googling for "python libfrancais" turns up nothing
> 
> relevant.
> 
> 
> 
> Rewriting the example to use locale.strcoll instead:
> 
> 
> 
> >>> li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']
> 
> >>> import locale
> 
> >>> locale.setlocale(locale.LC_ALL, 'French_France')
> 
> 'French_France.1252'
> 
> >>> import functools
> 
> >>> sorted(li, key=functools.cmp_to_key(locale.strcoll))
> 
> ['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir', 'noirâtre']
> 
> 
> 
> # Python 3.2
> 
> >>> import timeit
> 
> >>> timeit.repeat("sorted(li, key=functools.cmp_to_key(locale.strcoll))", "import functools; import locale; li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']", number=10000)
> 
> [0.5544277025009592, 0.5370117249557325, 0.5551836677925053]
> 
> 
> 
> # Python 3.3
> 
> >>> import timeit
> 
> >>> timeit.repeat("sorted(li, key=functools.cmp_to_key(locale.strcoll))", "import functools; import locale; li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']", number=10000)
> 
> [0.1421166788364303, 0.12389078130001963, 0.13184190553613462]
> 

> 
> As you can see, Python 3.3 is about 77% faster than Python 3.2 on this
> 
> example.  If this was intended to show that the Python 3.3 Unicode
> 
> representation is a regression over the Python 3.2 implementation,
> 
> then it's a complete failure as an example.


- Unfortunately, I got opposite and even much worst results on my win box,
considering
- libfrancais is one of my module and it does a little bit more than
the std sorting tools. 

My rationale: very simple.

1) I never heard about something better than sticking with one
of the Unicode coding scheme. (genreral theory)
2) I am not at all convinced by the "new" Py 3.3 algorithm. I'm not the
only one guy, who noticed problems. Arguing, "it is fast enough", is not
a correct answer.

jmf

[toc] | [prev] | [next] | [standalone]

#28257

From	Peter Otten <__peter__@web.de>
Date	2012-09-02 11:52 +0200
Message-ID	<mailman.74.1346579541.27098.python-list@python.org>
In reply to	#28245

Ian Kelly wrote:

> Rewriting the example to use locale.strcoll instead:
 
>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll))

There is also locale.strxfrm() which you can use directly:

sorted(li, key=locale.strxfrm)

[toc] | [prev] | [next] | [standalone]

#28260

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2012-09-02 11:36 +0100
Message-ID	<mailman.76.1346582082.27098.python-list@python.org>
In reply to	#28245

I've found the white paper which gives the technical basis for the 
claims made by jmf so thought I'd better share in order to explain his 
rationale.

http://www.montypython.net/scripts/right-think.php

-- 
Cheers.

Mark Lawrence.

[toc] | [prev] | [next] | [standalone]

#28267

From	Serhiy Storchaka <storchaka@gmail.com>
Date	2012-09-02 15:00 +0300
Message-ID	<mailman.83.1346587277.27098.python-list@python.org>
In reply to	#28245

On 02.09.12 12:52, Peter Otten wrote:
> Ian Kelly wrote:
>
>> Rewriting the example to use locale.strcoll instead:
>
>>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll))
>
> There is also locale.strxfrm() which you can use directly:
>
> sorted(li, key=locale.strxfrm)

Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.

[toc] | [prev] | [next] | [standalone]

#28337

From	wxjmfauth@gmail.com
Date	2012-09-02 22:39 -0700
Message-ID	<b7514131-3162-4c6f-909c-52df5d666992@googlegroups.com>
In reply to	#28267

Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit :
> On 02.09.12 12:52, Peter Otten wrote:
> 
> > Ian Kelly wrote:
> 
> >
> 
> >> Rewriting the example to use locale.strcoll instead:
> 
> >
> 
> >>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll))
> 
> >
> 
> > There is also locale.strxfrm() which you can use directly:
> 
> >
> 
> > sorted(li, key=locale.strxfrm)
> 
> 
> 
> Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.

With a memory gain = 0 since my text contains non-latin-1 characters!

jmf

[toc] | [prev] | [next] | [standalone]

#28339

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2012-09-03 07:11 +0100
Message-ID	<mailman.127.1346652593.27098.python-list@python.org>
In reply to	#28337

On 03/09/2012 06:39, wxjmfauth@gmail.com wrote:
> Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit :
>> On 02.09.12 12:52, Peter Otten wrote:
>>
>>> Ian Kelly wrote:
>>
>>>
>>
>>>> Rewriting the example to use locale.strcoll instead:
>>
>>>
>>
>>>>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll))
>>
>>>
>>
>>> There is also locale.strxfrm() which you can use directly:
>>
>>>
>>
>>> sorted(li, key=locale.strxfrm)
>>
>>
>>
>> Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.
>
> With a memory gain = 0 since my text contains non-latin-1 characters!
>
> jmf
>

This is getting really funny.  Do you make a living writing comedy for 
big film or TV studios?  Your response to Steven D'Aprano's "That's six 
wins versus one loss." should be hilarious.  Or do you not respond to 
fact based posts?

-- 
Cheers.

Mark Lawrence.

[toc] | [prev] | [next] | [standalone]

#28340

From	Peter Otten <__peter__@web.de>
Date	2012-09-03 08:15 +0200
Message-ID	<mailman.128.1346652940.27098.python-list@python.org>
In reply to	#28337

wxjmfauth@gmail.com wrote:

> Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit :

>> Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.
> 
> With a memory gain = 0 since my text contains non-latin-1 characters!

I can't confirm this. At least users of wide builds will see a decrease in 
memory use:

$ cat strxfrm_getsize.py 
import locale
import sys

print("maxunicode:", sys.maxunicode)
locale.setlocale(locale.LC_ALL, "fr_FR.UTF-8")
words = [
    'noël', 'noir', 'nœud', 'noduleux',
    'noétique', 'noèse', 'noirâtre']
print("total size of original strings:",
      sum(sys.getsizeof(s) for s in words))
print(
    "total size of transformed strings:",
    sum(sys.getsizeof(locale.strxfrm(s)) for s in words))

$ python3.2 strxfrm_getsize.py
maxunicode: 1114111
total size of original strings: 584
total size of transformed strings: 980

$ python3.3 strxfrm_getsize.py
maxunicode: 1114111
total size of original strings: 509
total size of transformed strings: 483

The situation is more complex than you suppose -- you need less dogma and 
more experiments ;)

[toc] | [prev] | [next] | [standalone]

#28344

From	Terry Reedy <tjreedy@udel.edu>
Date	2012-09-03 04:38 -0400
Message-ID	<mailman.134.1346661523.27098.python-list@python.org>
In reply to	#28337

On 9/3/2012 2:15 AM, Peter Otten wrote:
> At least users of wide builds will see a decrease in memory use:

Everyone saves because everyone uses large parts of the stdlib. When 3.3 
start up in a Windows console, there are 56 modules in sys.modules. With 
Idle, there are over 130. All the identifiers, all the global, local, 
and attribute names are present as ascii-only strings. Now multiply that 
by some reasonable average, keeping in mind that __builtins__ alone has 
148 names.

Former narrow build users gain less space but also gain the elimination 
of buggy behavior.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#28361

From	Serhiy Storchaka <storchaka@gmail.com>
Date	2012-09-03 18:56 +0300
Message-ID	<mailman.150.1346687830.27098.python-list@python.org>
In reply to	#28337

On 03.09.12 09:15, Peter Otten wrote:
> wxjmfauth@gmail.com wrote:
>> Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit :
>
>>> Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.
>>
>> With a memory gain = 0 since my text contains non-latin-1 characters!
>
> I can't confirm this. At least users of wide builds will see a decrease in
> memory use:

And only users of wide builds will see a 20% decrease in speed for this 
data (with longer strings Python 3.3 will outstrip Python 3.2). This 
happens because of the inevitable transformation UCS2 -> wchar_t and 
wchar_t -> UCS2 on platform with 4-bytes wchar_t. On Windows there 
should be no slowing down.

[toc] | [prev] | [next] | [standalone]

#28338

From	wxjmfauth@gmail.com
Date	2012-09-02 22:39 -0700
Message-ID	<mailman.126.1346650787.27098.python-list@python.org>
In reply to	#28267

Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit :
> On 02.09.12 12:52, Peter Otten wrote:
> 
> > Ian Kelly wrote:
> 
> >
> 
> >> Rewriting the example to use locale.strcoll instead:
> 
> >
> 
> >>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll))
> 
> >
> 
> > There is also locale.strxfrm() which you can use directly:
> 
> >
> 
> > sorted(li, key=locale.strxfrm)
> 
> 
> 
> Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.

With a memory gain = 0 since my text contains non-latin-1 characters!

jmf

[toc] | [prev] | [next] | [standalone]

#28268

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2012-09-02 13:23 +0100
Message-ID	<mailman.84.1346588596.27098.python-list@python.org>
In reply to	#28245

On 02/09/2012 13:00, Serhiy Storchaka wrote:
> On 02.09.12 12:52, Peter Otten wrote:
>> Ian Kelly wrote:
>>
>>> Rewriting the example to use locale.strcoll instead:
>>
>>>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll))
>>
>> There is also locale.strxfrm() which you can use directly:
>>
>> sorted(li, key=locale.strxfrm)
>
> Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.
>
>

That's it then I'm giving up with Python.  In future I'll be writing 
everything in machine code to ensure that I get the fastest possible run 
times.

-- 
Cheers.

Mark Lawrence.

[toc] | [prev] | [next] | [standalone]

#28269

From	Roy Smith <roy@panix.com>
Date	2012-09-02 08:35 -0400
Message-ID	<roy-FC61B4.08351302092012@news.panix.com>
In reply to	#28268

In article <mailman.84.1346588596.27098.python-list@python.org>,
 Mark Lawrence <breamoreboy@yahoo.co.uk> wrote:

> On 02/09/2012 13:00, Serhiy Storchaka wrote:
> > On 02.09.12 12:52, Peter Otten wrote:
> >> Ian Kelly wrote:
> >>
> >>> Rewriting the example to use locale.strcoll instead:
> >>
> >>>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll))
> >>
> >> There is also locale.strxfrm() which you can use directly:
> >>
> >> sorted(li, key=locale.strxfrm)
> >
> > Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.
> >
> >
> 
> That's it then I'm giving up with Python.  In future I'll be writing 
> everything in machine code to ensure that I get the fastest possible run 
> times.

Feh.  You software guys are always too willing to sacrifice performance 
for convenience.  If you really want speed, grab yourself a handful of 
chips and a soldering iron.

[toc] | [prev] | [next] | [standalone]

#28272

From	Ramchandra Apte <maniandram01@gmail.com>
Date	2012-09-02 06:48 -0700
Message-ID	<5c453ede-33dd-4b7f-aa53-9424224ec6c7@googlegroups.com>
In reply to	#28268

On Sunday, 2 September 2012 17:53:16 UTC+5:30, Mark Lawrence  wrote:
> On 02/09/2012 13:00, Serhiy Storchaka wrote:
> 
> > On 02.09.12 12:52, Peter Otten wrote:
> 
> >> Ian Kelly wrote:
> 
> >>
> 
> >>> Rewriting the example to use locale.strcoll instead:
> 
> >>
> 
> >>>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll))
> 
> >>
> 
> >> There is also locale.strxfrm() which you can use directly:
> 
> >>
> 
> >> sorted(li, key=locale.strxfrm)
> 
> >
> 
> > Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.
> 
> >
> 
> >
> 
> 
> 
> That's it then I'm giving up with Python.  In future I'll be writing 
> 
> everything in machine code to ensure that I get the fastest possible run 
> 
> times.
> 
> 
> 
> -- 
> 
> Cheers.
> 
> 
> 
> Mark Lawrence.

please make it  *heavily optimized* machine code

[toc] | [prev] | [next] | [standalone]

#28276

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2012-09-02 15:46 +0100
Message-ID	<mailman.90.1346597079.27098.python-list@python.org>
In reply to	#28272

On 02/09/2012 14:48, Ramchandra Apte wrote:
>
> please make it  *heavily optimized* machine code
>

Goes without saying.  First thing I'll concentrate on is removing 
superfluous newlines sent by crappy mail clients or similar.

-- 
Cheers.

Mark Lawrence.

[toc] | [prev] | [next] | [standalone]

#28273

From	Ramchandra Apte <maniandram01@gmail.com>
Date	2012-09-02 06:48 -0700
Message-ID	<mailman.87.1346593749.27098.python-list@python.org>
In reply to	#28268

On Sunday, 2 September 2012 17:53:16 UTC+5:30, Mark Lawrence  wrote:
> On 02/09/2012 13:00, Serhiy Storchaka wrote:
> 
> > On 02.09.12 12:52, Peter Otten wrote:
> 
> >> Ian Kelly wrote:
> 
> >>
> 
> >>> Rewriting the example to use locale.strcoll instead:
> 
> >>
> 
> >>>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll))
> 
> >>
> 
> >> There is also locale.strxfrm() which you can use directly:
> 
> >>
> 
> >> sorted(li, key=locale.strxfrm)
> 
> >
> 
> > Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.
> 
> >
> 
> >
> 
> 
> 
> That's it then I'm giving up with Python.  In future I'll be writing 
> 
> everything in machine code to ensure that I get the fastest possible run 
> 
> times.
> 
> 
> 
> -- 
> 
> Cheers.
> 
> 
> 
> Mark Lawrence.

please make it  *heavily optimized* machine code

[toc] | [prev] | [next] | [standalone]

#28364

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2012-09-03 12:33 -0600
Message-ID	<mailman.153.1346697242.27098.python-list@python.org>
In reply to	#28245

On Sun, Sep 2, 2012 at 6:00 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
> On 02.09.12 12:52, Peter Otten wrote:
>>
>> Ian Kelly wrote:
>>
>>> Rewriting the example to use locale.strcoll instead:
>>
>>
>>>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll))
>>
>>
>> There is also locale.strxfrm() which you can use directly:
>>
>> sorted(li, key=locale.strxfrm)
>
>
> Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.

Doh!  In Python 3.3, strcoll and strxfrm are the same speed, so I
guess that the actual optimization I'm seeing here is that in Python
3.3, cmp_to_key(strcoll) has been optimized to return strxfrm.

[toc] | [prev] | [next] | [standalone]

#28246

From	wxjmfauth@gmail.com
Date	2012-09-02 00:36 -0700
Message-ID	<mailman.63.1346571419.27098.python-list@python.org>
In reply to	#28126

Le jeudi 30 août 2012 17:01:50 UTC+2, Antoine Pitrou a écrit :
> 
> 
> I honestly suggest you shut up until you have a clue.
> 
Désolé Antoine,

I have not the knowledge to dive in the Python code,
but I know what is a character.

The coding of the characters is a domain per se,
independent from the os, from the computer languages.

Before spending time to implement a new algorithm,
maybe it is better to ask, if there is something
better than the actual schemes.

I still remember my thoughts when I read the PEP 393
discussion: "this is not logical", "they do no understand
typography", "atomic character ???", ...

Real world exemples.

>>> import libfrancais
>>> li = ['noël', 'noir', 'nœud', 'noduleux', \
...     'noétique', 'noèse', 'noirâtre']
>>> r = libfrancais.sortfr(li)
>>> r
['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir',
'noirâtre']

(cf "Le Petit Robert")

or

The *letters* satisfying the requirements of the
"Imprimerie nationale".

jmf

[toc] | [prev] | [next] | [standalone]

#28134

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2012-08-30 10:27 -0600
Message-ID	<mailman.3976.1346344057.4697.python-list@python.org>
In reply to	#28092

On Thu, Aug 30, 2012 at 2:51 AM,  <wxjmfauth@gmail.com> wrote:
> But as soon as you introduce artificially a "latin-1"
> bottleneck, all this machinery just become useless.

How is this a bottleneck?  If you removed the Latin-1 encoding
altogether and limited the flexible representation to just UCS-2 /
UCS-4, I doubt very much that you would see any significant speed
gains. The flexibility is the part that makes string creation slower,
not the Latin-1 option in particular.

> This flexible representation is working absurdly.
> It optimizes the characters you are not using (in one
> sense), it defaults to a non optimized form for the
> characters you wish to use.

I'm sure that if you wanted to you could patch Python to use Latin-9
instead.  Just be prepared for it to be slower than UCS-2, since it
would mean having to encode the code points rather than merely
truncating them.

> Pick up a random text and see the probability this
> text match the most optimized case 1 char / 1 byte,
> practically never.

Pick up a random text and see that this text matches the next most
optimized case, 1 char / 2 bytes: practically always.

> If a user will use exclusively latin-1, she/he is  better
> served by using a dedicated tool for "latin-1"

Speaker as a user who almost exclusively uses Latin-1, I strongly
disagree.  What you're describing is Python 2.x.  The user is always
almost better served by not having to worry about the full extent of
the character set their program might use.  That's why we moved to
Unicode strings in Python 3 in the first place.

> If a user will comfortably work with Unicode, she/he is
> better served by using one of this tools which is using
> properly one of the available Unicode schemes.
>
> In a funny way, this is what Python was doing and it
> performs better!

Seriously, please show us just one *real world* benchmark in which
Python 3.3 performs demonstrably worse than Python 3.2.  All you've
shown so far is this one microbenchmark of string creation that is
utterly irrelevant to actual programs.

[toc] | [prev] | [next] | [standalone]

#28317

From	Serhiy Storchaka <storchaka@gmail.com>
Date	2012-09-02 23:38 +0300
Message-ID	<mailman.115.1346618346.27098.python-list@python.org>
In reply to	#28092

On 30.08.12 09:55, Steven D'Aprano wrote:
> And Python's solution uses those: UCS-2, UCS-4, and UTF-8.

I see that this misconception widely spread. In fact Python 3.3 uses 
four kinds of ready strings.

* ASCII. All codes <= U+007F.
* UCS1. All codes <= U+00FF, at least one code > U+007F.
* UCS2. All codes <= U+FFFF, at least one code > U+00FF.
* UCS4. All codes <= U+0010FFFF, at least one code > U+FFFF.

Indexing is O(0) for any string.

Also the string can optionally cache UTF-8 and wchar_t* representation.

[toc] | [prev] | [next] | [standalone]

#28333

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2012-09-03 01:54 +0000
Message-ID	<50440de2$0$29967$c3e8da3$5496439d@news.astraweb.com>
In reply to	#28317

On Sun, 02 Sep 2012 23:38:49 +0300, Serhiy Storchaka wrote:

> On 30.08.12 09:55, Steven D'Aprano wrote:
>> And Python's solution uses those: UCS-2, UCS-4, and UTF-8.
> 
> I see that this misconception widely spread.

I am not familiar enough with the C implementation to tell what Python 
3.3 actually does, and the PEP assumes a fair amount of familiarity with 
the CPython source. So I welcome corrections.

> In fact Python 3.3 uses four kinds of ready strings.
> 
> * ASCII. All codes <= U+007F.
> * UCS1. All codes <= U+00FF, at least one code > U+007F. 
> * UCS2. All codes <= U+FFFF, at least one code > U+00FF. 
> * UCS4. All codes <= U+0010FFFF, at least one code > U+FFFF.

Where UCS1 is equivalent to Latin-1, correct?

UCS2 is what Python 3.2 narrow builds uses for all strings, including 
codes > U+FFFF using surrogate pairs.

UCS4 is what Python 3.2 wide builds uses for all strings.

This means that Python 3.3 will no longer have surrogate pairs.

Am I right?

> Indexing is O(0) for any string.

I think you mean O(1) for constant-time lookups.

> Also the string can optionally cache UTF-8 and wchar_t* representation.

Right, that's the bit that wasn't clear -- the UTF-8 data is a cache, not 
the canonical representation.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

Page 4 of 5 — ← Prev page 1 2 3 [4] 5 Next page →

csiph-web

Flexible string representation, unicode, typography, ...

Contents

#28293

#28257

#28260

#28267

#28337

#28339

#28340

#28344

#28361

#28338

#28268

#28269

#28272

#28276

#28273

#28364

#28246

#28134

#28317

#28333