Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #27730 > unrolled thread
| Started by | wxjmfauth@gmail.com |
|---|---|
| First post | 2012-08-23 05:47 -0700 |
| Last post | 2012-08-25 07:23 -0400 |
| Articles | 20 on this page of 95 — 21 participants |
Back to article view | Back to comp.lang.python
Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-23 05:47 -0700
Re: Flexible string representation, unicode, typography, ... Neil Hodgson <nhodgson@iinet.net.au> - 2012-08-23 23:57 +1000
Re: Flexible string representation, unicode, typography, ... MRAB <python@mrabarnett.plus.com> - 2012-08-23 16:11 +0100
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-23 09:19 -0600
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-23 11:33 -0700
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-23 13:22 -0600
Re: Flexible string representation, unicode, typography, ... rusi <rustompmody@gmail.com> - 2012-08-24 09:06 -0700
Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-24 17:47 +0100
Re: Flexible string representation, unicode, typography, ... Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-08-24 14:34 -0400
Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-23 20:34 +0100
Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-23 15:18 +0100
Re: Flexible string representation, unicode, typography, ... Ramchandra Apte <maniandram01@gmail.com> - 2012-08-24 07:38 -0700
Re: Flexible string representation, unicode, typography, ... Antoine Pitrou <solipsis@pitrou.net> - 2012-08-25 00:24 +0000
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 00:27 -0700
Re: Flexible string representation, unicode, typography, ... Ben Finney <ben+python@benfinney.id.au> - 2012-08-25 17:54 +1000
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 00:27 -0700
Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-25 09:58 +0100
Re: Flexible string representation, unicode, typography, ... Frank Millman <frank@chagford.com> - 2012-08-25 11:46 +0200
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 08:47 -0700
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 08:47 -0700
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-25 16:26 -0600
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 23:59 -0700
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 09:50 -0600
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 23:59 -0700
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 11:49 +0000
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 09:40 -0600
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 20:13 +0000
Re: Flexible string representation, unicode, typography, ... Dan Sommers <dan@tombstonezero.net> - 2012-08-26 13:45 -0700
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 12:16 -0700
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-27 14:14 -0600
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 13:37 -0700
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:38 -0700
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:38 -0700
Re: Flexible string representation, unicode, typography, ... Neil Hodgson <nhodgson@iinet.net.au> - 2012-08-28 09:54 +1000
Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-29 13:59 +1000
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-28 22:15 -0600
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-29 08:05 +0000
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:40 -0700
Re: Flexible string representation, unicode, typography, ... Dave Angel <d@davea.name> - 2012-08-29 08:01 -0400
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 08:43 -0700
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-30 06:55 +0000
Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-30 18:59 +1000
Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-08-30 07:02 -0400
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-30 16:00 +0000
Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-08-30 16:44 -0400
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-31 12:32 +0000
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-31 09:13 -0600
Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-08-31 08:43 -0400
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-31 14:54 +0000
Re: Flexible string representation, unicode, typography, ... Antoine Pitrou <solipsis@pitrou.net> - 2012-08-30 15:01 +0000
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 00:36 -0700
Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 09:58 +0100
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-09-02 03:06 -0600
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 11:58 -0700
Re: Flexible string representation, unicode, typography, ... Michael Torrie <torriem@gmail.com> - 2012-09-02 13:45 -0600
Re: Flexible string representation, unicode, typography, ... Dave Angel <d@davea.name> - 2012-09-02 16:07 -0400
Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-02 16:38 -0400
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-03 01:42 +0000
Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:26 +0300
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-04 00:53 +0000
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 11:58 -0700
Re: Flexible string representation, unicode, typography, ... Peter Otten <__peter__@web.de> - 2012-09-02 11:52 +0200
Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 11:36 +0100
Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-02 15:00 +0300
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 22:39 -0700
Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-03 07:11 +0100
Re: Flexible string representation, unicode, typography, ... Peter Otten <__peter__@web.de> - 2012-09-03 08:15 +0200
Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-03 04:38 -0400
Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:56 +0300
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 22:39 -0700
Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 13:23 +0100
Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-09-02 08:35 -0400
Re: Flexible string representation, unicode, typography, ... Ramchandra Apte <maniandram01@gmail.com> - 2012-09-02 06:48 -0700
Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 15:46 +0100
Re: Flexible string representation, unicode, typography, ... Ramchandra Apte <maniandram01@gmail.com> - 2012-09-02 06:48 -0700
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-09-03 12:33 -0600
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 00:36 -0700
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-30 10:27 -0600
Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-02 23:38 +0300
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-03 01:54 +0000
Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-02 22:33 -0400
Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-09-03 11:24 -0400
Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:41 +0300
Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 00:45 +0300
Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-30 01:54 +1000
Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-29 22:34 +1000
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:40 -0700
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 12:16 -0700
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 15:42 -0600
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 23:31 +0000
Re: Flexible string representation, unicode, typography, ... Paul Rubin <no.email@nospam.invalid> - 2012-08-26 17:47 -0700
Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-25 21:04 +1000
Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-25 12:05 +0100
Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-25 21:19 +1000
Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-08-25 07:23 -0400
Page 4 of 5 — ← Prev page 1 2 3 [4] 5 Next page →
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-09-02 11:58 -0700 |
| Message-ID | <mailman.102.1346612296.27098.python-list@python.org> |
| In reply to | #28251 |
Le dimanche 2 septembre 2012 11:07:35 UTC+2, Ian a écrit :
> On Sun, Sep 2, 2012 at 1:36 AM, <wxjmfauth@gmail.com> wrote:
>
> > I still remember my thoughts when I read the PEP 393
>
> > discussion: "this is not logical", "they do no understand
>
> > typography", "atomic character ???", ...
>
>
>
> That would indicate one of two possibilities. Either:
>
>
>
> 1) Everybody in the PEP 393 discussion except for you is clueless
>
> about how to implement a Unicode type; or
>
>
>
> 2) You are clueless about how to implement a Unicode type.
>
>
>
> Taking into account Occam's razor, and also that you seem to be unable
>
> or unwilling to offer a solid rationale for those thoughts, I have to
>
> say that I'm currently leaning toward the second possibility.
>
>
>
>
>
> > Real world exemples.
>
> >
>
> >>>> import libfrancais
>
> >>>> li = ['noël', 'noir', 'nœud', 'noduleux', \
>
> > ... 'noétique', 'noèse', 'noirâtre']
>
> >>>> r = libfrancais.sortfr(li)
>
> >>>> r
>
> > ['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir',
>
> > 'noirâtre']
>
>
>
> libfrancais does not appear to be publicly available. It's not listed
>
> in PyPI, and googling for "python libfrancais" turns up nothing
>
> relevant.
>
>
>
> Rewriting the example to use locale.strcoll instead:
>
>
>
> >>> li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']
>
> >>> import locale
>
> >>> locale.setlocale(locale.LC_ALL, 'French_France')
>
> 'French_France.1252'
>
> >>> import functools
>
> >>> sorted(li, key=functools.cmp_to_key(locale.strcoll))
>
> ['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir', 'noirâtre']
>
>
>
> # Python 3.2
>
> >>> import timeit
>
> >>> timeit.repeat("sorted(li, key=functools.cmp_to_key(locale.strcoll))", "import functools; import locale; li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']", number=10000)
>
> [0.5544277025009592, 0.5370117249557325, 0.5551836677925053]
>
>
>
> # Python 3.3
>
> >>> import timeit
>
> >>> timeit.repeat("sorted(li, key=functools.cmp_to_key(locale.strcoll))", "import functools; import locale; li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']", number=10000)
>
> [0.1421166788364303, 0.12389078130001963, 0.13184190553613462]
>
>
> As you can see, Python 3.3 is about 77% faster than Python 3.2 on this
>
> example. If this was intended to show that the Python 3.3 Unicode
>
> representation is a regression over the Python 3.2 implementation,
>
> then it's a complete failure as an example.
- Unfortunately, I got opposite and even much worst results on my win box,
considering
- libfrancais is one of my module and it does a little bit more than
the std sorting tools.
My rationale: very simple.
1) I never heard about something better than sticking with one
of the Unicode coding scheme. (genreral theory)
2) I am not at all convinced by the "new" Py 3.3 algorithm. I'm not the
only one guy, who noticed problems. Arguing, "it is fast enough", is not
a correct answer.
jmf
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2012-09-02 11:52 +0200 |
| Message-ID | <mailman.74.1346579541.27098.python-list@python.org> |
| In reply to | #28245 |
Ian Kelly wrote: > Rewriting the example to use locale.strcoll instead: >>>> sorted(li, key=functools.cmp_to_key(locale.strcoll)) There is also locale.strxfrm() which you can use directly: sorted(li, key=locale.strxfrm)
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2012-09-02 11:36 +0100 |
| Message-ID | <mailman.76.1346582082.27098.python-list@python.org> |
| In reply to | #28245 |
I've found the white paper which gives the technical basis for the claims made by jmf so thought I'd better share in order to explain his rationale. http://www.montypython.net/scripts/right-think.php -- Cheers. Mark Lawrence.
[toc] | [prev] | [next] | [standalone]
| From | Serhiy Storchaka <storchaka@gmail.com> |
|---|---|
| Date | 2012-09-02 15:00 +0300 |
| Message-ID | <mailman.83.1346587277.27098.python-list@python.org> |
| In reply to | #28245 |
On 02.09.12 12:52, Peter Otten wrote: > Ian Kelly wrote: > >> Rewriting the example to use locale.strcoll instead: > >>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll)) > > There is also locale.strxfrm() which you can use directly: > > sorted(li, key=locale.strxfrm) Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-09-02 22:39 -0700 |
| Message-ID | <b7514131-3162-4c6f-909c-52df5d666992@googlegroups.com> |
| In reply to | #28267 |
Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit : > On 02.09.12 12:52, Peter Otten wrote: > > > Ian Kelly wrote: > > > > > >> Rewriting the example to use locale.strcoll instead: > > > > > >>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll)) > > > > > > There is also locale.strxfrm() which you can use directly: > > > > > > sorted(li, key=locale.strxfrm) > > > > Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2. With a memory gain = 0 since my text contains non-latin-1 characters! jmf
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2012-09-03 07:11 +0100 |
| Message-ID | <mailman.127.1346652593.27098.python-list@python.org> |
| In reply to | #28337 |
On 03/09/2012 06:39, wxjmfauth@gmail.com wrote: > Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit : >> On 02.09.12 12:52, Peter Otten wrote: >> >>> Ian Kelly wrote: >> >>> >> >>>> Rewriting the example to use locale.strcoll instead: >> >>> >> >>>>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll)) >> >>> >> >>> There is also locale.strxfrm() which you can use directly: >> >>> >> >>> sorted(li, key=locale.strxfrm) >> >> >> >> Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2. > > With a memory gain = 0 since my text contains non-latin-1 characters! > > jmf > This is getting really funny. Do you make a living writing comedy for big film or TV studios? Your response to Steven D'Aprano's "That's six wins versus one loss." should be hilarious. Or do you not respond to fact based posts? -- Cheers. Mark Lawrence.
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2012-09-03 08:15 +0200 |
| Message-ID | <mailman.128.1346652940.27098.python-list@python.org> |
| In reply to | #28337 |
wxjmfauth@gmail.com wrote:
> Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit :
>> Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.
>
> With a memory gain = 0 since my text contains non-latin-1 characters!
I can't confirm this. At least users of wide builds will see a decrease in
memory use:
$ cat strxfrm_getsize.py
import locale
import sys
print("maxunicode:", sys.maxunicode)
locale.setlocale(locale.LC_ALL, "fr_FR.UTF-8")
words = [
'noël', 'noir', 'nœud', 'noduleux',
'noétique', 'noèse', 'noirâtre']
print("total size of original strings:",
sum(sys.getsizeof(s) for s in words))
print(
"total size of transformed strings:",
sum(sys.getsizeof(locale.strxfrm(s)) for s in words))
$ python3.2 strxfrm_getsize.py
maxunicode: 1114111
total size of original strings: 584
total size of transformed strings: 980
$ python3.3 strxfrm_getsize.py
maxunicode: 1114111
total size of original strings: 509
total size of transformed strings: 483
The situation is more complex than you suppose -- you need less dogma and
more experiments ;)
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2012-09-03 04:38 -0400 |
| Message-ID | <mailman.134.1346661523.27098.python-list@python.org> |
| In reply to | #28337 |
On 9/3/2012 2:15 AM, Peter Otten wrote: > At least users of wide builds will see a decrease in memory use: Everyone saves because everyone uses large parts of the stdlib. When 3.3 start up in a Windows console, there are 56 modules in sys.modules. With Idle, there are over 130. All the identifiers, all the global, local, and attribute names are present as ascii-only strings. Now multiply that by some reasonable average, keeping in mind that __builtins__ alone has 148 names. Former narrow build users gain less space but also gain the elimination of buggy behavior. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Serhiy Storchaka <storchaka@gmail.com> |
|---|---|
| Date | 2012-09-03 18:56 +0300 |
| Message-ID | <mailman.150.1346687830.27098.python-list@python.org> |
| In reply to | #28337 |
On 03.09.12 09:15, Peter Otten wrote: > wxjmfauth@gmail.com wrote: >> Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit : > >>> Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2. >> >> With a memory gain = 0 since my text contains non-latin-1 characters! > > I can't confirm this. At least users of wide builds will see a decrease in > memory use: And only users of wide builds will see a 20% decrease in speed for this data (with longer strings Python 3.3 will outstrip Python 3.2). This happens because of the inevitable transformation UCS2 -> wchar_t and wchar_t -> UCS2 on platform with 4-bytes wchar_t. On Windows there should be no slowing down.
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-09-02 22:39 -0700 |
| Message-ID | <mailman.126.1346650787.27098.python-list@python.org> |
| In reply to | #28267 |
Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit : > On 02.09.12 12:52, Peter Otten wrote: > > > Ian Kelly wrote: > > > > > >> Rewriting the example to use locale.strcoll instead: > > > > > >>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll)) > > > > > > There is also locale.strxfrm() which you can use directly: > > > > > > sorted(li, key=locale.strxfrm) > > > > Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2. With a memory gain = 0 since my text contains non-latin-1 characters! jmf
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2012-09-02 13:23 +0100 |
| Message-ID | <mailman.84.1346588596.27098.python-list@python.org> |
| In reply to | #28245 |
On 02/09/2012 13:00, Serhiy Storchaka wrote: > On 02.09.12 12:52, Peter Otten wrote: >> Ian Kelly wrote: >> >>> Rewriting the example to use locale.strcoll instead: >> >>>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll)) >> >> There is also locale.strxfrm() which you can use directly: >> >> sorted(li, key=locale.strxfrm) > > Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2. > > That's it then I'm giving up with Python. In future I'll be writing everything in machine code to ensure that I get the fastest possible run times. -- Cheers. Mark Lawrence.
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2012-09-02 08:35 -0400 |
| Message-ID | <roy-FC61B4.08351302092012@news.panix.com> |
| In reply to | #28268 |
In article <mailman.84.1346588596.27098.python-list@python.org>, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote: > On 02/09/2012 13:00, Serhiy Storchaka wrote: > > On 02.09.12 12:52, Peter Otten wrote: > >> Ian Kelly wrote: > >> > >>> Rewriting the example to use locale.strcoll instead: > >> > >>>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll)) > >> > >> There is also locale.strxfrm() which you can use directly: > >> > >> sorted(li, key=locale.strxfrm) > > > > Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2. > > > > > > That's it then I'm giving up with Python. In future I'll be writing > everything in machine code to ensure that I get the fastest possible run > times. Feh. You software guys are always too willing to sacrifice performance for convenience. If you really want speed, grab yourself a handful of chips and a soldering iron.
[toc] | [prev] | [next] | [standalone]
| From | Ramchandra Apte <maniandram01@gmail.com> |
|---|---|
| Date | 2012-09-02 06:48 -0700 |
| Message-ID | <5c453ede-33dd-4b7f-aa53-9424224ec6c7@googlegroups.com> |
| In reply to | #28268 |
On Sunday, 2 September 2012 17:53:16 UTC+5:30, Mark Lawrence wrote: > On 02/09/2012 13:00, Serhiy Storchaka wrote: > > > On 02.09.12 12:52, Peter Otten wrote: > > >> Ian Kelly wrote: > > >> > > >>> Rewriting the example to use locale.strcoll instead: > > >> > > >>>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll)) > > >> > > >> There is also locale.strxfrm() which you can use directly: > > >> > > >> sorted(li, key=locale.strxfrm) > > > > > > Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2. > > > > > > > > > > That's it then I'm giving up with Python. In future I'll be writing > > everything in machine code to ensure that I get the fastest possible run > > times. > > > > -- > > Cheers. > > > > Mark Lawrence. please make it *heavily optimized* machine code
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2012-09-02 15:46 +0100 |
| Message-ID | <mailman.90.1346597079.27098.python-list@python.org> |
| In reply to | #28272 |
On 02/09/2012 14:48, Ramchandra Apte wrote: > > please make it *heavily optimized* machine code > Goes without saying. First thing I'll concentrate on is removing superfluous newlines sent by crappy mail clients or similar. -- Cheers. Mark Lawrence.
[toc] | [prev] | [next] | [standalone]
| From | Ramchandra Apte <maniandram01@gmail.com> |
|---|---|
| Date | 2012-09-02 06:48 -0700 |
| Message-ID | <mailman.87.1346593749.27098.python-list@python.org> |
| In reply to | #28268 |
On Sunday, 2 September 2012 17:53:16 UTC+5:30, Mark Lawrence wrote: > On 02/09/2012 13:00, Serhiy Storchaka wrote: > > > On 02.09.12 12:52, Peter Otten wrote: > > >> Ian Kelly wrote: > > >> > > >>> Rewriting the example to use locale.strcoll instead: > > >> > > >>>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll)) > > >> > > >> There is also locale.strxfrm() which you can use directly: > > >> > > >> sorted(li, key=locale.strxfrm) > > > > > > Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2. > > > > > > > > > > That's it then I'm giving up with Python. In future I'll be writing > > everything in machine code to ensure that I get the fastest possible run > > times. > > > > -- > > Cheers. > > > > Mark Lawrence. please make it *heavily optimized* machine code
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2012-09-03 12:33 -0600 |
| Message-ID | <mailman.153.1346697242.27098.python-list@python.org> |
| In reply to | #28245 |
On Sun, Sep 2, 2012 at 6:00 AM, Serhiy Storchaka <storchaka@gmail.com> wrote: > On 02.09.12 12:52, Peter Otten wrote: >> >> Ian Kelly wrote: >> >>> Rewriting the example to use locale.strcoll instead: >> >> >>>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll)) >> >> >> There is also locale.strxfrm() which you can use directly: >> >> sorted(li, key=locale.strxfrm) > > > Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2. Doh! In Python 3.3, strcoll and strxfrm are the same speed, so I guess that the actual optimization I'm seeing here is that in Python 3.3, cmp_to_key(strcoll) has been optimized to return strxfrm.
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-09-02 00:36 -0700 |
| Message-ID | <mailman.63.1346571419.27098.python-list@python.org> |
| In reply to | #28126 |
Le jeudi 30 août 2012 17:01:50 UTC+2, Antoine Pitrou a écrit : > > > I honestly suggest you shut up until you have a clue. > Désolé Antoine, I have not the knowledge to dive in the Python code, but I know what is a character. The coding of the characters is a domain per se, independent from the os, from the computer languages. Before spending time to implement a new algorithm, maybe it is better to ask, if there is something better than the actual schemes. I still remember my thoughts when I read the PEP 393 discussion: "this is not logical", "they do no understand typography", "atomic character ???", ... Real world exemples. >>> import libfrancais >>> li = ['noël', 'noir', 'nœud', 'noduleux', \ ... 'noétique', 'noèse', 'noirâtre'] >>> r = libfrancais.sortfr(li) >>> r ['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir', 'noirâtre'] (cf "Le Petit Robert") or The *letters* satisfying the requirements of the "Imprimerie nationale". jmf
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2012-08-30 10:27 -0600 |
| Message-ID | <mailman.3976.1346344057.4697.python-list@python.org> |
| In reply to | #28092 |
On Thu, Aug 30, 2012 at 2:51 AM, <wxjmfauth@gmail.com> wrote: > But as soon as you introduce artificially a "latin-1" > bottleneck, all this machinery just become useless. How is this a bottleneck? If you removed the Latin-1 encoding altogether and limited the flexible representation to just UCS-2 / UCS-4, I doubt very much that you would see any significant speed gains. The flexibility is the part that makes string creation slower, not the Latin-1 option in particular. > This flexible representation is working absurdly. > It optimizes the characters you are not using (in one > sense), it defaults to a non optimized form for the > characters you wish to use. I'm sure that if you wanted to you could patch Python to use Latin-9 instead. Just be prepared for it to be slower than UCS-2, since it would mean having to encode the code points rather than merely truncating them. > Pick up a random text and see the probability this > text match the most optimized case 1 char / 1 byte, > practically never. Pick up a random text and see that this text matches the next most optimized case, 1 char / 2 bytes: practically always. > If a user will use exclusively latin-1, she/he is better > served by using a dedicated tool for "latin-1" Speaker as a user who almost exclusively uses Latin-1, I strongly disagree. What you're describing is Python 2.x. The user is always almost better served by not having to worry about the full extent of the character set their program might use. That's why we moved to Unicode strings in Python 3 in the first place. > If a user will comfortably work with Unicode, she/he is > better served by using one of this tools which is using > properly one of the available Unicode schemes. > > In a funny way, this is what Python was doing and it > performs better! Seriously, please show us just one *real world* benchmark in which Python 3.3 performs demonstrably worse than Python 3.2. All you've shown so far is this one microbenchmark of string creation that is utterly irrelevant to actual programs.
[toc] | [prev] | [next] | [standalone]
| From | Serhiy Storchaka <storchaka@gmail.com> |
|---|---|
| Date | 2012-09-02 23:38 +0300 |
| Message-ID | <mailman.115.1346618346.27098.python-list@python.org> |
| In reply to | #28092 |
On 30.08.12 09:55, Steven D'Aprano wrote: > And Python's solution uses those: UCS-2, UCS-4, and UTF-8. I see that this misconception widely spread. In fact Python 3.3 uses four kinds of ready strings. * ASCII. All codes <= U+007F. * UCS1. All codes <= U+00FF, at least one code > U+007F. * UCS2. All codes <= U+FFFF, at least one code > U+00FF. * UCS4. All codes <= U+0010FFFF, at least one code > U+FFFF. Indexing is O(0) for any string. Also the string can optionally cache UTF-8 and wchar_t* representation.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-09-03 01:54 +0000 |
| Message-ID | <50440de2$0$29967$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #28317 |
On Sun, 02 Sep 2012 23:38:49 +0300, Serhiy Storchaka wrote: > On 30.08.12 09:55, Steven D'Aprano wrote: >> And Python's solution uses those: UCS-2, UCS-4, and UTF-8. > > I see that this misconception widely spread. I am not familiar enough with the C implementation to tell what Python 3.3 actually does, and the PEP assumes a fair amount of familiarity with the CPython source. So I welcome corrections. > In fact Python 3.3 uses four kinds of ready strings. > > * ASCII. All codes <= U+007F. > * UCS1. All codes <= U+00FF, at least one code > U+007F. > * UCS2. All codes <= U+FFFF, at least one code > U+00FF. > * UCS4. All codes <= U+0010FFFF, at least one code > U+FFFF. Where UCS1 is equivalent to Latin-1, correct? UCS2 is what Python 3.2 narrow builds uses for all strings, including codes > U+FFFF using surrogate pairs. UCS4 is what Python 3.2 wide builds uses for all strings. This means that Python 3.3 will no longer have surrogate pairs. Am I right? > Indexing is O(0) for any string. I think you mean O(1) for constant-time lookups. > Also the string can optionally cache UTF-8 and wchar_t* representation. Right, that's the bit that wasn't clear -- the UTF-8 data is a cache, not the canonical representation. -- Steven
[toc] | [prev] | [next] | [standalone]
Page 4 of 5 — ← Prev page 1 2 3 [4] 5 Next page →
Back to top | Article view | comp.lang.python
csiph-web