Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #27843 > unrolled thread
| Started by | Antoine Pitrou <solipsis@pitrou.net> |
|---|---|
| First post | 2012-08-25 00:24 +0000 |
| Last post | 2012-08-25 07:23 -0400 |
| Articles | 20 on this page of 83 — 18 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Flexible string representation, unicode, typography, ... Antoine Pitrou <solipsis@pitrou.net> - 2012-08-25 00:24 +0000
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 00:27 -0700
Re: Flexible string representation, unicode, typography, ... Ben Finney <ben+python@benfinney.id.au> - 2012-08-25 17:54 +1000
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 00:27 -0700
Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-25 09:58 +0100
Re: Flexible string representation, unicode, typography, ... Frank Millman <frank@chagford.com> - 2012-08-25 11:46 +0200
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 08:47 -0700
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 08:47 -0700
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-25 16:26 -0600
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 23:59 -0700
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 09:50 -0600
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-25 23:59 -0700
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 11:49 +0000
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 09:40 -0600
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 20:13 +0000
Re: Flexible string representation, unicode, typography, ... Dan Sommers <dan@tombstonezero.net> - 2012-08-26 13:45 -0700
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 12:16 -0700
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-27 14:14 -0600
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 13:37 -0700
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:38 -0700
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:38 -0700
Re: Flexible string representation, unicode, typography, ... Neil Hodgson <nhodgson@iinet.net.au> - 2012-08-28 09:54 +1000
Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-29 13:59 +1000
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-28 22:15 -0600
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-29 08:05 +0000
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:40 -0700
Re: Flexible string representation, unicode, typography, ... Dave Angel <d@davea.name> - 2012-08-29 08:01 -0400
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 08:43 -0700
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-30 06:55 +0000
Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-30 18:59 +1000
Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-08-30 07:02 -0400
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-30 16:00 +0000
Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-08-30 16:44 -0400
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-31 12:32 +0000
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-31 09:13 -0600
Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-08-31 08:43 -0400
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-31 14:54 +0000
Re: Flexible string representation, unicode, typography, ... Antoine Pitrou <solipsis@pitrou.net> - 2012-08-30 15:01 +0000
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 00:36 -0700
Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 09:58 +0100
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-09-02 03:06 -0600
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 11:58 -0700
Re: Flexible string representation, unicode, typography, ... Michael Torrie <torriem@gmail.com> - 2012-09-02 13:45 -0600
Re: Flexible string representation, unicode, typography, ... Dave Angel <d@davea.name> - 2012-09-02 16:07 -0400
Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-02 16:38 -0400
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-03 01:42 +0000
Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:26 +0300
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-04 00:53 +0000
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 11:58 -0700
Re: Flexible string representation, unicode, typography, ... Peter Otten <__peter__@web.de> - 2012-09-02 11:52 +0200
Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 11:36 +0100
Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-02 15:00 +0300
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 22:39 -0700
Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-03 07:11 +0100
Re: Flexible string representation, unicode, typography, ... Peter Otten <__peter__@web.de> - 2012-09-03 08:15 +0200
Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-03 04:38 -0400
Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:56 +0300
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 22:39 -0700
Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 13:23 +0100
Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-09-02 08:35 -0400
Re: Flexible string representation, unicode, typography, ... Ramchandra Apte <maniandram01@gmail.com> - 2012-09-02 06:48 -0700
Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-02 15:46 +0100
Re: Flexible string representation, unicode, typography, ... Ramchandra Apte <maniandram01@gmail.com> - 2012-09-02 06:48 -0700
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-09-03 12:33 -0600
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-09-02 00:36 -0700
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-30 10:27 -0600
Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-02 23:38 +0300
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-03 01:54 +0000
Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-09-02 22:33 -0400
Re: Flexible string representation, unicode, typography, ... Roy Smith <roy@panix.com> - 2012-09-03 11:24 -0400
Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 18:41 +0300
Re: Flexible string representation, unicode, typography, ... Serhiy Storchaka <storchaka@gmail.com> - 2012-09-03 00:45 +0300
Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-30 01:54 +1000
Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-29 22:34 +1000
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-29 04:40 -0700
Re: Flexible string representation, unicode, typography, ... wxjmfauth@gmail.com - 2012-08-27 12:16 -0700
Re: Flexible string representation, unicode, typography, ... Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-26 15:42 -0600
Re: Flexible string representation, unicode, typography, ... Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-26 23:31 +0000
Re: Flexible string representation, unicode, typography, ... Paul Rubin <no.email@nospam.invalid> - 2012-08-26 17:47 -0700
Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-25 21:04 +1000
Re: Flexible string representation, unicode, typography, ... Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-25 12:05 +0100
Re: Flexible string representation, unicode, typography, ... Chris Angelico <rosuav@gmail.com> - 2012-08-25 21:19 +1000
Re: Flexible string representation, unicode, typography, ... Terry Reedy <tjreedy@udel.edu> - 2012-08-25 07:23 -0400
Page 3 of 5 — ← Prev page 1 2 [3] 4 5 Next page →
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2012-09-02 03:06 -0600 |
| Message-ID | <mailman.68.1346576808.27098.python-list@python.org> |
| In reply to | #28245 |
On Sun, Sep 2, 2012 at 1:36 AM, <wxjmfauth@gmail.com> wrote:
> I still remember my thoughts when I read the PEP 393
> discussion: "this is not logical", "they do no understand
> typography", "atomic character ???", ...
That would indicate one of two possibilities. Either:
1) Everybody in the PEP 393 discussion except for you is clueless
about how to implement a Unicode type; or
2) You are clueless about how to implement a Unicode type.
Taking into account Occam's razor, and also that you seem to be unable
or unwilling to offer a solid rationale for those thoughts, I have to
say that I'm currently leaning toward the second possibility.
> Real world exemples.
>
>>>> import libfrancais
>>>> li = ['noël', 'noir', 'nœud', 'noduleux', \
> ... 'noétique', 'noèse', 'noirâtre']
>>>> r = libfrancais.sortfr(li)
>>>> r
> ['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir',
> 'noirâtre']
libfrancais does not appear to be publicly available. It's not listed
in PyPI, and googling for "python libfrancais" turns up nothing
relevant.
Rewriting the example to use locale.strcoll instead:
>>> li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'French_France')
'French_France.1252'
>>> import functools
>>> sorted(li, key=functools.cmp_to_key(locale.strcoll))
['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir', 'noirâtre']
# Python 3.2
>>> import timeit
>>> timeit.repeat("sorted(li, key=functools.cmp_to_key(locale.strcoll))", "import functools; import locale; li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']", number=10000)
[0.5544277025009592, 0.5370117249557325, 0.5551836677925053]
# Python 3.3
>>> import timeit
>>> timeit.repeat("sorted(li, key=functools.cmp_to_key(locale.strcoll))", "import functools; import locale; li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']", number=10000)
[0.1421166788364303, 0.12389078130001963, 0.13184190553613462]
As you can see, Python 3.3 is about 77% faster than Python 3.2 on this
example. If this was intended to show that the Python 3.3 Unicode
representation is a regression over the Python 3.2 implementation,
then it's a complete failure as an example.
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-09-02 11:58 -0700 |
| Message-ID | <f8dfb1ca-e48d-4a2f-baed-3c28a2f89777@googlegroups.com> |
| In reply to | #28251 |
Le dimanche 2 septembre 2012 11:07:35 UTC+2, Ian a écrit :
> On Sun, Sep 2, 2012 at 1:36 AM, <wxjmfauth@gmail.com> wrote:
>
> > I still remember my thoughts when I read the PEP 393
>
> > discussion: "this is not logical", "they do no understand
>
> > typography", "atomic character ???", ...
>
>
>
> That would indicate one of two possibilities. Either:
>
>
>
> 1) Everybody in the PEP 393 discussion except for you is clueless
>
> about how to implement a Unicode type; or
>
>
>
> 2) You are clueless about how to implement a Unicode type.
>
>
>
> Taking into account Occam's razor, and also that you seem to be unable
>
> or unwilling to offer a solid rationale for those thoughts, I have to
>
> say that I'm currently leaning toward the second possibility.
>
>
>
>
>
> > Real world exemples.
>
> >
>
> >>>> import libfrancais
>
> >>>> li = ['noël', 'noir', 'nœud', 'noduleux', \
>
> > ... 'noétique', 'noèse', 'noirâtre']
>
> >>>> r = libfrancais.sortfr(li)
>
> >>>> r
>
> > ['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir',
>
> > 'noirâtre']
>
>
>
> libfrancais does not appear to be publicly available. It's not listed
>
> in PyPI, and googling for "python libfrancais" turns up nothing
>
> relevant.
>
>
>
> Rewriting the example to use locale.strcoll instead:
>
>
>
> >>> li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']
>
> >>> import locale
>
> >>> locale.setlocale(locale.LC_ALL, 'French_France')
>
> 'French_France.1252'
>
> >>> import functools
>
> >>> sorted(li, key=functools.cmp_to_key(locale.strcoll))
>
> ['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir', 'noirâtre']
>
>
>
> # Python 3.2
>
> >>> import timeit
>
> >>> timeit.repeat("sorted(li, key=functools.cmp_to_key(locale.strcoll))", "import functools; import locale; li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']", number=10000)
>
> [0.5544277025009592, 0.5370117249557325, 0.5551836677925053]
>
>
>
> # Python 3.3
>
> >>> import timeit
>
> >>> timeit.repeat("sorted(li, key=functools.cmp_to_key(locale.strcoll))", "import functools; import locale; li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']", number=10000)
>
> [0.1421166788364303, 0.12389078130001963, 0.13184190553613462]
>
>
> As you can see, Python 3.3 is about 77% faster than Python 3.2 on this
>
> example. If this was intended to show that the Python 3.3 Unicode
>
> representation is a regression over the Python 3.2 implementation,
>
> then it's a complete failure as an example.
- Unfortunately, I got opposite and even much worst results on my win box,
considering
- libfrancais is one of my module and it does a little bit more than
the std sorting tools.
My rationale: very simple.
1) I never heard about something better than sticking with one
of the Unicode coding scheme. (genreral theory)
2) I am not at all convinced by the "new" Py 3.3 algorithm. I'm not the
only one guy, who noticed problems. Arguing, "it is fast enough", is not
a correct answer.
jmf
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2012-09-02 13:45 -0600 |
| Message-ID | <mailman.106.1346615114.27098.python-list@python.org> |
| In reply to | #28292 |
On 09/02/2012 12:58 PM, wxjmfauth@gmail.com wrote: > My rationale: very simple. > > 1) I never heard about something better than sticking with one > of the Unicode coding scheme. (genreral theory) > 2) I am not at all convinced by the "new" Py 3.3 algorithm. I'm not the > only one guy, who noticed problems. Arguing, "it is fast enough", is not > a correct answer. If this is true, why were you holding ho Google Go as an example of doing it right? Certainly Google Go doesn't line up with your rational. Go has both Strings and Runes. But strings are UTF-8-encoded bytes strings and Runes are 32-bit integers. They are not interchangeable without a costly encoding and decoding process. Even worse, indexing a Go string to get a "Rune" involves some very costly decoding that has to be done starting at the beginning of the string each time. In the worst case, Python's strings are as slow as Go because Python does the exact same thing as Go, but chooses between three encodings instead of just one. Best case scenario, Python's strings could be much faster than Go's because indexing through 2 of the 3 encodings is O(1) because they are constant-width encodings. If as you say, the latin-1 subset of UTF-8 is used, then UTF-8 indexing is O(1) too, otherwise it's probably O(n).
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2012-09-02 16:07 -0400 |
| Message-ID | <mailman.108.1346616485.27098.python-list@python.org> |
| In reply to | #28292 |
On 09/02/2012 03:45 PM, Michael Torrie wrote: > <jmfauth snipped>: > In the worst case, Python's strings are as slow as Go because Python > does the exact same thing as Go, but chooses between three encodings > instead of just one. Best case scenario, Python's strings could be > much faster than Go's because indexing through 2 of the 3 encodings is > O(1) because they are constant-width encodings. If as you say, the > latin-1 subset of UTF-8 is used, then UTF-8 indexing is O(1) too, > otherwise it's probably O(n). I'm afraid you have it backwards. the Utf-8 version of the latin-1-compatible characters would be variable length. But my understanding of the pep is that the internal one-byte format is simply the lowest order byte of each code point, after assuring that all code points in the particular string are less than 256. That's going to coincidentally resemble latin-1's encoding, but since it's an internal form, the resemblance is irrelevant. Anyway, those one-byte values are going to be O(1), naturally. No encoding involved, and no searching nor expanding. -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2012-09-02 16:38 -0400 |
| Message-ID | <mailman.114.1346618335.27098.python-list@python.org> |
| In reply to | #28292 |
On 9/2/2012 3:45 PM, Michael Torrie wrote: > In the worst case, Python's strings are as slow as Go because Python > does the exact same thing as Go, but chooses between three encodings > instead of just one. Best case scenario, Python's strings could be much > faster than Go's because indexing through 2 of the 3 encodings is O(1) In CPython 3.3, indexing of str text string objects is always O(1) and it is always indexes and counts code points rather than code units. It was the latter for narrow builds in 3.2 and before. As a result, single character (code point) strings had a length of 2 rather than 1 for extended plane characters. 3.3 corrects this. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-09-03 01:42 +0000 |
| Message-ID | <50440af0$0$29967$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #28292 |
On Sun, 02 Sep 2012 11:58:08 -0700, wxjmfauth wrote: > - Unfortunately, I got opposite and even much worst results on my win > box, considering > - libfrancais is one of my module and it does a little bit more than the > std sorting tools. How do we know that the problem isn't in your module? > My rationale: very simple. > > 1) I never heard about something better than sticking with one of the > Unicode coding scheme. (genreral theory) Your ignorance is not a good reason for abandoning a powerful software technique. 2) I am not at all convinced by > the "new" Py 3.3 algorithm. I'm not the only one guy, who noticed > problems. That's nice. Nobody has yet displayed genuine performance problems, only artificial and platform-dependent slowdowns that are insignificant in practice. If you can demonstrate genuine problems, people will be interested in fixing them. Let me be frank: nobody gives a damn if, for some rare circumstances, some_string.replace(another_string) takes 0.3μs instead of 0.1μs. Overall, considering multiple platforms and dozens of different string operations, PEP 393 is a big win: - many operations are faster - a few operations are a LOT faster - but a very few operations are sometimes slower - many strings will use less memory - sometimes a LOT less memory - no more distinction between wide and narrow builds - characters in the supplementary planes are now, for the first time in Python, treated correctly by default That's six wins versus one loss. > Arguing, "it is fast enough", is not a correct answer. It is *exactly* the correct answer. Nobody is going to revert this just because your script now runs in 5.7ms instead of 5.2ms. Who cares? If you are *seriously* interested in debugging why string code is slower for you, you can start by running the full suite of Python string benchmarks: see the stringbench benchmark in the Tools directory of source installations, or see here: http://hg.python.org/cpython/file/8ff2f4634ed8/Tools/stringbench -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Serhiy Storchaka <storchaka@gmail.com> |
|---|---|
| Date | 2012-09-03 18:26 +0300 |
| Message-ID | <mailman.147.1346686000.27098.python-list@python.org> |
| In reply to | #28332 |
On 03.09.12 04:42, Steven D'Aprano wrote: > If you are *seriously* interested in debugging why string code is slower > for you, you can start by running the full suite of Python string > benchmarks: see the stringbench benchmark in the Tools directory of > source installations, or see here: > > http://hg.python.org/cpython/file/8ff2f4634ed8/Tools/stringbench http://hg.python.org/cpython/file/default/Tools/stringbench However, stringbench is not good tool to measure the effectiveness of new string representation, because it focuses mainly on ASCII strings and comparing strings with bytes.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-09-04 00:53 +0000 |
| Message-ID | <504550ff$0$29978$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #28359 |
On Mon, 03 Sep 2012 18:26:02 +0300, Serhiy Storchaka wrote: > On 03.09.12 04:42, Steven D'Aprano wrote: >> If you are *seriously* interested in debugging why string code is >> slower for you, you can start by running the full suite of Python >> string benchmarks: see the stringbench benchmark in the Tools directory >> of source installations, or see here: >> >> http://hg.python.org/cpython/file/8ff2f4634ed8/Tools/stringbench > > http://hg.python.org/cpython/file/default/Tools/stringbench > > However, stringbench is not good tool to measure the effectiveness of > new string representation, because it focuses mainly on ASCII strings > and comparing strings with bytes. But it is a good place to start, so you can develop unicode benchmarks. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-09-02 11:58 -0700 |
| Message-ID | <mailman.102.1346612296.27098.python-list@python.org> |
| In reply to | #28251 |
Le dimanche 2 septembre 2012 11:07:35 UTC+2, Ian a écrit :
> On Sun, Sep 2, 2012 at 1:36 AM, <wxjmfauth@gmail.com> wrote:
>
> > I still remember my thoughts when I read the PEP 393
>
> > discussion: "this is not logical", "they do no understand
>
> > typography", "atomic character ???", ...
>
>
>
> That would indicate one of two possibilities. Either:
>
>
>
> 1) Everybody in the PEP 393 discussion except for you is clueless
>
> about how to implement a Unicode type; or
>
>
>
> 2) You are clueless about how to implement a Unicode type.
>
>
>
> Taking into account Occam's razor, and also that you seem to be unable
>
> or unwilling to offer a solid rationale for those thoughts, I have to
>
> say that I'm currently leaning toward the second possibility.
>
>
>
>
>
> > Real world exemples.
>
> >
>
> >>>> import libfrancais
>
> >>>> li = ['noël', 'noir', 'nœud', 'noduleux', \
>
> > ... 'noétique', 'noèse', 'noirâtre']
>
> >>>> r = libfrancais.sortfr(li)
>
> >>>> r
>
> > ['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir',
>
> > 'noirâtre']
>
>
>
> libfrancais does not appear to be publicly available. It's not listed
>
> in PyPI, and googling for "python libfrancais" turns up nothing
>
> relevant.
>
>
>
> Rewriting the example to use locale.strcoll instead:
>
>
>
> >>> li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']
>
> >>> import locale
>
> >>> locale.setlocale(locale.LC_ALL, 'French_France')
>
> 'French_France.1252'
>
> >>> import functools
>
> >>> sorted(li, key=functools.cmp_to_key(locale.strcoll))
>
> ['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir', 'noirâtre']
>
>
>
> # Python 3.2
>
> >>> import timeit
>
> >>> timeit.repeat("sorted(li, key=functools.cmp_to_key(locale.strcoll))", "import functools; import locale; li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']", number=10000)
>
> [0.5544277025009592, 0.5370117249557325, 0.5551836677925053]
>
>
>
> # Python 3.3
>
> >>> import timeit
>
> >>> timeit.repeat("sorted(li, key=functools.cmp_to_key(locale.strcoll))", "import functools; import locale; li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']", number=10000)
>
> [0.1421166788364303, 0.12389078130001963, 0.13184190553613462]
>
>
> As you can see, Python 3.3 is about 77% faster than Python 3.2 on this
>
> example. If this was intended to show that the Python 3.3 Unicode
>
> representation is a regression over the Python 3.2 implementation,
>
> then it's a complete failure as an example.
- Unfortunately, I got opposite and even much worst results on my win box,
considering
- libfrancais is one of my module and it does a little bit more than
the std sorting tools.
My rationale: very simple.
1) I never heard about something better than sticking with one
of the Unicode coding scheme. (genreral theory)
2) I am not at all convinced by the "new" Py 3.3 algorithm. I'm not the
only one guy, who noticed problems. Arguing, "it is fast enough", is not
a correct answer.
jmf
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2012-09-02 11:52 +0200 |
| Message-ID | <mailman.74.1346579541.27098.python-list@python.org> |
| In reply to | #28245 |
Ian Kelly wrote: > Rewriting the example to use locale.strcoll instead: >>>> sorted(li, key=functools.cmp_to_key(locale.strcoll)) There is also locale.strxfrm() which you can use directly: sorted(li, key=locale.strxfrm)
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2012-09-02 11:36 +0100 |
| Message-ID | <mailman.76.1346582082.27098.python-list@python.org> |
| In reply to | #28245 |
I've found the white paper which gives the technical basis for the claims made by jmf so thought I'd better share in order to explain his rationale. http://www.montypython.net/scripts/right-think.php -- Cheers. Mark Lawrence.
[toc] | [prev] | [next] | [standalone]
| From | Serhiy Storchaka <storchaka@gmail.com> |
|---|---|
| Date | 2012-09-02 15:00 +0300 |
| Message-ID | <mailman.83.1346587277.27098.python-list@python.org> |
| In reply to | #28245 |
On 02.09.12 12:52, Peter Otten wrote: > Ian Kelly wrote: > >> Rewriting the example to use locale.strcoll instead: > >>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll)) > > There is also locale.strxfrm() which you can use directly: > > sorted(li, key=locale.strxfrm) Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-09-02 22:39 -0700 |
| Message-ID | <b7514131-3162-4c6f-909c-52df5d666992@googlegroups.com> |
| In reply to | #28267 |
Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit : > On 02.09.12 12:52, Peter Otten wrote: > > > Ian Kelly wrote: > > > > > >> Rewriting the example to use locale.strcoll instead: > > > > > >>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll)) > > > > > > There is also locale.strxfrm() which you can use directly: > > > > > > sorted(li, key=locale.strxfrm) > > > > Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2. With a memory gain = 0 since my text contains non-latin-1 characters! jmf
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2012-09-03 07:11 +0100 |
| Message-ID | <mailman.127.1346652593.27098.python-list@python.org> |
| In reply to | #28337 |
On 03/09/2012 06:39, wxjmfauth@gmail.com wrote: > Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit : >> On 02.09.12 12:52, Peter Otten wrote: >> >>> Ian Kelly wrote: >> >>> >> >>>> Rewriting the example to use locale.strcoll instead: >> >>> >> >>>>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll)) >> >>> >> >>> There is also locale.strxfrm() which you can use directly: >> >>> >> >>> sorted(li, key=locale.strxfrm) >> >> >> >> Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2. > > With a memory gain = 0 since my text contains non-latin-1 characters! > > jmf > This is getting really funny. Do you make a living writing comedy for big film or TV studios? Your response to Steven D'Aprano's "That's six wins versus one loss." should be hilarious. Or do you not respond to fact based posts? -- Cheers. Mark Lawrence.
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2012-09-03 08:15 +0200 |
| Message-ID | <mailman.128.1346652940.27098.python-list@python.org> |
| In reply to | #28337 |
wxjmfauth@gmail.com wrote:
> Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit :
>> Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.
>
> With a memory gain = 0 since my text contains non-latin-1 characters!
I can't confirm this. At least users of wide builds will see a decrease in
memory use:
$ cat strxfrm_getsize.py
import locale
import sys
print("maxunicode:", sys.maxunicode)
locale.setlocale(locale.LC_ALL, "fr_FR.UTF-8")
words = [
'noël', 'noir', 'nœud', 'noduleux',
'noétique', 'noèse', 'noirâtre']
print("total size of original strings:",
sum(sys.getsizeof(s) for s in words))
print(
"total size of transformed strings:",
sum(sys.getsizeof(locale.strxfrm(s)) for s in words))
$ python3.2 strxfrm_getsize.py
maxunicode: 1114111
total size of original strings: 584
total size of transformed strings: 980
$ python3.3 strxfrm_getsize.py
maxunicode: 1114111
total size of original strings: 509
total size of transformed strings: 483
The situation is more complex than you suppose -- you need less dogma and
more experiments ;)
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2012-09-03 04:38 -0400 |
| Message-ID | <mailman.134.1346661523.27098.python-list@python.org> |
| In reply to | #28337 |
On 9/3/2012 2:15 AM, Peter Otten wrote: > At least users of wide builds will see a decrease in memory use: Everyone saves because everyone uses large parts of the stdlib. When 3.3 start up in a Windows console, there are 56 modules in sys.modules. With Idle, there are over 130. All the identifiers, all the global, local, and attribute names are present as ascii-only strings. Now multiply that by some reasonable average, keeping in mind that __builtins__ alone has 148 names. Former narrow build users gain less space but also gain the elimination of buggy behavior. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Serhiy Storchaka <storchaka@gmail.com> |
|---|---|
| Date | 2012-09-03 18:56 +0300 |
| Message-ID | <mailman.150.1346687830.27098.python-list@python.org> |
| In reply to | #28337 |
On 03.09.12 09:15, Peter Otten wrote: > wxjmfauth@gmail.com wrote: >> Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit : > >>> Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2. >> >> With a memory gain = 0 since my text contains non-latin-1 characters! > > I can't confirm this. At least users of wide builds will see a decrease in > memory use: And only users of wide builds will see a 20% decrease in speed for this data (with longer strings Python 3.3 will outstrip Python 3.2). This happens because of the inevitable transformation UCS2 -> wchar_t and wchar_t -> UCS2 on platform with 4-bytes wchar_t. On Windows there should be no slowing down.
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-09-02 22:39 -0700 |
| Message-ID | <mailman.126.1346650787.27098.python-list@python.org> |
| In reply to | #28267 |
Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit : > On 02.09.12 12:52, Peter Otten wrote: > > > Ian Kelly wrote: > > > > > >> Rewriting the example to use locale.strcoll instead: > > > > > >>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll)) > > > > > > There is also locale.strxfrm() which you can use directly: > > > > > > sorted(li, key=locale.strxfrm) > > > > Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2. With a memory gain = 0 since my text contains non-latin-1 characters! jmf
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2012-09-02 13:23 +0100 |
| Message-ID | <mailman.84.1346588596.27098.python-list@python.org> |
| In reply to | #28245 |
On 02/09/2012 13:00, Serhiy Storchaka wrote: > On 02.09.12 12:52, Peter Otten wrote: >> Ian Kelly wrote: >> >>> Rewriting the example to use locale.strcoll instead: >> >>>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll)) >> >> There is also locale.strxfrm() which you can use directly: >> >> sorted(li, key=locale.strxfrm) > > Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2. > > That's it then I'm giving up with Python. In future I'll be writing everything in machine code to ensure that I get the fastest possible run times. -- Cheers. Mark Lawrence.
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2012-09-02 08:35 -0400 |
| Message-ID | <roy-FC61B4.08351302092012@news.panix.com> |
| In reply to | #28268 |
In article <mailman.84.1346588596.27098.python-list@python.org>, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote: > On 02/09/2012 13:00, Serhiy Storchaka wrote: > > On 02.09.12 12:52, Peter Otten wrote: > >> Ian Kelly wrote: > >> > >>> Rewriting the example to use locale.strcoll instead: > >> > >>>>>> sorted(li, key=functools.cmp_to_key(locale.strcoll)) > >> > >> There is also locale.strxfrm() which you can use directly: > >> > >> sorted(li, key=locale.strxfrm) > > > > Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2. > > > > > > That's it then I'm giving up with Python. In future I'll be writing > everything in machine code to ensure that I get the fastest possible run > times. Feh. You software guys are always too willing to sacrifice performance for convenience. If you really want speed, grab yourself a handful of chips and a soldering iron.
[toc] | [prev] | [next] | [standalone]
Page 3 of 5 — ← Prev page 1 2 [3] 4 5 Next page →
Back to top | Article view | comp.lang.python
csiph-web