Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #50503 > unrolled thread
| Started by | Devyn Collier Johnson <devyncjohnson@gmail.com> |
|---|---|
| First post | 2013-07-11 19:44 -0400 |
| Last post | 2013-07-18 13:17 -0700 |
| Articles | 20 on this page of 136 — 25 participants |
Back to article view | Back to comp.lang.python
RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-11 19:44 -0400
Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-12 02:23 -0700
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-12 19:27 +1000
Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-12 10:39 +0100
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-12 19:40 +1000
Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-12 06:45 -0400
Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-12 16:59 +0100
Re: RE Module Performance Peter Otten <__peter__@web.de> - 2013-07-12 18:15 +0200
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-13 02:21 +1000
Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-12 13:58 -0400
Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-13 05:37 +0000
Re: RE Module Performance 88888 Dihedral <dihedral88888@gmail.com> - 2013-07-14 11:17 -0700
Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-15 06:06 -0400
Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-15 12:36 +0000
Dihedral Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-15 08:52 -0400
Re: Dihedral Joel Goldstick <joel.goldstick@gmail.com> - 2013-07-15 09:03 -0400
Re: Dihedral Wayne Werner <wayne@waynewerner.com> - 2013-07-15 17:43 -0500
Re: Dihedral Fábio Santos <fabiosantosart@gmail.com> - 2013-07-15 23:54 +0100
Re: Dihedral Chris Angelico <rosuav@gmail.com> - 2013-07-16 08:59 +1000
Re: Dihedral Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-16 16:06 +1000
Re: Dihedral Stefan Behnel <stefan_ml@behnel.de> - 2013-07-24 20:08 +0200
Re: Dihedral Chris Angelico <rosuav@gmail.com> - 2013-07-25 04:23 +1000
Re: Dihedral Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-07-24 20:15 -0400
Re: RE Module Performance Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-13 08:16 +1000
Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-12 17:13 -0600
Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-24 06:40 -0700
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-24 23:48 +1000
Re: RE Module Performance David Hutto <dwightdhutto@gmail.com> - 2013-07-24 10:17 -0400
Re: RE Module Performance David Hutto <dwightdhutto@gmail.com> - 2013-07-24 10:19 -0400
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 00:34 +1000
Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 07:02 +0000
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 17:39 +1000
Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-24 08:47 -0600
Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-25 02:27 -0700
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 20:14 +1000
Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-25 12:07 -0700
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-26 05:18 +1000
RE: RE Module Performance "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2013-07-25 19:30 +0000
Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-25 21:06 -0600
Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-24 09:00 -0600
Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 05:56 +0000
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 00:56 +1000
Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-24 13:52 -0400
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 04:15 +1000
Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 07:15 +0000
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 17:58 +1000
Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 09:22 +0000
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 20:07 +1000
Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-24 18:09 -0400
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 08:19 +1000
Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-24 16:59 -0600
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 09:24 +1000
Re: RE Module Performance Serhiy Storchaka <storchaka@gmail.com> - 2013-07-25 08:49 +0300
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 15:58 +1000
Re: RE Module Performance Jeremy Sanders <jeremy@jeremysanders.net> - 2013-07-25 14:36 +0100
Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 15:26 +0000
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-26 01:36 +1000
Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 17:18 +0000
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-26 03:27 +1000
Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-25 15:45 -0500
Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-26 02:48 +0000
Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-25 21:20 -0600
Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 06:36 -0700
Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 08:46 -0700
Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-27 06:28 +0000
Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-27 03:37 +0000
Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-26 22:12 -0600
Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-27 05:04 +0000
Re: RE Module Performance Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-07-27 12:13 -0400
Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 06:19 -0700
Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-25 21:09 -0600
Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 06:21 -0700
Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-26 20:05 -0600
Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-27 11:21 -0700
Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-27 21:53 -0600
Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-28 11:13 -0700
Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-28 20:04 +0100
Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-28 12:30 -0700
Re: RE Module Performance Lele Gaifax <lele@metapensiero.it> - 2013-07-28 22:45 +0200
Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 22:01 +0200
Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-30 07:01 -0700
Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-30 16:38 +0200
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-30 15:45 +0100
Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-30 17:13 +0100
Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-30 18:39 +0200
Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-30 18:14 +0100
Re: RE Module Performance Neil Hodgson <nhodgson@iinet.net.au> - 2013-07-31 13:09 +1000
Re: RE Module Performance Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-31 03:27 +1000
Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-30 18:40 +0100
Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-30 20:19 +0200
Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-30 12:09 -0700
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-30 21:04 +0100
Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-30 21:54 -0600
Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-31 05:45 +0000
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-31 08:17 +0100
Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-31 13:15 -0700
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-31 21:41 +0100
Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-31 10:11 +0200
Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-31 01:32 -0700
Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-31 10:59 +0200
Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-31 08:44 -0600
Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-30 17:05 -0400
Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-30 21:30 -0600
Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-31 09:23 +0200
Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-31 08:27 -0600
Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 10:45 +0200
FSR and unicode compliance - was Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-28 09:52 -0600
Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-28 12:23 -0700
Re: FSR and unicode compliance - was Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-28 20:44 +0100
Re: FSR and unicode compliance - was Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 21:55 +0200
Re: FSR and unicode compliance - was Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-28 20:52 +0000
Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 04:43 -0700
Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-29 12:57 +0100
Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 05:56 -0700
Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 07:20 -0700
Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-29 15:49 +0100
Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 09:31 -0700
Re: FSR and unicode compliance - was Re: RE Module Performance Heiko Wundram <modelnine@modelnine.org> - 2013-07-29 14:06 +0200
Re: FSR and unicode compliance - was Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-29 08:43 -0400
Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-28 18:03 +0100
Re: FSR and unicode compliance - was Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-28 13:36 -0400
Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 06:36 -0700
Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-28 19:03 +0100
Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-28 19:19 +0100
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-28 19:29 +0100
Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-28 15:06 -0400
Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-28 23:14 +0100
Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 20:51 +0200
Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-29 00:07 +0100
Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-26 22:38 +0200
Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-25 09:44 -0400
Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-25 15:53 -0500
Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-13 00:16 +0100
Re: RE Module Performance Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-14 05:34 +1000
Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-16 06:30 -0400
Re: RE Module Performance 88888 Dihedral <dihedral88888@gmail.com> - 2013-07-18 13:17 -0700
Page 6 of 7 — ← Prev page 1 2 3 4 5 [6] 7 Next page →
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2013-07-31 08:44 -0600 |
| Message-ID | <mailman.31.1375281891.1251.python-list@python.org> |
| In reply to | #51632 |
On 07/31/2013 02:32 AM, wxjmfauth@gmail.com wrote: > Unicode/utf* Why do you keep using the terms "utf" and "Unicode" interchangeably?
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-07-30 17:05 -0400 |
| Message-ID | <mailman.5341.1375218369.3114.python-list@python.org> |
| In reply to | #51558 |
On 7/30/2013 1:40 PM, Joshua Landau wrote: > Additionally, who says a language couldn't use, say, B-Trees for all of > its list-like types, including strings? Tk apparently uses a B-tree in its text widget. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2013-07-30 21:30 -0600 |
| Message-ID | <mailman.5348.1375241427.3114.python-list@python.org> |
| In reply to | #51558 |
On 07/30/2013 12:19 PM, Antoon Pardon wrote: > So? Why are you making this a point of discussion? I was not aware that > the pro and cons of various editor buffer implemantations was relevant > to the point I was trying to make. I for one found it very interesting. In fact this thread caused me to wonder how one actually does create an efficient editor. Off the original topic true, but still very interesting.
[toc] | [prev] | [next] | [standalone]
| From | Antoon Pardon <antoon.pardon@rece.vub.ac.be> |
|---|---|
| Date | 2013-07-31 09:23 +0200 |
| Message-ID | <mailman.5356.1375255416.3114.python-list@python.org> |
| In reply to | #51558 |
Op 31-07-13 05:30, Michael Torrie schreef: > On 07/30/2013 12:19 PM, Antoon Pardon wrote: >> So? Why are you making this a point of discussion? I was not aware that >> the pro and cons of various editor buffer implemantations was relevant >> to the point I was trying to make. > > I for one found it very interesting. In fact this thread caused me to > wonder how one actually does create an efficient editor. Off the > original topic true, but still very interesting. > Yes, it can be interesting. But I really think if that is what you want to discuss, it deserves its own subject thread. -- Antoon Pardon
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2013-07-31 08:27 -0600 |
| Message-ID | <mailman.29.1375280875.1251.python-list@python.org> |
| In reply to | #51558 |
On 07/31/2013 01:23 AM, Antoon Pardon wrote: > Op 31-07-13 05:30, Michael Torrie schreef: >> On 07/30/2013 12:19 PM, Antoon Pardon wrote: >>> So? Why are you making this a point of discussion? I was not aware that >>> the pro and cons of various editor buffer implemantations was relevant >>> to the point I was trying to make. >> >> I for one found it very interesting. In fact this thread caused me to >> wonder how one actually does create an efficient editor. Off the >> original topic true, but still very interesting. >> > > Yes, it can be interesting. But I really think if that is what you want > to discuss, it deserves its own subject thread. Subject lines can and should be changed to reflect the ebbs and flows of the discussion. In fact this thread's subject should have been changed a long time ago since the original topic was RE module performance!
[toc] | [prev] | [next] | [standalone]
| From | Antoon Pardon <antoon.pardon@rece.vub.ac.be> |
|---|---|
| Date | 2013-07-28 10:45 +0200 |
| Message-ID | <mailman.5190.1375001194.3114.python-list@python.org> |
| In reply to | #51340 |
Op 27-07-13 20:21, wxjmfauth@gmail.com schreef:
> Quickly. sys.getsizeof() at the light of what I explained.
>
> 1) As this FSR works with multiple encoding, it has to keep
> track of the encoding. it puts is in the overhead of str
> class (overhead = real overhead + encoding). In such
> a absurd way, that a
>
>>>> sys.getsizeof('€')
> 40
>
> needs 14 bytes more than a
>
>>>> sys.getsizeof('z')
> 26
>
> You may vary the length of the str. The problem is
> still here. Not bad for a coding scheme.
>
> 2) Take a look at this. Get rid of the overhead.
>
>>>> sys.getsizeof('b'*1000000 + 'c')
> 1000026
>>>> sys.getsizeof('b'*1000000 + '€')
> 2000040
>
> What does it mean? It means that Python has to
> reencode a str every time it is necessary because
> it works with multiple codings.
So? The same effect can be seen with other datatypes.
>>> nr = 32767
>>> sys.getsizeof(nr)
14
>>> nr += 1
>>> sys.getsizeof(nr)
16
>
> This FSR is not even a copy of the utf-8.
>>>> len(('b'*1000000 + '€').encode('utf-8'))
> 1000003
Why should it be? Why should a unicode string be a copy
of its utf-8 encoding? That makes as much sense as expecting
that a number would be a copy of its string reprensentation.
>
> utf-8 or any (utf) never need and never spend their time
> in reencoding.
So? That python sometimes needs to do some kind of background
processing is not a problem, whether it is garbage collection,
allocating more memory, shufling around data blocks or reencoding a
string, that doesn't matter. If you've got a real world example where
one of those things noticeably slows your program down or makes the
program behave faulty then you have something that is worthy of
attention.
Until then you are merely harboring a pet peeve.
--
Antoon Pardon
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2013-07-28 09:52 -0600 |
| Subject | FSR and unicode compliance - was Re: RE Module Performance |
| Message-ID | <mailman.5191.1375026785.3114.python-list@python.org> |
| In reply to | #51340 |
On 07/27/2013 12:21 PM, wxjmfauth@gmail.com wrote: > Good point. FSR, nice tool for those who wish to teach > Unicode. It is not every day, one has such an opportunity. I had a long e-mail composed, but decided to chop it down, but still too long. so I ditched a lot of the context, which jmf also seems to do. Apologies. 1. FSR *is* UTF-32 so it is as unicode compliant as UTF-32, since UTF-32 is an official encoding. FSR only differs from UTF-32 in that the padding zeros are stripped off such that it is stored in the most compact form that can handle all the characters in string, which is always known at string creation time. Now you can argue many things, but to say FSR is not unicode compliant is quite a stretch! What unicode entities or characters cannot be stored in strings using FSR? What sequences of bytes in FSR result in invalid Unicode entities? 2. strings in Python *never change*. They are immutable. The + operator always copies strings character by character into a new string object, even if Python had used UTF-8 internally. If you're doing a lot of string concatenations, perhaps you're using the wrong data type. A byte buffer might be better for you, where you can stuff utf-8 sequences into it to your heart's content. 3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that slicing a string would be very very slow, and that's unacceptable for the use cases of python strings. I'm assuming you understand big O notation, as you talk of experience in many languages over the years. FSR and UTF-32 both are O(1) for slicing and lookups. UTF-8, 16 and any variable-width encoding are always O(n). A lot slower! 4. Unicode is, well, unicode. You seem to hop all over the place from talking about code points to bytes to bits, using them all interchangeably. And now you seem to be claiming that a particular byte encoding standard is by definition unicode (UTF-8). Or at least that's how it sounds. And also claim FSR is not compliant with unicode standards, which appears to me to be completely false. Is my understanding of these things wrong?
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2013-07-28 12:23 -0700 |
| Subject | Re: FSR and unicode compliance - was Re: RE Module Performance |
| Message-ID | <c5eed93b-bfa1-44fe-9a8f-67a7d9380b20@googlegroups.com> |
| In reply to | #51386 |
Le dimanche 28 juillet 2013 17:52:47 UTC+2, Michael Torrie a écrit :
> On 07/27/2013 12:21 PM, wxjmfauth@gmail.com wrote:
>
> > Good point. FSR, nice tool for those who wish to teach
>
> > Unicode. It is not every day, one has such an opportunity.
>
>
>
> I had a long e-mail composed, but decided to chop it down, but still too
>
> long. so I ditched a lot of the context, which jmf also seems to do.
>
> Apologies.
>
>
>
> 1. FSR *is* UTF-32 so it is as unicode compliant as UTF-32, since UTF-32
>
> is an official encoding. FSR only differs from UTF-32 in that the
>
> padding zeros are stripped off such that it is stored in the most
>
> compact form that can handle all the characters in string, which is
>
> always known at string creation time. Now you can argue many things,
>
> but to say FSR is not unicode compliant is quite a stretch! What
>
> unicode entities or characters cannot be stored in strings using FSR?
>
> What sequences of bytes in FSR result in invalid Unicode entities?
>
>
>
> 2. strings in Python *never change*. They are immutable. The +
>
> operator always copies strings character by character into a new string
>
> object, even if Python had used UTF-8 internally. If you're doing a lot
>
> of string concatenations, perhaps you're using the wrong data type. A
>
> byte buffer might be better for you, where you can stuff utf-8 sequences
>
> into it to your heart's content.
>
>
>
> 3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that
>
> slicing a string would be very very slow, and that's unacceptable for
>
> the use cases of python strings. I'm assuming you understand big O
>
> notation, as you talk of experience in many languages over the years.
>
> FSR and UTF-32 both are O(1) for slicing and lookups. UTF-8, 16 and any
>
> variable-width encoding are always O(n). A lot slower!
>
>
>
> 4. Unicode is, well, unicode. You seem to hop all over the place from
>
> talking about code points to bytes to bits, using them all
>
> interchangeably. And now you seem to be claiming that a particular byte
>
> encoding standard is by definition unicode (UTF-8). Or at least that's
>
> how it sounds. And also claim FSR is not compliant with unicode
>
> standards, which appears to me to be completely false.
>
>
>
> Is my understanding of these things wrong?
------
Compare these (a BDFL exemple, where I'using a non-ascii char)
Py 3.2 (narrow build)
>>> timeit.timeit("a = 'hundred'; 'x' in a")
0.09897159682121348
>>> timeit.timeit("a = 'hundre€'; 'x' in a")
0.09079501961732461
>>> sys.getsizeof('d')
32
>>> sys.getsizeof('€')
32
>>> sys.getsizeof('dd')
34
>>> sys.getsizeof('d€')
34
Py3.3
>>> timeit.timeit("a = 'hundred'; 'x' in a")
0.12183182740848858
>>> timeit.timeit("a = 'hundre€'; 'x' in a")
0.2365732969632326
>>> sys.getsizeof('d')
26
>>> sys.getsizeof('€')
40
>>> sys.getsizeof('dd')
27
>>> sys.getsizeof('d€')
42
Tell me which one seems to be more "unicode compliant"?
The goal of Unicode is to handle every char "equaly".
Now, the problem: memory. Do not forget that à la "FSR"
mechanism for a non-ascii user is *irrelevant*. As
soon as one uses one single non-ascii, your ascii feature
is lost. (That why we have all these dedicated coding
schemes, utfs included).
>>> sys.getsizeof('abc' * 1000 + 'z')
3026
>>> sys.getsizeof('abc' * 1000 + '\U00010010')
12044
A bit secret. The larger a repertoire of characters
is, the more bits you needs.
Secret #2. You can not escape from this.
jmf
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2013-07-28 20:44 +0100 |
| Subject | Re: FSR and unicode compliance - was Re: RE Module Performance |
| Message-ID | <mailman.5203.1375040663.3114.python-list@python.org> |
| In reply to | #51401 |
On 28/07/2013 20:23, wxjmfauth@gmail.com wrote:
[snip]
>
> Compare these (a BDFL exemple, where I'using a non-ascii char)
>
> Py 3.2 (narrow build)
>
Why are you using a narrow build of Python 3.2? It doesn't treat all
codepoints equally (those outside the BMP can't be stored in one code
unit) and, therefore, it isn't "Unicode compliant"!
>>>> timeit.timeit("a = 'hundred'; 'x' in a")
> 0.09897159682121348
>>>> timeit.timeit("a = 'hundre€'; 'x' in a")
> 0.09079501961732461
>>>> sys.getsizeof('d')
> 32
>>>> sys.getsizeof('€')
> 32
>>>> sys.getsizeof('dd')
> 34
>>>> sys.getsizeof('d€')
> 34
>
>
> Py3.3
>
>>>> timeit.timeit("a = 'hundred'; 'x' in a")
> 0.12183182740848858
>>>> timeit.timeit("a = 'hundre€'; 'x' in a")
> 0.2365732969632326
>>>> sys.getsizeof('d')
> 26
>>>> sys.getsizeof('€')
> 40
>>>> sys.getsizeof('dd')
> 27
>>>> sys.getsizeof('d€')
> 42
>
> Tell me which one seems to be more "unicode compliant"?
> The goal of Unicode is to handle every char "equaly".
>
> Now, the problem: memory. Do not forget that à la "FSR"
> mechanism for a non-ascii user is *irrelevant*. As
> soon as one uses one single non-ascii, your ascii feature
> is lost. (That why we have all these dedicated coding
> schemes, utfs included).
>
>>>> sys.getsizeof('abc' * 1000 + 'z')
> 3026
>>>> sys.getsizeof('abc' * 1000 + '\U00010010')
> 12044
>
> A bit secret. The larger a repertoire of characters
> is, the more bits you needs.
> Secret #2. You can not escape from this.
>
>
> jmf
>
[toc] | [prev] | [next] | [standalone]
| From | Antoon Pardon <antoon.pardon@rece.vub.ac.be> |
|---|---|
| Date | 2013-07-28 21:55 +0200 |
| Subject | Re: FSR and unicode compliance - was Re: RE Module Performance |
| Message-ID | <mailman.5204.1375041352.3114.python-list@python.org> |
| In reply to | #51401 |
Op 28-07-13 21:23, wxjmfauth@gmail.com schreef:
> Le dimanche 28 juillet 2013 17:52:47 UTC+2, Michael Torrie a écrit :
>> On 07/27/2013 12:21 PM, wxjmfauth@gmail.com wrote:
>>
>>> Good point. FSR, nice tool for those who wish to teach
>>
>>> Unicode. It is not every day, one has such an opportunity.
>>
>>
>>
>> I had a long e-mail composed, but decided to chop it down, but still too
>>
>> long. so I ditched a lot of the context, which jmf also seems to do.
>>
>> Apologies.
>>
>>
>>
>> 1. FSR *is* UTF-32 so it is as unicode compliant as UTF-32, since UTF-32
>>
>> is an official encoding. FSR only differs from UTF-32 in that the
>>
>> padding zeros are stripped off such that it is stored in the most
>>
>> compact form that can handle all the characters in string, which is
>>
>> always known at string creation time. Now you can argue many things,
>>
>> but to say FSR is not unicode compliant is quite a stretch! What
>>
>> unicode entities or characters cannot be stored in strings using FSR?
>>
>> What sequences of bytes in FSR result in invalid Unicode entities?
>>
>>
>>
>> 2. strings in Python *never change*. They are immutable. The +
>>
>> operator always copies strings character by character into a new string
>>
>> object, even if Python had used UTF-8 internally. If you're doing a lot
>>
>> of string concatenations, perhaps you're using the wrong data type. A
>>
>> byte buffer might be better for you, where you can stuff utf-8 sequences
>>
>> into it to your heart's content.
>>
>>
>>
>> 3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that
>>
>> slicing a string would be very very slow, and that's unacceptable for
>>
>> the use cases of python strings. I'm assuming you understand big O
>>
>> notation, as you talk of experience in many languages over the years.
>>
>> FSR and UTF-32 both are O(1) for slicing and lookups. UTF-8, 16 and any
>>
>> variable-width encoding are always O(n). A lot slower!
>>
>>
>>
>> 4. Unicode is, well, unicode. You seem to hop all over the place from
>>
>> talking about code points to bytes to bits, using them all
>>
>> interchangeably. And now you seem to be claiming that a particular byte
>>
>> encoding standard is by definition unicode (UTF-8). Or at least that's
>>
>> how it sounds. And also claim FSR is not compliant with unicode
>>
>> standards, which appears to me to be completely false.
>>
>>
>>
>> Is my understanding of these things wrong?
>
> ------
>
> Compare these (a BDFL exemple, where I'using a non-ascii char)
>
> Py 3.2 (narrow build)
>
>>>> timeit.timeit("a = 'hundred'; 'x' in a")
> 0.09897159682121348
>>>> timeit.timeit("a = 'hundre€'; 'x' in a")
> 0.09079501961732461
>>>> sys.getsizeof('d')
> 32
>>>> sys.getsizeof('€')
> 32
>>>> sys.getsizeof('dd')
> 34
>>>> sys.getsizeof('d€')
> 34
>
>
> Py3.3
>
>>>> timeit.timeit("a = 'hundred'; 'x' in a")
> 0.12183182740848858
>>>> timeit.timeit("a = 'hundre€'; 'x' in a")
> 0.2365732969632326
>>>> sys.getsizeof('d')
> 26
>>>> sys.getsizeof('€')
> 40
>>>> sys.getsizeof('dd')
> 27
>>>> sys.getsizeof('d€')
> 42
>
> Tell me which one seems to be more "unicode compliant"?
Cant tell, you give no relevant information on which one can decide
this question.
> The goal of Unicode is to handle every char "equaly".
Not to this kind of detail, which is looking at irrelevant
implementation details.
> Now, the problem: memory. Do not forget that à la "FSR"
> mechanism for a non-ascii user is *irrelevant*. As
> soon as one uses one single non-ascii, your ascii feature
> is lost. (That why we have all these dedicated coding
> schemes, utfs included).
So? Why should that trouble me? As far as I understand
whether I have an ascii string or not is totally irrelevant
to the application programmer. Within the application I
just process strings and let the programming environment
keep track of these details in a transparant way unless
you start looking at things like getsizeof, which gives
you implementation details that are mostly irrelevant
in deciding whether the behaviour is compliant or not.
>>>> sys.getsizeof('abc' * 1000 + 'z')
> 3026
>>>> sys.getsizeof('abc' * 1000 + '\U00010010')
> 12044
>
> A bit secret. The larger a repertoire of characters
> is, the more bits you needs.
> Secret #2. You can not escape from this.
And totally unimportant for deciding complyance.
--
Antoon Pardon
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-07-28 20:52 +0000 |
| Subject | Re: FSR and unicode compliance - was Re: RE Module Performance |
| Message-ID | <51f5847f$0$29971$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #51401 |
On Sun, 28 Jul 2013 12:23:04 -0700, wxjmfauth wrote: > Do not forget that à la "FSR" mechanism for a non-ascii user is > *irrelevant*. You have been told repeatedly, Python's internals are *full* of ASCII- only strings. py> dir(list) ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'] There's 45 ASCII-only strings right there, in only one built-in type, out of dozens. There are dozens, hundreds of ASCII-only strings in Python: builtin functions and classes, attributes, exceptions, internal attributes, variable names, and so on. You already know this, and yet you persist in repeating nonsense. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2013-07-29 04:43 -0700 |
| Subject | Re: FSR and unicode compliance - was Re: RE Module Performance |
| Message-ID | <dac88e10-1c4a-411d-8c38-bcbc00c64196@googlegroups.com> |
| In reply to | #51408 |
Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit :
> On Sun, 28 Jul 2013 12:23:04 -0700, wxjmfauth wrote:
>
>
>
> > Do not forget that à la "FSR" mechanism for a non-ascii user is
>
> > *irrelevant*.
>
>
>
> You have been told repeatedly, Python's internals are *full* of ASCII-
>
> only strings.
>
>
>
> py> dir(list)
>
> ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__',
>
> '__dir__', '__doc__', '__eq__', '__format__', '__ge__',
>
> '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__',
>
> '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__',
>
> '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>
> '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__',
>
> '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy',
>
> 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
>
>
>
> There's 45 ASCII-only strings right there, in only one built-in type, out
>
> of dozens. There are dozens, hundreds of ASCII-only strings in Python:
>
> builtin functions and classes, attributes, exceptions, internal
>
> attributes, variable names, and so on.
>
>
>
> You already know this, and yet you persist in repeating nonsense.
>
>
>
>
>
> --
>
> Steven
3.2
>>> timeit.timeit("r = dir(list)")
22.300465007102908
3.3
>>> timeit.timeit("r = dir(list)")
27.13981129541519
For the record, I do not put your example to contradict
you. I was expecting such a result even before testing.
Now, if you do not understand why, you do not understand.
There nothing wrong.
jmf
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-07-29 12:57 +0100 |
| Subject | Re: FSR and unicode compliance - was Re: RE Module Performance |
| Message-ID | <mailman.5227.1375099076.3114.python-list@python.org> |
| In reply to | #51434 |
On Mon, Jul 29, 2013 at 12:43 PM, <wxjmfauth@gmail.com> wrote:
> Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit :
> 3.2
>>>> timeit.timeit("r = dir(list)")
> 22.300465007102908
>
> 3.3
>>>> timeit.timeit("r = dir(list)")
> 27.13981129541519
3.2:
>>> len(dir(list))
42
3.3:
>>> len(dir(list))
45
Wonder if that might maybe have an impact on the timings.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2013-07-29 05:56 -0700 |
| Subject | Re: FSR and unicode compliance - was Re: RE Module Performance |
| Message-ID | <b17cef2f-7282-4f34-9e6d-868592e3783a@googlegroups.com> |
| In reply to | #51437 |
Le lundi 29 juillet 2013 13:57:47 UTC+2, Chris Angelico a écrit :
> On Mon, Jul 29, 2013 at 12:43 PM, <wxjmfauth@gmail.com> wrote:
>
> > Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit :
>
> > 3.2
>
> >>>> timeit.timeit("r = dir(list)")
>
> > 22.300465007102908
>
> >
>
> > 3.3
>
> >>>> timeit.timeit("r = dir(list)")
>
> > 27.13981129541519
>
>
>
> 3.2:
>
> >>> len(dir(list))
>
> 42
>
>
>
> 3.3:
>
> >>> len(dir(list))
>
> 45
>
>
>
> Wonder if that might maybe have an impact on the timings.
>
>
>
> ChrisA
Good point. I stupidely forgot this.
jmf
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2013-07-29 07:20 -0700 |
| Subject | Re: FSR and unicode compliance - was Re: RE Module Performance |
| Message-ID | <8babe5b2-779d-49c3-b7e8-addccbadd660@googlegroups.com> |
| In reply to | #51437 |
Le lundi 29 juillet 2013 13:57:47 UTC+2, Chris Angelico a écrit :
> On Mon, Jul 29, 2013 at 12:43 PM, <wxjmfauth@gmail.com> wrote:
>
> > Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit :
>
> > 3.2
>
> >>>> timeit.timeit("r = dir(list)")
>
> > 22.300465007102908
>
> >
>
> > 3.3
>
> >>>> timeit.timeit("r = dir(list)")
>
> > 27.13981129541519
>
>
>
> 3.2:
>
> >>> len(dir(list))
>
> 42
>
>
>
> 3.3:
>
> >>> len(dir(list))
>
> 45
>
>
>
> Wonder if that might maybe have an impact on the timings.
>
>
>
> ChrisA
--------
class C:
a = 'abc'
b = 'def'
def aaa(self):
pass
def bbb(self):
pass
def ccc(self):
pass
if __name__ == '__main__':
import timeit
print(timeit.timeit("r = dir(C)", setup="from __main__ import C"))
>c:\python32\pythonw -u "timitmod.py"
15.258061416225663
>Exit code: 0
>c:\Python33\pythonw -u "timitmod.py"
17.052203122286194
>Exit code: 0
jmf
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-07-29 15:49 +0100 |
| Subject | Re: FSR and unicode compliance - was Re: RE Module Performance |
| Message-ID | <mailman.5236.1375109382.3114.python-list@python.org> |
| In reply to | #51446 |
On Mon, Jul 29, 2013 at 3:20 PM, <wxjmfauth@gmail.com> wrote: >>c:\python32\pythonw -u "timitmod.py" > 15.258061416225663 >>Exit code: 0 >>c:\Python33\pythonw -u "timitmod.py" > 17.052203122286194 >>Exit code: 0 >>> len(dir(C)) Did you even think to check that before you posted timings? ChrisA
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2013-07-29 09:31 -0700 |
| Subject | Re: FSR and unicode compliance - was Re: RE Module Performance |
| Message-ID | <6ebcd6c1-ea5e-492b-a3ea-0541fc1d63cb@googlegroups.com> |
| In reply to | #51448 |
Le lundi 29 juillet 2013 16:49:34 UTC+2, Chris Angelico a écrit : > On Mon, Jul 29, 2013 at 3:20 PM, <wxjmfauth@gmail.com> wrote: > > >>c:\python32\pythonw -u "timitmod.py" > > > 15.258061416225663 > > >>Exit code: 0 > > >>c:\Python33\pythonw -u "timitmod.py" > > > 17.052203122286194 > > >>Exit code: 0 > > > > >>> len(dir(C)) > > > > Did you even think to check that before you posted timings? > > > > ChrisA Boum, no! the diff is one. I have however noticed, I can increase the number of attributes (ascii), the timing differences is very well marked. I do not draw conclusions. Such a factor for one unit.... jmf
[toc] | [prev] | [next] | [standalone]
| From | Heiko Wundram <modelnine@modelnine.org> |
|---|---|
| Date | 2013-07-29 14:06 +0200 |
| Subject | Re: FSR and unicode compliance - was Re: RE Module Performance |
| Message-ID | <mailman.5229.1375100008.3114.python-list@python.org> |
| In reply to | #51434 |
Am 29.07.2013 13:43, schrieb wxjmfauth@gmail.com:
> 3.2
>>>> timeit.timeit("r = dir(list)")
> 22.300465007102908
>
> 3.3
>>>> timeit.timeit("r = dir(list)")
> 27.13981129541519
>
> For the record, I do not put your example to contradict
> you. I was expecting such a result even before testing.
>
> Now, if you do not understand why, you do not understand.
> There nothing wrong.
Please give a single *proof* (not your gut feeling) that this is related
to the FSR, and not rather due to other side-effects such as changes in
how dir() works or (as Chris pointed out) due to more members on the
list type in 3.3. If you can't or won't give that proof, there's no
sense in continuing the discussion.
--
--- Heiko.
[toc] | [prev] | [next] | [standalone]
| From | Devyn Collier Johnson <devyncjohnson@gmail.com> |
|---|---|
| Date | 2013-07-29 08:43 -0400 |
| Subject | Re: FSR and unicode compliance - was Re: RE Module Performance |
| Message-ID | <mailman.5235.1375102214.3114.python-list@python.org> |
| In reply to | #51434 |
On 07/29/2013 08:06 AM, Heiko Wundram wrote:
> Am 29.07.2013 13:43, schrieb wxjmfauth@gmail.com:
>> 3.2
>>>>> timeit.timeit("r = dir(list)")
>> 22.300465007102908
>>
>> 3.3
>>>>> timeit.timeit("r = dir(list)")
>> 27.13981129541519
>>
>> For the record, I do not put your example to contradict
>> you. I was expecting such a result even before testing.
>>
>> Now, if you do not understand why, you do not understand.
>> There nothing wrong.
>
> Please give a single *proof* (not your gut feeling) that this is
> related to the FSR, and not rather due to other side-effects such as
> changes in how dir() works or (as Chris pointed out) due to more
> members on the list type in 3.3. If you can't or won't give that
> proof, there's no sense in continuing the discussion.
>
Wow! The RE Module thread I created is evolving into Unicode topics.
That thread grew up so fast!
DCJ
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-07-28 18:03 +0100 |
| Subject | Re: FSR and unicode compliance - was Re: RE Module Performance |
| Message-ID | <mailman.5192.1375030999.3114.python-list@python.org> |
| In reply to | #51340 |
On Sun, Jul 28, 2013 at 4:52 PM, Michael Torrie <torriem@gmail.com> wrote: > Is my understanding of these things wrong? No, your understanding of those matters is fine. There's just one area you seem to be misunderstanding; you appear to think that jmf actually cares about logical argument. I gave up on that theory a long time ago, and now I respond for the benefit of those reading, rather than jmf himself. I've also given up on trying to figure out what he actually wants; the nearest I can come up with is that he's King Gama-esque - that he just wants to complain. ChrisA
[toc] | [prev] | [next] | [standalone]
Page 6 of 7 — ← Prev page 1 2 3 4 5 [6] 7 Next page →
Back to top | Article view | comp.lang.python
csiph-web