Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #62898 > unrolled thread
| Started by | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| First post | 2013-12-30 19:41 +0000 |
| Last post | 2013-12-30 20:25 -0800 |
| Articles | 20 on this page of 82 — 19 participants |
Back to article view | Back to comp.lang.python
Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-30 19:41 +0000
Re: Blog "about python 3" Steven D'Aprano <steve@pearwood.info> - 2013-12-30 20:49 +0000
Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-30 21:29 +0000
Re: Blog "about python 3" Ethan Furman <ethan@stoneleaf.us> - 2013-12-30 14:38 -0800
Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2013-12-31 12:09 +1100
Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-31 04:38 +0000
Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2013-12-31 15:44 +1100
Re: Blog "about python 3" Ethan Furman <ethan@stoneleaf.us> - 2013-12-30 20:33 -0800
Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-31 04:59 +0000
Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-31 08:22 +0000
Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-31 20:53 +1100
Re: Blog "about python 3" Antoine Pitrou <solipsis@pitrou.net> - 2013-12-31 14:13 +0000
Re: Blog "about python 3" Roy Smith <roy@panix.com> - 2013-12-31 10:41 -0500
Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-01 02:54 +1100
Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-31 15:55 +0000
Re: Blog "about python 3" Robin Becker <robin@reportlab.com> - 2014-01-02 17:36 +0000
Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-03 15:49 +1100
Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-03 04:01 -0500
Re: Blog "about python 3" wxjmfauth@gmail.com - 2014-01-03 02:10 -0800
Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-03 21:24 +1100
Re: Blog "about python 3" Ethan Furman <ethan@stoneleaf.us> - 2014-01-03 08:56 -0800
Re: Blog "about python 3" Robin Becker <robin@reportlab.com> - 2014-01-03 12:28 +0000
Re: Blog "about python 3" Roy Smith <roy@panix.com> - 2014-01-03 09:57 -0500
Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-04 02:32 +1100
Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-03 17:00 -0500
Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-04 04:04 +0000
Re: Blog "about python 3" Roy Smith <roy@panix.com> - 2014-01-04 08:55 -0500
Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-05 01:17 +1100
Re: Blog "about python 3" wxjmfauth@gmail.com - 2014-01-04 11:10 -0800
Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-04 17:46 -0500
Re: Blog "about python 3" wxjmfauth@gmail.com - 2014-01-05 06:23 -0800
Re: Blog "about python 3" Ned Batchelder <ned@nedbatchelder.com> - 2014-01-05 10:20 -0500
Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-05 17:14 -0500
Re: Blog "about python 3" wxjmfauth@gmail.com - 2014-01-07 05:34 -0800
Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-07 09:54 -0500
Re: Blog "about python 3" Tim Delaney <timothy.c.delaney@gmail.com> - 2014-01-08 09:38 +1100
Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-07 19:02 -0500
Re: Blog "about python 3" wxjmfauth@gmail.com - 2014-01-08 01:59 -0800
Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-08 14:26 -0500
Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-08 20:04 +0000
Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-05 17:48 -0500
Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-05 10:28 +1100
Re: Blog "about python 3" Ned Batchelder <ned@nedbatchelder.com> - 2014-01-04 12:51 -0500
Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-05 13:27 +1100
Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-05 13:32 +1100
Re: Blog "about python 3" MRAB <python@mrabarnett.plus.com> - 2014-01-05 02:41 +0000
Re: Blog "about python 3" Roy Smith <roy@panix.com> - 2014-01-04 22:20 -0500
Re: Blog "about python 3" Rustom Mody <rustompmody@gmail.com> - 2014-01-05 10:12 +0530
Re: Blog "about python 3" Roy Smith <roy@panix.com> - 2014-01-05 00:11 -0500
Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-05 17:28 +1100
Re: Blog "about python 3" Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-01-05 14:05 -0500
Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-05 15:01 +1100
Re: Blog "about python 3" Roy Smith <roy@panix.com> - 2014-01-05 11:34 -0500
Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-06 03:51 +1100
Re: Blog "about python 3" Roy Smith <roy@panix.com> - 2014-01-05 12:09 -0500
Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-06 11:42 +1100
Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-05 17:56 -0500
Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-06 10:59 +1100
Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-06 12:23 +1100
Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-06 12:54 +1100
Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-06 05:53 +0000
Re: Blog "about python 3" Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-01-05 00:00 -0800
Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-05 23:28 +1100
Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-05 23:48 +1100
Re: Blog "about python 3" Roy Smith <roy@panix.com> - 2014-01-05 11:10 -0500
Re: Blog "about python 3" Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-01-05 13:51 -0500
Re: Blog "about python 3" David Hutto <dwightdhutto@gmail.com> - 2014-01-02 13:25 -0500
Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-02 13:37 -0500
Re: Blog "about python 3" Antoine Pitrou <solipsis@pitrou.net> - 2014-01-02 23:57 +0000
Re: Blog "about python 3" Robin Becker <robin@reportlab.com> - 2014-01-03 10:32 +0000
Re: Blog "about python 3" Robin Becker <robin@reportlab.com> - 2014-01-03 11:14 +0000
Re: Blog "about python 3" wxjmfauth@gmail.com - 2014-01-04 05:52 -0800
Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-05 13:41 +1100
Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-05 13:54 +1100
Re: Blog "about python 3" wxjmfauth@gmail.com - 2014-01-05 02:39 -0800
Re: Blog "about python 3" Robin Becker <robin@reportlab.com> - 2014-01-03 11:37 +0000
Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-04 07:30 +0000
Re: Blog "about python 3" Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-01-05 13:14 +0100
Re: Blog "about python 3" Stefan Behnel <stefan_ml@behnel.de> - 2014-01-05 14:55 +0100
Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-05 13:10 +0000
Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-31 20:04 +1100
Re: Blog "about python 3" Devin Jeanpierre <jeanpierreda@gmail.com> - 2013-12-30 20:25 -0800
Page 2 of 5 — ← Prev page 1 [2] 3 4 5 Next page →
| From | Ethan Furman <ethan@stoneleaf.us> |
|---|---|
| Date | 2014-01-03 08:56 -0800 |
| Message-ID | <mailman.4863.1388769702.18130.python-list@python.org> |
| In reply to | #63047 |
On 01/03/2014 02:24 AM, Chris Angelico wrote: > > I worked that out with a sheet of paper and a pencil. The pencil was a > little help, but the paper was three sheets in the wind. Beautiful! -- ~Ethan~
[toc] | [prev] | [next] | [standalone]
| From | Robin Becker <robin@reportlab.com> |
|---|---|
| Date | 2014-01-03 12:28 +0000 |
| Message-ID | <mailman.4850.1388752146.18130.python-list@python.org> |
| In reply to | #63036 |
On 03/01/2014 09:01, Terry Reedy wrote: > There was more speedup in 3.3.2 and possibly even more in 3.3.3, so OP > should run the latter. python 3.3.3 is what I use on windows. As for astral / non-bmp etc etc that's almost irrelevant for the sort of tests we're doing which are mostly simple english text. -- Robin Becker
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2014-01-03 09:57 -0500 |
| Message-ID | <roy-5D9224.09574803012014@news.panix.com> |
| In reply to | #63055 |
In article <mailman.4850.1388752146.18130.python-list@python.org>, Robin Becker <robin@reportlab.com> wrote: > On 03/01/2014 09:01, Terry Reedy wrote: > > There was more speedup in 3.3.2 and possibly even more in 3.3.3, so OP > > should run the latter. > > python 3.3.3 is what I use on windows. As for astral / non-bmp etc etc that's > almost irrelevant for the sort of tests we're doing which are mostly simple > english text. The sad part is, if you're accepting any text from external sources, you need to be able to deal with astral. I was doing a project a while ago importing 20-something million records into a MySQL database. Little did I know that FOUR of those records contained astral characters (which MySQL, at least the version I was using, couldn't handle). My way of dealing with those records was to nuke them. Longer term we ended up switching to Postgress.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-01-04 02:32 +1100 |
| Message-ID | <mailman.4854.1388763458.18130.python-list@python.org> |
| In reply to | #63057 |
On Sat, Jan 4, 2014 at 1:57 AM, Roy Smith <roy@panix.com> wrote: > I was doing a project a while ago importing 20-something million records > into a MySQL database. Little did I know that FOUR of those records > contained astral characters (which MySQL, at least the version I was > using, couldn't handle). > > My way of dealing with those records was to nuke them. Longer term we > ended up switching to Postgress. Look! Postgres means you don't lose data!! Seriously though, that's a much better long-term solution than destroying data. But MySQL does support the full Unicode range - just not in its "UTF8" type. You have to specify "UTF8MB4" - that is, "maximum bytes 4" rather than the default of 3. According to [1], the UTF8MB4 encoding is stored as UTF-16, and UTF8 is stored as UCS-2. And according to [2], it's even possible to explicitly choose the mindblowing behaviour of UCS-2 for a data type that calls itself "UTF8", so that a vague theoretical subsequent version of MySQL might be able to make "UTF8" mean UTF-8, and people can choose to use the other alias. To my mind, this is a bug with backward-compatibility concerns. That means it can't be fixed in a point release. Fine. But the behaviour change is "this used to throw an error, now it works". Surely that can be fixed in the next release. Or surely a version or two of deprecating "UTF8" in favour of the two "MB?" types (and never ever returning "UTF8" from any query), followed by a reintroduction of "UTF8" as an alias for MB4, and the deprecation of MB3. Or am I spoiled by the quality of Python (and other) version numbering, where I can (largely) depend on functionality not changing in point releases? ChrisA [1] http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8mb4.html [2] http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8mb3.html
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2014-01-03 17:00 -0500 |
| Message-ID | <mailman.4870.1388786452.18130.python-list@python.org> |
| In reply to | #63036 |
On 1/3/2014 7:28 AM, Robin Becker wrote: > On 03/01/2014 09:01, Terry Reedy wrote: >> There was more speedup in 3.3.2 and possibly even more in 3.3.3, so OP >> should run the latter. > > python 3.3.3 is what I use on windows. As for astral / non-bmp etc etc > that's almost irrelevant for the sort of tests we're doing which are > mostly simple english text. If you do not test the cases where 2.7 is buggy and requires nasty workarounds, then I can understand why you do not so much appreciate 3.3 ;-). -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2014-01-04 04:04 +0000 |
| Message-ID | <mailman.4882.1388808283.18130.python-list@python.org> |
| In reply to | #63036 |
On 03/01/2014 22:00, Terry Reedy wrote: > On 1/3/2014 7:28 AM, Robin Becker wrote: >> On 03/01/2014 09:01, Terry Reedy wrote: >>> There was more speedup in 3.3.2 and possibly even more in 3.3.3, so OP >>> should run the latter. >> >> python 3.3.3 is what I use on windows. As for astral / non-bmp etc etc >> that's almost irrelevant for the sort of tests we're doing which are >> mostly simple english text. > > If you do not test the cases where 2.7 is buggy and requires nasty > workarounds, then I can understand why you do not so much appreciate 3.3 > ;-). > Are you crazy? Surely everybody prefers fast but incorrect code in preference to something that is correct but slow? Except that Python 3.3.3 is often faster. And always (to my knowledge) correct. Upper Class Twit of the Year anybody? :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2014-01-04 08:55 -0500 |
| Message-ID | <roy-1820F1.08551004012014@news.panix.com> |
| In reply to | #63104 |
In article <mailman.4882.1388808283.18130.python-list@python.org>, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote: > Surely everybody prefers fast but incorrect code in > preference to something that is correct but slow? I realize I'm taking this statement out of context, but yes, sometimes fast is more important than correct. Sometimes the other way around.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-01-05 01:17 +1100 |
| Message-ID | <mailman.4905.1388845063.18130.python-list@python.org> |
| In reply to | #63134 |
On Sun, Jan 5, 2014 at 12:55 AM, Roy Smith <roy@panix.com> wrote: > In article <mailman.4882.1388808283.18130.python-list@python.org>, > Mark Lawrence <breamoreboy@yahoo.co.uk> wrote: > >> Surely everybody prefers fast but incorrect code in >> preference to something that is correct but slow? > > I realize I'm taking this statement out of context, but yes, sometimes > fast is more important than correct. Sometimes the other way around. More usually, it's sometimes better to be really fast and mostly correct than really really slow and entirely correct. That's why we use IEEE floating point instead of Decimal most of the time. Though I'm glad that Python 3 now deems the default int type to be capable of representing arbitrary integers (instead of dropping out to a separate long type as Py2 did), I think it's possibly worth optimizing small integers to machine words - but mainly, the int type focuses on correctness above performance, because the cost is low compared to the benefit. With float, the cost of arbitrary precision is extremely high, and the benefit much lower. With Unicode, the cost of perfect support is normally seen to be a doubling of internal memory usage (UTF-16 vs UCS-4). Pike and Python decided that the cost could, instead, be a tiny measure of complexity and actually *less* memory usage (compared to UTF-16, when lots of identifiers are ASCII). It's a system that works only when strings are immutable, but works beautifully there. Fortunately Pike doesn't have any, and Python has only one, idiot like jmf who completely misunderstands what's going on and uses microbenchmarks to prove obscure points... and then uses nonsense to try to prove... uhh... actually I'm not even sure what, sometimes. I wouldn't dare try to read his posts except that my mind's already in a rather broken state, as a combination of programming and Alice in Wonderland. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2014-01-04 11:10 -0800 |
| Message-ID | <3519f85e-0909-4f5a-9a6e-09b6fd4c312d@googlegroups.com> |
| In reply to | #63135 |
Le samedi 4 janvier 2014 15:17:40 UTC+1, Chris Angelico a écrit : > On Sun, Jan 5, 2014 at 12:55 AM, Roy Smith <roy@panix.com> wrote: > > > In article <mailman.4882.1388808283.18130.python-list@python.org>, > > > Mark Lawrence <breamoreboy@yahoo.co.uk> wrote: > > > > > >> Surely everybody prefers fast but incorrect code in > > >> preference to something that is correct but slow? > > > > > > I realize I'm taking this statement out of context, but yes, sometimes > > > fast is more important than correct. Sometimes the other way around. > > > > More usually, it's sometimes better to be really fast and mostly > > correct than really really slow and entirely correct. That's why we > > use IEEE floating point instead of Decimal most of the time. Though > > I'm glad that Python 3 now deems the default int type to be capable of > > representing arbitrary integers (instead of dropping out to a separate > > long type as Py2 did), I think it's possibly worth optimizing small > > integers to machine words - but mainly, the int type focuses on > > correctness above performance, because the cost is low compared to the > > benefit. With float, the cost of arbitrary precision is extremely > > high, and the benefit much lower. > > > > With Unicode, the cost of perfect support is normally seen to be a > > doubling of internal memory usage (UTF-16 vs UCS-4). Pike and Python > > decided that the cost could, instead, be a tiny measure of complexity > > and actually *less* memory usage (compared to UTF-16, when lots of > > identifiers are ASCII). It's a system that works only when strings are > > immutable, but works beautifully there. Fortunately Pike doesn't have > > any, and Python has only one, idiot like jmf who completely > > misunderstands what's going on and uses microbenchmarks to prove > > obscure points... and then uses nonsense to try to prove... uhh... > > actually I'm not even sure what, sometimes. I wouldn't dare try to > > read his posts except that my mind's already in a rather broken state, > > as a combination of programming and Alice in Wonderland. > I do not mind to be considered as an idiot, but I'm definitively not blind. And I could add, I *never* saw once one soul, who is explaining what I'm doing wrong in the gazillion of examples I gave on this list. --- Back to ReportLab. Technically I would be really interested to see what could happen at the light of my previous post. jmf
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2014-01-04 17:46 -0500 |
| Message-ID | <mailman.4915.1388875627.18130.python-list@python.org> |
| In reply to | #63140 |
On 1/4/2014 2:10 PM, wxjmfauth@gmail.com wrote: > Le samedi 4 janvier 2014 15:17:40 UTC+1, Chris Angelico a écrit : >> any, and Python has only one, idiot like jmf who completely Chris, I appreciate the many contributions you make to this list, but that does not exempt you from out standard of conduct. >> misunderstands what's going on and uses microbenchmarks to prove >> obscure points... and then uses nonsense to try to prove... uhh... Troll baiting is a form of trolling. I think you are intelligent enough to know this. Please stop. > I do not mind to be considered as an idiot, but > I'm definitively not blind. > > And I could add, I *never* saw once one soul, who is > explaining what I'm doing wrong in the gazillion > of examples I gave on this list. If this is true, it is because you have ignored and not read my numerous, relatively polite posts. To repeat very briefly: 1. Cherry picking (presenting the most extreme case as representative). 2. Calling space saving a problem (repeatedly). 3. Ignoring bug fixes. 4. Repetition (of the 'gazillion example' without new content). Have you ever acknowledged, let alone thank people for, the fix for the one bad regression you did find. The FSR is still a work in progress. Just today, Serhiy pushed a patch speeding up the UTF-32 encoder, after previously speeding up the UTF-32 decoder. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2014-01-05 06:23 -0800 |
| Message-ID | <d8438ee4-1429-4855-9d78-b833f4f2748f@googlegroups.com> |
| In reply to | #63151 |
Le samedi 4 janvier 2014 23:46:49 UTC+1, Terry Reedy a écrit : > On 1/4/2014 2:10 PM, wxjmfauth@gmail.com wrote: > > > Le samedi 4 janvier 2014 15:17:40 UTC+1, Chris Angelico a écrit : > > > > >> any, and Python has only one, idiot like jmf who completely > > > > Chris, I appreciate the many contributions you make to this list, but > > that does not exempt you from out standard of conduct. > > > > >> misunderstands what's going on and uses microbenchmarks to prove > > >> obscure points... and then uses nonsense to try to prove... uhh... > > > > Troll baiting is a form of trolling. I think you are intelligent enough > > to know this. Please stop. > > > > > I do not mind to be considered as an idiot, but > > > I'm definitively not blind. > > > > > > And I could add, I *never* saw once one soul, who is > > > explaining what I'm doing wrong in the gazillion > > > of examples I gave on this list. > > > > If this is true, it is because you have ignored and not read my > > numerous, relatively polite posts. To repeat very briefly: > > > > 1. Cherry picking (presenting the most extreme case as representative). > > > > 2. Calling space saving a problem (repeatedly). > > > > 3. Ignoring bug fixes. > > > > 4. Repetition (of the 'gazillion example' without new content). > > > > Have you ever acknowledged, let alone thank people for, the fix for the > > one bad regression you did find. The FSR is still a work in progress. > > Just today, Serhiy pushed a patch speeding up the UTF-32 encoder, after > > previously speeding up the UTF-32 decoder. > > > > -- My examples are ONLY ILLUSTRATING, this FSR is wrong by design, can be on the side of memory, performance, linguistic or even typography. I will not refrain you to waste your time in adjusting bytes, if the problem is not on that side. jmf
[toc] | [prev] | [next] | [standalone]
| From | Ned Batchelder <ned@nedbatchelder.com> |
|---|---|
| Date | 2014-01-05 10:20 -0500 |
| Message-ID | <mailman.4948.1388935226.18130.python-list@python.org> |
| In reply to | #63194 |
On 1/5/14 9:23 AM, wxjmfauth@gmail.com wrote: > Le samedi 4 janvier 2014 23:46:49 UTC+1, Terry Reedy a écrit : >> On 1/4/2014 2:10 PM, wxjmfauth@gmail.com wrote: >>> I do not mind to be considered as an idiot, but >>> I'm definitively not blind. >>> >>> And I could add, I *never* saw once one soul, who is >>> explaining what I'm doing wrong in the gazillion >>> of examples I gave on this list. >> >> If this is true, it is because you have ignored and not read my >> numerous, relatively polite posts. To repeat very briefly: >> >> 1. Cherry picking (presenting the most extreme case as representative). >> >> 2. Calling space saving a problem (repeatedly). >> >> 3. Ignoring bug fixes. >> >> 4. Repetition (of the 'gazillion example' without new content). >> >> Have you ever acknowledged, let alone thank people for, the fix for the >> one bad regression you did find. The FSR is still a work in progress. >> Just today, Serhiy pushed a patch speeding up the UTF-32 encoder, after >> previously speeding up the UTF-32 decoder. >> >> -- > > My examples are ONLY ILLUSTRATING, this FSR > is wrong by design, can be on the side of > memory, performance, linguistic or even > typography. JMF: this has been pointed out to you time and again: the flexible string representation is not wrong. To show that it is wrong, you would have to demonstrate some semantic of Unicode that is violated. You have never done this. You've picked pathological cases and shown micro-timing output, and memory usage. The Unicode standard doesn't promise anything about timing or memory use. The FSR makes a trade-off of time and space. Everyone but you considers it a good trade-off. I don't think you are showing real use cases, but if they are, I'm sorry that your use-case suffers. That doesn't make the FSR wrong. The most accurate statement is that you don't like the FSR. That's fine, you're entitled to your opinion. You say the FSR is wrong linguistically. This can't be true, since an FSR Unicode string is indistinguishable from an internally-UTF-32 Unicode string, and no, memory use or timings are irrelevant when discussing the linguistic performance of a Unicode string. You've also said that the internal representation of the FSR is incorrect because of encodings somehow. Encodings have nothing to do with the internal representation of a Unicode string, they are for interchanging data. You seem to know a lot about Unicode, but when you make this fundamental mistake, you call all of your expertise into question. To re-iterate what you are doing wrong: 1) You continue to claim things that are not true, and that you have never substantiated. 2) You paste code samples without accompanying text that explain what you are trying to demonstrate. 3) You ignore refutations that disprove your points. These are all the behaviors of a troll. Please stop. If you want to discuss the details of Unicode implementations, I'd welcome an offlist discussion, but only if you will approach it honestly enough to leave open the possibility that you are wrong. I know I would be glad to learn details of Unicode that I have missed, but so far you haven't provided any. --Ned. > > I will not refrain you to waste your time > in adjusting bytes, if the problem is not > on that side. > > jmf > -- Ned Batchelder, http://nedbatchelder.com
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2014-01-05 17:14 -0500 |
| Message-ID | <mailman.4976.1388960067.18130.python-list@python.org> |
| In reply to | #63194 |
On 1/5/2014 9:23 AM, wxjmfauth@gmail.com wrote: > Le samedi 4 janvier 2014 23:46:49 UTC+1, Terry Reedy a écrit : >> On 1/4/2014 2:10 PM, wxjmfauth@gmail.com wrote: >>> And I could add, I *never* saw once one soul, who is >>> explaining what I'm doing wrong in the gazillion >>> of examples I gave on this list. >> If this is true, it is because you have ignored and not read my >> numerous, relatively polite posts. To repeat very briefly: >> 1. Cherry picking (presenting the most extreme case as representative). >> 2. Calling space saving a problem (repeatedly). >> 3. Ignoring bug fixes. ... > My examples are ONLY ILLUSTRATING, this FSR > is wrong by design, can be on the side of > memory, performance, linguistic or even > typography. Let me expand on 3 of my points. First, performance == time: Point 3. You correctly identified a time regression in finding a character in a string. I saw that the slowdown was *not* inherent in the FSR but had to be a glitch in the code, and reported it on pydev with the hope that someone would fix it even if it were not too important in real use cases. Someone did. Point 1. You incorrectly generalized that extreme case. I reported (a year ago last September) that the overall stringbench results were about the same. I also pointed out that there is an equally non-representative extreme case in the opposite direction, and that it would equally be wrong of me to use that to claim that FSR is faster. (It turns out that this FSR speed advantage *is* inherent in the design.) Memory: Point 2. A *design goal* of FSR was to save memory relative to UTF-32, which is what you apparently prefer. Your examples show that FSF successfully met its design goal. But you call that success, saving memory, 'wrong'. On what basis? You *claim* the FSR is 'wrong by design', but your examples only show that is was temporarily wrong in implementation as far as speed and correct by design as far as memory goes. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2014-01-07 05:34 -0800 |
| Message-ID | <2fbf4f89-caaa-4fab-8d7e-ff7ef84029a2@googlegroups.com> |
| In reply to | #63232 |
Le dimanche 5 janvier 2014 23:14:07 UTC+1, Terry Reedy a écrit : > On 1/5/2014 9:23 AM, wxjmfauth@gmail.com wrote: > > > Le samedi 4 janvier 2014 23:46:49 UTC+1, Terry Reedy a écrit : > > >> On 1/4/2014 2:10 PM, wxjmfauth@gmail.com wrote: > > >>> And I could add, I *never* saw once one soul, who is > > >>> explaining what I'm doing wrong in the gazillion > > >>> of examples I gave on this list. > > > > >> If this is true, it is because you have ignored and not read my > > >> numerous, relatively polite posts. To repeat very briefly: > > >> 1. Cherry picking (presenting the most extreme case as representative). > > >> 2. Calling space saving a problem (repeatedly). > > >> 3. Ignoring bug fixes. > > ... > > > > > My examples are ONLY ILLUSTRATING, this FSR > > > is wrong by design, can be on the side of > > > memory, performance, linguistic or even > > > typography. > > > > Let me expand on 3 of my points. First, performance == time: > > > > Point 3. You correctly identified a time regression in finding a > > character in a string. I saw that the slowdown was *not* inherent in the > > FSR but had to be a glitch in the code, and reported it on pydev with > > the hope that someone would fix it even if it were not too important in > > real use cases. Someone did. > > > > Point 1. You incorrectly generalized that extreme case. I reported (a > > year ago last September) that the overall stringbench results were about > > the same. I also pointed out that there is an equally non-representative > > extreme case in the opposite direction, and that it would equally be > > wrong of me to use that to claim that FSR is faster. (It turns out that > > this FSR speed advantage *is* inherent in the design.) > > > > Memory: Point 2. A *design goal* of FSR was to save memory relative to > > UTF-32, which is what you apparently prefer. Your examples show that FSF > > successfully met its design goal. But you call that success, saving > > memory, 'wrong'. On what basis? > > > > You *claim* the FSR is 'wrong by design', but your examples only show > > that is was temporarily wrong in implementation as far as speed and > > correct by design as far as memory goes. > > Point 3: You are right. I'm very happy to agree. Point 2: This Flexible String Representation does no "effectuate" any memory optimization. It only succeeds to do the opposite of what a corrrect usage of utf* do. Ned : this has already been explained and illustrated. jmf
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2014-01-07 09:54 -0500 |
| Message-ID | <mailman.5134.1389106482.18130.python-list@python.org> |
| In reply to | #63427 |
On 1/7/2014 8:34 AM, wxjmfauth@gmail.com wrote: > Le dimanche 5 janvier 2014 23:14:07 UTC+1, Terry Reedy a écrit : >> Memory: Point 2. A *design goal* of FSR was to save memory relative to >> UTF-32, which is what you apparently prefer. Your examples show that FSF >> successfully met its design goal. But you call that success, saving >> memory, 'wrong'. On what basis? > Point 2: This Flexible String Representation does no > "effectuate" any memory optimization. It only succeeds > to do the opposite of what a corrrect usage of utf* > do. Since the FSF *was* successful in saving memory, and indeed shrank the Python binary by about a megabyte, I have no idea what you mean. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Tim Delaney <timothy.c.delaney@gmail.com> |
|---|---|
| Date | 2014-01-08 09:38 +1100 |
| Message-ID | <mailman.5152.1389134341.18130.python-list@python.org> |
| In reply to | #63427 |
[Multipart message — attachments visible in raw view] — view raw
On 8 January 2014 00:34, <wxjmfauth@gmail.com> wrote:
>
> Point 2: This Flexible String Representation does no
> "effectuate" any memory optimization. It only succeeds
> to do the opposite of what a corrrect usage of utf*
> do.
>
UTF-8 is a variable-width encoding that uses less memory to encode code
points with lower numerical values, on a per-character basis e.g. if a code
point <= U+007F it will use a single byte to encode; if <= U+07FF two bytes
will be used; ... up to a maximum of 6 bytes for code points >= U+4000000.
FSR is a variable-width memory structure that uses the width of the code
point with the highest numerical value in the string e.g. if all code
points in the string are <= U+00FF a single byte will be used per
character; if all code points are <= U+FFFF two bytes will be used per
character; and in all other cases 4 bytes will be used per character.
In terms of memory usage the difference is that UTF-8 varies its width
per-character, whereas the FSR varies its width per-string. For any
particular string, UTF-8 may well result in using less memory than the FSR,
but in other (quite common) cases the FSR will use less memory than UTF-8
e.g. if the string contains only contains code points <= U+00FF, but some
are between U+0080 and U+00FF (inclusive).
In most cases the FSR uses the same or less memory than earlier versions of
Python 3 and correctly handles all code points (just like UTF-8). In the
cases where the FSR uses more memory than previously, the previous
behaviour was incorrect.
No matter which representation is used, there will be a certain amount of
overhead (which is the majority of what most of your examples have shown).
Here are examples which demonstrate cases where UTF-8 uses less memory,
cases where the FSR uses less memory, and cases where they use the same
amount of memory (accounting for the minimum amount of overhead required
for each).
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64
bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>>
>>> fsr = u""
>>> utf8 = fsr.encode("utf-8")
>>> min_fsr_overhead = sys.getsizeof(fsr)
>>> min_utf8_overhead = sys.getsizeof(utf8)
>>> min_fsr_overhead
49
>>> min_utf8_overhead
33
>>>
>>> fsr = u"\u0001" * 1000
>>> utf8 = fsr.encode("utf-8")
>>> sys.getsizeof(fsr) - min_fsr_overhead
1000
>>> sys.getsizeof(utf8) - min_utf8_overhead
1000
>>>
>>> fsr = u"\u0081" * 1000
>>> utf8 = fsr.encode("utf-8")
>>> sys.getsizeof(fsr) - min_fsr_overhead
1024
>>> sys.getsizeof(utf8) - min_utf8_overhead
2000
>>>
>>> fsr = u"\u0001\u0081" * 1000
>>> utf8 = fsr.encode("utf-8")
>>> sys.getsizeof(fsr) - min_fsr_overhead
2024
>>> sys.getsizeof(utf8) - min_utf8_overhead
3000
>>>
>>> fsr = u"\u0101" * 1000
>>> utf8 = fsr.encode("utf-8")
>>> sys.getsizeof(fsr) - min_fsr_overhead
2025
>>> sys.getsizeof(utf8) - min_utf8_overhead
2000
>>>
>>> fsr = u"\u0101\u0081" * 1000
>>> utf8 = fsr.encode("utf-8")
>>> sys.getsizeof(fsr) - min_fsr_overhead
4025
>>> sys.getsizeof(utf8) - min_utf8_overhead
4000
Indexing a character in UTF-8 is O(N) - you have to traverse the the string
up to the character being indexed. Indexing a character in the FSR is O(1).
In all cases the FSR has better performance characteristics for indexing
and slicing than UTF-8.
There are tradeoffs with both UTF-8 and the FSR. The Python developers
decided the priorities for Unicode handling in Python were:
1. Correctness
a. all code points must be handled correctly;
b. it must not be possible to obtain part of a code point (e.g. the
first byte only of a multi-byte code point);
2. No change in the Big O characteristics of string operations e.g.
indexing must remain O(1);
3. Reduced memory use in most cases.
It is impossible for UTF-8 to meet both criteria 1b and 2 without
additional auxiliary data (which uses more memory and increases complexity
of the implementation). The FSR meets all 3 criteria.
Tim Delaney
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2014-01-07 19:02 -0500 |
| Message-ID | <mailman.5153.1389139359.18130.python-list@python.org> |
| In reply to | #63427 |
On 1/7/2014 9:54 AM, Terry Reedy wrote: > On 1/7/2014 8:34 AM, wxjmfauth@gmail.com wrote: >> Le dimanche 5 janvier 2014 23:14:07 UTC+1, Terry Reedy a écrit : > >>> Memory: Point 2. A *design goal* of FSR was to save memory relative to >>> UTF-32, which is what you apparently prefer. Your examples show that FSF >>> successfully met its design goal. But you call that success, saving >>> memory, 'wrong'. On what basis? > >> Point 2: This Flexible String Representation does no >> "effectuate" any memory optimization. It only succeeds >> to do the opposite of what a corrrect usage of utf* >> do. > > Since the FSF *was* successful in saving memory, and indeed shrank the > Python binary by about a megabyte, I have no idea what you mean. Tim Delaney apparently did, and answered on the basis of his understanding. Note that I said that the design goal was 'save memory RELATIVE TO UTF-32', not 'optimize memory'. UTF-8 was not considered an option. Nor was any form of arithmetic coding https://en.wikipedia.org/wiki/Arithmetic_coding to truly 'optimize memory'. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2014-01-08 01:59 -0800 |
| Message-ID | <cd28325a-7c02-43be-b94c-fa29a20acf52@googlegroups.com> |
| In reply to | #63453 |
Le mercredi 8 janvier 2014 01:02:22 UTC+1, Terry Reedy a écrit :
> On 1/7/2014 9:54 AM, Terry Reedy wrote:
>
> > On 1/7/2014 8:34 AM, wxjmfauth@gmail.com wrote:
>
> >> Le dimanche 5 janvier 2014 23:14:07 UTC+1, Terry Reedy a écrit :
>
> >
>
> >>> Memory: Point 2. A *design goal* of FSR was to save memory relative to
>
> >>> UTF-32, which is what you apparently prefer. Your examples show that FSF
>
> >>> successfully met its design goal. But you call that success, saving
>
> >>> memory, 'wrong'. On what basis?
>
> >
>
> >> Point 2: This Flexible String Representation does no
>
> >> "effectuate" any memory optimization. It only succeeds
>
> >> to do the opposite of what a corrrect usage of utf*
>
> >> do.
>
> >
>
> > Since the FSF *was* successful in saving memory, and indeed shrank the
>
> > Python binary by about a megabyte, I have no idea what you mean.
>
>
>
> Tim Delaney apparently did, and answered on the basis of his
>
> understanding. Note that I said that the design goal was 'save memory
>
> RELATIVE TO UTF-32', not 'optimize memory'. UTF-8 was not considered an
>
> option. Nor was any form of arithmetic coding
>
> https://en.wikipedia.org/wiki/Arithmetic_coding
>
> to truly 'optimize memory'.
>
>
The FSR acts more as an coding scheme selector than
as a code point optimizer.
Claiming that it saves memory is some kind of illusion;
a little bit as saying "Py2.7 uses "relatively" less memory than
Py3.2 (UCS-2)".
>>> sys.getsizeof('a' * 10000 + 'z')
10026
>>> sys.getsizeof('a' * 10000 + '€')
20040
>>> sys.getsizeof('a' * 10000 + '\U00010000')
40044
>>> sys.getsizeof('€' * 10000 + '€')
20040
>>> sys.getsizeof('€' * 10000 + '\U00010000')
40044
>>> sys.getsizeof('\U00010000' * 10000 + '\U00010000')
40044
jmf
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2014-01-08 14:26 -0500 |
| Message-ID | <mailman.5192.1389209238.18130.python-list@python.org> |
| In reply to | #63466 |
On 1/8/2014 4:59 AM, wxjmfauth@gmail.com wrote:
[responding to me]
> The FSR acts more as an coding scheme selector
That is what PEP 393 describes and what I and many others have said. The
FSR saves memory by selecting from three choices the most compact coding
scheme for each string.
I ask again, have you read PEP 393? If you are going to critique the
FSR, you should read its basic document.
> than as a code point optimizer.
I do not know what you mean by 'code point optimizer'.
> Claiming that it saves memory is some kind of illusion;
Do you really think that the mathematical fact "10026 < 20040 < 40044"
(from your example below) is some kind of illusion? If so, please take
your claim to a metaphysics list. If not, please stop trolling.
> a little bit as saying "Py2.7 uses "relatively" less memory than
> Py3.2 (UCS-2)".
This is inane as 2.7 and 3.2 both use the same two coding schemes.
Saying '1 < 2' is different from saying '2 < 2'.
On 3.3+
>>>> sys.getsizeof('a' * 10000 + 'z')
> 10026
>>>> sys.getsizeof('a' * 10000 + '€')
> 20040
>>>> sys.getsizeof('a' * 10000 + '\U00010000')
> 40044
3.2- wide (UCS-4) builds use about 40050 bytes for all three unicode
strings. One again, you have posted examples that show how FSR saves
memory, thus negating your denial of the saving.
--
Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2014-01-08 20:04 +0000 |
| Message-ID | <mailman.5195.1389211486.18130.python-list@python.org> |
| In reply to | #63427 |
On 07/01/2014 13:34, wxjmfauth@gmail.com wrote: > Le dimanche 5 janvier 2014 23:14:07 UTC+1, Terry Reedy a écrit : > > Ned : this has already been explained and illustrated. > > jmf > This has never been explained and illustrated. Roughly 30 minutes ago Terry Reedy once again completely shot your argument about memory usage to pieces. You did not bother to respond to the comments from Tim Delaney made almost one day ago. Please give up. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
[toc] | [prev] | [next] | [standalone]
Page 2 of 5 — ← Prev page 1 [2] 3 4 5 Next page →
Back to top | Article view | comp.lang.python
csiph-web