Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #62898 > unrolled thread

Blog "about python 3"

Started byMark Lawrence <breamoreboy@yahoo.co.uk>
First post2013-12-30 19:41 +0000
Last post2013-12-30 20:25 -0800
Articles 20 on this page of 82 — 19 participants

Back to article view | Back to comp.lang.python


Contents

  Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-30 19:41 +0000
    Re: Blog "about python 3" Steven D'Aprano <steve@pearwood.info> - 2013-12-30 20:49 +0000
      Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-30 21:29 +0000
      Re: Blog "about python 3" Ethan Furman <ethan@stoneleaf.us> - 2013-12-30 14:38 -0800
      Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2013-12-31 12:09 +1100
      Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-31 04:38 +0000
      Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2013-12-31 15:44 +1100
      Re: Blog "about python 3" Ethan Furman <ethan@stoneleaf.us> - 2013-12-30 20:33 -0800
      Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-31 04:59 +0000
      Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-31 08:22 +0000
        Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-31 20:53 +1100
          Re: Blog "about python 3" Antoine Pitrou <solipsis@pitrou.net> - 2013-12-31 14:13 +0000
            Re: Blog "about python 3" Roy Smith <roy@panix.com> - 2013-12-31 10:41 -0500
              Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-01 02:54 +1100
              Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-31 15:55 +0000
              Re: Blog "about python 3" Robin Becker <robin@reportlab.com> - 2014-01-02 17:36 +0000
                Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-03 15:49 +1100
                  Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-03 04:01 -0500
                    Re: Blog "about python 3" wxjmfauth@gmail.com - 2014-01-03 02:10 -0800
                      Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-03 21:24 +1100
                      Re: Blog "about python 3" Ethan Furman <ethan@stoneleaf.us> - 2014-01-03 08:56 -0800
                  Re: Blog "about python 3" Robin Becker <robin@reportlab.com> - 2014-01-03 12:28 +0000
                    Re: Blog "about python 3" Roy Smith <roy@panix.com> - 2014-01-03 09:57 -0500
                      Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-04 02:32 +1100
                  Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-03 17:00 -0500
                  Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-04 04:04 +0000
                    Re: Blog "about python 3" Roy Smith <roy@panix.com> - 2014-01-04 08:55 -0500
                      Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-05 01:17 +1100
                        Re: Blog "about python 3" wxjmfauth@gmail.com - 2014-01-04 11:10 -0800
                          Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-04 17:46 -0500
                            Re: Blog "about python 3" wxjmfauth@gmail.com - 2014-01-05 06:23 -0800
                              Re: Blog "about python 3" Ned Batchelder <ned@nedbatchelder.com> - 2014-01-05 10:20 -0500
                              Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-05 17:14 -0500
                                Re: Blog "about python 3" wxjmfauth@gmail.com - 2014-01-07 05:34 -0800
                                  Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-07 09:54 -0500
                                  Re: Blog "about python 3" Tim Delaney <timothy.c.delaney@gmail.com> - 2014-01-08 09:38 +1100
                                  Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-07 19:02 -0500
                                    Re: Blog "about python 3" wxjmfauth@gmail.com - 2014-01-08 01:59 -0800
                                      Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-08 14:26 -0500
                                  Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-08 20:04 +0000
                              Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-05 17:48 -0500
                          Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-05 10:28 +1100
                      Re: Blog "about python 3" Ned Batchelder <ned@nedbatchelder.com> - 2014-01-04 12:51 -0500
                      Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-05 13:27 +1100
                        Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-05 13:32 +1100
                        Re: Blog "about python 3" MRAB <python@mrabarnett.plus.com> - 2014-01-05 02:41 +0000
                        Re: Blog "about python 3" Roy Smith <roy@panix.com> - 2014-01-04 22:20 -0500
                          Re: Blog "about python 3" Rustom Mody <rustompmody@gmail.com> - 2014-01-05 10:12 +0530
                            Re: Blog "about python 3" Roy Smith <roy@panix.com> - 2014-01-05 00:11 -0500
                          Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-05 17:28 +1100
                            Re: Blog "about python 3" Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-01-05 14:05 -0500
                          Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-05 15:01 +1100
                            Re: Blog "about python 3" Roy Smith <roy@panix.com> - 2014-01-05 11:34 -0500
                              Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-06 03:51 +1100
                                Re: Blog "about python 3" Roy Smith <roy@panix.com> - 2014-01-05 12:09 -0500
                                Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-06 11:42 +1100
                              Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-05 17:56 -0500
                              Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-06 10:59 +1100
                              Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-06 12:23 +1100
                                Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-06 12:54 +1100
                                Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-06 05:53 +0000
                        Re: Blog "about python 3" Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-01-05 00:00 -0800
                          Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-05 23:28 +1100
                            Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-05 23:48 +1100
                            Re: Blog "about python 3" Roy Smith <roy@panix.com> - 2014-01-05 11:10 -0500
                        Re: Blog "about python 3" Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-01-05 13:51 -0500
              Re: Blog "about python 3" David Hutto <dwightdhutto@gmail.com> - 2014-01-02 13:25 -0500
              Re: Blog "about python 3" Terry Reedy <tjreedy@udel.edu> - 2014-01-02 13:37 -0500
              Re: Blog "about python 3" Antoine Pitrou <solipsis@pitrou.net> - 2014-01-02 23:57 +0000
              Re: Blog "about python 3" Robin Becker <robin@reportlab.com> - 2014-01-03 10:32 +0000
              Re: Blog "about python 3" Robin Becker <robin@reportlab.com> - 2014-01-03 11:14 +0000
                Re: Blog "about python 3" wxjmfauth@gmail.com - 2014-01-04 05:52 -0800
                  Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-05 13:41 +1100
                    Re: Blog "about python 3" Chris Angelico <rosuav@gmail.com> - 2014-01-05 13:54 +1100
                      Re: Blog "about python 3" wxjmfauth@gmail.com - 2014-01-05 02:39 -0800
              Re: Blog "about python 3" Robin Becker <robin@reportlab.com> - 2014-01-03 11:37 +0000
              Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-04 07:30 +0000
          Re: Blog "about python 3" Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-01-05 13:14 +0100
            Re: Blog "about python 3" Stefan Behnel <stefan_ml@behnel.de> - 2014-01-05 14:55 +0100
          Re: Blog "about python 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-05 13:10 +0000
      Re: Blog "about python 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-31 20:04 +1100
      Re: Blog "about python 3" Devin Jeanpierre <jeanpierreda@gmail.com> - 2013-12-30 20:25 -0800

Page 2 of 5 — ← Prev page 1 [2] 3 4 5  Next page →


#63074

FromEthan Furman <ethan@stoneleaf.us>
Date2014-01-03 08:56 -0800
Message-ID<mailman.4863.1388769702.18130.python-list@python.org>
In reply to#63047
On 01/03/2014 02:24 AM, Chris Angelico wrote:
>
> I worked that out with a sheet of paper and a pencil. The pencil was a
> little help, but the paper was three sheets in the wind.

Beautiful!

--
~Ethan~

[toc] | [prev] | [next] | [standalone]


#63055

FromRobin Becker <robin@reportlab.com>
Date2014-01-03 12:28 +0000
Message-ID<mailman.4850.1388752146.18130.python-list@python.org>
In reply to#63036
On 03/01/2014 09:01, Terry Reedy wrote:
> There was more speedup in 3.3.2 and possibly even more in 3.3.3, so OP
> should run the latter.

python 3.3.3 is what I use on windows. As for astral / non-bmp etc etc that's 
almost irrelevant for the sort of tests we're doing which are mostly simple 
english text.
-- 
Robin Becker

[toc] | [prev] | [next] | [standalone]


#63057

FromRoy Smith <roy@panix.com>
Date2014-01-03 09:57 -0500
Message-ID<roy-5D9224.09574803012014@news.panix.com>
In reply to#63055
In article <mailman.4850.1388752146.18130.python-list@python.org>,
 Robin Becker <robin@reportlab.com> wrote:

> On 03/01/2014 09:01, Terry Reedy wrote:
> > There was more speedup in 3.3.2 and possibly even more in 3.3.3, so OP
> > should run the latter.
> 
> python 3.3.3 is what I use on windows. As for astral / non-bmp etc etc that's 
> almost irrelevant for the sort of tests we're doing which are mostly simple 
> english text.

The sad part is, if you're accepting any text from external sources, you 
need to be able to deal with astral.

I was doing a project a while ago importing 20-something million records 
into a MySQL database.  Little did I know that FOUR of those records 
contained astral characters (which MySQL, at least the version I was 
using, couldn't handle).

My way of dealing with those records was to nuke them.  Longer term we 
ended up switching to Postgress.

[toc] | [prev] | [next] | [standalone]


#63061

FromChris Angelico <rosuav@gmail.com>
Date2014-01-04 02:32 +1100
Message-ID<mailman.4854.1388763458.18130.python-list@python.org>
In reply to#63057
On Sat, Jan 4, 2014 at 1:57 AM, Roy Smith <roy@panix.com> wrote:
> I was doing a project a while ago importing 20-something million records
> into a MySQL database.  Little did I know that FOUR of those records
> contained astral characters (which MySQL, at least the version I was
> using, couldn't handle).
>
> My way of dealing with those records was to nuke them.  Longer term we
> ended up switching to Postgress.

Look! Postgres means you don't lose data!!

Seriously though, that's a much better long-term solution than
destroying data. But MySQL does support the full Unicode range - just
not in its "UTF8" type. You have to specify "UTF8MB4" - that is,
"maximum bytes 4" rather than the default of 3. According to [1], the
UTF8MB4 encoding is stored as UTF-16, and UTF8 is stored as UCS-2. And
according to [2], it's even possible to explicitly choose the
mindblowing behaviour of UCS-2 for a data type that calls itself
"UTF8", so that a vague theoretical subsequent version of MySQL might
be able to make "UTF8" mean UTF-8, and people can choose to use the
other alias.

To my mind, this is a bug with backward-compatibility concerns. That
means it can't be fixed in a point release. Fine. But the behaviour
change is "this used to throw an error, now it works". Surely that can
be fixed in the next release. Or surely a version or two of
deprecating "UTF8" in favour of the two "MB?" types (and never ever
returning "UTF8" from any query), followed by a reintroduction of
"UTF8" as an alias for MB4, and the deprecation of MB3. Or am I
spoiled by the quality of Python (and other) version numbering, where
I can (largely) depend on functionality not changing in point
releases?

ChrisA

[1] http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8mb4.html
[2] http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8mb3.html

[toc] | [prev] | [next] | [standalone]


#63088

FromTerry Reedy <tjreedy@udel.edu>
Date2014-01-03 17:00 -0500
Message-ID<mailman.4870.1388786452.18130.python-list@python.org>
In reply to#63036
On 1/3/2014 7:28 AM, Robin Becker wrote:
> On 03/01/2014 09:01, Terry Reedy wrote:
>> There was more speedup in 3.3.2 and possibly even more in 3.3.3, so OP
>> should run the latter.
>
> python 3.3.3 is what I use on windows. As for astral / non-bmp etc etc
> that's almost irrelevant for the sort of tests we're doing which are
> mostly simple english text.

If you do not test the cases where 2.7 is buggy and requires nasty 
workarounds, then I can understand why you do not so much appreciate 3.3 
;-).

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#63104

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2014-01-04 04:04 +0000
Message-ID<mailman.4882.1388808283.18130.python-list@python.org>
In reply to#63036
On 03/01/2014 22:00, Terry Reedy wrote:
> On 1/3/2014 7:28 AM, Robin Becker wrote:
>> On 03/01/2014 09:01, Terry Reedy wrote:
>>> There was more speedup in 3.3.2 and possibly even more in 3.3.3, so OP
>>> should run the latter.
>>
>> python 3.3.3 is what I use on windows. As for astral / non-bmp etc etc
>> that's almost irrelevant for the sort of tests we're doing which are
>> mostly simple english text.
>
> If you do not test the cases where 2.7 is buggy and requires nasty
> workarounds, then I can understand why you do not so much appreciate 3.3
> ;-).
>

Are you crazy?  Surely everybody prefers fast but incorrect code in 
preference to something that is correct but slow? Except that Python 
3.3.3 is often faster.  And always (to my knowledge) correct.  Upper 
Class Twit of the Year anybody? :)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#63134

FromRoy Smith <roy@panix.com>
Date2014-01-04 08:55 -0500
Message-ID<roy-1820F1.08551004012014@news.panix.com>
In reply to#63104
In article <mailman.4882.1388808283.18130.python-list@python.org>,
 Mark Lawrence <breamoreboy@yahoo.co.uk> wrote:

> Surely everybody prefers fast but incorrect code in 
> preference to something that is correct but slow?

I realize I'm taking this statement out of context, but yes, sometimes 
fast is more important than correct.  Sometimes the other way around.

[toc] | [prev] | [next] | [standalone]


#63135

FromChris Angelico <rosuav@gmail.com>
Date2014-01-05 01:17 +1100
Message-ID<mailman.4905.1388845063.18130.python-list@python.org>
In reply to#63134
On Sun, Jan 5, 2014 at 12:55 AM, Roy Smith <roy@panix.com> wrote:
> In article <mailman.4882.1388808283.18130.python-list@python.org>,
>  Mark Lawrence <breamoreboy@yahoo.co.uk> wrote:
>
>> Surely everybody prefers fast but incorrect code in
>> preference to something that is correct but slow?
>
> I realize I'm taking this statement out of context, but yes, sometimes
> fast is more important than correct.  Sometimes the other way around.

More usually, it's sometimes better to be really fast and mostly
correct than really really slow and entirely correct. That's why we
use IEEE floating point instead of Decimal most of the time. Though
I'm glad that Python 3 now deems the default int type to be capable of
representing arbitrary integers (instead of dropping out to a separate
long type as Py2 did), I think it's possibly worth optimizing small
integers to machine words - but mainly, the int type focuses on
correctness above performance, because the cost is low compared to the
benefit. With float, the cost of arbitrary precision is extremely
high, and the benefit much lower.

With Unicode, the cost of perfect support is normally seen to be a
doubling of internal memory usage (UTF-16 vs UCS-4). Pike and Python
decided that the cost could, instead, be a tiny measure of complexity
and actually *less* memory usage (compared to UTF-16, when lots of
identifiers are ASCII). It's a system that works only when strings are
immutable, but works beautifully there. Fortunately Pike doesn't have
any, and Python has only one, idiot like jmf who completely
misunderstands what's going on and uses microbenchmarks to prove
obscure points... and then uses nonsense to try to prove... uhh...
actually I'm not even sure what, sometimes. I wouldn't dare try to
read his posts except that my mind's already in a rather broken state,
as a combination of programming and Alice in Wonderland.

ChrisA

[toc] | [prev] | [next] | [standalone]


#63140

Fromwxjmfauth@gmail.com
Date2014-01-04 11:10 -0800
Message-ID<3519f85e-0909-4f5a-9a6e-09b6fd4c312d@googlegroups.com>
In reply to#63135
Le samedi 4 janvier 2014 15:17:40 UTC+1, Chris Angelico a écrit :
> On Sun, Jan 5, 2014 at 12:55 AM, Roy Smith <roy@panix.com> wrote:
> 
> > In article <mailman.4882.1388808283.18130.python-list@python.org>,
> 
> >  Mark Lawrence <breamoreboy@yahoo.co.uk> wrote:
> 
> >
> 
> >> Surely everybody prefers fast but incorrect code in
> 
> >> preference to something that is correct but slow?
> 
> >
> 
> > I realize I'm taking this statement out of context, but yes, sometimes
> 
> > fast is more important than correct.  Sometimes the other way around.
> 
> 
> 
> More usually, it's sometimes better to be really fast and mostly
> 
> correct than really really slow and entirely correct. That's why we
> 
> use IEEE floating point instead of Decimal most of the time. Though
> 
> I'm glad that Python 3 now deems the default int type to be capable of
> 
> representing arbitrary integers (instead of dropping out to a separate
> 
> long type as Py2 did), I think it's possibly worth optimizing small
> 
> integers to machine words - but mainly, the int type focuses on
> 
> correctness above performance, because the cost is low compared to the
> 
> benefit. With float, the cost of arbitrary precision is extremely
> 
> high, and the benefit much lower.
> 
> 
> 
> With Unicode, the cost of perfect support is normally seen to be a
> 
> doubling of internal memory usage (UTF-16 vs UCS-4). Pike and Python
> 
> decided that the cost could, instead, be a tiny measure of complexity
> 
> and actually *less* memory usage (compared to UTF-16, when lots of
> 
> identifiers are ASCII). It's a system that works only when strings are
> 
> immutable, but works beautifully there. Fortunately Pike doesn't have
> 
> any, and Python has only one, idiot like jmf who completely
> 
> misunderstands what's going on and uses microbenchmarks to prove
> 
> obscure points... and then uses nonsense to try to prove... uhh...
> 
> actually I'm not even sure what, sometimes. I wouldn't dare try to
> 
> read his posts except that my mind's already in a rather broken state,
> 
> as a combination of programming and Alice in Wonderland.
> 


I do not mind to be considered as an idiot, but
I'm definitively not blind.

And I could add, I *never* saw once one soul, who is
explaining what I'm doing wrong in the gazillion
of examples I gave on this list.

---

Back to ReportLab. Technically I would be really
interested to see what could happen at the light
of my previous post.

jmf

[toc] | [prev] | [next] | [standalone]


#63151

FromTerry Reedy <tjreedy@udel.edu>
Date2014-01-04 17:46 -0500
Message-ID<mailman.4915.1388875627.18130.python-list@python.org>
In reply to#63140
On 1/4/2014 2:10 PM, wxjmfauth@gmail.com wrote:
> Le samedi 4 janvier 2014 15:17:40 UTC+1, Chris Angelico a écrit :

>> any, and Python has only one, idiot like jmf who completely

Chris, I appreciate the many contributions you make to this list, but 
that does not exempt you from out standard of conduct.

>> misunderstands what's going on and uses microbenchmarks to prove
>> obscure points... and then uses nonsense to try to prove... uhh...

Troll baiting is a form of trolling. I think you are intelligent enough 
to know this.  Please stop.

> I do not mind to be considered as an idiot, but
> I'm definitively not blind.
>
> And I could add, I *never* saw once one soul, who is
> explaining what I'm doing wrong in the gazillion
> of examples I gave on this list.

If this is true, it is because you have ignored and not read my 
numerous, relatively polite posts. To repeat very briefly:

1. Cherry picking (presenting the most extreme case as representative).

2. Calling space saving a problem (repeatedly).

3. Ignoring bug fixes.

4. Repetition (of the 'gazillion example' without new content).

Have you ever acknowledged, let alone thank people for, the fix for the 
one bad regression you did find. The FSR is still a work in progress. 
Just today, Serhiy pushed a patch speeding up the UTF-32 encoder, after 
previously speeding up the UTF-32 decoder.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#63194

Fromwxjmfauth@gmail.com
Date2014-01-05 06:23 -0800
Message-ID<d8438ee4-1429-4855-9d78-b833f4f2748f@googlegroups.com>
In reply to#63151
Le samedi 4 janvier 2014 23:46:49 UTC+1, Terry Reedy a écrit :
> On 1/4/2014 2:10 PM, wxjmfauth@gmail.com wrote:
> 
> > Le samedi 4 janvier 2014 15:17:40 UTC+1, Chris Angelico a écrit :
> 
> 
> 
> >> any, and Python has only one, idiot like jmf who completely
> 
> 
> 
> Chris, I appreciate the many contributions you make to this list, but 
> 
> that does not exempt you from out standard of conduct.
> 
> 
> 
> >> misunderstands what's going on and uses microbenchmarks to prove
> 
> >> obscure points... and then uses nonsense to try to prove... uhh...
> 
> 
> 
> Troll baiting is a form of trolling. I think you are intelligent enough 
> 
> to know this.  Please stop.
> 
> 
> 
> > I do not mind to be considered as an idiot, but
> 
> > I'm definitively not blind.
> 
> >
> 
> > And I could add, I *never* saw once one soul, who is
> 
> > explaining what I'm doing wrong in the gazillion
> 
> > of examples I gave on this list.
> 
> 
> 
> If this is true, it is because you have ignored and not read my 
> 
> numerous, relatively polite posts. To repeat very briefly:
> 
> 
> 
> 1. Cherry picking (presenting the most extreme case as representative).
> 
> 
> 
> 2. Calling space saving a problem (repeatedly).
> 
> 
> 
> 3. Ignoring bug fixes.
> 
> 
> 
> 4. Repetition (of the 'gazillion example' without new content).
> 
> 
> 
> Have you ever acknowledged, let alone thank people for, the fix for the 
> 
> one bad regression you did find. The FSR is still a work in progress. 
> 
> Just today, Serhiy pushed a patch speeding up the UTF-32 encoder, after 
> 
> previously speeding up the UTF-32 decoder.
> 
> 
> 
> -- 

My examples are ONLY ILLUSTRATING, this FSR
is wrong by design, can be on the side of
memory, performance, linguistic or even
typography.

I will not refrain you to waste your time
in adjusting bytes, if the problem is not
on that side.

jmf

[toc] | [prev] | [next] | [standalone]


#63196

FromNed Batchelder <ned@nedbatchelder.com>
Date2014-01-05 10:20 -0500
Message-ID<mailman.4948.1388935226.18130.python-list@python.org>
In reply to#63194
On 1/5/14 9:23 AM, wxjmfauth@gmail.com wrote:
> Le samedi 4 janvier 2014 23:46:49 UTC+1, Terry Reedy a écrit :
>> On 1/4/2014 2:10 PM, wxjmfauth@gmail.com wrote:
>>> I do not mind to be considered as an idiot, but
>>> I'm definitively not blind.
>>>
>>> And I could add, I *never* saw once one soul, who is
>>> explaining what I'm doing wrong in the gazillion
>>> of examples I gave on this list.
>>
>> If this is true, it is because you have ignored and not read my
>> numerous, relatively polite posts. To repeat very briefly:
>>
>> 1. Cherry picking (presenting the most extreme case as representative).
>>
>> 2. Calling space saving a problem (repeatedly).
>>
>> 3. Ignoring bug fixes.
>>
>> 4. Repetition (of the 'gazillion example' without new content).
>>
>> Have you ever acknowledged, let alone thank people for, the fix for the
>> one bad regression you did find. The FSR is still a work in progress.
>> Just today, Serhiy pushed a patch speeding up the UTF-32 encoder, after
>> previously speeding up the UTF-32 decoder.
>>
>> --
>
> My examples are ONLY ILLUSTRATING, this FSR
> is wrong by design, can be on the side of
> memory, performance, linguistic or even
> typography.

JMF: this has been pointed out to you time and again: the flexible 
string representation is not wrong.  To show that it is wrong, you would 
have to demonstrate some semantic of Unicode that is violated.  You have 
never done this.  You've picked pathological cases and shown 
micro-timing output, and memory usage.  The Unicode standard doesn't 
promise anything about timing or memory use.

The FSR makes a trade-off of time and space.  Everyone but you considers 
it a good trade-off.  I don't think you are showing real use cases, but 
if they are, I'm sorry that your use-case suffers.  That doesn't make 
the FSR wrong. The most accurate statement is that you don't like the 
FSR.  That's fine, you're entitled to your opinion.

You say the FSR is wrong linguistically.  This can't be true, since an 
FSR Unicode string is indistinguishable from an internally-UTF-32 
Unicode string, and no, memory use or timings are irrelevant when 
discussing the linguistic performance of a Unicode string.

You've also said that the internal representation of the FSR is 
incorrect because of encodings somehow.  Encodings have nothing to do 
with the internal representation of a Unicode string, they are for 
interchanging data.  You seem to know a lot about Unicode, but when you 
make this fundamental mistake, you call all of your expertise into question.

To re-iterate what you are doing wrong:

1) You continue to claim things that are not true, and that you have 
never substantiated.

2) You paste code samples without accompanying text that explain what 
you are trying to demonstrate.

3) You ignore refutations that disprove your points.

These are all the behaviors of a troll.  Please stop.

If you want to discuss the details of Unicode implementations, I'd 
welcome an offlist discussion, but only if you will approach it honestly 
enough to leave open the possibility that you are wrong.  I know I would 
be glad to learn details of Unicode that I have missed, but so far you 
haven't provided any.

--Ned.

>
> I will not refrain you to waste your time
> in adjusting bytes, if the problem is not
> on that side.
>
> jmf
>


-- 
Ned Batchelder, http://nedbatchelder.com

[toc] | [prev] | [next] | [standalone]


#63232

FromTerry Reedy <tjreedy@udel.edu>
Date2014-01-05 17:14 -0500
Message-ID<mailman.4976.1388960067.18130.python-list@python.org>
In reply to#63194
On 1/5/2014 9:23 AM, wxjmfauth@gmail.com wrote:
> Le samedi 4 janvier 2014 23:46:49 UTC+1, Terry Reedy a écrit :
>> On 1/4/2014 2:10 PM, wxjmfauth@gmail.com wrote:
>>> And I could add, I *never* saw once one soul, who is
>>> explaining what I'm doing wrong in the gazillion
>>> of examples I gave on this list.

>> If this is true, it is because you have ignored and not read my
>> numerous, relatively polite posts. To repeat very briefly:
>> 1. Cherry picking (presenting the most extreme case as representative).
>> 2. Calling space saving a problem (repeatedly).
 >> 3. Ignoring bug fixes.
...

> My examples are ONLY ILLUSTRATING, this FSR
> is wrong by design, can be on the side of
> memory, performance, linguistic or even
> typography.

Let me expand on 3 of my points. First, performance == time:

Point 3. You correctly identified a time regression in finding a 
character in a string. I saw that the slowdown was *not* inherent in the 
FSR but had to be a glitch in the code, and reported it on pydev with 
the hope that someone would fix it even if it were not too important in 
real use cases. Someone did.

Point 1. You incorrectly generalized that extreme case. I reported (a 
year ago last September) that the overall stringbench results were about 
the same. I also pointed out that there is an equally non-representative 
extreme case in the opposite direction, and that it would equally be 
wrong of me to use that to claim that FSR is faster. (It turns out that 
this FSR speed advantage *is* inherent in the design.)

Memory: Point 2. A *design goal* of FSR was to save memory relative  to 
UTF-32, which is what you apparently prefer. Your examples show that FSF 
successfully met its design goal. But you call that success, saving 
memory, 'wrong'. On what basis?

You *claim* the FSR is 'wrong by design', but your examples only show 
that is was temporarily wrong in implementation as far as speed and 
correct by design as far as memory goes.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#63427

Fromwxjmfauth@gmail.com
Date2014-01-07 05:34 -0800
Message-ID<2fbf4f89-caaa-4fab-8d7e-ff7ef84029a2@googlegroups.com>
In reply to#63232
Le dimanche 5 janvier 2014 23:14:07 UTC+1, Terry Reedy a écrit :
> On 1/5/2014 9:23 AM, wxjmfauth@gmail.com wrote:
> 
> > Le samedi 4 janvier 2014 23:46:49 UTC+1, Terry Reedy a écrit :
> 
> >> On 1/4/2014 2:10 PM, wxjmfauth@gmail.com wrote:
> 
> >>> And I could add, I *never* saw once one soul, who is
> 
> >>> explaining what I'm doing wrong in the gazillion
> 
> >>> of examples I gave on this list.
> 
> 
> 
> >> If this is true, it is because you have ignored and not read my
> 
> >> numerous, relatively polite posts. To repeat very briefly:
> 
> >> 1. Cherry picking (presenting the most extreme case as representative).
> 
> >> 2. Calling space saving a problem (repeatedly).
> 
>  >> 3. Ignoring bug fixes.
> 
> ...
> 
> 
> 
> > My examples are ONLY ILLUSTRATING, this FSR
> 
> > is wrong by design, can be on the side of
> 
> > memory, performance, linguistic or even
> 
> > typography.
> 
> 
> 
> Let me expand on 3 of my points. First, performance == time:
> 
> 
> 
> Point 3. You correctly identified a time regression in finding a 
> 
> character in a string. I saw that the slowdown was *not* inherent in the 
> 
> FSR but had to be a glitch in the code, and reported it on pydev with 
> 
> the hope that someone would fix it even if it were not too important in 
> 
> real use cases. Someone did.
> 
> 
> 
> Point 1. You incorrectly generalized that extreme case. I reported (a 
> 
> year ago last September) that the overall stringbench results were about 
> 
> the same. I also pointed out that there is an equally non-representative 
> 
> extreme case in the opposite direction, and that it would equally be 
> 
> wrong of me to use that to claim that FSR is faster. (It turns out that 
> 
> this FSR speed advantage *is* inherent in the design.)
> 
> 
> 
> Memory: Point 2. A *design goal* of FSR was to save memory relative  to 
> 
> UTF-32, which is what you apparently prefer. Your examples show that FSF 
> 
> successfully met its design goal. But you call that success, saving 
> 
> memory, 'wrong'. On what basis?
> 
> 
> 
> You *claim* the FSR is 'wrong by design', but your examples only show 
> 
> that is was temporarily wrong in implementation as far as speed and 
> 
> correct by design as far as memory goes.
> 
> 

Point 3: You are right. I'm very happy to agree.

Point 2: This Flexible String Representation does no
"effectuate" any memory optimization. It only succeeds
to do the opposite of what a corrrect usage of utf*
do.

Ned : this has already been explained and illustrated.

jmf

[toc] | [prev] | [next] | [standalone]


#63429

FromTerry Reedy <tjreedy@udel.edu>
Date2014-01-07 09:54 -0500
Message-ID<mailman.5134.1389106482.18130.python-list@python.org>
In reply to#63427
On 1/7/2014 8:34 AM, wxjmfauth@gmail.com wrote:
> Le dimanche 5 janvier 2014 23:14:07 UTC+1, Terry Reedy a écrit :

>> Memory: Point 2. A *design goal* of FSR was to save memory relative  to
>> UTF-32, which is what you apparently prefer. Your examples show that FSF
>> successfully met its design goal. But you call that success, saving
>> memory, 'wrong'. On what basis?

> Point 2: This Flexible String Representation does no
> "effectuate" any memory optimization. It only succeeds
> to do the opposite of what a corrrect usage of utf*
> do.

Since the FSF *was* successful in saving memory, and indeed shrank the 
Python binary by about a megabyte, I have no idea what you mean.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#63451

FromTim Delaney <timothy.c.delaney@gmail.com>
Date2014-01-08 09:38 +1100
Message-ID<mailman.5152.1389134341.18130.python-list@python.org>
In reply to#63427

[Multipart message — attachments visible in raw view] — view raw

On 8 January 2014 00:34, <wxjmfauth@gmail.com> wrote:

>
> Point 2: This Flexible String Representation does no
> "effectuate" any memory optimization. It only succeeds
> to do the opposite of what a corrrect usage of utf*
> do.
>

UTF-8 is a variable-width encoding that uses less memory to encode code
points with lower numerical values, on a per-character basis e.g. if a code
point <= U+007F it will use a single byte to encode; if <= U+07FF two bytes
will be used; ... up to a maximum of 6 bytes for code points >= U+4000000.

FSR is a variable-width memory structure that uses the width of the code
point with the highest numerical value in the string e.g. if all code
points in the string are <= U+00FF a single byte will be used per
character; if all code points are <= U+FFFF two bytes will be used per
character; and in all other cases 4 bytes will be used per character.

In terms of memory usage the difference is that UTF-8 varies its width
per-character, whereas the FSR varies its width per-string. For any
particular string, UTF-8 may well result in using less memory than the FSR,
but in other (quite common) cases the FSR will use less memory than UTF-8
e.g. if the string contains only contains code points <= U+00FF, but some
are between U+0080 and U+00FF (inclusive).

In most cases the FSR uses the same or less memory than earlier versions of
Python 3 and correctly handles all code points (just like UTF-8). In the
cases where the FSR uses more memory than previously, the previous
behaviour was incorrect.

No matter which representation is used, there will be a certain amount of
overhead (which is the majority of what most of your examples have shown).
Here are examples which demonstrate cases where UTF-8 uses less memory,
cases where the FSR uses less memory, and cases where they use the same
amount of memory (accounting for the minimum amount of overhead required
for each).

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64
bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>>
>>> fsr = u""
>>> utf8 = fsr.encode("utf-8")
>>> min_fsr_overhead = sys.getsizeof(fsr)
>>> min_utf8_overhead = sys.getsizeof(utf8)
>>> min_fsr_overhead
49
>>> min_utf8_overhead
33
>>>
>>> fsr = u"\u0001" * 1000
>>> utf8 = fsr.encode("utf-8")
>>> sys.getsizeof(fsr) - min_fsr_overhead
1000
>>> sys.getsizeof(utf8) - min_utf8_overhead
1000
>>>
>>> fsr = u"\u0081" * 1000
>>> utf8 = fsr.encode("utf-8")
>>> sys.getsizeof(fsr) - min_fsr_overhead
1024
>>> sys.getsizeof(utf8) - min_utf8_overhead
2000
>>>
>>> fsr = u"\u0001\u0081" * 1000
>>> utf8 = fsr.encode("utf-8")
>>> sys.getsizeof(fsr) - min_fsr_overhead
2024
>>> sys.getsizeof(utf8) - min_utf8_overhead
3000
>>>
>>> fsr = u"\u0101" * 1000
>>> utf8 = fsr.encode("utf-8")
>>> sys.getsizeof(fsr) - min_fsr_overhead
2025
>>> sys.getsizeof(utf8) - min_utf8_overhead
2000
>>>
>>> fsr = u"\u0101\u0081" * 1000
>>> utf8 = fsr.encode("utf-8")
>>> sys.getsizeof(fsr) - min_fsr_overhead
4025
>>> sys.getsizeof(utf8) - min_utf8_overhead
4000

Indexing a character in UTF-8 is O(N) - you have to traverse the the string
up to the character being indexed. Indexing a character in the FSR is O(1).
In all cases the FSR has better performance characteristics for indexing
and slicing than UTF-8.

There are tradeoffs with both UTF-8 and the FSR. The Python developers
decided the priorities for Unicode handling in Python were:

1. Correctness
  a. all code points must be handled correctly;
  b.  it must not be possible to obtain part of a code point (e.g. the
first byte only of a multi-byte code point);

2. No change in the Big O characteristics of string operations e.g.
indexing must remain O(1);

3. Reduced memory use in most cases.

It is impossible for UTF-8 to meet both criteria 1b and 2 without
additional auxiliary data (which uses more memory and increases complexity
of the implementation). The FSR meets all 3 criteria.

Tim Delaney

[toc] | [prev] | [next] | [standalone]


#63453

FromTerry Reedy <tjreedy@udel.edu>
Date2014-01-07 19:02 -0500
Message-ID<mailman.5153.1389139359.18130.python-list@python.org>
In reply to#63427
On 1/7/2014 9:54 AM, Terry Reedy wrote:
> On 1/7/2014 8:34 AM, wxjmfauth@gmail.com wrote:
>> Le dimanche 5 janvier 2014 23:14:07 UTC+1, Terry Reedy a écrit :
>
>>> Memory: Point 2. A *design goal* of FSR was to save memory relative  to
>>> UTF-32, which is what you apparently prefer. Your examples show that FSF
>>> successfully met its design goal. But you call that success, saving
>>> memory, 'wrong'. On what basis?
>
>> Point 2: This Flexible String Representation does no
>> "effectuate" any memory optimization. It only succeeds
>> to do the opposite of what a corrrect usage of utf*
>> do.
>
> Since the FSF *was* successful in saving memory, and indeed shrank the
> Python binary by about a megabyte, I have no idea what you mean.

Tim Delaney apparently did, and answered on the basis of his 
understanding. Note that I said that the design goal was 'save memory 
RELATIVE TO UTF-32', not 'optimize memory'. UTF-8 was not considered an 
option. Nor was any form of arithmetic coding
https://en.wikipedia.org/wiki/Arithmetic_coding
to truly 'optimize memory'.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#63466

Fromwxjmfauth@gmail.com
Date2014-01-08 01:59 -0800
Message-ID<cd28325a-7c02-43be-b94c-fa29a20acf52@googlegroups.com>
In reply to#63453
Le mercredi 8 janvier 2014 01:02:22 UTC+1, Terry Reedy a écrit :
> On 1/7/2014 9:54 AM, Terry Reedy wrote:
> 
> > On 1/7/2014 8:34 AM, wxjmfauth@gmail.com wrote:
> 
> >> Le dimanche 5 janvier 2014 23:14:07 UTC+1, Terry Reedy a écrit :
> 
> >
> 
> >>> Memory: Point 2. A *design goal* of FSR was to save memory relative  to
> 
> >>> UTF-32, which is what you apparently prefer. Your examples show that FSF
> 
> >>> successfully met its design goal. But you call that success, saving
> 
> >>> memory, 'wrong'. On what basis?
> 
> >
> 
> >> Point 2: This Flexible String Representation does no
> 
> >> "effectuate" any memory optimization. It only succeeds
> 
> >> to do the opposite of what a corrrect usage of utf*
> 
> >> do.
> 
> >
> 
> > Since the FSF *was* successful in saving memory, and indeed shrank the
> 
> > Python binary by about a megabyte, I have no idea what you mean.
> 
> 
> 
> Tim Delaney apparently did, and answered on the basis of his 
> 
> understanding. Note that I said that the design goal was 'save memory 
> 
> RELATIVE TO UTF-32', not 'optimize memory'. UTF-8 was not considered an 
> 
> option. Nor was any form of arithmetic coding
> 
> https://en.wikipedia.org/wiki/Arithmetic_coding
> 
> to truly 'optimize memory'.
> 
> 


The FSR acts more as an coding scheme selector than
as a code point optimizer.

Claiming that it saves memory is some kind of illusion;
a little bit as saying "Py2.7 uses "relatively" less memory than
Py3.2 (UCS-2)".

>>> sys.getsizeof('a' * 10000 + 'z')
10026
>>> sys.getsizeof('a' * 10000 + '€')
20040
>>> sys.getsizeof('a' * 10000 + '\U00010000')
40044
>>> sys.getsizeof('€' * 10000 + '€')
20040
>>> sys.getsizeof('€' * 10000 + '\U00010000')
40044
>>> sys.getsizeof('\U00010000' * 10000 + '\U00010000')
40044

jmf

[toc] | [prev] | [next] | [standalone]


#63510

FromTerry Reedy <tjreedy@udel.edu>
Date2014-01-08 14:26 -0500
Message-ID<mailman.5192.1389209238.18130.python-list@python.org>
In reply to#63466
On 1/8/2014 4:59 AM, wxjmfauth@gmail.com wrote:
[responding to me]
> The FSR acts more as an coding scheme selector

That is what PEP 393 describes and what I and many others have said. The 
FSR saves memory by selecting from three choices the most compact coding 
scheme for each string.

I ask again, have you read PEP 393? If you are going to critique the 
FSR, you should read its basic document.

 > than as a code point optimizer.

I do not know what you mean by 'code point optimizer'.

> Claiming that it saves memory is some kind of illusion;

Do you really think that the mathematical fact "10026 < 20040 < 40044" 
(from your example below) is some kind of illusion? If so, please take 
your claim to a metaphysics list. If not, please stop trolling.

> a little bit as saying "Py2.7 uses "relatively" less memory than
> Py3.2 (UCS-2)".

This is inane as 2.7 and 3.2 both use the same two coding schemes. 
Saying '1 < 2' is different from saying '2 < 2'.

On 3.3+
>>>> sys.getsizeof('a' * 10000 + 'z')
> 10026
>>>> sys.getsizeof('a' * 10000 + '€')
> 20040
>>>> sys.getsizeof('a' * 10000 + '\U00010000')
> 40044

3.2- wide (UCS-4) builds use about 40050 bytes for all three unicode 
strings. One again, you have posted examples that show how FSR saves 
memory, thus negating your denial of the saving.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#63514

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2014-01-08 20:04 +0000
Message-ID<mailman.5195.1389211486.18130.python-list@python.org>
In reply to#63427
On 07/01/2014 13:34, wxjmfauth@gmail.com wrote:
> Le dimanche 5 janvier 2014 23:14:07 UTC+1, Terry Reedy a écrit :
>
> Ned : this has already been explained and illustrated.
>
> jmf
>

This has never been explained and illustrated.  Roughly 30 minutes ago 
Terry Reedy once again completely shot your argument about memory usage 
to pieces.  You did not bother to respond to the comments from Tim 
Delaney made almost one day ago.  Please give up.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


Page 2 of 5 — ← Prev page 1 [2] 3 4 5  Next page →

Back to top | Article view | comp.lang.python


csiph-web