Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #102372

Re: psss...I want to move from Perl to Python

From Chris Angelico <rosuav@gmail.com>
Newsgroups comp.lang.python
Subject Re: psss...I want to move from Perl to Python
Date 2016-02-01 10:48 +1100
Message-ID <mailman.176.1454284124.2338.python-list@python.org> (permalink)
References <n8ea4q$muu$1@gioia.aioe.org> <mailman.69.1454027286.2338.python-list@python.org> <2a8dc773-87a1-4ffd-8b8f-a77f2f6ff693@googlegroups.com> <n8m258$1rs2$1@gioia.aioe.org>

Show all headers | View raw


On Mon, Feb 1, 2016 at 9:34 AM, Fillmore <fillmore_remove@hotmail.com> wrote:
> On 01/30/2016 05:26 AM, wxjmfauth@gmail.com wrote:
>
>>> Python 2 vs python 3 is anything but "solved".
>>
>>
>>
>> Python 3.5.1 is still suffering from the same buggy
>> behaviour as in Python 3.0 .
>
>
>
> Can you elaborate?

This is jmf. His posts are suppressed from the mailing list, because
the only thing he ever says is that Python 3's "Unicode by default"
behaviour is fundamentally and mathematically wrong, on the basis of
microbenchmarks showing a performance regression compared to his
beloved - and buggy - narrow build of Python 2.7. (I'm not certain,
but I think the regression might even have been fixed now. Or maybe he
has other regressions to moan about.)

Here's a facts-only summary of Unicode handling in several different
CPython [0] builds.

* Python 2.7 comes in two flavours, selected at compile time. A "Wide"
build is the default on Unix-like platforms, and it uses 32-bit
Unicode characters. In other words, the string b"abc" takes up three
bytes, but the string u"abc" takes up twelve. [1] These builds are
perfectly consistent; a Unicode character *always* takes exactly 4
bytes, and indexing and subscripting are perfectly correct.

* A "Narrow" build of Python 2.7 (the default on Windows) uses 16-bit
Unicode characters. The string b"abc" still takes up three bytes, but
u"abc" takes only six - however, the same string with three astral
characters would take up twelve bytes. These builds are thus
inconsistent, but potentially more efficient - a thousand BMP
characters followed by a single SMP character would take up only 2004
bytes, rather than 4004 as a wide build would use.

* Starting with Python 3.0, a default quoted string is a Unicode
string. That doesn't change anything about these considerations, but
it does mean that "abc" suddenly takes up a lot more room than it used
to (because it's now equivalent to u"abc" rather than b"abc").

* Python 3.3 introduced a new "Flexible String Representation", which
you can read about in detail in PEP 393. Strings are now stored as
compactly as possible; u"Hello!" (all ASCII) takes up six bytes,
u"¡Hola!" (Latin-1) also takes up six bytes, u"Привет" (Basic
Multilingual Plane) takes up twelve, and u"Hi! 😀😁" (or u"Hi!
\U0001f600\U0001f601" if your mailer doesn't have those characters)
takes up twenty-four. Each string has a length of 6, as given by
len(x), but takes up differing amounts of space according to actual
needs.


The issue jmf has is with the way the FSR has to "widen" a string. If
you take a megabyte of all-ASCII text (stored one byte per character)
and append one astral character to it, the resulting string has to be
stored four bytes per character, even for the ASCII ones. This is to
make sure that indexing and slicing work correctly and efficiently,
but it does come at a cost - it takes time to copy all those
characters into the new wider string. On microbenchmarks doing exactly
this, it's clear that Python 3 is paying a price. But has it truly
suffered?

rosuav@sikorsky:~$ python -m timeit -s "s=u'a'*1048576" "len(s+u'\U0001f600')"
10000 loops, best of 3: 197 usec per loop
rosuav@sikorsky:~$ python3 -m timeit -s "s=u'a'*1048576" "len(s+u'\U0001f600')"
10000 loops, best of 3: 148 usec per loop
rosuav@sikorsky:~$ python -m timeit -s "s=u'a'*1048576" "len(s+u'b')"
10000 loops, best of 3: 187 usec per loop
rosuav@sikorsky:~$ python3 -m timeit -s "s=u'a'*1048576" "len(s+u'b')"
10000 loops, best of 3: 31.6 usec per loop
rosuav@sikorsky:~$ python -c 'import sys; print(sys.version)'
2.7.11 (default, Jan 11 2016, 21:04:40)
[GCC 5.3.1 20160101]
rosuav@sikorsky:~$ python3 -c 'import sys; print(sys.version)'
3.6.0a0 (default:5452e4b5c007, Feb  1 2016, 07:28:50)
[GCC 5.3.1 20160121]

The other consideration is that, *on Windows only*, this operation
takes more memory under 3.6 than under 2.7, because 2.7 will keep
storing the 'a' in 16 bits and then just slap a two-code-unit smiley
to the end; but on the flip side, 3.6 has been storing that all-ASCII
string in *8* bits per character. Most of your programs will be full
of ASCII strings - remember, all your variable names are string keys
into some dictionary [2], and every time you call up a built-in
function or standard library module, you'll be using an ASCII-only
name to reference it. Halving their storage space makes a significant
difference; and doubling the size of a very few strings in a very few
programs is worth the correctness we gain by not having to worry about
string index bugs.

So in summary: Take no notice of jmf; he's a crank.

ChrisA

[0] Other Python implementations may be very different, but it's
CPython that most people are looking at.
[1] If you use sys.getsizeof() on these strings, you'll find that they
actually take up a lot more space than I'm talking about. That's
because there's overheads on string objects, which dominate tiny
strings. But for large strings, where the performance difference
actually matters, the storage space of the characters themselves
dominates the overhead.
[2] Local names in functions might get compiled out and replaced with
numeric slot indices. But module-level names, names of built-ins,
attribute names, etc, are all stored in the code as actual strings.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

psss...I want to move from Perl to Python Fillmore <fillmore_remove@hotmail.com> - 2016-01-28 19:01 -0500
  Re: psss...I want to move from Perl to Python paul.hermeneutic@gmail.com - 2016-01-28 17:22 -0700
  Re: psss...I want to move from Perl to Python Nathan Hilterbrand <nhilterbrand@gmail.com> - 2016-01-28 19:21 -0500
    Re: psss...I want to move from Perl to Python Rick Johnson <rantingrickjohnson@gmail.com> - 2016-01-28 19:23 -0800
      Re: psss...I want to move from Perl to Python Chris Angelico <rosuav@gmail.com> - 2016-01-29 14:26 +1100
    Re: psss...I want to move from Perl to Python wxjmfauth@gmail.com - 2016-01-30 02:26 -0800
      Re: psss...I want to move from Perl to Python Fillmore <fillmore_remove@hotmail.com> - 2016-01-31 17:34 -0500
        Re: psss...I want to move from Perl to Python Michael Torrie <torriem@gmail.com> - 2016-01-31 16:45 -0700
        Re: psss...I want to move from Perl to Python Chris Angelico <rosuav@gmail.com> - 2016-02-01 10:48 +1100
        Re: psss...I want to move from Perl to Python Terry Reedy <tjreedy@udel.edu> - 2016-01-31 18:51 -0500
          Re: psss...I want to move from Perl to Python Rustom Mody <rustompmody@gmail.com> - 2016-01-31 18:59 -0800
            Re: psss...I want to move from Perl to Python Chris Angelico <rosuav@gmail.com> - 2016-02-01 14:15 +1100
              Re: psss...I want to move from Perl to Python Rustom Mody <rustompmody@gmail.com> - 2016-01-31 19:43 -0800
              Re: psss...I want to move from Perl to Python Rick Johnson <rantingrickjohnson@gmail.com> - 2016-02-02 14:53 -0800
  Re: psss...I want to move from Perl to Python Chris Angelico <rosuav@gmail.com> - 2016-01-29 11:25 +1100
    Re: psss...I want to move from Perl to Python Steven D'Aprano <steve@pearwood.info> - 2016-01-29 18:12 +1100
      Re: psss...I want to move from Perl to Python Chris Angelico <rosuav@gmail.com> - 2016-01-29 18:19 +1100
      Re: psss...I want to move from Perl to Python Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2016-01-29 09:12 +0000
        Re: psss...I want to move from Perl to Python James Harris <james.harris.1@gmail.com> - 2016-01-29 11:03 +0000
          Re: psss...I want to move from Perl to Python Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2016-01-29 12:21 +0000
            Re: psss...I want to move from Perl to Python Steven D'Aprano <steve@pearwood.info> - 2016-01-30 00:46 +1100
              Re: psss...I want to move from Perl to Python Ben Finney <ben+python@benfinney.id.au> - 2016-01-30 09:47 +1100
                Re: psss...I want to move from Perl to Python Steven D'Aprano <steve@pearwood.info> - 2016-01-30 22:12 +1100
            Re: psss...I want to move from Perl to Python Rick Johnson <rantingrickjohnson@gmail.com> - 2016-01-29 13:30 -0800
              Re: psss...I want to move from Perl to Python Fillmore <fillmore_remove@hotmail.com> - 2016-01-29 19:57 -0500
                Re: psss...I want to move from Perl to Python Ben Finney <ben+python@benfinney.id.au> - 2016-01-30 12:04 +1100
              Re: psss...I want to move from Perl to Python Rustom Mody <rustompmody@gmail.com> - 2016-01-29 19:38 -0800
                Re: psss...I want to move from Perl to Python Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-01-31 11:18 +1300
                Re: psss...I want to move from Perl to Python Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-01-31 12:56 +1100
                Re: psss...I want to move from Perl to Python Rustom Mody <rustompmody@gmail.com> - 2016-01-30 19:22 -0800
                Re: psss...I want to move from Perl to Python Cameron Simpson <cs@zip.com.au> - 2016-01-31 14:48 +1100
                Re: psss...I want to move from Perl to Python Rustom Mody <rustompmody@gmail.com> - 2016-01-30 19:53 -0800
                Re: psss...I want to move from Perl to Python Cameron Simpson <cs@zip.com.au> - 2016-01-31 20:56 +1100
                Re: psss...I want to move from Perl to Python Paul Rubin <no.email@nospam.invalid> - 2016-01-31 09:45 -0800
                Re: psss...I want to move from Perl to Python Rustom Mody <rustompmody@gmail.com> - 2016-01-31 18:08 -0800
                Re: psss...I want to move from Perl to Python Steven D'Aprano <steve@pearwood.info> - 2016-01-31 20:23 +1100
                Re: psss...I want to move from Perl to Python Cameron Simpson <cs@zip.com.au> - 2016-01-31 20:53 +1100
                Re: psss...I want to move from Perl to Python Paul Rubin <no.email@nospam.invalid> - 2016-01-31 09:49 -0800
                Re: psss...I want to move from Perl to Python Cameron Simpson <cs@zip.com.au> - 2016-02-01 08:16 +1100
                Re: psss...I want to move from Perl to Python Terry Reedy <tjreedy@udel.edu> - 2016-01-31 07:28 -0500
                Re: psss...I want to move from Perl to Python Rustom Mody <rustompmody@gmail.com> - 2016-01-31 05:12 -0800
                Re: psss...I want to move from Perl to Python Rick Johnson <rantingrickjohnson@gmail.com> - 2016-01-30 14:25 -0800
              Re: psss...I want to move from Perl to Python Rustom Mody <rustompmody@gmail.com> - 2016-01-29 20:04 -0800
        Re: psss...I want to move from Perl to Python Random832 <random832@fastmail.com> - 2016-01-29 10:07 -0500
          Re: psss...I want to move from Perl to Python Fillmore <fillmore_remove@hotmail.com> - 2016-01-29 12:56 -0500
          Re: psss...I want to move from Perl to Python Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-01-30 11:21 +1300
        Re: psss...I want to move from Perl to Python sohcahtoa82@gmail.com - 2016-01-29 12:49 -0800
          Re: psss...I want to move from Perl to Python Rick Johnson <rantingrickjohnson@gmail.com> - 2016-01-29 14:29 -0800
      Re: psss...I want to move from Perl to Python Cameron Simpson <cs@zip.com.au> - 2016-01-30 10:48 +1100
  Re: psss...I want to move from Perl to Python Cameron Simpson <cs@zip.com.au> - 2016-01-29 12:20 +1100
  Re: psss...I want to move from Perl to Python Paul Rubin <no.email@nospam.invalid> - 2016-01-28 18:06 -0800
    Re: psss...I want to move from Perl to Python Chris Angelico <rosuav@gmail.com> - 2016-01-29 13:34 +1100
      Re: psss...I want to move from Perl to Python sohcahtoa82@gmail.com - 2016-01-29 12:41 -0800
        Re: psss...I want to move from Perl to Python Chris Angelico <rosuav@gmail.com> - 2016-01-30 07:54 +1100
        Re: psss...I want to move from Perl to Python Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-01-30 11:27 +1300
    Re: psss...I want to move from Perl to Python Michael Torrie <torriem@gmail.com> - 2016-01-28 21:42 -0700
    Re: psss...I want to move from Perl to Python Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-01-29 08:54 -0500
  Re: psss...I want to move from Perl to Python Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-01-29 08:24 +0200
  Re: psss...I want to move from Perl to Python Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-01-29 08:50 -0500
  Re: psss...I want to move from Perl to Python Fillmore <fillmore_remove@hotmail.com> - 2016-01-29 12:04 -0500
  Re: psss...I want to move from Perl to Python "Sven R. Kunze" <srkunze@mail.de> - 2016-01-29 18:39 +0100
  Re: psss...I want to move from Perl to Python Fillmore <fillmore_remove@hotmail.com> - 2016-01-29 15:42 -0500
    Re: psss...I want to move from Perl to Python Chris Angelico <rosuav@gmail.com> - 2016-01-30 07:51 +1100
    Re: psss...I want to move from Perl to Python Nathan Hilterbrand <nhilterbrand@gmail.com> - 2016-01-29 16:38 -0500
    Re: psss...I want to move from Perl to Python Cody Piersall <cody.piersall@gmail.com> - 2016-01-29 15:50 -0600
      Re: psss...I want to move from Perl to Python Josef Pktd <josef.pktd@gmail.com> - 2016-01-29 16:48 -0800
    Re: psss...I want to move from Perl to Python Terry Reedy <tjreedy@udel.edu> - 2016-01-29 22:08 -0500
    Re: psss...I want to move from Perl to Python Chris Angelico <rosuav@gmail.com> - 2016-01-30 14:12 +1100
  Re: psss...I want to move from Perl to Python Ben Finney <ben+python@benfinney.id.au> - 2016-01-30 09:49 +1100
  Re: psss...I want to move from Perl to Python Larry Hudson <orgnut@yahoo.com> - 2016-01-29 22:22 -0800
  Re: psss...I want to move from Perl to Python "Sven R. Kunze" <srkunze@mail.de> - 2016-01-30 13:43 +0100
  Re: psss...I want to move from Perl to Python <paul.hermeneutic@gmail.com> - 2016-02-02 10:00 -0700
  Re: psss...I want to move from Perl to Python "Martin A. Brown" <martin@linux-ip.net> - 2016-02-02 13:04 -0800
  Re: psss...I want to move from Perl to Python Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-02-02 19:36 -0500

csiph-web