Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #72340 > unrolled thread

Python 3.2 has some deadly infection

Started byMark Lawrence <breamoreboy@yahoo.co.uk>
First post2014-05-31 17:10 +0100
Last post2014-06-03 14:22 -0400
Articles 12 on this page of 92 — 19 participants

Back to article view | Back to comp.lang.python


Contents

  Python 3.2 has some deadly infection Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-31 17:10 +0100
    Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-05-31 22:55 +0300
    Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-01 02:26 +0000
      Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-01 12:43 +1000
      Re: Python 3.2 has some deadly infection Tim Delaney <timothy.c.delaney@gmail.com> - 2014-06-02 08:54 +1000
        Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-02 01:14 +0000
          Re: Python 3.2 has some deadly infection Tim Delaney <timothy.c.delaney@gmail.com> - 2014-06-02 12:23 +1000
            Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-01 19:46 -0700
          Re: Python 3.2 has some deadly infection Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2014-06-02 07:45 +0000
          Re: Python 3.2 has some deadly infection Tim Delaney <timothy.c.delaney@gmail.com> - 2014-06-02 19:02 +1000
          Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-02 19:14 +1000
          Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-02 12:10 +0100
            Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-03 16:34 +0000
              Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-04 02:43 +1000
          Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-02 17:34 -0400
            Re: Python 3.2 has some deadly infection Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-06-03 17:16 +1200
              Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-03 02:21 -0400
              Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-03 15:18 +0100
                Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-04 13:08 +0000
                  Re: Python 3.2 has some deadly infection Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-06-05 14:01 +1200
                    Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 10:16 +0300
                      Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-05 17:30 +1000
                        Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 11:05 +0300
                          Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-05 18:36 +1000
                            Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 12:53 +0300
                              Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-05 05:43 -0700
                              Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 14:50 -0400
                                Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 23:21 +0300
                                  Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 18:09 -0400
                                  Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 23:13 +0000
                                    Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 02:30 +0300
                                      Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 09:39 +1000
                                      Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 22:08 -0400
                                      Re: Python 3.2 has some deadly infection Ethan Furman <ethan@stoneleaf.us> - 2014-06-05 20:47 -0700
                    Re: Python 3.2 has some deadly infection Steven D'Aprano <steve@pearwood.info> - 2014-06-05 08:34 +0000
                      Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 12:41 +0300
                        Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 06:37 -0700
                          Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 17:45 +0300
                            Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 15:33 +0000
                              Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 02:12 +1000
                                Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 09:54 -0700
                                  Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 03:36 +1000
                              Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 19:52 +0300
                                Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 03:28 +1000
                                  Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 15:35 -0700
                                    Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 08:52 +1000
                                      Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 20:11 -0700
                                        Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 13:20 +1000
                                          Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 20:32 -0700
                                Re: Python 3.2 has some deadly infection Akira Li <4kir4.1i@gmail.com> - 2014-06-06 12:03 +0400
                            Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-05 16:37 +0100
                              Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 16:16 +0000
                            Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 01:50 +1000
                            Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-05 17:17 +0100
                              Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 16:32 +0000
                                Re: Python 3.2 has some deadly infection Ethan Furman <ethan@stoneleaf.us> - 2014-06-06 07:40 -0700
                            Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 03:14 +1000
                            Re: Python 3.2 has some deadly infection Ian Kelly <ian.g.kelly@gmail.com> - 2014-06-05 11:16 -0600
                            Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 14:11 -0400
                              Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 21:30 +0300
                                Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 23:02 +0000
                                  Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 02:21 +0300
                                    Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-06 12:15 +0000
                                      Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 16:00 +0300
                                  Re: Python 3.2 has some deadly infection rurpy@yahoo.com - 2014-06-07 21:34 -0700
                                Re: Python 3.2 has some deadly infection Ethan Furman <ethan@stoneleaf.us> - 2014-06-06 06:24 -0700
                                  Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 17:10 +0300
                                    Re: Python 3.2 has some deadly infection Michael Torrie <torriem@gmail.com> - 2014-06-06 09:02 -0600
                                      Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 18:32 +0300
                                        Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 01:50 +1000
                                          Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 20:02 +0300
                                            Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-06 10:13 -0700
                                              Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 03:26 +1000
                                          Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-06 11:03 -0700
                                          Re: Python 3.2 has some deadly infection Denis McMahon <denismfmcmahon@gmail.com> - 2014-06-06 21:18 +0000
                                            Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 08:18 +1000
                                        Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-06 15:57 +0000
                                          Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-06 09:21 -0700
                                            Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 02:48 +1000
                                              Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-06 10:04 -0700
                                                Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 03:12 +1000
                                          Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 20:11 +0300
                                            Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 03:16 +1000
                                            Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 20:18 +0300
                                            Re: Python 3.2 has some deadly infection Ned Batchelder <ned@nedbatchelder.com> - 2014-06-06 13:33 -0400
                                Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 01:25 +1000
                                  Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-06 08:44 -0700
                                    Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-06 08:48 -0700
                            Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-06 12:56 +0100
                  Re: Python 3.2 has some deadly infection Akira Li <4kir4.1i@gmail.com> - 2014-06-05 06:49 +0400
              Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-04 00:25 +1000
              Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-03 14:22 -0400

Page 5 of 5 — ← Prev page 1 2 3 4 [5]


#72869

FromChris Angelico <rosuav@gmail.com>
Date2014-06-07 03:12 +1000
Message-ID<mailman.10825.1402075210.18130.python-list@python.org>
In reply to#72863
On Sat, Jun 7, 2014 at 3:04 AM, Rustom Mody <rustompmody@gmail.com> wrote:
>> ASCII was once your one companion, it was all that mattered. ASCII was
>> once a friendly encoding, then your world was shattered. Wishing it
>> were somehow here again, wishing it were somehow near... sometimes it
>> seemed, if you just dreamed, somehow it would be here! Wishing you
>> could use just bytes again, knowing that you never would... dreaming
>> of it won't help you to do all that you dream you could!
>
>> It's time to stop chasing the phantom and start living in the Raoul
>> world... err, the real world. :)
>
> I thought that "If only bytes were 21+ bits wide" would sound sufficiently
> nonsensical, that I did not need to explicitly qualify it as a utopian dream!

Humour never dies!

ChrisA
(In case it's not obvious, by the way, everything I said above is a
reference to the Phantom of the Opera.)

[toc] | [prev] | [next] | [standalone]


#72865

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-06-06 20:11 +0300
Message-ID<871tv255g9.fsf@elektro.pacujo.net>
In reply to#72858
Steven D'Aprano <steve+comp.lang.python@pearwood.info>:

> On Fri, 06 Jun 2014 18:32:39 +0300, Marko Rauhamaa wrote:
>> Unicode, like ASCII, is a code. Representing text in unicode is
>> encoding.
>
> A Unicode string as an abstract data type has no encoding.

Unicode itself is an encoding. See it in action here:

    72 101 108 108 111 44 32 119 111 114 108 100

> It is a Platonic ideal, a pure form like the real numbers.

Far from it. It is a mapping from symbols to integers. The symbols are
the Platonic ones.

The Unicode/ASCII encoding above represents the same "Platonic" string
as this ESCDIC one:

    212 133 147 147 150 107 64 166 150 153 137 132

> Unicode string like this:
>
> s = u"NOBODY expects the Spanish Inquisition!"
>
> should not be thought of as a bunch of bytes in some encoding,

Encoding is not tied to bytes or even computers. People can speak in
code, after all.


Marko

[toc] | [prev] | [next] | [standalone]


#72867

FromChris Angelico <rosuav@gmail.com>
Date2014-06-07 03:16 +1000
Message-ID<mailman.10824.1402075026.18130.python-list@python.org>
In reply to#72865
On Sat, Jun 7, 2014 at 3:11 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Encoding is not tied to bytes or even computers. People can speak in
> code, after all.

Obligatory: http://xkcd.com/257/

ChrisA

[toc] | [prev] | [next] | [standalone]


#72868

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-06-06 20:18 +0300
Message-ID<87tx7y3qiz.fsf@elektro.pacujo.net>
In reply to#72865
Marko Rauhamaa <marko@pacujo.net>:

> Far from it. It is a mapping from symbols to integers. The symbols are
> the Platonic ones.

Well, of course, even the symbols are a code. Letters code sounds and
digits code numbers.

And the sounds and numbers code ideas. Now we are getting close to being
truly Platonic.


Marko

[toc] | [prev] | [next] | [standalone]


#72873

FromNed Batchelder <ned@nedbatchelder.com>
Date2014-06-06 13:33 -0400
Message-ID<mailman.10827.1402076054.18130.python-list@python.org>
In reply to#72865
On 6/6/14 1:11 PM, Marko Rauhamaa wrote:
> Steven D'Aprano <steve+comp.lang.python@pearwood.info>:
>
>> On Fri, 06 Jun 2014 18:32:39 +0300, Marko Rauhamaa wrote:
>>> Unicode, like ASCII, is a code. Representing text in unicode is
>>> encoding.
>>
>> A Unicode string as an abstract data type has no encoding.
>
> Unicode itself is an encoding. See it in action here:
>
>      72 101 108 108 111 44 32 119 111 114 108 100
>
>> It is a Platonic ideal, a pure form like the real numbers.
>
> Far from it. It is a mapping from symbols to integers. The symbols are
> the Platonic ones.
>
> The Unicode/ASCII encoding above represents the same "Platonic" string
> as this ESCDIC one:
>
>      212 133 147 147 150 107 64 166 150 153 137 132
>
>> Unicode string like this:
>>
>> s = u"NOBODY expects the Spanish Inquisition!"
>>
>> should not be thought of as a bunch of bytes in some encoding,
>
> Encoding is not tied to bytes or even computers. People can speak in
> code, after all.
>
>

Marko, you are right about the broader English meaning of the word 
"encoding".  The original point here was that "Unicode text" provides no 
information about what sequence of bytes is at work.

In the Unicode ecosystem, an encoding is a specification of how the text 
will be represented in a byte stream.  Saying something is "Unicode" 
doesn't provide that information.  You have to say, "UTF8" or "UTF16" or 
"UCS2", etc, in order to know how bytes will be involved.

When Ethan said, "a Unicode string, as a data type, has no encoding," he 
meant (as he explained) that a Unicode string doesn't require or imply 
any particular mapping to bytes.

I'm sure you understand this, I'm just trying to clarify the different 
meanings of the word "encoding".


> Marko
>


-- 
Ned Batchelder, http://nedbatchelder.com

[toc] | [prev] | [next] | [standalone]


#72851

FromChris Angelico <rosuav@gmail.com>
Date2014-06-07 01:25 +1000
Message-ID<mailman.10819.1402068349.18130.python-list@python.org>
In reply to#72744
On Fri, Jun 6, 2014 at 11:24 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
> On 06/05/2014 11:30 AM, Marko Rauhamaa wrote:
>>
>>
>> How text is represented is very different from whether text is a
>> fundamental data type. A fundamental text file is such that ordinary
>> operating system facilities can't see inside the black box (that is,
>> they are *not* encoded as far as the applications go).
>
> Of course they are.  It may be an ASCII-encoding of some flavor or other, or
> something really (to me) strange -- but an encoding is most assuredly in
> affect.

Allow me to explain what I think Marko's getting at here.

In most file systems, a file exists on the disk as a set of sectors of
data, plus some metadata including the file's actual size. When you
ask the OS to read you that file, it goes to the disk, reads those
sectors, truncates the data to the real size, and gives you those
bytes.

It's possible to mount a file as a directory, in which case the
physical representation is very different, but the file still appears
the same. In that case, the OS goes reading some part of the file,
maybe decompresses it, and gives it to you. Same difference. These
files still contain bytes.

A "fundamental text file" would be one where, instead of reading and
writing bytes, you read and write Unicode text. Since the hard disk
still works with sectors and bytes, it'll still be stored as such, but
that's an implementation detail; and you could format your disk UTF-8
or UTF-16 or FSR or anything you like, and the only difference you'd
see is performance.

This could certainly be done, in theory. I don't know how well it'd
fit with any of the popular OSes of today, but it could be done. And
these files would not have an encoding; their on-platter
representations would, but that's purely implementation - the text
that you wrote out and the text that you read in are the same text,
and there's been no encoding visible.

ChrisA

[toc] | [prev] | [next] | [standalone]


#72855

Fromwxjmfauth@gmail.com
Date2014-06-06 08:44 -0700
Message-ID<3564caae-9088-41a3-879a-451771ab173b@googlegroups.com>
In reply to#72851
Le vendredi 6 juin 2014 17:25:47 UTC+2, Chris Angelico a écrit :
> On Fri, Jun 6, 2014 at 11:24 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
> 
> > On 06/05/2014 11:30 AM, Marko Rauhamaa wrote:
> 
> >>
> 
> >>
> 
> >> How text is represented is very different from whether text is a
> 
> >> fundamental data type. A fundamental text file is such that ordinary
> 
> >> operating system facilities can't see inside the black box (that is,
> 
> >> they are *not* encoded as far as the applications go).
> 
> >
> 
> > Of course they are.  It may be an ASCII-encoding of some flavor or other, or
> 
> > something really (to me) strange -- but an encoding is most assuredly in
> 
> > affect.
> 
> 
> 
> Allow me to explain what I think Marko's getting at here.
> 
> 
> 
> In most file systems, a file exists on the disk as a set of sectors of
> 
> data, plus some metadata including the file's actual size. When you
> 
> ask the OS to read you that file, it goes to the disk, reads those
> 
> sectors, truncates the data to the real size, and gives you those
> 
> bytes.
> 
> 
> 
> It's possible to mount a file as a directory, in which case the
> 
> physical representation is very different, but the file still appears
> 
> the same. In that case, the OS goes reading some part of the file,
> 
> maybe decompresses it, and gives it to you. Same difference. These
> 
> files still contain bytes.
> 
> 
> 
> A "fundamental text file" would be one where, instead of reading and
> 
> writing bytes, you read and write Unicode text. Since the hard disk
> 
> still works with sectors and bytes, it'll still be stored as such, but
> 
> that's an implementation detail; and you could format your disk UTF-8
> 
> or UTF-16 or FSR or anything you like, and the only difference you'd
> 
> see is performance.
> 
> 
> 
> This could certainly be done, in theory. I don't know how well it'd
> 
> fit with any of the popular OSes of today, but it could be done. And
> 
> these files would not have an encoding; their on-platter
> 
> representations would, but that's purely implementation - the text
> 
> that you wrote out and the text that you read in are the same text,
> 
> and there's been no encoding visible.
> 
> 
----------

From the three, you can already eliminates one.
It's not a good new.

sys.getsizeof('Gödel'.encode('utf-8'))
23
sys.getsizeof('Gödel'.encode('utf-16-le'))
27
sys.getsizeof('Gödel')
42
os.listdir(r'D:\jm\Москва\Zürich\Αθήνα\œdipe')
['a.txt', 'kk.bat', 'kk.cmd', 'kk.py', '__pycache__']
sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe'.encode('utf-8'))
61
sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe'.encode('utf-16-le'))
79
sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe')
100

jmf

[toc] | [prev] | [next] | [standalone]


#72856

Fromwxjmfauth@gmail.com
Date2014-06-06 08:48 -0700
Message-ID<611b507c-1342-4764-9bbb-0583ee40695f@googlegroups.com>
In reply to#72855
Le vendredi 6 juin 2014 17:44:57 UTC+2, wxjm...@gmail.com a écrit :
> Le vendredi 6 juin 2014 17:25:47 UTC+2, Chris Angelico a écrit :
> 
> > On Fri, Jun 6, 2014 at 11:24 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
> 
> > 
> 
> > > On 06/05/2014 11:30 AM, Marko Rauhamaa wrote:
> 
> > 
> 
> > >>
> 
> > 
> 
> > >>
> 
> > 
> 
> > >> How text is represented is very different from whether text is a
> 
> > 
> 
> > >> fundamental data type. A fundamental text file is such that ordinary
> 
> > 
> 
> > >> operating system facilities can't see inside the black box (that is,
> 
> > 
> 
> > >> they are *not* encoded as far as the applications go).
> 
> > 
> 
> > >
> 
> > 
> 
> > > Of course they are.  It may be an ASCII-encoding of some flavor or other, or
> 
> > 
> 
> > > something really (to me) strange -- but an encoding is most assuredly in
> 
> > 
> 
> > > affect.
> 
> > 
> 
> > 
> 
> > 
> 
> > Allow me to explain what I think Marko's getting at here.
> 
> > 
> 
> > 
> 
> > 
> 
> > In most file systems, a file exists on the disk as a set of sectors of
> 
> > 
> 
> > data, plus some metadata including the file's actual size. When you
> 
> > 
> 
> > ask the OS to read you that file, it goes to the disk, reads those
> 
> > 
> 
> > sectors, truncates the data to the real size, and gives you those
> 
> > 
> 
> > bytes.
> 
> > 
> 
> > 
> 
> > 
> 
> > It's possible to mount a file as a directory, in which case the
> 
> > 
> 
> > physical representation is very different, but the file still appears
> 
> > 
> 
> > the same. In that case, the OS goes reading some part of the file,
> 
> > 
> 
> > maybe decompresses it, and gives it to you. Same difference. These
> 
> > 
> 
> > files still contain bytes.
> 
> > 
> 
> > 
> 
> > 
> 
> > A "fundamental text file" would be one where, instead of reading and
> 
> > 
> 
> > writing bytes, you read and write Unicode text. Since the hard disk
> 
> > 
> 
> > still works with sectors and bytes, it'll still be stored as such, but
> 
> > 
> 
> > that's an implementation detail; and you could format your disk UTF-8
> 
> > 
> 
> > or UTF-16 or FSR or anything you like, and the only difference you'd
> 
> > 
> 
> > see is performance.
> 
> > 
> 
> > 
> 
> > 
> 
> > This could certainly be done, in theory. I don't know how well it'd
> 
> > 
> 
> > fit with any of the popular OSes of today, but it could be done. And
> 
> > 
> 
> > these files would not have an encoding; their on-platter
> 
> > 
> 
> > representations would, but that's purely implementation - the text
> 
> > 
> 
> > that you wrote out and the text that you read in are the same text,
> 
> > 
> 
> > and there's been no encoding visible.
> 
> > 
> 
> > 
> 
> ----------
> 
> 
> 
> From the three, you can already eliminates one.
> 
> It's not a good new.
> 
> 
> 
> sys.getsizeof('Gödel'.encode('utf-8'))
> 
> 23
> 
> sys.getsizeof('Gödel'.encode('utf-16-le'))
> 
> 27
> 
> sys.getsizeof('Gödel')
> 
> 42
> 
> os.listdir(r'D:\jm\Москва\Zürich\Αθήνα\œdipe')
> 
> ['a.txt', 'kk.bat', 'kk.cmd', 'kk.py', '__pycache__']
> 
> sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe'.encode('utf-8'))
> 
> 61
> 
> sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe'.encode('utf-16-le'))
> 
> 79
> 
> sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe')
> 
> 100
> 
> 
> 
> jmf

Sorry, wront copy/paste

>>> sys.getsizeof('Gödel'.encode('utf-8'))
23
>>> sys.getsizeof('Gödel'.encode('utf-16-le'))
27
>>> sys.getsizeof('Gödel')
42
>>> os.listdir(r'D:\jm\Москва\Zürich\Αθήνα\œdipe')
['a.txt', 'kk.bat', 'kk.cmd', 'kk.py', '__pycache__']
>>> sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe'.encode('utf-8'))
61
>>> sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe'.encode('utf-16-le'))
79
>>> sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe')
100
>>>

jmf

[toc] | [prev] | [next] | [standalone]


#72838

FromRobin Becker <robin@reportlab.com>
Date2014-06-06 12:56 +0100
Message-ID<mailman.10811.1402055808.18130.python-list@python.org>
In reply to#72708
On 05/06/2014 18:16, Ian Kelly wrote:
.........
>
> How should e.g. bytes.upper() be implemented then?  The correct
> behavior is entirely dependent on the encoding.  Python 2 just assumes
> ASCII, which at best will correctly upper-case some subset of the
> string and leave the rest unchanged, and at worst could corrupt the
> string entirely.  There are some things that were dropped that should
> not have been, but my impression is that those are being worked on,
> for example % formatting in PEP 461.
>
bytes.upper should have done exactly what str.upper in python 2 did; that way we 
could have at least continued to do the wrong thing :)
-- 
Robin Becker

[toc] | [prev] | [next] | [standalone]


#72671

FromAkira Li <4kir4.1i@gmail.com>
Date2014-06-05 06:49 +0400
Message-ID<mailman.10721.1401936565.18130.python-list@python.org>
In reply to#72635
Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:

> On Tue, 03 Jun 2014 15:18:19 +0100, Robin Becker wrote:
>
>> Isn't it a bit old fashioned to think everything is connected to a
>> console?
>
> The whole concept of stdin and stdout is based on the idea of having a 
> console to read from and write to. Otherwise, what would be the point? 
> Classic Mac (pre OS X) had no command line interface nothing, and nothing 
> even remotely like stdin and stdout. But once you have a console, stdin, 
> stdout, and stderr become useful. And once you have them, then you can 
> extend the concept using redirection and pipes. But fundamentally, stdin 
> and stdout are about consoles.
>
We can consider "pipes" abstraction to be fundumental. Decades of usage
prove a pipeline of processes usefulness e.g.,

  tr -cs A-Za-z '\n' |
  tr A-Z a-z |
  sort |
  uniq -c |
  sort -rn |
  sed ${1}q

See http://www.leancrew.com/all-this/2011/12/more-shell-less-egg/

Whether or not a pipe is connected to a tty is a small
detail. stdin/stdout is about pipes, not consoles.


--
akira

[toc] | [prev] | [next] | [standalone]


#72523

FromChris Angelico <rosuav@gmail.com>
Date2014-06-04 00:25 +1000
Message-ID<mailman.10627.1401805540.18130.python-list@python.org>
In reply to#72468
On Wed, Jun 4, 2014 at 12:18 AM, Robin Becker <robin@reportlab.com> wrote:
> I think the idea that we only give meaning to binary data using encodings is
> a bit limiting. A zip or gif file has structure, but I don't think it's
> reasonable to regard such a file as having an encoding in the python unicode
> sense.

Of course it doesn't. Those are binary files. Ultimately, every file
is binary; but since the vast majority of them actually contain text,
in one of a handful of common encodings, it's nice to have an easy way
to open a text file. You could argue that "rb" should be the default,
rather than "rt", but that's a relatively minor point.

ChrisA

[toc] | [prev] | [next] | [standalone]


#72540

FromTerry Reedy <tjreedy@udel.edu>
Date2014-06-03 14:22 -0400
Message-ID<mailman.10638.1401819784.18130.python-list@python.org>
In reply to#72468
On 6/3/2014 10:18 AM, Robin Becker wrote:

> I think the idea that we only give meaning to binary data using
> encodings is a bit limiting.

On the contrary, it is liberating. The fact that bits have no meaning 
other than 'a choice between two alterntives' means
1. any binary choice - 0/1, -/+, false/true, no/yes, closed/open, 
male/female, sad/happy, evil/good, low/high, and so on ad infinitum, can 
be encoded into a bit. Since any such pair could have been reversed, the 
mapping between bit states and the pair is arbitrary, and constitutes an 
encoding.
2. any discret or digitized information that constitutes a choice 
between multiple alternative can be encoded into a sequence of bits.

This crucial discovery is the basis of Shannon's 1947 paper and of the 
information age that started about then.

> A zip or gif file has structure, but I don't think it's reasonable  to
 >to regard such a file as having an encoding in the python unicode sense.

I an not quite sure what you are denying. Color encodings are encodings 
as much as character encodings, even if they encode different 
information. Both encode sensory experience and conceptual correlates 
into a sequences of bits usually organized for convenience into a 
sequence of bytes or other chunks.

There is another similarity. Text files often have at least two levels 
of encoding. First is the character encoding; that is all unicode 
handles. Then there is the text structure encoding, which is sometimes 
called the 'file format'. Most text files are at least structured into 
'lines'. For this, they use encoded line endings, and there have been 
multiple choices for this and at least 2 still in common use (which is a 
nuisance).

Similarly, a pixel (bitmap!) image file must encode the color of each 
pixel and a higher-level structuring of pixels into a a 2D array of rows 
of lines. Just as with text, there have been and still are multiple 
encoding at both levels. Also, similarly, the receiver of an image must 
know what encoding the sender used.

Vector graphics is a different way of encoding certain types of images, 
and again there are multiple ways to encode the information into bits. 
The encoding hassle here is similar to that for text. One of the 
frustrations of tk is that it natively uses just one old dialect of 
postscript (.ps) to output screen images. One has to find and install an 
extension to a modern Scaled Vector Graphics (.svg) encoding.

Because Python is programed with lines of text, it must come with 
minimal text decoding. If Python were programmed with drawings, it would 
come with one or more drawing decoders and a drawing equivalent of a 
lexer. It might even have special 'rd' (read drawing) mode for open.

-- 
Terry Jan Reedy

[toc] | [prev] | [standalone]


Page 5 of 5 — ← Prev page 1 2 3 4 [5]

Back to top | Article view | comp.lang.python


csiph-web