Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #72340 > unrolled thread
| Started by | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| First post | 2014-05-31 17:10 +0100 |
| Last post | 2014-06-03 14:22 -0400 |
| Articles | 12 on this page of 92 — 19 participants |
Back to article view | Back to comp.lang.python
Python 3.2 has some deadly infection Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-31 17:10 +0100
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-05-31 22:55 +0300
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-01 02:26 +0000
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-01 12:43 +1000
Re: Python 3.2 has some deadly infection Tim Delaney <timothy.c.delaney@gmail.com> - 2014-06-02 08:54 +1000
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-02 01:14 +0000
Re: Python 3.2 has some deadly infection Tim Delaney <timothy.c.delaney@gmail.com> - 2014-06-02 12:23 +1000
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-01 19:46 -0700
Re: Python 3.2 has some deadly infection Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2014-06-02 07:45 +0000
Re: Python 3.2 has some deadly infection Tim Delaney <timothy.c.delaney@gmail.com> - 2014-06-02 19:02 +1000
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-02 19:14 +1000
Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-02 12:10 +0100
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-03 16:34 +0000
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-04 02:43 +1000
Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-02 17:34 -0400
Re: Python 3.2 has some deadly infection Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-06-03 17:16 +1200
Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-03 02:21 -0400
Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-03 15:18 +0100
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-04 13:08 +0000
Re: Python 3.2 has some deadly infection Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-06-05 14:01 +1200
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 10:16 +0300
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-05 17:30 +1000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 11:05 +0300
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-05 18:36 +1000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 12:53 +0300
Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-05 05:43 -0700
Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 14:50 -0400
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 23:21 +0300
Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 18:09 -0400
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 23:13 +0000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 02:30 +0300
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 09:39 +1000
Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 22:08 -0400
Re: Python 3.2 has some deadly infection Ethan Furman <ethan@stoneleaf.us> - 2014-06-05 20:47 -0700
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve@pearwood.info> - 2014-06-05 08:34 +0000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 12:41 +0300
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 06:37 -0700
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 17:45 +0300
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 15:33 +0000
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 02:12 +1000
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 09:54 -0700
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 03:36 +1000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 19:52 +0300
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 03:28 +1000
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 15:35 -0700
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 08:52 +1000
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 20:11 -0700
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 13:20 +1000
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-05 20:32 -0700
Re: Python 3.2 has some deadly infection Akira Li <4kir4.1i@gmail.com> - 2014-06-06 12:03 +0400
Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-05 16:37 +0100
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 16:16 +0000
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 01:50 +1000
Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-05 17:17 +0100
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 16:32 +0000
Re: Python 3.2 has some deadly infection Ethan Furman <ethan@stoneleaf.us> - 2014-06-06 07:40 -0700
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-06 03:14 +1000
Re: Python 3.2 has some deadly infection Ian Kelly <ian.g.kelly@gmail.com> - 2014-06-05 11:16 -0600
Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-05 14:11 -0400
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 21:30 +0300
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-05 23:02 +0000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 02:21 +0300
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-06 12:15 +0000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 16:00 +0300
Re: Python 3.2 has some deadly infection rurpy@yahoo.com - 2014-06-07 21:34 -0700
Re: Python 3.2 has some deadly infection Ethan Furman <ethan@stoneleaf.us> - 2014-06-06 06:24 -0700
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 17:10 +0300
Re: Python 3.2 has some deadly infection Michael Torrie <torriem@gmail.com> - 2014-06-06 09:02 -0600
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 18:32 +0300
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 01:50 +1000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 20:02 +0300
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-06 10:13 -0700
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 03:26 +1000
Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-06 11:03 -0700
Re: Python 3.2 has some deadly infection Denis McMahon <denismfmcmahon@gmail.com> - 2014-06-06 21:18 +0000
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 08:18 +1000
Re: Python 3.2 has some deadly infection Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-06 15:57 +0000
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-06 09:21 -0700
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 02:48 +1000
Re: Python 3.2 has some deadly infection Rustom Mody <rustompmody@gmail.com> - 2014-06-06 10:04 -0700
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 03:12 +1000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 20:11 +0300
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 03:16 +1000
Re: Python 3.2 has some deadly infection Marko Rauhamaa <marko@pacujo.net> - 2014-06-06 20:18 +0300
Re: Python 3.2 has some deadly infection Ned Batchelder <ned@nedbatchelder.com> - 2014-06-06 13:33 -0400
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-07 01:25 +1000
Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-06 08:44 -0700
Re: Python 3.2 has some deadly infection wxjmfauth@gmail.com - 2014-06-06 08:48 -0700
Re: Python 3.2 has some deadly infection Robin Becker <robin@reportlab.com> - 2014-06-06 12:56 +0100
Re: Python 3.2 has some deadly infection Akira Li <4kir4.1i@gmail.com> - 2014-06-05 06:49 +0400
Re: Python 3.2 has some deadly infection Chris Angelico <rosuav@gmail.com> - 2014-06-04 00:25 +1000
Re: Python 3.2 has some deadly infection Terry Reedy <tjreedy@udel.edu> - 2014-06-03 14:22 -0400
Page 5 of 5 — ← Prev page 1 2 3 4 [5]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-06-07 03:12 +1000 |
| Message-ID | <mailman.10825.1402075210.18130.python-list@python.org> |
| In reply to | #72863 |
On Sat, Jun 7, 2014 at 3:04 AM, Rustom Mody <rustompmody@gmail.com> wrote: >> ASCII was once your one companion, it was all that mattered. ASCII was >> once a friendly encoding, then your world was shattered. Wishing it >> were somehow here again, wishing it were somehow near... sometimes it >> seemed, if you just dreamed, somehow it would be here! Wishing you >> could use just bytes again, knowing that you never would... dreaming >> of it won't help you to do all that you dream you could! > >> It's time to stop chasing the phantom and start living in the Raoul >> world... err, the real world. :) > > I thought that "If only bytes were 21+ bits wide" would sound sufficiently > nonsensical, that I did not need to explicitly qualify it as a utopian dream! Humour never dies! ChrisA (In case it's not obvious, by the way, everything I said above is a reference to the Phantom of the Opera.)
[toc] | [prev] | [next] | [standalone]
| From | Marko Rauhamaa <marko@pacujo.net> |
|---|---|
| Date | 2014-06-06 20:11 +0300 |
| Message-ID | <871tv255g9.fsf@elektro.pacujo.net> |
| In reply to | #72858 |
Steven D'Aprano <steve+comp.lang.python@pearwood.info>:
> On Fri, 06 Jun 2014 18:32:39 +0300, Marko Rauhamaa wrote:
>> Unicode, like ASCII, is a code. Representing text in unicode is
>> encoding.
>
> A Unicode string as an abstract data type has no encoding.
Unicode itself is an encoding. See it in action here:
72 101 108 108 111 44 32 119 111 114 108 100
> It is a Platonic ideal, a pure form like the real numbers.
Far from it. It is a mapping from symbols to integers. The symbols are
the Platonic ones.
The Unicode/ASCII encoding above represents the same "Platonic" string
as this ESCDIC one:
212 133 147 147 150 107 64 166 150 153 137 132
> Unicode string like this:
>
> s = u"NOBODY expects the Spanish Inquisition!"
>
> should not be thought of as a bunch of bytes in some encoding,
Encoding is not tied to bytes or even computers. People can speak in
code, after all.
Marko
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-06-07 03:16 +1000 |
| Message-ID | <mailman.10824.1402075026.18130.python-list@python.org> |
| In reply to | #72865 |
On Sat, Jun 7, 2014 at 3:11 AM, Marko Rauhamaa <marko@pacujo.net> wrote: > Encoding is not tied to bytes or even computers. People can speak in > code, after all. Obligatory: http://xkcd.com/257/ ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Marko Rauhamaa <marko@pacujo.net> |
|---|---|
| Date | 2014-06-06 20:18 +0300 |
| Message-ID | <87tx7y3qiz.fsf@elektro.pacujo.net> |
| In reply to | #72865 |
Marko Rauhamaa <marko@pacujo.net>: > Far from it. It is a mapping from symbols to integers. The symbols are > the Platonic ones. Well, of course, even the symbols are a code. Letters code sounds and digits code numbers. And the sounds and numbers code ideas. Now we are getting close to being truly Platonic. Marko
[toc] | [prev] | [next] | [standalone]
| From | Ned Batchelder <ned@nedbatchelder.com> |
|---|---|
| Date | 2014-06-06 13:33 -0400 |
| Message-ID | <mailman.10827.1402076054.18130.python-list@python.org> |
| In reply to | #72865 |
On 6/6/14 1:11 PM, Marko Rauhamaa wrote: > Steven D'Aprano <steve+comp.lang.python@pearwood.info>: > >> On Fri, 06 Jun 2014 18:32:39 +0300, Marko Rauhamaa wrote: >>> Unicode, like ASCII, is a code. Representing text in unicode is >>> encoding. >> >> A Unicode string as an abstract data type has no encoding. > > Unicode itself is an encoding. See it in action here: > > 72 101 108 108 111 44 32 119 111 114 108 100 > >> It is a Platonic ideal, a pure form like the real numbers. > > Far from it. It is a mapping from symbols to integers. The symbols are > the Platonic ones. > > The Unicode/ASCII encoding above represents the same "Platonic" string > as this ESCDIC one: > > 212 133 147 147 150 107 64 166 150 153 137 132 > >> Unicode string like this: >> >> s = u"NOBODY expects the Spanish Inquisition!" >> >> should not be thought of as a bunch of bytes in some encoding, > > Encoding is not tied to bytes or even computers. People can speak in > code, after all. > > Marko, you are right about the broader English meaning of the word "encoding". The original point here was that "Unicode text" provides no information about what sequence of bytes is at work. In the Unicode ecosystem, an encoding is a specification of how the text will be represented in a byte stream. Saying something is "Unicode" doesn't provide that information. You have to say, "UTF8" or "UTF16" or "UCS2", etc, in order to know how bytes will be involved. When Ethan said, "a Unicode string, as a data type, has no encoding," he meant (as he explained) that a Unicode string doesn't require or imply any particular mapping to bytes. I'm sure you understand this, I'm just trying to clarify the different meanings of the word "encoding". > Marko > -- Ned Batchelder, http://nedbatchelder.com
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-06-07 01:25 +1000 |
| Message-ID | <mailman.10819.1402068349.18130.python-list@python.org> |
| In reply to | #72744 |
On Fri, Jun 6, 2014 at 11:24 PM, Ethan Furman <ethan@stoneleaf.us> wrote: > On 06/05/2014 11:30 AM, Marko Rauhamaa wrote: >> >> >> How text is represented is very different from whether text is a >> fundamental data type. A fundamental text file is such that ordinary >> operating system facilities can't see inside the black box (that is, >> they are *not* encoded as far as the applications go). > > Of course they are. It may be an ASCII-encoding of some flavor or other, or > something really (to me) strange -- but an encoding is most assuredly in > affect. Allow me to explain what I think Marko's getting at here. In most file systems, a file exists on the disk as a set of sectors of data, plus some metadata including the file's actual size. When you ask the OS to read you that file, it goes to the disk, reads those sectors, truncates the data to the real size, and gives you those bytes. It's possible to mount a file as a directory, in which case the physical representation is very different, but the file still appears the same. In that case, the OS goes reading some part of the file, maybe decompresses it, and gives it to you. Same difference. These files still contain bytes. A "fundamental text file" would be one where, instead of reading and writing bytes, you read and write Unicode text. Since the hard disk still works with sectors and bytes, it'll still be stored as such, but that's an implementation detail; and you could format your disk UTF-8 or UTF-16 or FSR or anything you like, and the only difference you'd see is performance. This could certainly be done, in theory. I don't know how well it'd fit with any of the popular OSes of today, but it could be done. And these files would not have an encoding; their on-platter representations would, but that's purely implementation - the text that you wrote out and the text that you read in are the same text, and there's been no encoding visible. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2014-06-06 08:44 -0700 |
| Message-ID | <3564caae-9088-41a3-879a-451771ab173b@googlegroups.com> |
| In reply to | #72851 |
Le vendredi 6 juin 2014 17:25:47 UTC+2, Chris Angelico a écrit :
> On Fri, Jun 6, 2014 at 11:24 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
>
> > On 06/05/2014 11:30 AM, Marko Rauhamaa wrote:
>
> >>
>
> >>
>
> >> How text is represented is very different from whether text is a
>
> >> fundamental data type. A fundamental text file is such that ordinary
>
> >> operating system facilities can't see inside the black box (that is,
>
> >> they are *not* encoded as far as the applications go).
>
> >
>
> > Of course they are. It may be an ASCII-encoding of some flavor or other, or
>
> > something really (to me) strange -- but an encoding is most assuredly in
>
> > affect.
>
>
>
> Allow me to explain what I think Marko's getting at here.
>
>
>
> In most file systems, a file exists on the disk as a set of sectors of
>
> data, plus some metadata including the file's actual size. When you
>
> ask the OS to read you that file, it goes to the disk, reads those
>
> sectors, truncates the data to the real size, and gives you those
>
> bytes.
>
>
>
> It's possible to mount a file as a directory, in which case the
>
> physical representation is very different, but the file still appears
>
> the same. In that case, the OS goes reading some part of the file,
>
> maybe decompresses it, and gives it to you. Same difference. These
>
> files still contain bytes.
>
>
>
> A "fundamental text file" would be one where, instead of reading and
>
> writing bytes, you read and write Unicode text. Since the hard disk
>
> still works with sectors and bytes, it'll still be stored as such, but
>
> that's an implementation detail; and you could format your disk UTF-8
>
> or UTF-16 or FSR or anything you like, and the only difference you'd
>
> see is performance.
>
>
>
> This could certainly be done, in theory. I don't know how well it'd
>
> fit with any of the popular OSes of today, but it could be done. And
>
> these files would not have an encoding; their on-platter
>
> representations would, but that's purely implementation - the text
>
> that you wrote out and the text that you read in are the same text,
>
> and there's been no encoding visible.
>
>
----------
From the three, you can already eliminates one.
It's not a good new.
sys.getsizeof('Gödel'.encode('utf-8'))
23
sys.getsizeof('Gödel'.encode('utf-16-le'))
27
sys.getsizeof('Gödel')
42
os.listdir(r'D:\jm\Москва\Zürich\Αθήνα\œdipe')
['a.txt', 'kk.bat', 'kk.cmd', 'kk.py', '__pycache__']
sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe'.encode('utf-8'))
61
sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe'.encode('utf-16-le'))
79
sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe')
100
jmf
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2014-06-06 08:48 -0700 |
| Message-ID | <611b507c-1342-4764-9bbb-0583ee40695f@googlegroups.com> |
| In reply to | #72855 |
Le vendredi 6 juin 2014 17:44:57 UTC+2, wxjm...@gmail.com a écrit :
> Le vendredi 6 juin 2014 17:25:47 UTC+2, Chris Angelico a écrit :
>
> > On Fri, Jun 6, 2014 at 11:24 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
>
> >
>
> > > On 06/05/2014 11:30 AM, Marko Rauhamaa wrote:
>
> >
>
> > >>
>
> >
>
> > >>
>
> >
>
> > >> How text is represented is very different from whether text is a
>
> >
>
> > >> fundamental data type. A fundamental text file is such that ordinary
>
> >
>
> > >> operating system facilities can't see inside the black box (that is,
>
> >
>
> > >> they are *not* encoded as far as the applications go).
>
> >
>
> > >
>
> >
>
> > > Of course they are. It may be an ASCII-encoding of some flavor or other, or
>
> >
>
> > > something really (to me) strange -- but an encoding is most assuredly in
>
> >
>
> > > affect.
>
> >
>
> >
>
> >
>
> > Allow me to explain what I think Marko's getting at here.
>
> >
>
> >
>
> >
>
> > In most file systems, a file exists on the disk as a set of sectors of
>
> >
>
> > data, plus some metadata including the file's actual size. When you
>
> >
>
> > ask the OS to read you that file, it goes to the disk, reads those
>
> >
>
> > sectors, truncates the data to the real size, and gives you those
>
> >
>
> > bytes.
>
> >
>
> >
>
> >
>
> > It's possible to mount a file as a directory, in which case the
>
> >
>
> > physical representation is very different, but the file still appears
>
> >
>
> > the same. In that case, the OS goes reading some part of the file,
>
> >
>
> > maybe decompresses it, and gives it to you. Same difference. These
>
> >
>
> > files still contain bytes.
>
> >
>
> >
>
> >
>
> > A "fundamental text file" would be one where, instead of reading and
>
> >
>
> > writing bytes, you read and write Unicode text. Since the hard disk
>
> >
>
> > still works with sectors and bytes, it'll still be stored as such, but
>
> >
>
> > that's an implementation detail; and you could format your disk UTF-8
>
> >
>
> > or UTF-16 or FSR or anything you like, and the only difference you'd
>
> >
>
> > see is performance.
>
> >
>
> >
>
> >
>
> > This could certainly be done, in theory. I don't know how well it'd
>
> >
>
> > fit with any of the popular OSes of today, but it could be done. And
>
> >
>
> > these files would not have an encoding; their on-platter
>
> >
>
> > representations would, but that's purely implementation - the text
>
> >
>
> > that you wrote out and the text that you read in are the same text,
>
> >
>
> > and there's been no encoding visible.
>
> >
>
> >
>
> ----------
>
>
>
> From the three, you can already eliminates one.
>
> It's not a good new.
>
>
>
> sys.getsizeof('Gödel'.encode('utf-8'))
>
> 23
>
> sys.getsizeof('Gödel'.encode('utf-16-le'))
>
> 27
>
> sys.getsizeof('Gödel')
>
> 42
>
> os.listdir(r'D:\jm\Москва\Zürich\Αθήνα\œdipe')
>
> ['a.txt', 'kk.bat', 'kk.cmd', 'kk.py', '__pycache__']
>
> sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe'.encode('utf-8'))
>
> 61
>
> sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe'.encode('utf-16-le'))
>
> 79
>
> sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe')
>
> 100
>
>
>
> jmf
Sorry, wront copy/paste
>>> sys.getsizeof('Gödel'.encode('utf-8'))
23
>>> sys.getsizeof('Gödel'.encode('utf-16-le'))
27
>>> sys.getsizeof('Gödel')
42
>>> os.listdir(r'D:\jm\Москва\Zürich\Αθήνα\œdipe')
['a.txt', 'kk.bat', 'kk.cmd', 'kk.py', '__pycache__']
>>> sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe'.encode('utf-8'))
61
>>> sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe'.encode('utf-16-le'))
79
>>> sys.getsizeof(r'D:\jm\Москва\Zürich\Αθήνα\œdipe')
100
>>>
jmf
[toc] | [prev] | [next] | [standalone]
| From | Robin Becker <robin@reportlab.com> |
|---|---|
| Date | 2014-06-06 12:56 +0100 |
| Message-ID | <mailman.10811.1402055808.18130.python-list@python.org> |
| In reply to | #72708 |
On 05/06/2014 18:16, Ian Kelly wrote: ......... > > How should e.g. bytes.upper() be implemented then? The correct > behavior is entirely dependent on the encoding. Python 2 just assumes > ASCII, which at best will correctly upper-case some subset of the > string and leave the rest unchanged, and at worst could corrupt the > string entirely. There are some things that were dropped that should > not have been, but my impression is that those are being worked on, > for example % formatting in PEP 461. > bytes.upper should have done exactly what str.upper in python 2 did; that way we could have at least continued to do the wrong thing :) -- Robin Becker
[toc] | [prev] | [next] | [standalone]
| From | Akira Li <4kir4.1i@gmail.com> |
|---|---|
| Date | 2014-06-05 06:49 +0400 |
| Message-ID | <mailman.10721.1401936565.18130.python-list@python.org> |
| In reply to | #72635 |
Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:
> On Tue, 03 Jun 2014 15:18:19 +0100, Robin Becker wrote:
>
>> Isn't it a bit old fashioned to think everything is connected to a
>> console?
>
> The whole concept of stdin and stdout is based on the idea of having a
> console to read from and write to. Otherwise, what would be the point?
> Classic Mac (pre OS X) had no command line interface nothing, and nothing
> even remotely like stdin and stdout. But once you have a console, stdin,
> stdout, and stderr become useful. And once you have them, then you can
> extend the concept using redirection and pipes. But fundamentally, stdin
> and stdout are about consoles.
>
We can consider "pipes" abstraction to be fundumental. Decades of usage
prove a pipeline of processes usefulness e.g.,
tr -cs A-Za-z '\n' |
tr A-Z a-z |
sort |
uniq -c |
sort -rn |
sed ${1}q
See http://www.leancrew.com/all-this/2011/12/more-shell-less-egg/
Whether or not a pipe is connected to a tty is a small
detail. stdin/stdout is about pipes, not consoles.
--
akira
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-06-04 00:25 +1000 |
| Message-ID | <mailman.10627.1401805540.18130.python-list@python.org> |
| In reply to | #72468 |
On Wed, Jun 4, 2014 at 12:18 AM, Robin Becker <robin@reportlab.com> wrote: > I think the idea that we only give meaning to binary data using encodings is > a bit limiting. A zip or gif file has structure, but I don't think it's > reasonable to regard such a file as having an encoding in the python unicode > sense. Of course it doesn't. Those are binary files. Ultimately, every file is binary; but since the vast majority of them actually contain text, in one of a handful of common encodings, it's nice to have an easy way to open a text file. You could argue that "rb" should be the default, rather than "rt", but that's a relatively minor point. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2014-06-03 14:22 -0400 |
| Message-ID | <mailman.10638.1401819784.18130.python-list@python.org> |
| In reply to | #72468 |
On 6/3/2014 10:18 AM, Robin Becker wrote: > I think the idea that we only give meaning to binary data using > encodings is a bit limiting. On the contrary, it is liberating. The fact that bits have no meaning other than 'a choice between two alterntives' means 1. any binary choice - 0/1, -/+, false/true, no/yes, closed/open, male/female, sad/happy, evil/good, low/high, and so on ad infinitum, can be encoded into a bit. Since any such pair could have been reversed, the mapping between bit states and the pair is arbitrary, and constitutes an encoding. 2. any discret or digitized information that constitutes a choice between multiple alternative can be encoded into a sequence of bits. This crucial discovery is the basis of Shannon's 1947 paper and of the information age that started about then. > A zip or gif file has structure, but I don't think it's reasonable to >to regard such a file as having an encoding in the python unicode sense. I an not quite sure what you are denying. Color encodings are encodings as much as character encodings, even if they encode different information. Both encode sensory experience and conceptual correlates into a sequences of bits usually organized for convenience into a sequence of bytes or other chunks. There is another similarity. Text files often have at least two levels of encoding. First is the character encoding; that is all unicode handles. Then there is the text structure encoding, which is sometimes called the 'file format'. Most text files are at least structured into 'lines'. For this, they use encoded line endings, and there have been multiple choices for this and at least 2 still in common use (which is a nuisance). Similarly, a pixel (bitmap!) image file must encode the color of each pixel and a higher-level structuring of pixels into a a 2D array of rows of lines. Just as with text, there have been and still are multiple encoding at both levels. Also, similarly, the receiver of an image must know what encoding the sender used. Vector graphics is a different way of encoding certain types of images, and again there are multiple ways to encode the information into bits. The encoding hassle here is similar to that for text. One of the frustrations of tk is that it natively uses just one old dialect of postscript (.ps) to output screen images. One has to find and install an extension to a modern Scaled Vector Graphics (.svg) encoding. Because Python is programed with lines of text, it must come with minimal text decoding. If Python were programmed with drawings, it would come with one or more drawing decoders and a drawing equivalent of a lexer. It might even have special 'rd' (read drawing) mode for open. -- Terry Jan Reedy
[toc] | [prev] | [standalone]
Page 5 of 5 — ← Prev page 1 2 3 4 [5]
Back to top | Article view | comp.lang.python
csiph-web