Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'encoding': 0.05; 'represents': 0.05; 'say,': 0.05; 'subject:Python': 0.06; 'string': 0.09; '101': 0.09; 'bytes.': 0.09; 'imply': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'text"': 0.09; 'type,': 0.09; '107': 0.16; '153': 0.16; 'ascii,': 0.16; 'encoding.': 0.16; 'expects': 0.16; 'integers.': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'symbols': 0.16; 'unicode,': 0.16; 'all.': 0.16; 'wrote:': 0.18; 'code.': 0.18; 'trying': 0.19; 'meant': 0.20; '>>>': 0.22; 'code,': 0.22; 'saying': 0.22; 'header:User-Agent:1': 0.23; 'byte': 0.24; 'bytes': 0.24; 'string,': 0.24; 'unicode': 0.24; 'this:': 0.26; 'header:X-Complaints-To:1': 0.27; 'header:In-Reply- To:1': 0.27; 'point': 0.28; "doesn't": 0.30; 'said,': 0.30; "i'm": 0.30; 'work.': 0.31; 'bunch': 0.31; "d'aprano": 0.31; 'ones.': 0.31; 'steven': 0.31; 'subject:some': 0.31; 'text': 0.33; 'fri,': 0.33; 'something': 0.35; 'representing': 0.36; 'sequence': 0.36; 'should': 0.36; 'mapping': 0.38; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'itself': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'how': 0.40; 'even': 0.60; 'skip:u 10': 0.60; 'here:': 0.62; 'information': 0.63; 'real': 0.63; 'provide': 0.64; 'different': 0.65; 'here': 0.66; '100': 0.79; 'computers.': 0.84; 'etc,': 0.84; 'ethan': 0.84; '150': 0.91; '212': 0.91; 'involved.': 0.91; 'tied': 0.93 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Ned Batchelder Subject: Re: Python 3.2 has some deadly infection Date: Fri, 06 Jun 2014 13:33:55 -0400 References: <538C5BB8.1020702@chamonix.reportlab.co.uk> <538f1a61$0$29978$c3e8da3$5496439d@news.astraweb.com> <53902bb1$0$11109$c3e8da3@news.astraweb.com> <87wqcvu20h.fsf@elektro.pacujo.net> <7b3543f6-6f62-49c5-abdc-e2783fd6d629@googlegroups.com> <87oay7tnxt.fsf@elektro.pacujo.net> <87tx7z5hvw.fsf@elektro.pacujo.net> <87egz25dsd.fsf@elektro.pacujo.net> <87a99q5a08.fsf@elektro.pacujo.net> <5391e4fe$0$29988$c3e8da3$5496439d@news.astraweb.com> <871tv255g9.fsf@elektro.pacujo.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: 18.189.116.186 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 In-Reply-To: <871tv255g9.fsf@elektro.pacujo.net> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 58 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1402076054 news.xs4all.nl 2850 [2001:888:2000:d::a6]:54511 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:72873 On 6/6/14 1:11 PM, Marko Rauhamaa wrote: > Steven D'Aprano : > >> On Fri, 06 Jun 2014 18:32:39 +0300, Marko Rauhamaa wrote: >>> Unicode, like ASCII, is a code. Representing text in unicode is >>> encoding. >> >> A Unicode string as an abstract data type has no encoding. > > Unicode itself is an encoding. See it in action here: > > 72 101 108 108 111 44 32 119 111 114 108 100 > >> It is a Platonic ideal, a pure form like the real numbers. > > Far from it. It is a mapping from symbols to integers. The symbols are > the Platonic ones. > > The Unicode/ASCII encoding above represents the same "Platonic" string > as this ESCDIC one: > > 212 133 147 147 150 107 64 166 150 153 137 132 > >> Unicode string like this: >> >> s = u"NOBODY expects the Spanish Inquisition!" >> >> should not be thought of as a bunch of bytes in some encoding, > > Encoding is not tied to bytes or even computers. People can speak in > code, after all. > > Marko, you are right about the broader English meaning of the word "encoding". The original point here was that "Unicode text" provides no information about what sequence of bytes is at work. In the Unicode ecosystem, an encoding is a specification of how the text will be represented in a byte stream. Saying something is "Unicode" doesn't provide that information. You have to say, "UTF8" or "UTF16" or "UCS2", etc, in order to know how bytes will be involved. When Ethan said, "a Unicode string, as a data type, has no encoding," he meant (as he explained) that a Unicode string doesn't require or imply any particular mapping to bytes. I'm sure you understand this, I'm just trying to clarify the different meanings of the word "encoding". > Marko > -- Ned Batchelder, http://nedbatchelder.com