Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin3!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed2a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.026 X-Spam-Evidence: '*H*': 0.95; '*S*': 0.00; 'encoding': 0.05; 'subject:Python': 0.06; '*not*': 0.07; 'encoded': 0.07; 'utf-8': 0.07; 'bytes,': 0.09; 'bytes.': 0.09; 'cc:addr:python-list': 0.11; 'stored': 0.12; 'wrote': 0.14; 'different,': 0.16; 'file"': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'other,': 0.16; 'such,': 0.16; 'truncates': 0.16; 'size,': 0.16; 'wrote:': 0.18; 'file,': 0.19; 'fit': 0.20; 'appears': 0.22; 'cc:addr:python.org': 0.22; 'certainly': 0.24; 'exists': 0.24; 'text,': 0.24; 'text.': 0.24; 'unicode': 0.24; 'cc:2**0': 0.24; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; 'message- id:@mail.gmail.com': 0.30; 'gives': 0.31; 'getting': 0.31; 'are.': 0.31; 'directory,': 0.31; 'ordinary': 0.31; 'purely': 0.31; 'subject:some': 0.31; 'file': 0.32; 'text': 0.33; 'fri,': 0.33; 'actual': 0.34; 'maybe': 0.34; 'could': 0.34; "can't": 0.35; 'something': 0.35; 'case,': 0.35; 'done.': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'really': 0.36; 'data,': 0.36; 'disk': 0.36; 'done,': 0.36; 'like,': 0.36; 'possible': 0.36; 'operating': 0.37; 'same.': 0.38; 'files': 0.38; 'pm,': 0.38; 'anything': 0.39; 'explain': 0.39; 'how': 0.40; 'read': 0.60; 'most': 0.60; 'black': 0.61; 'today,': 0.61; 'course': 0.61; 'you.': 0.62; 'real': 0.63; 'such': 0.63; 'different': 0.65; '(that': 0.65; 'size.': 0.65; 'reads': 0.68; 'facilities': 0.69; 'physical': 0.72; 'difference.': 0.84; 'ethan': 0.84; 'furman': 0.84; "it'd": 0.84; 'sectors': 0.91; 'sectors,': 0.91; 'to:none': 0.92; 'mount': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=uf6CGSssc9d2HZeLtztyhb0NPs2expjArMe+MZ5VgnY=; b=vpuKjFWfLMUXsmda1aSxphDhAq48RCLCQZTBuw5gSsQEwSdj8rn0e8W14oOZNLf+Sl WJs342jD5rg25saNHC8vXOudpJvpkhm8I3TYFaU12ZB4bkPRN+8zBpAahyvRC1a0sYW5 gZFOEJBQup9LbaKeTLyvXREVul4Q/vat2rEjq+1Jy/Ki8MaDdAhJpCOR5au0gGfqIKOa sqWcvNWs/Lf2Q4ItSgC+7zujh2I/OFQBNr9oRo2wCdcRU9P7zAZvDVShTo2P806UgsTr C1CU8y7o7KSoI7bTNRGCnAdhdZgjYqUzc1UH7OIsxdaTuSlNNX+QURPSwsZAZre+fOhC yZGw== MIME-Version: 1.0 X-Received: by 10.58.74.164 with SMTP id u4mr3248912vev.81.1402068347169; Fri, 06 Jun 2014 08:25:47 -0700 (PDT) In-Reply-To: <5391C113.5030508@stoneleaf.us> References: <538a8f48$0$29978$c3e8da3$5496439d@news.astraweb.com> <538bcfff$0$29978$c3e8da3$5496439d@news.astraweb.com> <538C5BB8.1020702@chamonix.reportlab.co.uk> <538f1a61$0$29978$c3e8da3$5496439d@news.astraweb.com> <53902bb1$0$11109$c3e8da3@news.astraweb.com> <87wqcvu20h.fsf@elektro.pacujo.net> <7b3543f6-6f62-49c5-abdc-e2783fd6d629@googlegroups.com> <87oay7tnxt.fsf@elektro.pacujo.net> <87tx7z5hvw.fsf@elektro.pacujo.net> <5391C113.5030508@stoneleaf.us> Date: Sat, 7 Jun 2014 01:25:47 +1000 Subject: Re: Python 3.2 has some deadly infection From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 42 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1402068349 news.xs4all.nl 2900 [2001:888:2000:d::a6]:49358 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:72851 On Fri, Jun 6, 2014 at 11:24 PM, Ethan Furman wrote: > On 06/05/2014 11:30 AM, Marko Rauhamaa wrote: >> >> >> How text is represented is very different from whether text is a >> fundamental data type. A fundamental text file is such that ordinary >> operating system facilities can't see inside the black box (that is, >> they are *not* encoded as far as the applications go). > > Of course they are. It may be an ASCII-encoding of some flavor or other, or > something really (to me) strange -- but an encoding is most assuredly in > affect. Allow me to explain what I think Marko's getting at here. In most file systems, a file exists on the disk as a set of sectors of data, plus some metadata including the file's actual size. When you ask the OS to read you that file, it goes to the disk, reads those sectors, truncates the data to the real size, and gives you those bytes. It's possible to mount a file as a directory, in which case the physical representation is very different, but the file still appears the same. In that case, the OS goes reading some part of the file, maybe decompresses it, and gives it to you. Same difference. These files still contain bytes. A "fundamental text file" would be one where, instead of reading and writing bytes, you read and write Unicode text. Since the hard disk still works with sectors and bytes, it'll still be stored as such, but that's an implementation detail; and you could format your disk UTF-8 or UTF-16 or FSR or anything you like, and the only difference you'd see is performance. This could certainly be done, in theory. I don't know how well it'd fit with any of the popular OSes of today, but it could be done. And these files would not have an encoding; their on-platter representations would, but that's purely implementation - the text that you wrote out and the text that you read in are the same text, and there's been no encoding visible. ChrisA