Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'subject:Python': 0.05; 'ascii': 0.07; 'bytes.': 0.07; 'symbols': 0.07; 'python': 0.08; 'bytes,': 0.09; 'encoding.': 0.09; 'bizarre': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'subject:usage': 0.16; 'received:74.125.82.44': 0.16; 'received :mail-ww0-f44.google.com': 0.16; 'wrote:': 0.18; 'bytes': 0.18; 'trying': 0.21; 'header:In-Reply-To:1': 0.22; 'feb': 0.22; 'statement': 0.23; 'subject:numbers': 0.23; 'byte': 0.24; 'windows': 0.26; 'message-id:@mail.gmail.com': 0.29; 'fine.': 0.29; 'weird': 0.29; 'pm,': 0.29; 'sun,': 0.30; 'translate': 0.31; 'represents': 0.32; 'error.': 0.32; 'usual': 0.32; 'file': 0.34; 'character': 0.34; 'steven': 0.34; 'to:addr:python-list': 0.35; 'external': 0.35; 'received:74.125.82': 0.36; 'shows': 0.37; 'received:google.com': 0.37; 'mathematical': 0.38; 'received:74.125': 0.38; 'should': 0.38; 'open': 0.38; 'files': 0.39; 'characters': 0.39; 'unless': 0.39; 'except': 0.39; 'johnson': 0.39; 'everyone': 0.39; 'to:addr:python.org': 0.40; 'human': 0.62; 'foreign': 0.64; '11,': 0.68; 'speaks': 0.91; 'worrying': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=Dqh1dlZAxIEPE0jETgxUPjn9lpJzcZods4OIfyhMchg=; b=QI6o5thIj1q9bAl0FvGTxq4z8ekALx+HLAi3gQQcUU3kE0M+Q079lC5GjUvjnjSLxV +QBAg/y22TuxjpvbqhJmfRaVo0+hFeZ9iPEw/ZLSiKvsbIr2fSMA0S4hURYdi71pzgQe aPo4PywPBBjygmUm+fTioGpLFvrjGCXUp4CWk= MIME-Version: 1.0 In-Reply-To: References: <4F36E2F5.9000505@gmail.com> <4f37229b$0$29986$c3e8da3$5496439d@news.astraweb.com> Date: Sun, 12 Feb 2012 15:38:37 +1100 Subject: Re: Python usage numbers From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 31 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1329021524 news.xs4all.nl 6851 [2001:888:2000:d::a6]:33843 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:20248 On Sun, Feb 12, 2012 at 1:36 PM, Rick Johnson wrote: > On Feb 11, 8:23=A0pm, Steven D'Aprano +comp.lang.pyt...@pearwood.info> wrote: >> "I have a file containing text. I can open it in an editor and see it's >> nearly all ASCII text, except for a few weird and bizarre characters lik= e >> =A3 =A9 =B1 or =F6. In Python 2, I can read that file fine. In Python 3 = I get an >> error. What should I do that requires no thought?" >> >> Obvious answers: > > the most obvious answer would be to read the file WITHOUT worrying > about asinine encoding. What this statement misunderstands, though, is that ASCII is itself an encoding. Files contain bytes, and it's only what's external to those bytes that gives them meaning. The famous "bush hid the facts" trick with Windows Notepad shows the folly of trying to use internal evidence to identify meaning from bytes. Everything that displays text to a human needs to translate bytes into glyphs, and the usual way to do this conceptually is to go via characters. Pretending that it's all the same thing really means pretending that one byte represents one character and that each character is depicted by one glyph. And that's doomed to failure, unless everyone speaks English with no foreign symbols - so, no mathematical notations. ChrisA