Path: csiph.com!usenet.pasdenom.info!news.etla.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.044 X-Spam-Evidence: '*H*': 0.91; '*S*': 0.00; '*is*': 0.09; 'ascii': 0.09; 'python': 0.11; 'ascii,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'subject:issue': 0.16; 'wrote:': 0.18; 'thu,': 0.19; 'byte': 0.24; 'bytes': 0.24; 'unicode': 0.24; 'header:In-Reply-To:1': 0.27; 'correct': 0.29; "doesn't": 0.30; 'characters': 0.30; 'message-id:@mail.gmail.com': 0.30; 'code': 0.31; "d'aprano": 0.31; 'sep': 0.31; 'steven': 0.31; 'entirely': 0.33; 'knows': 0.35; 'received:google.com': 0.35; 'representing': 0.36; 'sequence': 0.36; 'set.': 0.36; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'itself': 0.39; 'to:addr:python.org': 0.39; 'happen': 0.63; 'six': 0.68; 'reverse': 0.68; '182': 0.91; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=ikqYU+YUpPMksYChXEhnfjun1eSPlRHYwcoslPfOoQE=; b=Qy0e5XSc60e3cUx9gPemP5b9Rapg9q5vEZb+AIr+Pt5u7ltrjwTmVHl0DTBLJhkjBf EfKwq+yxmIBkr5GbJ0sdcCFKMkV3k406apOjH1PkiofUZB3ZcUAdNSxaq8h/xzk1pQN/ q1e3rwus94qCBWBL6yBT5dOjSvUENzCdUMu/u59xKbIgTDxDAQmUG4QFG31VZj67NmJ7 cUf1WVve66nhO9XWgBXVpvOKj9qBU14q+JHc5jqEobjgxy1zIFlNkiXffIeSQo62ZmMx DrT6ycSINqA7CFL97gl9sWPgKzEi6MwiRLP4l5iTM/reejDgeYgag8d3eqtS//Orwmof Q6RA== MIME-Version: 1.0 X-Received: by 10.52.103.73 with SMTP id fu9mr807461vdb.29.1378353557355; Wed, 04 Sep 2013 20:59:17 -0700 (PDT) In-Reply-To: <5227f57e$0$2743$c3e8da3$76491128@news.astraweb.com> References: <5222fc40$0$6599$c3e8da3$5496439d@news.astraweb.com> <3e549761-4323-4379-b4e4-ce51597d59c0@googlegroups.com> <5227f57e$0$2743$c3e8da3$76491128@news.astraweb.com> Date: Thu, 5 Sep 2013 13:59:17 +1000 Subject: Re: UnicodeDecodeError issue From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 13 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1378353565 news.xs4all.nl 15948 [2001:888:2000:d::a6]:43044 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:53668 On Thu, Sep 5, 2013 at 1:07 PM, Steven D'Aprano wrote: > Technically, it's not ASCII, since ASCII only knows about bytes \x00 > through \x7F (decimal 0 through 127). That's why it isn't correct to > describe Python bytes strings as "ASCII strings". They're byte strings > that happen to be displayed as ASCII-plus-other-stuff. The line of code is itself entirely ASCII. The sequence REVERSE SOLIDUS, LATIN SMALL LETTER X, LATIN SMALL LETTER B, DIGIT SIX is four Unicode characters that are in the ASCII set. That Python interprets them as representing the byte value 182 doesn't change that; the line of code *is* ASCII. ChrisA