Path: csiph.com!usenet.pasdenom.info!news.albasani.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.016 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'encoding': 0.05; 'mrab': 0.05; 'encoded': 0.07; "subject:' ": 0.07; 'utf-8': 0.07; 'bytes,': 0.09; 'subject:position': 0.09; 'url:unicode': 0.09; '127': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'subject: \n ': 0.16; 'subject:start': 0.16; 'too?': 0.16; 'wrote:': 0.18; 'thu,': 0.19; 'not,': 0.20; '>>>': 0.22; 'byte': 0.24; 'bytes': 0.24; 'question': 0.24; 'header:In-Reply-To:1': 0.27; 'chris': 0.29; 'character': 0.29; 'statement': 0.30; 'message-id:@mail.gmail.com': 0.30; '>>>>': 0.31; 'safely': 0.31; 'received:google.com': 0.35; 'false': 0.36; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'subject:can': 0.39; 'to:addr:python.org': 0.39; 'even': 0.60; 'read': 0.60; 'no.': 0.61; 'first': 0.61; 'linked': 0.65; 'between': 0.67; 'answer.': 0.68; 'equals': 0.68; '8bit%:92': 0.71; '8bit%:100': 0.72; 'jul': 0.74; '128,': 0.84; 'characters,': 0.84; 'skip:\xef 10': 0.84; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=rwFnEi9qoErzj/KUQW7NDN5b9RApXXkBmHfZpiFjytM=; b=mSwtE+VGcr5ZHhQjjGrqWxBnNqiBIJrt8xAUaaPHdUYsMw+xdBava8MUAqqAE9ANle ECiWxZThReEid/bwLcfT7TVOK97Fz8D4i797L1Vkg89AtfP5o2bQtFWfEuC8fk7Vtbre PMYFafMvEJ3CncghHXqGbUXheRIcHuWWtWSswEkRQ0DIuJHLmZG6smT20xKS54isNm6a cpjSKt86DQLePrzZe5DOZU2mE7dTslXTMNLTaJ5OKN9QT6iuhRX4p/bOLoPKTMShFFDh fMze9+8QBaTPVYcQClk8BdcpFeGSAfGfA/2ptKG8I65qtMUTBxRgEtbELL76T1KaBOPc Q2Qw== MIME-Version: 1.0 X-Received: by 10.68.213.5 with SMTP id no5mr5319110pbc.185.1372941460569; Thu, 04 Jul 2013 05:37:40 -0700 (PDT) In-Reply-To: <51D561FD.9030907@mrabarnett.plus.com> References: <51D561FD.9030907@mrabarnett.plus.com> Date: Thu, 4 Jul 2013 22:37:40 +1000 Subject: Re: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 0: invalid start byte From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 36 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1372941469 news.xs4all.nl 15884 [2001:888:2000:d::a6]:55888 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:49849 On Thu, Jul 4, 2013 at 9:52 PM, MRAB wrote: > On 04/07/2013 12:29, =CE=9D=CE=AF=CE=BA=CE=BF=CF=82 wrote: >> >> =CE=A3=CF=84=CE=B9=CF=82 4/7/2013 1:54 =CE=BC=CE=BC, =CE=BF/=CE=B7 Chris= Angelico =CE=AD=CE=B3=CF=81=CE=B1=CF=88=CE=B5: >>> >>> On Thu, Jul 4, 2013 at 8:38 PM, =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF= =BF=BD wrote: >>>> >>>> So you are also suggesting that what gesthostbyaddr() returns is not >>>> utf-8 >>>> encoded too? >>>> >>>> What character is 0xb6 anyways? >>> >>> >>> It isn't. It's a byte. Bytes are not characters. >>> >>> http://www.joelonsoftware.com/articles/Unicode.html >> >> >> Well in case of utf-8 encoding for the first 127 codepoing we can safely >> say that a character equals a byte :) >> > Equals? No. Bytes are not characters. (Strictly speaking, they're > codepoints, not characters.) > > And anyway, it's the first _128_ codepoints. As MRAB says, even if there's a 1:1 correspondence between bytes, codepoints, and characters, they're still not the same thing. Plus, 0xb6 is not in the first 128, so your statement is false and your question has no answer. Do you understand why I gave you that link? If not, go read the page linked to. ChrisA