Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.006 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'encoding': 0.05; 'encoded': 0.07; 'utf-8': 0.07; 'string': 0.09; 'bytes.': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'python': 0.11; 'jan': 0.12; 'suggest': 0.14; '11:09': 0.16; 'character.': 0.16; 'pythonistas': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'reedy': 0.16; 'subject:Unicode': 0.16; '(you': 0.16; 'wrote:': 0.18; 'discussion': 0.18; 'header:User-Agent:1': 0.23; 'byte': 0.24; 'bytes': 0.24; 'parse': 0.24; "shouldn't": 0.24; 'question': 0.24; 'header:X-Complaints-To:1': 0.27; 'header:In-Reply-To:1': 0.27; 'character': 0.29; 'characters': 0.30; "i'm": 0.30; "skip:' 10": 0.31; 'produces': 0.31; 'purely': 0.31; 'probably': 0.32; 'supposed': 0.32; 'raw': 0.33; 'form.': 0.35; 'interaction': 0.36; 'sequence': 0.36; 'two': 0.37; 'represent': 0.38; 'to:addr:python- list': 0.38; 'pm,': 0.38; 'does': 0.39; 'to:addr:python.org': 0.39; 'either': 0.39; 'received:org': 0.40; 'even': 0.60; 'referred': 0.60; 'digital': 0.61; 'received:173': 0.61; 'information,': 0.61; 'simply': 0.61; 'email addr:gmail.com': 0.63; 'information': 0.63; 'such': 0.63; 'skip:n 10': 0.64; 'more': 0.64; 'between': 0.67; 'statement,': 0.68; 'respect': 0.70; 'analog': 0.84; 'received:fios.verizon.net': 0.84; 'was:': 0.91 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Terry Reedy Subject: Re: Beazley 4E P.E.R, Page29: Unicode Date: Sun, 14 Jul 2013 03:08:22 -0400 References: <51cbaddd-c29d-48a3-97ab-3beb1d944f1a@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: pool-173-75-251-66.phlapa.fios.verizon.net User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130620 Thunderbird/17.0.7 In-Reply-To: <51cbaddd-c29d-48a3-97ab-3beb1d944f1a@googlegroups.com> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 31 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1373785719 news.xs4all.nl 15985 [2001:888:2000:d::a6]:37213 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:50629 On 7/13/2013 11:09 PM, vek.m1234@gmail.com wrote: > http://stackoverflow.com/questions/17632246/beazley-4e-p-e-r-page29-unicode Is this David Beazley? (You referred to 'DB' later.) > "directly writing a raw UTF-8 encoded string such as > 'Jalape\xc3\xb1o' simply produces a nine-character string U+004A, > U+0061, U+006C, U+0061, U+0070, U+0065, U+00C3, U+00B1, U+006F, which > is probably not what you intended.This is because in UTF-8, the > multi- byte sequence \xc3\xb1 is supposed to represent the single > character U+00F1, not the two characters U+00C3 and U+00B1." > > My original question was: Shouldn't this be 8 characters - not 9? He > says: \xc3\xb1 is supposed to represent the single character. However > after some interaction with fellow Pythonistas i'm even more > confused. > > With reference to the above para: 1. What does he mean by "writing a > raw UTF-8 encoded string"?? As much respect as I have for DB, I think this is an impossible to parse confused statement, fueled by the Python 2 confusion between characters and bytes. I suggest forgetting it and the discussion that followed. Bytes as bytes can carry any digital information, just as modulated sine waves can carry any analog information. In both cases, one can regard them as either purely what they are or as encoding information in some other form. -- Terry Jan Reedy