Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #50629

Re: Beazley 4E P.E.R, Page29: Unicode

From Terry Reedy <tjreedy@udel.edu>
Subject Re: Beazley 4E P.E.R, Page29: Unicode
Date 2013-07-14 03:08 -0400
References <51cbaddd-c29d-48a3-97ab-3beb1d944f1a@googlegroups.com>
Newsgroups comp.lang.python
Message-ID <mailman.4695.1373785719.3114.python-list@python.org> (permalink)

Show all headers | View raw


On 7/13/2013 11:09 PM, vek.m1234@gmail.com wrote:
> http://stackoverflow.com/questions/17632246/beazley-4e-p-e-r-page29-unicode

Is this David Beazley? (You referred to 'DB' later.)

>  "directly writing a raw UTF-8 encoded string such as
> 'Jalape\xc3\xb1o' simply produces a nine-character string U+004A,
> U+0061, U+006C, U+0061, U+0070, U+0065, U+00C3, U+00B1, U+006F, which
> is probably not what you intended.This is because in UTF-8, the
> multi- byte sequence \xc3\xb1 is supposed to represent the single
> character U+00F1, not the two characters U+00C3 and U+00B1."
>
> My original question was: Shouldn't this be 8 characters - not 9? He
> says: \xc3\xb1 is supposed to represent the single character. However
> after some interaction with fellow Pythonistas i'm even more
> confused.
>
> With reference to the above para: 1. What does he mean by "writing a
> raw UTF-8 encoded string"??

As much respect as I have for DB, I think this is an impossible to parse 
confused statement, fueled by the Python 2 confusion between characters 
and bytes. I suggest forgetting it and the discussion that followed. 
Bytes as bytes can carry any digital information, just as modulated sine 
waves can carry any analog information. In both cases, one can regard 
them as either purely what they are or as encoding information in some 
other form.

-- 
Terry Jan Reedy

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Beazley 4E P.E.R, Page29: Unicode vek.m1234@gmail.com - 2013-07-13 20:09 -0700
  Re: Beazley 4E P.E.R, Page29: Unicode Terry Reedy <tjreedy@udel.edu> - 2013-07-14 03:08 -0400
  Re: Beazley 4E P.E.R, Page29: Unicode Joshua Landau <joshua@landau.ws> - 2013-07-14 08:13 +0100
    Re: Beazley 4E P.E.R, Page29: Unicode vek.m1234@gmail.com - 2013-07-14 01:10 -0700
  Re: Beazley 4E P.E.R, Page29: Unicode Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-14 08:18 +0000
    Re: Beazley 4E P.E.R, Page29: Unicode vek.m1234@gmail.com - 2013-07-14 02:39 -0700

csiph-web