Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #68070

Re: How is unicode implemented behind the scenes?

From Dan Sommers <dan@tombstonezero.net>
Newsgroups comp.lang.python
Subject Re: How is unicode implemented behind the scenes?
Date 2014-03-09 05:46 +0000
Organization A noiseless patient Spider
Message-ID <lfgv6t$qmf$1@dont-email.me> (permalink)
References <mailman.7942.1394330927.18130.python-list@python.org> <531bd709$0$29985$c3e8da3$5496439d@news.astraweb.com>

Show all headers | View raw


On Sun, 09 Mar 2014 03:50:49 +0000, Steven D'Aprano wrote:

> ... UTF-16 ... the letter "A" is stored as two bytes 0x0041 (or 0x4100
> depending on your platform's byte order) ...

At the risk of being pedantic, the two bytes are 0x00 and 0x41, and the
order in which they appear in memory depends on your platform and even
your particular view of that platform (do stacks grow up or down?  are
addresses of higher memory larger or smaller?).

> ... UTF-32 ... "A" would be stored as 0x00000041 or 0x41000000 ...

Or even some other sequence if you're on a PDP-11.

See <http://www.catb.org/jargon/html/M/middle-endian.html>.

But you knew that.  ;-)

Pedantic'ly yours,
Dan

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

How is unicode implemented behind the scenes? Dan Stromberg <drsalists@gmail.com> - 2014-03-08 18:08 -0800
  Re: How is unicode implemented behind the scenes? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-09 02:50 +0000
    Re: How is unicode implemented behind the scenes? Roy Smith <roy@panix.com> - 2014-03-08 22:01 -0500
      Re: How is unicode implemented behind the scenes? Chris Angelico <rosuav@gmail.com> - 2014-03-09 14:19 +1100
    Re: How is unicode implemented behind the scenes? Rustom Mody <rustompmody@gmail.com> - 2014-03-08 19:12 -0800
    Re: How is unicode implemented behind the scenes? Dan Sommers <dan@tombstonezero.net> - 2014-03-09 05:46 +0000

csiph-web