Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #68059

Re: How is unicode implemented behind the scenes?

Date 2014-03-09 02:40 +0000
From MRAB <python@mrabarnett.plus.com>
Subject Re: How is unicode implemented behind the scenes?
References <CAGGBd_rSN1bMHkQYix8Lo0TfXi3_k+Q9nu25vMokR1+Eumf5Cg@mail.gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.7943.1394332835.18130.python-list@python.org> (permalink)

Show all headers | View raw


On 2014-03-09 02:08, Dan Stromberg wrote:
> OK, I know that Unicode data is stored in an encoding on disk.
>
> But how is it stored in RAM?
>
> I realize I shouldn't write code that depends on any relevant
> implementation details, but knowing some of the more common
> implementation options would probably help build an intuition for
> what's going on internally.
>
> I've heard that characters are no longer all c bytes wide internally,
> so is it sometimes utf-8?
>
No.

 From Python 3.3, it's an array of 1, 2 or 4 bytes per codepoint.

In Python terms:

if all(c <= '\xFF' for c in string):
     use 1 byte per codepoint
elif all(c <= '\xFFFF' for c in string):
     use 2 bytes per codepoint
else:
     use 4 bytes per codepoint

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Re: How is unicode implemented behind the scenes? MRAB <python@mrabarnett.plus.com> - 2014-03-09 02:40 +0000
  Re: How is unicode implemented behind the scenes? wxjmfauth@gmail.com - 2014-03-09 00:39 -0800
    Re: How is unicode implemented behind the scenes? Rustom Mody <rustompmody@gmail.com> - 2014-03-09 03:32 -0700
      Re: How is unicode implemented behind the scenes? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-09 14:53 +0000

csiph-web