Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #68073
| Newsgroups | comp.lang.python |
|---|---|
| Date | 2014-03-09 00:39 -0800 |
| References | <CAGGBd_rSN1bMHkQYix8Lo0TfXi3_k+Q9nu25vMokR1+Eumf5Cg@mail.gmail.com> <mailman.7943.1394332835.18130.python-list@python.org> |
| Message-ID | <751cbe5d-ebbe-4f4e-93a9-6012667297e3@googlegroups.com> (permalink) |
| Subject | Re: How is unicode implemented behind the scenes? |
| From | wxjmfauth@gmail.com |
Le dimanche 9 mars 2014 03:40:28 UTC+1, MRAB a écrit : > On 2014-03-09 02:08, Dan Stromberg wrote: > > > OK, I know that Unicode data is stored in an encoding on disk. > > > > > > But how is it stored in RAM? > > > > > > I realize I shouldn't write code that depends on any relevant > > > implementation details, but knowing some of the more common > > > implementation options would probably help build an intuition for > > > what's going on internally. > > > > > > I've heard that characters are no longer all c bytes wide internally, > > > so is it sometimes utf-8? > > > > > No. > > > > From Python 3.3, it's an array of 1, 2 or 4 bytes per codepoint. > > > > In Python terms: > > > > if all(c <= '\xFF' for c in string): > > use 1 byte per codepoint > > elif all(c <= '\xFFFF' for c in string): > > use 2 bytes per codepoint > > else: > > use 4 bytes per codepoint A very, very nice recursive mathematical absurdity. jmf
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Re: How is unicode implemented behind the scenes? MRAB <python@mrabarnett.plus.com> - 2014-03-09 02:40 +0000
Re: How is unicode implemented behind the scenes? wxjmfauth@gmail.com - 2014-03-09 00:39 -0800
Re: How is unicode implemented behind the scenes? Rustom Mody <rustompmody@gmail.com> - 2014-03-09 03:32 -0700
Re: How is unicode implemented behind the scenes? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-09 14:53 +0000
csiph-web