Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #52371

Re: Could you verify this, Oh Great Unicode Experts of the Python-List?

References <CAN1F8qXgBwTGSkbP3N1uJZPJw1CY=4O4ptQurV-2=Gmm4UiYbw@mail.gmail.com>
Date 2013-08-11 07:24 +0100
Subject Re: Could you verify this, Oh Great Unicode Experts of the Python-List?
From Chris Angelico <rosuav@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.471.1376211637.1251.python-list@python.org> (permalink)

Show all headers | View raw


On Sun, Aug 11, 2013 at 7:17 AM, Joshua Landau <joshua@landau.ws> wrote:
> Given tweet = b"caf\x65\xCC\x81".decode():
>
>     >>> tweet
>     'café'
>
> But:
>
>     >>> len(tweet)
>     5

You're now looking at the difference between glyphs and combining
characters. Twitter counts combining characters, so when you build one
"thing" out of lots of separately-typed parts, it does count as more
characters.

Read this article for some arguments on the subject, including a
number of references to Twitter itself:

http://unspecified.wordpress.com/2012/04/19/the-importance-of-language-level-abstract-unicode-strings/

ChrisA

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Could you verify this, Oh Great Unicode Experts of the Python-List? Chris Angelico <rosuav@gmail.com> - 2013-08-11 07:24 +0100

csiph-web