Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #52371 > unrolled thread

Re: Could you verify this, Oh Great Unicode Experts of the Python-List?

Started byChris Angelico <rosuav@gmail.com>
First post2013-08-11 07:24 +0100
Last post2013-08-11 07:24 +0100
Articles 1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Could you verify this, Oh Great Unicode Experts of the Python-List? Chris Angelico <rosuav@gmail.com> - 2013-08-11 07:24 +0100

#52371 — Re: Could you verify this, Oh Great Unicode Experts of the Python-List?

FromChris Angelico <rosuav@gmail.com>
Date2013-08-11 07:24 +0100
SubjectRe: Could you verify this, Oh Great Unicode Experts of the Python-List?
Message-ID<mailman.471.1376211637.1251.python-list@python.org>
On Sun, Aug 11, 2013 at 7:17 AM, Joshua Landau <joshua@landau.ws> wrote:
> Given tweet = b"caf\x65\xCC\x81".decode():
>
>     >>> tweet
>     'café'
>
> But:
>
>     >>> len(tweet)
>     5

You're now looking at the difference between glyphs and combining
characters. Twitter counts combining characters, so when you build one
"thing" out of lots of separately-typed parts, it does count as more
characters.

Read this article for some arguments on the subject, including a
number of references to Twitter itself:

http://unspecified.wordpress.com/2012/04/19/the-importance-of-language-level-abstract-unicode-strings/

ChrisA

[toc] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web