Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #52362

Could you verify this, Oh Great Unicode Experts of the Python-List?

From Joshua Landau <joshua@landau.ws>
Date 2013-08-11 07:17 +0100
Subject Could you verify this, Oh Great Unicode Experts of the Python-List?
Newsgroups comp.lang.python
Message-ID <mailman.468.1376201912.1251.python-list@python.org> (permalink)

Show all headers | View raw


Basically, I think Twitter's broken.

For my full discusion on the matter, see:
http://www.reddit.com/r/learnpython/comments/1k2yrn/help_with_len_and_input_function_33/cbku5e8

Here's the first post of mine, ineffectually edited for this list:

"""
<strikethrough>The obvious solution [to getting the length of a tweet]
is wrong. Like, slightly wrong¹.</strikethrough>

Given tweet = b"caf\x65\xCC\x81".decode():

    >>> tweet
    'café'

But:

    >>> len(tweet)
    5

So the solution is:

    >>> import unicodedata
    >>> len(unicodedata.normalize("NFC", tweet))
    4

<strikethrough>Read twitter's commentary¹ for proof.</strikethrough>

<strikethrough>There are additional complications I'm trying to sort
out.</strikethrough>
________________________________

After further testing (I don't actually use Twitter) it seems the
whole thing was just smoke and mirrors. The linked article is a lie,
at least on the user's end.

On Linux you can prove this by running:

    >>> p = subprocess.Popen(['xsel', '-bi'], stdin=subprocess.PIPE)
    >>> p.communicate(input=b"caf\x65\xCC\x81")
    (None, None)

"café" will be in your Copy-Paste buffer, and you can paste it in to
the tweet-box. It takes 5 characters. So much for testing ;).

________________________________
¹ https://dev.twitter.com/docs/counting-characters#Definition_of_a_Character
"""


I know this isn't *really* Python-related, but there's Python involved
and you're the sort of people who'll be able to tell me what I've done
wrong, if anything.

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Could you verify this, Oh Great Unicode Experts of the Python-List? Joshua Landau <joshua@landau.ws> - 2013-08-11 07:17 +0100
  Re: Could you verify this, Oh Great Unicode Experts of the Python-List? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-11 09:09 +0000
    Re: Could you verify this, Oh Great Unicode Experts of the Python-List? Joshua Landau <joshua@landau.ws> - 2013-08-11 10:44 +0100
      Re: Could you verify this, Oh Great Unicode Experts of the Python-List? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-11 11:14 +0000
        Re: Could you verify this, Oh Great Unicode Experts of the Python-List? Chris Angelico <rosuav@gmail.com> - 2013-08-11 12:45 +0100
        Re: Could you verify this, Oh Great Unicode Experts of the Python-List? Joshua Landau <joshua@landau.ws> - 2013-08-11 12:59 +0100
        Re: Could you verify this, Oh Great Unicode Experts of the Python-List? Joshua Landau <joshua@landau.ws> - 2013-08-13 09:40 +0100
    Re: Could you verify this, Oh Great Unicode Experts of the Python-List? wxjmfauth@gmail.com - 2013-08-11 05:51 -0700
      Re: Could you verify this, Oh Great Unicode Experts of the Python-List? Joshua Landau <joshua@landau.ws> - 2013-08-11 14:07 +0100

csiph-web