Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #92237

Re: Ah Python, you have spoiled me for all other languages

References (2 earlier) <2212595.DFZ6OqehRn@PointedEars.de> <55607a1b$0$13011$c3e8da3$5496439d@news.astraweb.com> <2c4d029c-8ea5-465b-8adc-6c35185bd150@googlegroups.com> <2483375.eHyISxeWLQ@PointedEars.de> <55742e0e$0$12980$c3e8da3$5496439d@news.astraweb.com>
Date 2015-06-07 22:08 +1000
Subject Re: Ah Python, you have spoiled me for all other languages
From Chris Angelico <rosuav@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.243.1433678894.13271.python-list@python.org> (permalink)

Show all headers | View raw


On Sun, Jun 7, 2015 at 9:42 PM, Steven D'Aprano <steve@pearwood.info> wrote:
> My opinion is that a programming language like Python or ECMAScript should
> operate on *code points*. If we want to call them "characters" informally,
> that should be allowed, but whenever there is ambiguity we should remember
> we're dealing with code points. The implementation shouldn't matter:
> compliant Python interpreters might choose to use UTF-8 internally, or
> UTF-16, or UTF-32, or something else, and still agree on how many
> characters a string contains. Normalisation is still an issue, of course,
> but any decent Unicode implementation will include a way to normalise or
> denormalise strings.

If by "normalise" you mean the NF[K]C/NF[K]D composition and
decomposition, then yes, any decent Unicode library will provide that.
I'm not sure it's critical to string handling itself, though; and
Python defers the operation to the unicodedata module:

>>> s1 = "\N{LATIN SMALL LETTER A}\N{COMBINING ACUTE ACCENT}"
>>> s2 = "\N{LATIN SMALL LETTER A WITH ACUTE}"
>>> s1 == s2
False
>>> unicodedata.normalize("NFC", s1) == s2
True

It's a useful operation to be able to do, but I would never expect
that *string comparison* or other operations should automatically
normalize. (Unless you want to say that all strings are guaranteed to
be NFC/NFD normalized, such that s1 and s2 would actually be
identical, which I suppose is plausible. I'm not sure what the
advantage would be, though. And certainly you wouldn't want to
K-normalize strings automatically.)

> The question of graphemes (what "ordinary people" consider letters and
> characters, e.g. "ch" is two letters to an English speaker but one letter
> to a Czech speaker) should be left to libraries. It's a much harder problem
> to solve in the full general case, requires localisation, and is overkill
> for many string-processing tasks.

Yeah. The basic challenge to a beginning programmer, "reverse this
string", becomes rather tricky in the presence of natural language.

>>> s1 += "e"
>>> s1
'áe'
>>> s1[::-1]
'éa'

Oops.

But hey. It's easier to understand what went wrong here than, say, if
you reverse the bytes in a UTF-8 stream. Or the code units in a UTF-16
stream. If you're lucky, those would give you instant errors... if
you're not, well, who knows.

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-05-23 00:58 +1000
  Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-23 01:29 +1000
    Re: Ah Python, you have spoiled me for all other languages wxjmfauth@gmail.com - 2015-05-22 10:57 -0700
    Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-22 16:40 -0500
    Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-22 16:40 -0500
      Re: Ah Python, you have spoiled me for all other languages Terry Reedy <tjreedy@udel.edu> - 2015-05-22 21:54 -0400
        Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-23 06:12 -0500
        Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-23 06:12 -0500
          Re: Ah Python, you have spoiled me for all other languages Terry Reedy <tjreedy@udel.edu> - 2015-05-23 13:26 -0400
      Re: Ah Python, you have spoiled me for all other languages Michael Torrie <torriem@gmail.com> - 2015-05-22 21:31 -0600
        Re: Ah Python, you have spoiled me for all other languages Johannes Bauer <dfnsonfsduifb@gmx.de> - 2015-05-23 08:55 +0200
          Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-23 06:21 -0500
            Re: Ah Python, you have spoiled me for all other languages Johannes Bauer <dfnsonfsduifb@gmx.de> - 2015-05-23 15:24 +0200
              Re: Ah Python, you have spoiled me for all other languages Marko Rauhamaa <marko@pacujo.net> - 2015-05-23 20:05 +0300
                Re: Ah Python, you have spoiled me for all other languages Johannes Bauer <dfnsonfsduifb@gmx.de> - 2015-05-24 20:29 +0200
          Re: Ah Python, you have spoiled me for all other languages Marko Rauhamaa <marko@pacujo.net> - 2015-05-23 15:44 +0300
            Re: Ah Python, you have spoiled me for all other languages Johannes Bauer <dfnsonfsduifb@gmx.de> - 2015-05-23 15:17 +0200
            Re: Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-05-24 00:00 +1000
              Re: Ah Python, you have spoiled me for all other languages Marko Rauhamaa <marko@pacujo.net> - 2015-05-23 19:53 +0300
                Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-24 03:41 +1000
                Re: Ah Python, you have spoiled me for all other languages Marko Rauhamaa <marko@pacujo.net> - 2015-05-23 22:02 +0300
                Re: Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-05-24 20:26 +1000
                Re: Ah Python, you have spoiled me for all other languages Marko Rauhamaa <marko@pacujo.net> - 2015-05-24 18:26 +0300
                Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-25 01:35 +1000
                Re: Ah Python, you have spoiled me for all other languages Marko Rauhamaa <marko@pacujo.net> - 2015-05-25 09:57 +0300
                Re: Ah Python, you have spoiled me for all other languages Laura Creighton <lac@openend.se> - 2015-05-25 11:39 +0200
                Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-25 21:09 +1000
            Re: Ah Python, you have spoiled me for all other languages Michael Torrie <torriem@gmail.com> - 2015-05-23 21:00 -0600
              Re: Ah Python, you have spoiled me for all other languages Marko Rauhamaa <marko@pacujo.net> - 2015-05-24 11:23 +0300
      Re: Ah Python, you have spoiled me for all other languages Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-22 22:10 -0600
      Re: Ah Python, you have spoiled me for all other languages amber <amber.of.luxor@gmail.com> - 2015-05-23 04:11 +0000
        Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-23 06:11 -0500
        Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-23 06:11 -0500
      Re: Ah Python, you have spoiled me for all other languages Ben Finney <ben+python@benfinney.id.au> - 2015-05-23 14:20 +1000
      Re: Ah Python, you have spoiled me for all other languages Michael Torrie <torriem@gmail.com> - 2015-05-22 22:30 -0600
        Re: Ah Python, you have spoiled me for all other languages Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2015-05-23 11:10 +0000
          Re: Ah Python, you have spoiled me for all other languages Tim Chase <python.list@tim.thechases.com> - 2015-05-23 06:34 -0500
          Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-23 21:40 +1000
          Re: Ah Python, you have spoiled me for all other languages Michael Torrie <torriem@gmail.com> - 2015-05-23 20:57 -0600
          Re: Ah Python, you have spoiled me for all other languages Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-24 01:22 -0600
      Re: Ah Python, you have spoiled me for all other languages Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-22 22:29 -0600
      Re: Ah Python, you have spoiled me for all other languages Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-22 22:49 -0600
      Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-23 14:49 +1000
        Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-23 06:29 -0500
      Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-23 14:55 +1000
      Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-23 14:28 +1000
      Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-23 14:21 +1000
    Re: Ah Python, you have spoiled me for all other languages Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-05-23 14:33 +0200
      Re: Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-05-23 23:01 +1000
        Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-23 23:12 +1000
          Re: Ah Python, you have spoiled me for all other languages wxjmfauth@gmail.com - 2015-05-23 23:37 -0700
        Re: Ah Python, you have spoiled me for all other languages Ned Batchelder <ned@nedbatchelder.com> - 2015-05-23 06:35 -0700
          Re: Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-05-24 00:09 +1000
          Re: Ah Python, you have spoiled me for all other languages Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 10:21 +0200
            Re: Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-06-07 21:42 +1000
              Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-06-07 22:08 +1000
                Re: Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-06-07 23:24 +1000
                Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-06-08 00:47 +1000
              Re: Ah Python, you have spoiled me for all other languages random832@fastmail.us - 2015-06-07 10:58 -0400
                Re: Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-06-08 02:28 +1000
  Re: Ah Python, you have spoiled me for all other languages Tony the Tiger <tony@tiger.invalid> - 2015-05-22 16:31 +0000
    Re: Ah Python, you have spoiled me for all other languages Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-05-22 17:57 +0100
    Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-22 16:41 -0500
      Re: Ah Python, you have spoiled me for all other languages Tony the Tiger <tony@tiger.invalid> - 2015-05-23 20:25 +0000
  Re: Ah Python, you have spoiled me for all other languages Grant Edwards <invalid@invalid.invalid> - 2015-05-22 17:47 +0000
    Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-23 04:11 +1000
    Re: Ah Python, you have spoiled me for all other languages mm0fmf <none@mailinator.com> - 2015-05-22 19:19 +0100
    Re: Ah Python, you have spoiled me for all other languages Laura Creighton <lac@openend.se> - 2015-05-22 21:14 +0200
      Re: Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-05-23 11:36 +1000
    Re: Ah Python, you have spoiled me for all other languages MRAB <python@mrabarnett.plus.com> - 2015-05-22 20:34 +0100
    Re: Ah Python, you have spoiled me for all other languages Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-22 13:56 -0600
      Re: Ah Python, you have spoiled me for all other languages Marko Rauhamaa <marko@pacujo.net> - 2015-05-22 23:34 +0300
        Re: Ah Python, you have spoiled me for all other languages Tim Chase <python.list@tim.thechases.com> - 2015-05-22 15:55 -0500
        Re: Ah Python, you have spoiled me for all other languages Ethan Furman <ethan@stoneleaf.us> - 2015-05-22 14:15 -0700
        Re: Ah Python, you have spoiled me for all other languages Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-22 15:20 -0600
  Re: Ah Python, you have spoiled me for all other languages Paul Rubin <no.email@nospam.invalid> - 2015-05-22 16:00 -0700
    Re: Ah Python, you have spoiled me for all other languages Michael Torrie <torriem@gmail.com> - 2015-05-22 21:33 -0600

csiph-web