Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #92237
| Path | csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!1.eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed1a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <rosuav@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.007 |
| X-Spam-Evidence | '*H*': 0.99; '*S*': 0.00; 'subject:Python': 0.05; 'strings.': 0.07; 'utf-8': 0.07; 'ambiguity': 0.09; 'libraries.': 0.09; 'cc:addr:python-list': 0.10; 'python': 0.11; 'do,': 0.15; '*string': 0.16; 'czech': 0.16; 'decent': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'informally,': 0.16; 'module:': 0.16; 'programmer,': 0.16; 'stream.': 0.16; 'string",': 0.16; 'wrote:': 0.16; "wouldn't": 0.16; 'string': 0.17; 'bytes': 0.18; 'else,': 0.18; 'say,': 0.18; "shouldn't": 0.18; 'language': 0.19; '>>>': 0.20; 'library': 0.20; 'handling': 0.20; 'cc:2**0': 0.21; 'cc:addr:python.org': 0.21; 'suppose': 0.22; 'programming': 0.23; '2015': 0.23; 'header:In- Reply-To:1': 0.24; 'question': 0.26; 'not,': 0.27; 'issue,': 0.27; 'message-id:@mail.gmail.com': 0.28; 'went': 0.28; "i'm": 0.29; 'allowed,': 0.29; 'itself,': 0.29; 'subject:other': 0.29; 'skip:u 20': 0.30; 'that.': 0.30; "we're": 0.30; 'becomes': 0.31; 'certainly': 0.31; 'e.g.': 0.31; 'operations': 0.31; 'code': 0.31; 'language.': 0.32; 'operate': 0.32; 'subject:all': 0.32; 'problem': 0.33; "d'aprano": 0.33; 'steven': 0.33; 'though.': 0.33; 'case,': 0.34; 'presence': 0.34; 'received:google.com': 0.34; 'useful': 0.35; 'wrong': 0.35; 'false': 0.35; 'unicode': 0.35; 'something': 0.35; 'but': 0.36; 'there': 0.36; 'basic': 0.36; 'beginning': 0.36; 'two': 0.37; 'should': 0.37; 'agree': 0.37; 'subject:: ': 0.37; 'rather': 0.38; 'say': 0.38; 'mean': 0.38; 'pm,': 0.39; 'expect': 0.39; 'sure': 0.40; 'challenge': 0.61; 'provide': 0.61; 'skip:n 10': 0.63; 'here': 0.66; 'compliant': 0.66; 'reverse': 0.66; 'tasks.': 0.66; 'letters': 0.67; 'guaranteed': 0.67; 'natural': 0.67; 'choose': 0.68; 'subject:have': 0.80; 'chrisa': 0.84; 'contains.': 0.84; 'people"': 0.84; 'subject:you': 0.88; 'to:none': 0.90; 'tricky': 0.93; 'instant': 0.98 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type:content-transfer-encoding; bh=jLk2FY+iZyCfzl0ntFnbH1ttn/PKle/ORqs3pO/6nJs=; b=t/yr/cFl74eEeR835llDTGjXESTuNrHieKdVEKSlys/FC4Z0/8Zyaao3+r/Gv9ATVd XsVrMYA0WJV8OZMzTgV6YA1QO95yFGinA/kzJUGny2GomnOZO7Iz4ATb4jEg8M00tAoQ sr/BSvdcdW8BI6mveg0kHwLhMmTHcOJL0mEsihw/Gj91nTLXSfy2RfHQJ5XU6fVi0ufJ st4cAgtDFBxqHNJEBJl/0LQLsTxvHaI3qVmTcjL+Equ/58nWVSi0lvTS7KSB+STi9Bhc XlOk9ohc0jatn1lIuIMrDyTir3RLAbDcFteqDfzANlkR5/7wqqXKbA2mcZuNUKOE2J3w TQIg== |
| MIME-Version | 1.0 |
| X-Received | by 10.107.131.196 with SMTP id n65mr14313530ioi.53.1433678886421; Sun, 07 Jun 2015 05:08:06 -0700 (PDT) |
| In-Reply-To | <55742e0e$0$12980$c3e8da3$5496439d@news.astraweb.com> |
| References | <555f440a$0$12990$c3e8da3$5496439d@news.astraweb.com> <mailman.222.1432309028.17265.python-list@python.org> <2212595.DFZ6OqehRn@PointedEars.de> <55607a1b$0$13011$c3e8da3$5496439d@news.astraweb.com> <2c4d029c-8ea5-465b-8adc-6c35185bd150@googlegroups.com> <2483375.eHyISxeWLQ@PointedEars.de> <55742e0e$0$12980$c3e8da3$5496439d@news.astraweb.com> |
| Date | Sun, 7 Jun 2015 22:08:06 +1000 |
| Subject | Re: Ah Python, you have spoiled me for all other languages |
| From | Chris Angelico <rosuav@gmail.com> |
| Cc | "python-list@python.org" <python-list@python.org> |
| Content-Type | text/plain; charset=UTF-8 |
| Content-Transfer-Encoding | quoted-printable |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.20+ |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.243.1433678894.13271.python-list@python.org> (permalink) |
| Lines | 59 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1433678894 news.xs4all.nl 2853 [2001:888:2000:d::a6]:33202 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:92237 |
Show key headers only | View raw
On Sun, Jun 7, 2015 at 9:42 PM, Steven D'Aprano <steve@pearwood.info> wrote:
> My opinion is that a programming language like Python or ECMAScript should
> operate on *code points*. If we want to call them "characters" informally,
> that should be allowed, but whenever there is ambiguity we should remember
> we're dealing with code points. The implementation shouldn't matter:
> compliant Python interpreters might choose to use UTF-8 internally, or
> UTF-16, or UTF-32, or something else, and still agree on how many
> characters a string contains. Normalisation is still an issue, of course,
> but any decent Unicode implementation will include a way to normalise or
> denormalise strings.
If by "normalise" you mean the NF[K]C/NF[K]D composition and
decomposition, then yes, any decent Unicode library will provide that.
I'm not sure it's critical to string handling itself, though; and
Python defers the operation to the unicodedata module:
>>> s1 = "\N{LATIN SMALL LETTER A}\N{COMBINING ACUTE ACCENT}"
>>> s2 = "\N{LATIN SMALL LETTER A WITH ACUTE}"
>>> s1 == s2
False
>>> unicodedata.normalize("NFC", s1) == s2
True
It's a useful operation to be able to do, but I would never expect
that *string comparison* or other operations should automatically
normalize. (Unless you want to say that all strings are guaranteed to
be NFC/NFD normalized, such that s1 and s2 would actually be
identical, which I suppose is plausible. I'm not sure what the
advantage would be, though. And certainly you wouldn't want to
K-normalize strings automatically.)
> The question of graphemes (what "ordinary people" consider letters and
> characters, e.g. "ch" is two letters to an English speaker but one letter
> to a Czech speaker) should be left to libraries. It's a much harder problem
> to solve in the full general case, requires localisation, and is overkill
> for many string-processing tasks.
Yeah. The basic challenge to a beginning programmer, "reverse this
string", becomes rather tricky in the presence of natural language.
>>> s1 += "e"
>>> s1
'áe'
>>> s1[::-1]
'éa'
Oops.
But hey. It's easier to understand what went wrong here than, say, if
you reverse the bytes in a UTF-8 stream. Or the code units in a UTF-16
stream. If you're lucky, those would give you instant errors... if
you're not, well, who knows.
ChrisA
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-05-23 00:58 +1000
Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-23 01:29 +1000
Re: Ah Python, you have spoiled me for all other languages wxjmfauth@gmail.com - 2015-05-22 10:57 -0700
Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-22 16:40 -0500
Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-22 16:40 -0500
Re: Ah Python, you have spoiled me for all other languages Terry Reedy <tjreedy@udel.edu> - 2015-05-22 21:54 -0400
Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-23 06:12 -0500
Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-23 06:12 -0500
Re: Ah Python, you have spoiled me for all other languages Terry Reedy <tjreedy@udel.edu> - 2015-05-23 13:26 -0400
Re: Ah Python, you have spoiled me for all other languages Michael Torrie <torriem@gmail.com> - 2015-05-22 21:31 -0600
Re: Ah Python, you have spoiled me for all other languages Johannes Bauer <dfnsonfsduifb@gmx.de> - 2015-05-23 08:55 +0200
Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-23 06:21 -0500
Re: Ah Python, you have spoiled me for all other languages Johannes Bauer <dfnsonfsduifb@gmx.de> - 2015-05-23 15:24 +0200
Re: Ah Python, you have spoiled me for all other languages Marko Rauhamaa <marko@pacujo.net> - 2015-05-23 20:05 +0300
Re: Ah Python, you have spoiled me for all other languages Johannes Bauer <dfnsonfsduifb@gmx.de> - 2015-05-24 20:29 +0200
Re: Ah Python, you have spoiled me for all other languages Marko Rauhamaa <marko@pacujo.net> - 2015-05-23 15:44 +0300
Re: Ah Python, you have spoiled me for all other languages Johannes Bauer <dfnsonfsduifb@gmx.de> - 2015-05-23 15:17 +0200
Re: Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-05-24 00:00 +1000
Re: Ah Python, you have spoiled me for all other languages Marko Rauhamaa <marko@pacujo.net> - 2015-05-23 19:53 +0300
Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-24 03:41 +1000
Re: Ah Python, you have spoiled me for all other languages Marko Rauhamaa <marko@pacujo.net> - 2015-05-23 22:02 +0300
Re: Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-05-24 20:26 +1000
Re: Ah Python, you have spoiled me for all other languages Marko Rauhamaa <marko@pacujo.net> - 2015-05-24 18:26 +0300
Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-25 01:35 +1000
Re: Ah Python, you have spoiled me for all other languages Marko Rauhamaa <marko@pacujo.net> - 2015-05-25 09:57 +0300
Re: Ah Python, you have spoiled me for all other languages Laura Creighton <lac@openend.se> - 2015-05-25 11:39 +0200
Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-25 21:09 +1000
Re: Ah Python, you have spoiled me for all other languages Michael Torrie <torriem@gmail.com> - 2015-05-23 21:00 -0600
Re: Ah Python, you have spoiled me for all other languages Marko Rauhamaa <marko@pacujo.net> - 2015-05-24 11:23 +0300
Re: Ah Python, you have spoiled me for all other languages Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-22 22:10 -0600
Re: Ah Python, you have spoiled me for all other languages amber <amber.of.luxor@gmail.com> - 2015-05-23 04:11 +0000
Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-23 06:11 -0500
Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-23 06:11 -0500
Re: Ah Python, you have spoiled me for all other languages Ben Finney <ben+python@benfinney.id.au> - 2015-05-23 14:20 +1000
Re: Ah Python, you have spoiled me for all other languages Michael Torrie <torriem@gmail.com> - 2015-05-22 22:30 -0600
Re: Ah Python, you have spoiled me for all other languages Jon Ribbens <jon+usenet@unequivocal.co.uk> - 2015-05-23 11:10 +0000
Re: Ah Python, you have spoiled me for all other languages Tim Chase <python.list@tim.thechases.com> - 2015-05-23 06:34 -0500
Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-23 21:40 +1000
Re: Ah Python, you have spoiled me for all other languages Michael Torrie <torriem@gmail.com> - 2015-05-23 20:57 -0600
Re: Ah Python, you have spoiled me for all other languages Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-24 01:22 -0600
Re: Ah Python, you have spoiled me for all other languages Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-22 22:29 -0600
Re: Ah Python, you have spoiled me for all other languages Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-22 22:49 -0600
Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-23 14:49 +1000
Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-23 06:29 -0500
Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-23 14:55 +1000
Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-23 14:28 +1000
Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-23 14:21 +1000
Re: Ah Python, you have spoiled me for all other languages Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-05-23 14:33 +0200
Re: Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-05-23 23:01 +1000
Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-23 23:12 +1000
Re: Ah Python, you have spoiled me for all other languages wxjmfauth@gmail.com - 2015-05-23 23:37 -0700
Re: Ah Python, you have spoiled me for all other languages Ned Batchelder <ned@nedbatchelder.com> - 2015-05-23 06:35 -0700
Re: Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-05-24 00:09 +1000
Re: Ah Python, you have spoiled me for all other languages Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-06-07 10:21 +0200
Re: Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-06-07 21:42 +1000
Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-06-07 22:08 +1000
Re: Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-06-07 23:24 +1000
Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-06-08 00:47 +1000
Re: Ah Python, you have spoiled me for all other languages random832@fastmail.us - 2015-06-07 10:58 -0400
Re: Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-06-08 02:28 +1000
Re: Ah Python, you have spoiled me for all other languages Tony the Tiger <tony@tiger.invalid> - 2015-05-22 16:31 +0000
Re: Ah Python, you have spoiled me for all other languages Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-05-22 17:57 +0100
Re: Ah Python, you have spoiled me for all other languages Tim Daneliuk <tundra@tundraware.com> - 2015-05-22 16:41 -0500
Re: Ah Python, you have spoiled me for all other languages Tony the Tiger <tony@tiger.invalid> - 2015-05-23 20:25 +0000
Re: Ah Python, you have spoiled me for all other languages Grant Edwards <invalid@invalid.invalid> - 2015-05-22 17:47 +0000
Re: Ah Python, you have spoiled me for all other languages Chris Angelico <rosuav@gmail.com> - 2015-05-23 04:11 +1000
Re: Ah Python, you have spoiled me for all other languages mm0fmf <none@mailinator.com> - 2015-05-22 19:19 +0100
Re: Ah Python, you have spoiled me for all other languages Laura Creighton <lac@openend.se> - 2015-05-22 21:14 +0200
Re: Ah Python, you have spoiled me for all other languages Steven D'Aprano <steve@pearwood.info> - 2015-05-23 11:36 +1000
Re: Ah Python, you have spoiled me for all other languages MRAB <python@mrabarnett.plus.com> - 2015-05-22 20:34 +0100
Re: Ah Python, you have spoiled me for all other languages Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-22 13:56 -0600
Re: Ah Python, you have spoiled me for all other languages Marko Rauhamaa <marko@pacujo.net> - 2015-05-22 23:34 +0300
Re: Ah Python, you have spoiled me for all other languages Tim Chase <python.list@tim.thechases.com> - 2015-05-22 15:55 -0500
Re: Ah Python, you have spoiled me for all other languages Ethan Furman <ethan@stoneleaf.us> - 2015-05-22 14:15 -0700
Re: Ah Python, you have spoiled me for all other languages Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-22 15:20 -0600
Re: Ah Python, you have spoiled me for all other languages Paul Rubin <no.email@nospam.invalid> - 2015-05-22 16:00 -0700
Re: Ah Python, you have spoiled me for all other languages Michael Torrie <torriem@gmail.com> - 2015-05-22 21:33 -0600
csiph-web