Path: csiph.com!usenet.pasdenom.info!dedibox.gegeweb.org!gegeweb.eu!nntpfeed.proxad.net!proxad.net!feeder1-2.proxad.net!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '16,': 0.03; 'alias': 0.07; 'ascii': 0.07; 'suppose': 0.07; 'python': 0.09; '32-bit': 0.09; 'integers': 0.09; 'pointers': 0.09; 'subject:string': 0.09; 'to:addr:comp.lang.python': 0.09; 'unsigned': 0.09; 'cc:addr :python-list': 0.10; 'language': 0.14; '"code': 0.16; '24,': 0.16; '36,': 0.16; 'int32': 0.16; 'integers.': 0.16; 'libs"': 0.16; 'merely': 0.16; 'subject:unicode': 0.16; 'wrote:': 0.17; 'bytes': 0.17; 'else,': 0.17; 'integer': 0.17; 'mechanism': 0.17; 'unicode': 0.17; '(or': 0.18; 'ideal': 0.20; 'sort': 0.21; 'supposed': 0.21; 'cc:no real name:2**0': 0.24; 'idea': 0.24; 'cc:2**1': 0.24; 'tried': 0.25; 'least': 0.25; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header :User-Agent:1': 0.26; 'values': 0.26; "d'aprano": 0.29; 'dan': 0.29; 'idea,': 0.29; 'steven': 0.29; 'points': 0.29; 'probably': 0.29; "i'm": 0.29; 'sense': 0.31; 'code': 0.31; 'int': 0.33; '(with': 0.33; 'point.': 0.33; 'that,': 0.34; 'received:google.com': 0.34; 'doing': 0.35; 'too.': 0.35; 'received:209.85': 0.35; 'something': 0.35; 'but': 0.36; 'characters': 0.36; 'too': 0.36; 'bad': 0.37; 'why': 0.37; 'rather': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'positive': 0.38; 'nothing': 0.38; 'sure': 0.38; 'skip:" 10': 0.40; 'your': 0.60; 'from:no real name:2**0': 0.60; 'real': 0.61; 'subject:, ': 0.61; 'subject:...': 0.63; '20,': 0.65; 'readers': 0.65; 'pleasure': 0.75; 'pain': 0.84; 'subject:, ...': 0.84; 'safety,': 0.91 Newsgroups: comp.lang.python Date: Mon, 27 Aug 2012 12:16:27 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=83.78.126.50; posting-account=ung4FAoAAAC46zhHJ0Nsnuox7M5gDvs_ References: <1cb3f062-eb45-4b0c-977b-76afb099923c@googlegroups.com> <503a0d51$0$6574$c3e8da3$5496439d@news.astraweb.com> <503a8361$0$6574$c3e8da3$5496439d@news.astraweb.com> User-Agent: G2/1.0 X-Google-Web-Client: true X-Google-IP: 83.78.126.50 MIME-Version: 1.0 Subject: Re: Flexible string representation, unicode, typography, ... From: wxjmfauth@gmail.com To: comp.lang.python@googlegroups.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: python-list@python.org, Steven D'Aprano X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Message-ID: Lines: 60 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1346094990 news.xs4all.nl 6920 [2001:888:2000:d::a6]:56488 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:27995 Le dimanche 26 ao=FBt 2012 22:45:09 UTC+2, Dan Sommers a =E9crit=A0: > On 2012-08-26 at 20:13:21 +0000, >=20 > Steven D'Aprano wrote: >=20 >=20 >=20 > > I note that not all 32-bit ints are valid code points. I suppose I can >=20 > > see sense in having rune be a 32-bit integer value limited to those >=20 > > valid code points. (But, dammit, why not call it a code point?) But if >=20 > > rune is merely an alias for int32, why not just call it int32? >=20 >=20 >=20 > Having a "code point" type is a good idea. If nothing else, human code >=20 > readers can tell that you're doing something with characters rather than >=20 > something with integers. If your language provides any sort of type >=20 > safety, then you get that, too. >=20 >=20 >=20 > Calling your code points int32 is a bad idea for the same reason that it >=20 > turned out to be a bad idea to call all my old ASCII characters int8. >=20 > Or all my pointers int (or unsigned int), for n in 16, 20, 24, 32, >=20 > 36, 48, or 64 (or I'm sure other values of n that I never had the pain >=20 > or pleasure of using). >=20 And this is precisely the concept of rune, a real int which is a name for Unicode code point. Go "has" the integers int32 and int64. A rune ensure the usage of int32. "Text libs" use runes. Go has only bytes and runes. If you do not like the word "perfection", this mechanism has at least an ideal simplicity (with probably a lot of positive consequences). rune -> int32 -> utf32 -> unicode code points. - Why int32 and not uint32? No idea, I tried to find an answer without asking. - I find the name "rune" elegant. "char" would have been too confusing. End. This is supposed to be a Python forum. jmf