Path: csiph.com!usenet.pasdenom.info!news.albasani.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'explicitly': 0.04; 'string.': 0.04; 'alias': 0.07; 'ascii': 0.07; 'bytes.': 0.07; 'missed': 0.09; 'python': 0.09; 'boundaries': 0.09; 'comfortably': 0.09; 'subject:string': 0.09; 'aug': 0.13; '(the': 0.15; '12:59': 0.16; '[*]': 0.16; 'buggy': 0.16; 'decoding': 0.16; 'int32': 0.16; 'subject:unicode': 0.16; 'unicode.': 0.16; 'string': 0.17; 'wrote:': 0.17; 'bytes': 0.17; 'unicode': 0.17; 'memory': 0.18; 'sorry,': 0.22; 'header:In-Reply-To:1': 0.25; 'looks': 0.26; 'am,': 0.27; 'right.': 0.27; 'converting': 0.27; 'necessary.': 0.27; 'message-id:@mail.gmail.com': 0.27; 'correct': 0.28; 'actual': 0.28; 'arrays': 0.29; 'strings,': 0.29; 'array': 0.29; 'character': 0.29; 'convert': 0.29; 'basic': 0.30; 'code': 0.31; 'gets': 0.32; 'to:addr:python-list': 0.33; 'program,': 0.34; 'received:google.com': 0.34; 'sequence': 0.35; 'received:209.85': 0.35; 'but': 0.36; 'level.': 0.36; 'does': 0.37; 'why': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'object': 0.38; 'nothing': 0.38; 'to:addr:python.org': 0.39; 'where': 0.40; 'header:Received:5': 0.40; 'think': 0.40; 'your': 0.60; "you've": 0.61; 'high': 0.61; 'subject:, ': 0.61; 'first': 0.61; 'back': 0.62; 'is.': 0.62; 'perfect': 0.63; 'subject:...': 0.63; '26,': 0.65; 'composing': 0.84; 'subject:, ...': 0.84; 'to:name:python': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=RZ4nwit9Scmg0VxQqOzCRDZBiRokDKOsqm8E8X97/XQ=; b=bHlxmnBYf/tvqHTgIw+p6gcgO1OSAt4FM9VaXl6Fco4iT7fniUCJDCS3QOnsO4GRK6 zWOYDTlw7DMiMKRtvYFBkrsc7K0AlJ13czO1cC/P1kFpOoDJUPcizQ4x6FXbuvjqJLyP VSnv9a0hjaV0nn050ccnHv7J82Imvb/mzkLak7rkHVbtHSnmaneYHB8Thz94MSpD5Bc4 0jMY38Q2OHolx6FnMzpfbEXUUy/FHtzFeYCUoyXtB7CVUlp+zM0it8fqoDnUuQR6NWsa 5NZ4Nor9kboKVvo9gJ7Cvzq7hpA+jYyE8L+b72FHnJZYn2TAhlH0tOQZ0Wwi5SpStCC2 275Q== MIME-Version: 1.0 In-Reply-To: <4853fddf-5e4d-4c11-9a19-5a1dbe4cbc20@googlegroups.com> References: <1874857c-68ef-4c1b-b15a-46ef47df9445@googlegroups.com> <1cb3f062-eb45-4b0c-977b-76afb099923c@googlegroups.com> <4853fddf-5e4d-4c11-9a19-5a1dbe4cbc20@googlegroups.com> From: Ian Kelly Date: Sun, 26 Aug 2012 09:50:41 -0600 Subject: Re: Flexible string representation, unicode, typography, ... To: Python Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 32 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1345996273 news.xs4all.nl 6937 [2001:888:2000:d::a6]:57958 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:27932 On Sun, Aug 26, 2012 at 12:59 AM, wrote: > Sorry, you do not get it. > > The rune is an alias for int32. A sequence of runes is a > sequence of int32's. Go do not spend its time in using a > machinery to work with, to differentiate, to keep in memory > this sequence according to the *characers* composing this > "array of code points". > > The message is even stronger. Use runes to work comfortably [*] > with unicode: > rune -> int32 -> utf32 -> unicode (the perfect scheme, cann't be > better) I understand what rune is. I think you've missed my complaint, which is that although rune is the basic building block of Unicode strings -- representing a single Unicode character -- strings in Go are not built from runes but from bytes. If you want to do any actual work with Unicode strings, then you have to first convert them to runes or arrays of runes. The conceptual cost of this is that the object you're working with is no longer a string. You call this the "perfect scheme" for working with Unicode. Why does the "perfect scheme" for Unicode make it *easier* to write buggy code that only works for ASCII than to write correct code that works for all characters? This is IMO where Python 3 gets it right. When you want to work with Unicode strings, you just work with Unicode strings -- none of this nonsense of first explicitly converting the string to an array of ints that looks nothing like a string at a high level. The only place Python 3 makes you worry about converting strings is at the boundaries of your program, where decoding from bytes to strings and back is necessary.