Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!ecngs!feeder2.ecngs.de!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'alias': 0.07; 'badly': 0.07; 'caller': 0.07; 'character,': 0.07; 'utf-8': 0.07; 'width': 0.07; 'subject:string': 0.09; 'url:unicode': 0.09; 'aug': 0.13; 'int32': 0.16; 'subject:unicode': 0.16; 'wrote:': 0.17; 'bytes': 0.17; 'certainly': 0.17; 'sorry,': 0.22; 'url:utf8': 0.22; 'header :In-Reply-To:1': 0.25; '(which': 0.26; 'am,': 0.27; 'i.e.': 0.27; 'message-id:@mail.gmail.com': 0.27; "d'aprano": 0.29; 'steven': 0.29; 'notes': 0.30; 'function': 0.30; 'code': 0.31; 'not.': 0.32; 'to:addr:python-list': 0.33; 'received:google.com': 0.34; 'sequence': 0.35; 'stores': 0.35; 'received:209.85': 0.35; 'url:org': 0.36; 'received:209': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'think': 0.40; 'subject:, ': 0.61; 'subject:...': 0.63; '26,': 0.65; 'presumably': 0.84; 'subject:, ...': 0.84; 'to:name:python': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=qbKCJnr9MW6K+MzOv0v34sLI5e9XMz1Oge7ekAllX8M=; b=VSO/vdCfnGQEmx/2joxFIcpcZkSvwvEs9BSDVNeXiRSkmYywMWU3ellWYglvIiAx+j WSbA6yEDp0kw5kPenbYC1AMDOiVqBzsRtXi3FWH/iIS/D79iZ84bsTx0tJ2JDOTQeLLU aujuf5rqxUiOYmAkh+i9ga9RzJtIzZ6FwfbuPs8nCtWWHohdglVjGYWBXtmUarT62rcH RAkfA+QnoSRNNP0bLzEiZ96HiYI81/3Imne9InHKNKWfE2qJCkZj0HIw6Hsu2EFrUxKz Ys5qmzaVqiYp48+OzqtHffF5tZOVataAglHJq/BVwm8x+0WiWrTyCYZ0C0dfyfm/hY+2 jfMg== MIME-Version: 1.0 In-Reply-To: <503a0d51$0$6574$c3e8da3$5496439d@news.astraweb.com> References: <1874857c-68ef-4c1b-b15a-46ef47df9445@googlegroups.com> <1cb3f062-eb45-4b0c-977b-76afb099923c@googlegroups.com> <503a0d51$0$6574$c3e8da3$5496439d@news.astraweb.com> From: Ian Kelly Date: Sun, 26 Aug 2012 09:40:13 -0600 Subject: Re: Flexible string representation, unicode, typography, ... To: Python Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 18 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1345995646 news.xs4all.nl 6854 [2001:888:2000:d::a6]:55017 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:27931 On Sun, Aug 26, 2012 at 5:49 AM, Steven D'Aprano wrote: >> Sorry, you do not get it. >> >> The rune is an alias for int32. A sequence of runes is a sequence of >> int32's. > > It certainly is not. Runes are variable-width. Here, for example, are a > number of Go functions which return a single rune and its width in bytes: > > http://golang.org/pkg/unicode/utf8/ I think the documentation for those functions is simply badly worded. The "width in bytes" it returns is not the width of the rune (which as jmf notes is simply an alias for int32 that stores a single code point). It means the UTF-8 width of the character, i.e. the number of UTF-8 bytes the function "consumed", presumably so that the caller can then reslice the data with that many bytes fewer.