Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!ecngs!feeder2.ecngs.de!novso.com!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'string.': 0.05; 'indexing': 0.07; 'string': 0.09; '32-bit': 0.09; 'character,': 0.09; 'counting': 0.09; 'span': 0.09; 'subject:language': 0.09; 'supported,': 0.09; 'expensive,': 0.16; 'iteration': 0.16; 'surrogate': 0.16; 'underlying': 0.16; 'wrote:': 0.18; 'trying': 0.19; "hasn't": 0.19; 'email addr:gmail.com>': 0.22; 'string,': 0.24; 'unicode': 0.24; '>': 0.26; 'equivalent': 0.26; 'certain': 0.27; 'header:In-Reply-To:1': 0.27; 'appear': 0.29; 'am,': 0.29; 'character': 0.29; 'characters': 0.30; 'message- id:@mail.gmail.com': 0.30; "i'm": 0.30; 'this.': 0.32; 'figure': 0.32; 'beginning': 0.33; 'subject: (': 0.35; 'operations': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'nov': 0.38; 'requiring': 0.38; 'to:addr:python-list': 0.38; 'skip:& 20': 0.39; 'does': 0.39; 'to:addr:python.org': 0.39; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=eS66QnkSlTd4SGnKwc3xDOA1hw2fAR7JnpRavAYrklQ=; b=xmn5j9kkQveHbxnJZcjrm3hdFxFEx+bhgXClb2nt37S2Mig28i0WcW7OhTakVxJe6S plgX9c7yP6Z8cUjefowfjFjadpqTmiigAQgqS7IevBOU2Un8lEUtJPbv9ALdmpodFWZ1 M+pXRRAk56FuS7BUjx4hlNhue0iUg/YpPHlz5iNuRMhwd/+qR/a/waiRnbq9gseLg1hr Dan5lJa2qV+mt2QFIpbNWMrYWsmRbrqCujTj5drhzfgabtjscvZcggszgMrlqlxD2Ohz IWfLZ2ZZO7cQGrG2nbznHgyw2m628vq0xOVOr8lH0gHSeB8MillSeNZEwC1Dc5O71kh4 oZ9w== MIME-Version: 1.0 X-Received: by 10.66.180.200 with SMTP id dq8mr21048664pac.104.1384777749851; Mon, 18 Nov 2013 04:29:09 -0800 (PST) In-Reply-To: References: <1f0ffad0-f9b1-4154-b048-510d8e38846e@googlegroups.com> <41f332dd-1c31-4699-9176-7e8589f9c8ae@googlegroups.com> Date: Mon, 18 Nov 2013 05:29:09 -0700 Subject: Re: Oh look, another language (ceylon) From: Ian Kelly To: Python Content-Type: multipart/alternative; boundary=047d7ba972c2df743f04eb72b10d X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 48 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1384777759 news.xs4all.nl 15931 [2001:888:2000:d::a6]:44868 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:59857 --047d7ba972c2df743f04eb72b10d Content-Type: text/plain; charset=ISO-8859-1 On Nov 18, 2013 3:06 AM, "Chris Angelico" wrote: > > I'm trying to figure this out. Reading the docs hasn't answered this. > If each character in a string is a 32-bit Unicode character, and (as > can be seen in the examples) string indexing and slicing are > supported, then does string indexing mean counting from the beginning > to see if there were any surrogate pairs? The string reference says: """Since a String has an underlying UTF-16 encoding, certain operations are expensive, requiring iteration of the characters of the string. In particular, size requires iteration of the whole string, and get(), span(), and segment() require iteration from the beginning of the string to the given index.""" The get and span operations appear to be equivalent to indexing and slicing. --047d7ba972c2df743f04eb72b10d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable


On Nov 18, 2013 3:06 AM, "Chris Angelico" <rosuav@gmail.com> wrote:
>
> I'm trying to figure this out. Reading the docs hasn't answere= d this.
> If each character in a string is a 32-bit Unicode character, and (as > can be seen in the examples) string indexing and slicing are
> supported, then does string indexing mean counting from the beginning<= br> > to see if there were any surrogate pairs?

The string reference says:

"""Since a String has an underlying UTF-16 en= coding, certain operations are expensive, requiring iteration of the charac= ters of the string. In particular, size requires iteration of the whole str= ing, and get(), span(), and segment() require iteration from the beginning = of the string to the given index."""

The get and span operations appear to be equivalent to index= ing and slicing.

--047d7ba972c2df743f04eb72b10d--