Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'cpython': 0.05; 'that?': 0.05; 'subject:Python': 0.06; 'incompatible': 0.07; 'think,': 0.07; 'utf-8': 0.07; 'string': 0.09; 'okay': 0.09; 'subject: [': 0.09; 'url:github': 0.09; 'cc:addr:python-list': 0.11; 'missed': 0.12; 'accepting': 0.14; 'character.': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'iterating': 0.16; 'mandate': 0.16; 'semantics': 0.16; 'subject: \n ': 0.16; 'subject:Unicode': 0.16; 'subject:between': 0.16; 'subscripting': 0.16; 'url:issues': 0.16; 'wrote:': 0.18; 'discussion': 0.18; 'wed,': 0.18; 'bit': 0.19; "python's": 0.19; '(the': 0.22; 'cc:addr:python.org': 0.22; "aren't": 0.24; 'bytes': 0.24; 'unicode': 0.24; 'cc:2**0': 0.24; 'changes,': 0.26; 'tracker': 0.26; 'header:In-Reply-To:1': 0.27; 'point': 0.28; 'chris': 0.29; 'am,': 0.29; 'character': 0.29; 'characters': 0.30; 'message- id:@mail.gmail.com': 0.30; "i'm": 0.30; '(on': 0.31; '13,': 0.31; "d'aprano": 0.31; 'question:': 0.31; 'steven': 0.31; 'probably': 0.32; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'yield': 0.36; "didn't": 0.36; 'requirements': 0.37; 'problems': 0.38; 'subject:]': 0.38; 'issue': 0.38; 'length': 0.61; 'back': 0.62; 'skip:n 10': 0.64; 'become': 0.64; 'more': 0.64; 'finally': 0.65; 'due': 0.66; 'biggest': 0.67; 'detail.': 0.68; 'overall': 0.69; '2015': 0.84; 'python-dev': 0.84; 'url:657': 0.84; 'to:none': 0.92; 'imagine': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=ddnb4thNkAM3KwMWkNb8OCIT0negOfaQmjmY10vgz9Y=; b=uJh16xBK/xLmqTvZpmdtbzJ4TZIzEoLXCCSp2vpU9KDonzAgXJLst+Vp8lkpz1NpoI hrlhGFxZVqxqig5dn+cfJEAJTiKPibq8MqZ1pk7yeQx58dbKvQ5VI3NW04ELRtzG8Hbs MSlXKj2vRaXH03we2vsx9yGSQta41vFyGVJ0319Q9iHY4EltFGPHdpEx2C9+40qIn+Jz d8eejDyXRjV4KUTCOmlV4xupn3VOyDUvhBpH7P5LXX372bsdvrxlmwN+HHG96iH1v879 77zLZIXbUu3GPZKowS6dPhstTizJ+AYjDBgM/mLnfv3xIknDCw2/MZNfpkB6iiVjrCjb cLFA== MIME-Version: 1.0 X-Received: by 10.107.134.206 with SMTP id q75mr24134637ioi.27.1431494012780; Tue, 12 May 2015 22:13:32 -0700 (PDT) In-Reply-To: <5552a774$0$12993$c3e8da3$5496439d@news.astraweb.com> References: <02dba7aa-8466-4937-a8d8-82ffd03e5568@googlegroups.com> <87wq0gyvyr.fsf@elektro.pacujo.net> <55515f9d$0$12987$c3e8da3$5496439d@news.astraweb.com> <569169cf-d232-48c0-bd49-91090e9c0ddb@googlegroups.com> <5552a774$0$12993$c3e8da3$5496439d@news.astraweb.com> Date: Wed, 13 May 2015 15:13:32 +1000 Subject: Re: uPy Unicode [was Re: Instead of deciding between Python or Lisp blah blah blah] From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 39 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1431494021 news.xs4all.nl 2955 [2001:888:2000:d::a6]:56509 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:90527 On Wed, May 13, 2015 at 11:23 AM, Steven D'Aprano wrote: > On Wed, 13 May 2015 03:26 am, Chris Angelico wrote: > >> back when MicroPython was debating the implementation of Unicode >> strings, there was a lengthy discussion on python-dev about whether >> it's okay for string subscripting to be O(n) instead of O(1), and the >> final decision was that yes, that's an implementation detail. (UTF-8 >> internal string representation, so iterating over a string would still >> yield characters in overall O(n), but iterating up to the string's >> length and subscripting for each character would become O(n*n) on >> uPy.) > > o_O > > Got a link to that? I must have missed it. Linking to python-dev is a bit fiddly and/or unstable due to URL changes, plus the discussion there was pretty long and rambly. Probably the best I can do is point you to the tracker issue where I opened the original question: https://github.com/micropython/micropython/issues/657 (The biggest issue was that uPy was, at the time, fundamentally incompatible with Python's stipulated semantics - imagine all the problems of a narrow build of CPython <3.3, only more frequent because it's actually UTF-8.) It was finally decided, I think, that Python-the-language didn't actually mandate O(1) indexing, meaning that a microcontroller (on which strings aren't going to be gigantic anyway) is welcome to use a UTF-8 internal representation, with "Hello, world"[4] required to scan across and count non-continuation bytes to find the right character. Whether or not uPy actually ended up accepting the requirements of proper Unicode support I don't know, as I'm no longer involved with the project. ChrisA