Path: csiph.com!usenet.pasdenom.info!dedibox.gegeweb.org!gegeweb.eu!nntpfeed.proxad.net!proxad.net!feeder1-2.proxad.net!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed2a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'python,': 0.02; 'anyway.': 0.05; 'explicitly': 0.05; 'subject:Python': 0.06; 'subject: -- ': 0.07; 'wednesday,': 0.07; 'bits': 0.09; 'character,': 0.09; 'strings.': 0.09; 'subset': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'mostly': 0.14; 'escapes': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'simple.': 0.16; 'unicode.': 0.16; 'wrote:': 0.18; 'do.': 0.18; 'wed,': 0.18; 'cc:addr:python.org': 0.22; "aren't": 0.24; 'unicode': 0.24; 'earlier': 0.24; 'cc:2**0': 0.24; 'header:In- Reply-To:1': 0.27; 'point': 0.28; 'chris': 0.29; 'character': 0.29; 'thus': 0.29; 'characters': 0.30; 'message- id:@mail.gmail.com': 0.30; 'away.': 0.31; 'subject:skip:i 10': 0.31; 'values.': 0.31; 'critical': 0.32; '(most': 0.33; 'device': 0.34; 'maybe': 0.34; 'something': 0.35; 'anybody': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'acceptable': 0.36; 'should': 0.36; 'handle': 0.38; 'pm,': 0.38; 'expect': 0.39; 'enough': 0.39; 'even': 0.60; 'skip:u 10': 0.60; 'course.': 0.60; "you're": 0.61; 'such': 0.63; 'talking': 0.65; 'said:': 0.68; 'internet': 0.71; 'eight': 0.74; 'as:': 0.81; 'bmp,': 0.84; 'expose': 0.84; "it'd": 0.84; 'rubbish': 0.84; 'to:none': 0.92; 'wishing': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=KnVHPJL9G1yfhSWZxhs+obYMgPxzojd2xMSaMTYmpJ0=; b=bHBwJkLZ7jZyuOCkANcUxHqXk3NDI8lJexlljGMKZ1SDobrZFYe4fqV/bbyIOmDKQz Xr7inPXj1r4ubUuyD2BF+tgAacHckDti3mrkJyM1ohygBPmDF82lSzIttP+2ZUscW5fA x7Ygq3PW8tGWG/wDtNHFMV6z/O3U9C2ndd8wwgZqR3Y9h3gjw/OXBMJiIXyEr/9kHv8x k5UVPbwLHhQXkK9pkPSy8z64w6cX6+G+0+AumwBvXk0gPBGemfcHsBBcjU3BxTi2Deg5 +fyNos4sUcKBuzpg8OtaiktyrfXNf+dryCNz+2Oe/T39slTGJf6KHrkGYKoAsyGewl2w V0Zg== MIME-Version: 1.0 X-Received: by 10.52.190.162 with SMTP id gr2mr161073vdc.71.1401866173654; Wed, 04 Jun 2014 00:16:13 -0700 (PDT) In-Reply-To: References: <20140603194949.3147497d@x34f> <44acd692-5dcd-4e5f-8238-7fbe0de4db2a@googlegroups.com> Date: Wed, 4 Jun 2014 17:16:13 +1000 Subject: Re: Micro Python -- a lean and efficient implementation of Python 3 From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 37 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1401866177 news.xs4all.nl 2967 [2001:888:2000:d::a6]:37129 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:72606 On Wed, Jun 4, 2014 at 2:40 PM, Rustom Mody wrote: > On Wednesday, June 4, 2014 9:22:54 AM UTC+5:30, Chris Angelico wrote: >> On Wed, Jun 4, 2014 at 1:37 PM, Rustom Mody wrote: >> > And so a pure BMP-supporting implementation may be a reasonable >> > compromise. [As long as no surrogate-pairs are there] > >> Not if you're working on the internet. There are several critical >> groups of characters that aren't in the BMP, such as: > > Of course. But what has the internet to do with micropython? Earlier you said: > IOW from pov of a universallly acceptable character set this is mostly > rubbish "Universally acceptable character set" and microcontrollers may well not meet, but if you're talking about universality, you need Unicode. It's that simple. Maybe there's a use-case for a microcontroller that works in ISO-8859-5 natively, thus using only eight bits per character, but even if there is, I would expect a Python implementation on it to expose Unicode codepoints in its strings. (Most of the time you won't even be aware of the exact codepoint values. It's only when you put \xNN or \uNNNN or U000NNNNN escapes into your strings, or explicitly use ord/chr or equivalent, that it'd make a difference.) The point is not that you might be able to get away with sticking your head in the sand and wishing Unicode would just go away. Even if you can, it's not something Python 3 can ever do. And I don't think anybody can, anyway. If your device is big enough to hold Python, it should be big enough to handle Unicode; and then you don't have to say "Oh, sorry rest-of-the-world, this only works in English... and only a subset of English... and stuff". ChrisA