Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #59544
| Path | csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <rosuav@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.001 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'scripts': 0.03; '16,': 0.03; '(especially': 0.07; '(except': 0.07; 'suddenly': 0.07; 'string': 0.09; 'ascii': 0.09; 'bash': 0.09; 'bytes,': 0.09; 'compact': 0.09; 'differently.': 0.09; 'handful': 0.09; 'newline': 0.09; 'parsing': 0.09; 'pep': 0.09; 'width': 0.09; 'python': 0.11; 'language,': 0.12; 'all...': 0.16; 'character.': 0.16; 'complicated,': 0.16; 'dig': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'reasonably': 0.16; 'slash': 0.16; 'subject:3.3': 0.16; 'unicode,': 0.16; 'unicode.': 0.16; 'with;': 0.16; 'subject:python': 0.16; 'elements': 0.16; 'sat,': 0.16; 'weird': 0.16; 'ignore': 0.16; 'wrote:': 0.18; 'variable': 0.18; 'code,': 0.22; 'spread': 0.22; 'bytes': 0.24; 'specifies': 0.24; 'string,': 0.24; 'unicode': 0.24; "haven't": 0.24; 'sort': 0.25; "i've": 0.25; 'nearly': 0.26; 'least': 0.26; 'gets': 0.27; 'header:In-Reply-To:1': 0.27; 'point': 0.28; 'correct': 0.29; 'am,': 0.29; 'wonder': 0.29; 'message-id:@mail.gmail.com': 0.30; 'code': 0.31; '3.2': 0.31; 'easy,': 0.31; 'fighting': 0.31; 'fixing': 0.31; 'handled': 0.32; 'languages': 0.32; 'quite': 0.32; 'text': 0.33; 'maybe': 0.34; 'could': 0.34; 'knowledge': 0.35; "can't": 0.35; 'requirement': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'really': 0.36; 'doubt': 0.36; 'transition': 0.36; 'done': 0.36; "didn't": 0.36; 'too': 0.37; 'two': 0.37; 'nov': 0.38; 'rich': 0.38; 'to:addr :python-list': 0.38; 'itself': 0.39; 'to:addr:python.org': 0.39; 'even': 0.60; 'skip:u 10': 0.60; 'easy': 0.60; 'middle': 0.60; 'most': 0.60; 'gone': 0.61; 'guarantee': 0.63; 'such': 0.63; 'happen': 0.63; 'choose': 0.64; 'more': 0.64; 'different': 0.65; 'between': 0.67; 'believe': 0.68; 'wish': 0.70; 'safe': 0.72; 'action.': 0.84; 'characters,': 0.84; 'complexity': 0.84; 'costly': 0.84; 'pike': 0.84; 'points,': 0.84; 'route': 0.84; 'treating': 0.84; 'you).': 0.95; 'fight': 0.97; '2013': 0.98 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Uojxur9Wc6IQJqKO99ZKmpq/cYKeWbk0s/Y797utT9U=; b=rSR0nV6DXe1zdJBikR1UVbSrrz9Y1eMts8SoIZBic3Err2mo6Yrygse6T9xNw2tG5f BnRfFGXb29xs54AXbBugujsOraoQbge26eVo3ExBS0+xveWNz5rB+NVvUE5hR/IxuzTL Xd5jH8VRRg+5uEJS2y6UYPAx/SWaMOKX3GgwNBpvK7qdc/7Qs7kKTA82TEor5eVydzNI v5LFAU7s5buq1RaFo2Cf1F8R2191DBvyj6Z/32nj+eV3/R6LcRmYJUg/+jDHBjgCLwv+ JEB3qP3mMxqmEJFKqS7v0x6i40PXYSEfYatbuFk7xaiki9Ta97O5TWkCUDnjxlsHiYno gqAg== |
| MIME-Version | 1.0 |
| X-Received | by 10.66.163.2 with SMTP id ye2mr7610178pab.170.1384531297622; Fri, 15 Nov 2013 08:01:37 -0800 (PST) |
| In-Reply-To | <52864018.9020205@chamonix.reportlab.co.uk> |
| References | <mailman.2646.1384514912.18130.python-list@python.org> <b6db8982-feac-4036-8ec4-2dc720d41a4b@googlegroups.com> <roy-66E351.09004515112013@news.panix.com> <BD21979F-E8CB-41EA-9136-6C052D65DEE0@panix.com> <mailman.2660.1384526610.18130.python-list@python.org> <0d383a3c-247f-4b6a-9a18-7e7fadeb6047@googlegroups.com> <52864018.9020205@chamonix.reportlab.co.uk> |
| Date | Sat, 16 Nov 2013 03:01:37 +1100 |
| Subject | Re: python 3.3 repr |
| From | Chris Angelico <rosuav@gmail.com> |
| To | python-list@python.org |
| Content-Type | text/plain; charset=ISO-8859-1 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.2674.1384531302.18130.python-list@python.org> (permalink) |
| Lines | 57 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1384531302 news.xs4all.nl 15979 [2001:888:2000:d::a6]:45537 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:59544 |
Show key headers only | View raw
On Sat, Nov 16, 2013 at 2:39 AM, Robin Becker <robin@reportlab.com> wrote: >> Dealing with bytes and Unicode is complicated, and the 2->3 transition is >> not easy, but let's please not spread the misunderstanding that somehow the >> Flexible String Representation is at fault. However you store Unicode code >> points, they are different than bytes, and it is complex having to deal with >> both. You can't somehow make the dichotomy go away, you can only choose >> where you want to think about it. >> >> --Ned. > > ....... > I don't think that's what I said; the flexible representation is just an > added complexity that has come about because of the wish to store strings in > a compact way. The requirement for such complexity is the unicode type > itself (especially the storage requirements) which necessitated some > remedial action. > > There's no point in fighting the change to using unicode. The type wasn't > required for any technical reason as other languages didn't go this route > and are reasonably ok, but there's no doubt the change made things more > difficult. There's no perceptible difference between a 3.2 wide build and the 3.3 flexible representation. (Differences with narrow builds are bugs, and have now been fixed.) As far as your script's concerned, Python 3.3 always stores strings in UTF-32, four bytes per character. It just happens to be way more efficient on memory, most of the time. Other languages _have_ gone for at least some sort of Unicode support. Unfortunately quite a few have done a half-way job and use UTF-16 as their internal representation. That means there's no difference between U+0012, U+0123, and U+1234, but U+12345 suddenly gets handled differently. ECMAScript actually specifies the perverse behaviour of treating codepoints >U+FFFF as two elements in a string, because it's just too costly to change. There are a small number of languages that guarantee correct Unicode handling. I believe bash scripts get this right (though I haven't tested; string manipulation in bash isn't nearly as rich as a proper text parsing language, so I don't dig into it much); Pike is a very Python-like language, and PEP 393 made Python even more Pike-like, because Pike's string has been variable width for as long as I've known it. A handful of other languages also guarantee UTF-32 semantics. All of them are really easy to work with; instead of writing your code and then going "Oh, I wonder what'll happen if I give this thing weird characters?", you just write your code, safe in the knowledge that there is no such thing as a "weird character" (except for a few in the ASCII set... you may find that code breaks if given a newline in the middle of something, or maybe the slash confuses you). Definitely don't fight the change to Unicode, because it's not a change at all... it's just fixing what was buggy. You already had a difference between bytes and characters, you just thought you could ignore it. ChrisA
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
python 3.3 repr Robin Becker <robin@reportlab.com> - 2013-11-15 11:28 +0000
Re: python 3.3 repr Ned Batchelder <ned@nedbatchelder.com> - 2013-11-15 03:38 -0800
Re: python 3.3 repr Robin Becker <robin@reportlab.com> - 2013-11-15 12:16 +0000
Re: python 3.3 repr Ned Batchelder <ned@nedbatchelder.com> - 2013-11-15 05:54 -0800
Re: python 3.3 repr Robin Becker <robin@reportlab.com> - 2013-11-15 14:29 +0000
Re: python 3.3 repr Serhiy Storchaka <storchaka@gmail.com> - 2013-11-15 16:40 +0200
Re: python 3.3 repr Robin Becker <robin@reportlab.com> - 2013-11-15 14:52 +0000
Re: python 3.3 repr Roy Smith <roy@panix.com> - 2013-11-15 09:25 -0500
Re: python 3.3 repr Robin Becker <robin@reportlab.com> - 2013-11-15 14:43 +0000
Re: python 3.3 repr Ned Batchelder <ned@nedbatchelder.com> - 2013-11-15 07:08 -0800
Re: python 3.3 repr Robin Becker <robin@reportlab.com> - 2013-11-15 15:39 +0000
Re: python 3.3 repr Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-11-15 16:49 +0100
Re: python 3.3 repr Chris Angelico <rosuav@gmail.com> - 2013-11-16 03:01 +1100
Re: python 3.3 repr Neil Cerutti <neilc@norwich.edu> - 2013-11-15 17:47 +0000
Re: python 3.3 repr Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-16 01:09 +0000
Re: python 3.3 repr Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-15 17:10 +0000
Re: python 3.3 repr Chris Angelico <rosuav@gmail.com> - 2013-11-16 04:29 +1100
Re: python 3.3 repr Cousin Stanley <cousinstanley@gmail.com> - 2013-11-15 10:45 -0700
Re: python 3.3 repr Joel Goldstick <joel.goldstick@gmail.com> - 2013-11-15 09:50 -0500
Re: python 3.3 repr Robin Becker <robin@reportlab.com> - 2013-11-15 15:03 +0000
Re: python 3.3 repr Joel Goldstick <joel.goldstick@gmail.com> - 2013-11-15 10:07 -0500
Re: python 3.3 repr Chris Angelico <rosuav@gmail.com> - 2013-11-16 02:08 +1100
Re: python 3.3 repr Robin Becker <robin@reportlab.com> - 2013-11-15 15:18 +0000
Re: python 3.3 repr Roy Smith <roy@panix.com> - 2013-11-15 10:32 -0500
Re: python 3.3 repr William Ray Wing <wrw@mac.com> - 2013-11-15 11:30 -0500
Re: python 3.3 repr Zero Piraeus <z@etiol.net> - 2013-11-15 14:06 -0300
Re: python 3.3 repr Chris Angelico <rosuav@gmail.com> - 2013-11-16 04:11 +1100
Re: python 3.3 repr Serhiy Storchaka <storchaka@gmail.com> - 2013-11-15 19:37 +0200
Re: python 3.3 repr Gene Heskett <gheskett@wdtv.com> - 2013-11-15 11:36 -0500
Re: python 3.3 repr Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-15 17:58 +0000
Re: python 3.3 repr Gene Heskett <gheskett@wdtv.com> - 2013-11-15 14:23 -0500
csiph-web