Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!ecngs!feeder2.ecngs.de!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.012 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'string.': 0.04; 'encoded': 0.05; 'arguments': 0.07; 'lawrence': 0.09; 'subject:string': 0.09; 'terry': 0.09; 'aug': 0.13; 'cases': 0.15; 'sat,': 0.15; 'build"': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'reedy': 0.16; 'subject:unicode': 0.16; 'surrogates.': 0.16; 'wrote:': 0.17; 'bytes': 0.17; 'memory': 0.18; 'received:209.85.214.174': 0.21; 'header:In-Reply-To:1': 0.25; 'message-id:@mail.gmail.com': 0.27; 'represent': 0.28; 'character': 0.29; 'substantial': 0.33; 'to:addr:python-list': 0.33; 'received:google.com': 0.34; 'pm,': 0.35; 'continue': 0.35; 'received:209.85': 0.35; 'there': 0.35; 'but': 0.36; 'characters': 0.36; 'two': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'mark': 0.38; 'performance': 0.39; 'to:addr:python.org': 0.39; 'received:209.85.214': 0.39; 'header:Received:5': 0.40; 'your': 0.60; 'most': 0.61; 'subject:, ': 0.61; 'subject:...': 0.63; 'therefore': 0.65; 'debate': 0.65; 'savings': 0.75; 'overhead,': 0.84; 'subject:, ...': 0.84; 'worthwhile.': 0.84; 'numerous': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=JetgfRgx8Ppy1bqmLj5bC+MbujuyJRIOwkwinEukxSM=; b=ye00u6/2VlVYvzgL2j/03ztOwEZAYcMHMnr+pbF1NmhEJVr4/ACVegO9F8oDoWcLgS YbVrrkDKyA/0jSn7S5jpHwvUaJcCIEaAabjRAjEUr46TGxtjwxzcHT4LIHXQo6LVQsDU wlw6JTIvsH0Szl+X5xEqaF0aYsIpwgVMxs/y15zO6/k5XxUNbwyLf912yVc9OblDTfKQ x4ZmkcNXclMIzLhbE+eRTLI7yUVNzDTdNu4SkQd9CsUzaQyvr+AtARRG3QG8/xB3Qm6J Pfog5VizgM5PtLCUwD11k7bscxae72a2fXfy4vDKOezJiaZDiK5ZKBbiYyxSx4+VnI7n xP4A== MIME-Version: 1.0 In-Reply-To: References: <1874857c-68ef-4c1b-b15a-46ef47df9445@googlegroups.com> <1cb3f062-eb45-4b0c-977b-76afb099923c@googlegroups.com> Date: Sat, 25 Aug 2012 21:19:38 +1000 Subject: Re: Flexible string representation, unicode, typography, ... From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 18 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1345893580 news.xs4all.nl 6877 [2001:888:2000:d::a6]:48421 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:27866 On Sat, Aug 25, 2012 at 9:05 PM, Mark Lawrence wrote: > I thought Terry Reedy had shot down any claims about performance overhead, > and that the memory savings in many cases must be substantial and therefore > worthwhile. Or have I misread something? Or what? My reading of the thread(s) is/are that there are two reasons for the debate to continue to rage: 1) Comparisons with a "narrow build" in which most characters take two bytes but there are one or two characters that get encoded with surrogates. The new system will allocate four bytes per character for the whole string. 2) Arguments on the basis of huge strings that represent _all the data_ that your program's working with, forgetting that there are numerous strings all through everything that are ASCII-only. ChrisA