Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!ecngs!feeder2.ecngs.de!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.053 X-Spam-Evidence: '*H*': 0.89; '*S*': 0.00; '(it': 0.09; 'oh,': 0.09; 'subject:()': 0.09; 'dec': 0.15; 'contrived': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'string:': 0.16; 'subject:3.3': 0.16; 'subject:unicode': 0.16; 'ticketing': 0.16; 'wrote:': 0.17; 'string,': 0.17; 'thu,': 0.17; 'sort': 0.21; '3.2': 0.22; 'example': 0.23; 'this:': 0.23; 'header :In-Reply-To:1': 0.25; '[1]': 0.27; 'am,': 0.27; 'question': 0.27; 'message-id:@mail.gmail.com': 0.27; '>>>>': 0.29; 'really,': 0.29; "skip:' 10": 0.30; 'that.': 0.30; 'train': 0.30; 'point': 0.31; 'problem.': 0.32; "skip:' 20": 0.32; 'getting': 0.33; 'to:addr :python-list': 0.33; 'received:google.com': 0.34; 'received:209.85.220': 0.35; 'received:209.85': 0.35; 'really': 0.36; 'compare': 0.36; 'should': 0.36; 'keeps': 0.37; 'does': 0.37; 'received:209': 0.37; 'far': 0.37; 'subject:: ': 0.38; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'think': 0.40; 'your': 0.60; 'real': 0.61; 'subject:, ': 0.61; 'more': 0.63; 'costs': 0.64; '20,': 0.65; 'car': 0.69; 'discover': 0.72; 'why?': 0.84; 'beloved': 0.91; 'distance.': 0.91; '8bit%:67': 0.93; 'hill': 0.96 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=ckjAOsQCWdfE7vAkKclq4s1KpHOLvSQZoziKartCxy4=; b=r6snx9/F5V59wwq0o1Ao3nBZyJTSYio9Vs/LnFBismR+8NnMXC7fic1KhJRzSyVNXS 5uvXbI/zyHn1uMgnzBicLR+1LlYjXZ7fdgjnwLX3Y6zB9vjdoBCrr6wmYDDTrkvIrULJ ypLXsXHKIUrOa4Lg5j+pNKxWYNVmLTTQbPngP6rQthM2PngMffx7vHOVT4E1c5V4In3x LYvu4Anm+cwPRNnWno4SQhx4BvxYvZHkHS+SfsTTa7rveXjTmWA5wGG9/rEQqAc3aY6o Y5fG10EBWC9YFvxmmj/bW0pqJSkf2frPha3djMQQ8LLD220GKR0kHLTkXxe6qflt5Lob Hi0w== MIME-Version: 1.0 In-Reply-To: <2adb4a25-8ea3-441f-b8c0-ee6c87e4b19f@googlegroups.com> References: <2adb4a25-8ea3-441f-b8c0-ee6c87e4b19f@googlegroups.com> Date: Thu, 20 Dec 2012 02:17:36 +1100 Subject: Re: Py 3.3, unicode / upper() From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 30 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1355930264 news.xs4all.nl 6978 [2001:888:2000:d::a6]:36634 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:35129 On Thu, Dec 20, 2012 at 1:23 AM, wrote: > But, this is not the problem. > I was suprised to discover this: > >>>> 'Stra=C3=9Fe'.upper() > 'STRASSE' > > I really, really do not know what I should think about that. > (It is a complex subject.) And the real question is why? Not all strings can be uppercased and lowercased cleanly. Please stop trotting out the old Box Hill-to-Camberwell arguments[1] yet again. For comparison, try this string: '=F0=9D=90=87=F0=9D=90=9E=F0=9D=90=A5=F0=9D=90=A5=F0=9D=90=A8, =F0=9D=90=B0= =F0=9D=90=A8=F0=9D=90=AB=F0=9D=90=A5=F0=9D=90=9D!'.upper() And while you're at it, check out sys.getsizeof() on that sort of string, compare your beloved 3.2 on that. Oh, and also check out len() on it. [1] Melbourne's current ticketing system is based on zones, and Camberwell is in zone 1, and Box Hill in zone 2. Detractors of public transport point out that it costs far more to take the train from Box Hill to Camberwell than it does to drive a car the same distance. It's the same contrived example that keeps on getting trotted out time and time again. ChrisA