Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.016 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'utf-8': 0.07; 'string': 0.09; 'meaningful': 0.09; 'python': 0.11; 'comparison:': 0.16; 'curious.': 0.16; 'encodings,': 0.16; 'from:addr:torriem': 0.16; 'from:name:michael torrie': 0.16; 'pairs': 0.16; 'storing': 0.16; 'surrogate': 0.16; 'wider': 0.16; 'wrote:': 0.18; 'trying': 0.19; 'memory': 0.22; 'header:User-Agent:1': 0.23; 'byte': 0.24; 'comparing': 0.24; 'unicode': 0.24; 'header:In-Reply-To:1': 0.27; 'point': 0.28; 'character': 0.29; 'characters': 0.30; 'sets': 0.30; "i'm": 0.30; '>>>>': 0.31; 'overhead': 0.31; 'everyone': 0.33; 'actual': 0.34; 'but': 0.35; 'so,': 0.37; 'message- id:@gmail.com': 0.38; 'feed': 0.38; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'itself': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'users': 0.40; '8bit%:6': 0.40; 'according': 0.40; 'even': 0.60; 'simple': 0.61; "you're": 0.61; 'you.': 0.62; 'email addr:gmail.com': 0.63; 'charset:windows-1252': 0.65; 'programs,': 0.74; 'expose': 0.84 X-Virus-Scanned: amavisd-new at torriefamily.org Date: Thu, 25 Jul 2013 21:06:21 -0600 From: Michael Torrie User-Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.12) Gecko/20130105 Thunderbird/10.0.12 MIME-Version: 1.0 To: python-list@python.org Subject: Re: RE Module Performance References: <571a6dfe-fd66-42cf-92fc-8b97cbe6e9e4@googlegroups.com> <51DFDE65.5040001@Gmail.com> <4f1067f6-bc99-42ad-9166-37fb228b90e8@googlegroups.com> <0420de60-b9b5-4ac4-ba7b-ca5ac2ca65fe@googlegroups.com> <741eaf38-6655-4763-8962-748408e7c2d8@googlegroups.com> In-Reply-To: <741eaf38-6655-4763-8962-748408e7c2d8@googlegroups.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 30 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1374807999 news.xs4all.nl 15871 [2001:888:2000:d::a6]:33878 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:51273 On 07/25/2013 01:07 PM, wxjmfauth@gmail.com wrote: > Let start with a simple string \textemdash or \texttendash > >>>> sys.getsizeof('–') > 40 >>>> sys.getsizeof('a') > 26 That's meaningless. You're comparing the overhead of a string object itself (a one-time cost anyway), not the overhead of storing the actual characters. This is the only meaningful comparison: >>>> sys.getsizeof('––') - sys.getsizeof('–') >>>> sys.getsizeof('aa') - sys.getsizeof('a') Actually I'm not even sure what your point is after all this time of railing against FSR. You have said in the past that Python penalizes users of character sets that require wider byte encodings, but what would you have us do? use 4-byte characters and penalize everyone equally? Use 2-byte characters that incorrectly expose surrogate pairs for some characters? Use UTF-8 in memory and do O(n) indexing? Are your programs (actual programs, not contrived benchmarks) actually slower because of FSR? Is FSR incorrect? If so, according to what part of the unicode standard? I'm not trying to troll, or feed the troll. I'm actually curious. I think perhaps you feel that many of us who don't use unicode often don't understand unicode because some of us don't understand you. If so, I'm not sure that's actually true.