Path: csiph.com!usenet.pasdenom.info!gegeweb.org!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.013 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'ascii': 0.09; 'mind,': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'scripts,': 0.09; 'devs': 0.16; 'elsewhere.': 0.16; 'illiterate': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'wrote:': 0.18; 'things.': 0.19; 'import': 0.22; 'header:User- Agent:1': 0.23; 'byte': 0.24; 'unicode': 0.24; 'post': 0.26; 'header:X-Complaints-To:1': 0.27; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; 'ignored.': 0.30; 'talented': 0.30; 'code': 0.31; 'comments': 0.31; '>>>>': 0.31; 'obscure': 0.31; 'really,': 0.31; 'continuing': 0.33; 'skip:t 40': 0.33; 'core': 0.34; 'problem': 0.35; 'skip:u 20': 0.35; 'but': 0.35; '8bit%:9': 0.36; 'to:addr :python-list': 0.38; 'recent': 0.39; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'skip:u 10': 0.60; 'world.': 0.61; 'times': 0.62; "you've": 0.63; 'email addr:gmail.com': 0.63; 'real': 0.63; 'skip:n 10': 0.64; 'interest': 0.64; 'details': 0.65; 'wish': 0.70 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Ned Batchelder Subject: Re: Bytes indexing returns an int Date: Wed, 08 Jan 2014 12:19:03 -0500 References: <52cbe15a$0$29993$c3e8da3$5496439d@news.astraweb.com> <52cc278c$0$29979$c3e8da3$5496439d@news.astraweb.com> <7d2d5d85-afa2-474d-8739-c33745b7c00b@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Gmane-NNTP-Posting-Host: 18.189.30.229 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 In-Reply-To: <7d2d5d85-afa2-474d-8739-c33745b7c00b@googlegroups.com> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 49 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1389201568 news.xs4all.nl 2958 [2001:888:2000:d::a6]:45635 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:63500 On 1/8/14 11:08 AM, wxjmfauth@gmail.com wrote: > Byte strings (encoded code points) or native unicode is one > thing. > > But on the other side, the problem is elsewhere. These very > talented ascii narrow minded, unicode illiterate devs only > succeded to produce this (I, really, do not wish to be rude). If you don't want to be rude, you are failing. You've been told a number of times that your obscure micro-benchmarks are meaningless. Now you've taken to calling the core devs narrow-minded and Unicode illiterate. They are neither of these things. Continuing to post these comments with no interest in learning is rude. Other recent threads have contained details rebuttals of your views, which you have ignored. This is rude. Please stop. --Ned. > >>>> import unicodedata >>>> unicodedata.name('ǟ') > 'LATIN SMALL LETTER A WITH DIAERESIS AND MACRON' >>>> sys.getsizeof('a') > 26 >>>> sys.getsizeof('ǟ') > 40 >>>> timeit.timeit("unicodedata.normalize('NFKD', 'ǟ')", "import unicodedata") > 0.8040018888575129 >>>> timeit.timeit("unicodedata.normalize('NFKD', 'zzz')", "import unicodedata") > 0.3073749330963995 >>>> timeit.timeit("unicodedata.normalize('NFKD', 'z')", "import unicodedata") > 0.2874013282653962 >>>> >>>> timeit.timeit("len(unicodedata.normalize('NFKD', 'zzz'))", "import unicodedata") > 0.3803570633857589 >>>> timeit.timeit("len(unicodedata.normalize('NFKD', 'ǟ'))", "import unicodedata") > 0.9359970320201683 > > pdf, typography, linguistic, scripts, ... in mind, in other word the real > *unicode* world. > > jmf > -- Ned Batchelder, http://nedbatchelder.com