Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.068 X-Spam-Evidence: '*H*': 0.87; '*S*': 0.00; 'puts': 0.07; 'utf-8': 0.07; 'string': 0.09; 'scheme.': 0.09; 'so?': 0.09; 'python': 0.11; 'behave': 0.16; 'blocks': 0.16; 'encoding.': 0.16; 'expecting': 0.16; 'garbage': 0.16; 'merely': 0.16; '8bit%:5': 0.22; '>>>': 0.22; 'example': 0.22; 'coding': 0.22; 'header:User- Agent:1': 0.23; 'bytes': 0.24; 'rid': 0.24; 'string,': 0.24; 'unicode': 0.24; 'header:In-Reply-To:1': 0.27; "doesn't": 0.30; '>>>>': 0.31; 'overhead': 0.31; 'class': 0.32; 'this.': 0.32; 'sense': 0.34; 'problem': 0.35; 'something': 0.35; 'should': 0.36; 'sometimes': 0.38; 'to:addr:python-list': 0.38; 'track': 0.38; 'does': 0.39; 'bad': 0.39; 'to:addr:python.org': 0.39; '8bit%:6': 0.40; 'even': 0.60; 'length': 0.61; "you've": 0.63; 'email addr:gmail.com': 0.63; 'kind': 0.63; 'real': 0.63; 'such': 0.63; 'more': 0.64; 'charset:windows-1252': 0.65; 'world': 0.66; 'attention.': 0.68; 'pardon': 0.84; 'received:195.238': 0.84; 'received:195.238.6': 0.84; 'received:belgacom.be': 0.84; 'received:isp.belgacom.be': 0.84; 'str.': 0.91 X-Belgacom-Dynamic: yes X-Cloudmark-SP-Filtered: true X-Cloudmark-SP-Result: v=1.1 cv=zPMwBkkPfiSbYLp7I0z+lR1wkS32w+GEFGpOlQDzoSM= c=1 sm=2 a=a21qFd9_iNQA:10 a=N659UExz7-8A:10 a=pGLkceISAAAA:8 a=6NMnT_vT9xxXPruxfNAA:9 a=pILNOxqGKmIA:10 a=MSl-tDqOz04A:10 a=lVcUrU1GN3_p4SyY:21 a=o0d-XNKYLSLVjJLM:21 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApMBAFfZ9FFtgEev/2dsb2JhbAANTcFdgS2DGAEBAQMBMgEFQAYLCyEWDwkDAgECAQ82EwYCAod6AwmmFIh0DVeIB40Vgm8Wg28DlXaBaYwmiDw Date: Sun, 28 Jul 2013 10:45:25 +0200 From: Antoon Pardon User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130704 Icedove/17.0.7 MIME-Version: 1.0 To: python-list@python.org Subject: Re: RE Module Performance References: <571a6dfe-fd66-42cf-92fc-8b97cbe6e9e4@googlegroups.com> <51DFDE65.5040001@Gmail.com> <4f1067f6-bc99-42ad-9166-37fb228b90e8@googlegroups.com> <51f14395$0$29971$c3e8da3$5496439d@news.astraweb.com> <51f15e03$0$29971$c3e8da3$5496439d@news.astraweb.com> <8203e802-9dc5-44c5-9547-6e1947ee224b@googlegroups.com> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 66 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1375001194 news.xs4all.nl 15974 [2001:888:2000:d::a6]:39555 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:51382 Op 27-07-13 20:21, wxjmfauth@gmail.com schreef: > Quickly. sys.getsizeof() at the light of what I explained. > > 1) As this FSR works with multiple encoding, it has to keep > track of the encoding. it puts is in the overhead of str > class (overhead = real overhead + encoding). In such > a absurd way, that a > >>>> sys.getsizeof('€') > 40 > > needs 14 bytes more than a > >>>> sys.getsizeof('z') > 26 > > You may vary the length of the str. The problem is > still here. Not bad for a coding scheme. > > 2) Take a look at this. Get rid of the overhead. > >>>> sys.getsizeof('b'*1000000 + 'c') > 1000026 >>>> sys.getsizeof('b'*1000000 + '€') > 2000040 > > What does it mean? It means that Python has to > reencode a str every time it is necessary because > it works with multiple codings. So? The same effect can be seen with other datatypes. >>> nr = 32767 >>> sys.getsizeof(nr) 14 >>> nr += 1 >>> sys.getsizeof(nr) 16 > > This FSR is not even a copy of the utf-8. >>>> len(('b'*1000000 + '€').encode('utf-8')) > 1000003 Why should it be? Why should a unicode string be a copy of its utf-8 encoding? That makes as much sense as expecting that a number would be a copy of its string reprensentation. > > utf-8 or any (utf) never need and never spend their time > in reencoding. So? That python sometimes needs to do some kind of background processing is not a problem, whether it is garbage collection, allocating more memory, shufling around data blocks or reencoding a string, that doesn't matter. If you've got a real world example where one of those things noticeably slows your program down or makes the program behave faulty then you have something that is worthy of attention. Until then you are merely harboring a pet peeve. -- Antoon Pardon