Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #53921
| Path | csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <random832@fastmail.us> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.002 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'differently': 0.07; 'subject:file': 0.07; 'string': 0.09; '32-bit': 0.09; 'ascii': 0.09; 'bytes,': 0.09; 'received:internal': 0.09; 'strings.': 0.09; 'python': 0.11; 'stored': 0.12; 'assume': 0.14; 'ascii,': 0.16; 'character.': 0.16; 'eliminating': 0.16; 'headers.': 0.16; 'length,': 0.16; 'message-id:@webmail.messagingengine.com': 0.16; 'received:10.202': 0.16; 'received:10.202.2': 0.16; 'received:66.111': 0.16; 'received:66.111.4': 0.16; 'received:66.111.4.27': 0.16; 'received:messagingengine.com': 0.16; 'received:out3-smtp.messagingengine.com': 0.16; 'subject:String': 0.16; 'comment:': 0.16; 'size,': 0.16; 'wrote:': 0.18; 'basically': 0.19; '>>>': 0.22; '(in': 0.22; 'saying': 0.22; 'byte': 0.24; 'bytes': 0.24; 'case.': 0.24; 'char': 0.24; 'pointer': 0.24; 'mon,': 0.24; 'header:In-Reply-To:1': 0.27; 'words': 0.29; 'characters': 0.30; 'compared': 0.30; 'along': 0.30; 'header,': 0.31; 'overhead': 0.31; 'sep': 0.31; 'subject:the': 0.34; 'received:66': 0.35; 'skip:s 30': 0.35; 'more,': 0.35; 'but': 0.35; 'there': 0.35; 'i.e.': 0.36; 'shorter': 0.36; 'yours,': 0.36; 'possible': 0.36; 'should': 0.36; 'two': 0.37; 'received:10': 0.37; 'performance': 0.37; 'minimum': 0.38; 'system,': 0.38; 'mine': 0.38; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'even': 0.60; 'from:no real name:2**0': 0.61; "you're": 0.61; "you've": 0.63; 'header:Message- Id:1': 0.63; 'email addr:gmail.com': 0.63; 'such': 0.63; 'happen': 0.63; 'more': 0.64; 'worth': 0.66; 'believe': 0.68; 'six': 0.68; 'exceed': 0.68; 'savings': 0.81; '2013,': 0.91; 'differences': 0.93; 'relating': 0.93 |
| DKIM-Signature | v=1; a=rsa-sha1; c=relaxed/relaxed; d=fastmail.us; h= message-id:from:to:mime-version:content-transfer-encoding :content-type:in-reply-to:references:subject:date; s=mesmtp; bh= 7FBAadQJv8c9ZGhdZrh07Z3kuc4=; b=ZZyxwOT8Hjh2uscNZ6EOk4ZGSL7ZZjuv dDRXwvIewjseTVuji5+IqqBemlbdNEJOoLekAMQo43kW4c3E+ps41ZbTVYNmooS8 wduGCpkqpwZXxNWuOhdxoVECnRMYhG93nkEzFB4/F2HHnnamwgbQBjoHCdd0CDd/ 7j3iJKmDExA= |
| DKIM-Signature | v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=message-id:from:to:mime-version :content-transfer-encoding:content-type:in-reply-to:references :subject:date; s=smtpout; bh=7FBAadQJv8c9ZGhdZrh07Z3kuc4=; b=fz5 hJgqTkKa7rWTyNyORsxShJVrNRdsJ0NButAiTDA7BCyR8XGuQipEpODc/EkUYJ5G r3Xq95CgXCRBL284dIlGCccti3yPfirMNK8ZsWt0aDi67xltJ9yDgGOi1tUxoJqR bCp7N82XHmuxSTBag1gW28imQxkUoFpFF9Zv3ZHU= |
| X-Sasl-Enc | dHzFK54VHTuy/XczW4fincFMQDjtrizRKIbc2jWXfiG7 1378827393 |
| From | random832@fastmail.us |
| To | python-list@python.org |
| MIME-Version | 1.0 |
| Content-Transfer-Encoding | quoted-printable |
| Content-Type | text/plain; charset="UTF-8" |
| X-Mailer | MessagingEngine.com Webmail Interface - ajax-15090c31 |
| In-Reply-To | <04abbe99-ca1e-40b5-86c7-64b0e5d9de9c@googlegroups.com> |
| References | <4ce85ea8-4a4c-46cf-a546-ad999576a5f7@googlegroups.com> <m2a9jqq7g9.fsf@cochabamba.vanoostrum.org> <04abbe99-ca1e-40b5-86c7-64b0e5d9de9c@googlegroups.com> |
| Subject | Re: Chardet, file, ... and the Flexible String Representation |
| Date | Tue, 10 Sep 2013 11:36:33 -0400 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.220.1378827397.5461.python-list@python.org> (permalink) |
| Lines | 51 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1378827397 news.xs4all.nl 15880 [2001:888:2000:d::a6]:56501 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:53921 |
Show key headers only | View raw
On Mon, Sep 9, 2013, at 10:28, wxjmfauth@gmail.com wrote:
*time performance differences*
>
> Comment: Such differences never happen with utf.
Why is this bad? Keeping in mind that otherwise they would all be almost
as slow as the UCS-4 case.
> >>> sys.getsizeof('a')
> 26
> >>> sys.getsizeof('€')
> 40
> >>> sys.getsizeof('\U0001d11e')
> 44
>
> Comment: 18 bytes more than latin-1
>
> Comment: With utf, a char (in string or not) never exceed 4
A string is an object and needs to store the length, along with any
overhead relating to object headers. I believe there is also an appended
null character. Also, ASCII strings are stored differently from Latin-1
strings.
>>> sys.getsizeof('a'*999)
1048 = 49 bytes overhead, 1 byte per character.
>>> sys.getsizeof('\xa4'*999)
1072 = 74 bytes overhead, 1 byte per character.
>>> sys.getsizeof('\u20ac'*999)
2072 = 76 bytes overhead, 2 bytes per character.
>>> sys.getsizeof('\U0001d11e'*999)
4072 = 80 bytes overhead, 4 bytes per character.
(I bet sys.getsizeof('\xa4') will return 38 on your system, so 44 is
only six bytes more, not 18)
If we did not have the FSR, everything would be 4 bytes per character.
We might have less overhead, but a string only has to be 25 characters
long before the savings from the shorter representation outweigh even
having _no_ overhead, and every four bytes of overhead reduces that
number by one. And you have a 32-bit python build, which has less
overhead than mine - in yours, strings only have to be seven characters
long for the FSR to be worth it. Assume the minimum possible overhead is
two words for the object header, a size, and a pointer - i.e. sixteen
bytes, compared to the 25 you've demonstrated for ASCII, and strings
only need to be _two_ characters long for the FSR to be a better deal
than always using UCS4 strings.
The need for four-byte-per-character strings would not go away by
eliminating the FSR, so you're basically saying that everything should
be constrained to the worst-case performance scenario.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Chardet, file, ... and the Flexible String Representation wxjmfauth@gmail.com - 2013-09-06 02:11 -0700
Re: Chardet, file, ... and the Flexible String Representation Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-09-06 10:57 +0000
Re: Chardet, file, ... and the Flexible String Representation Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-09-06 13:10 +0200
Re: Chardet, file, ... and the Flexible String Representation Ned Batchelder <ned@nedbatchelder.com> - 2013-09-06 07:02 -0400
Re: Chardet, file, ... and the Flexible String Representation Piet van Oostrum <piet@vanoostrum.org> - 2013-09-06 11:46 -0400
Re: Chardet, file, ... and the Flexible String Representation Chris Angelico <rosuav@gmail.com> - 2013-09-07 02:04 +1000
Re: Chardet, file, ... and the Flexible String Representation random832@fastmail.us - 2013-09-06 12:59 -0400
Re: Chardet, file, ... and the Flexible String Representation Chris Angelico <rosuav@gmail.com> - 2013-09-07 03:04 +1000
Re: Chardet, file, ... and the Flexible String Representation wxjmfauth@gmail.com - 2013-09-09 07:28 -0700
Re: Chardet, file, ... and the Flexible String Representation Ned Batchelder <ned@nedbatchelder.com> - 2013-09-09 12:38 -0400
Re: Chardet, file, ... and the Flexible String Representation Michael Torrie <torriem@gmail.com> - 2013-09-09 11:05 -0600
Re: Chardet, file, ... and the Flexible String Representation Steven D'Aprano <steve@pearwood.info> - 2013-09-10 04:58 +0000
Re: Chardet, file, ... and the Flexible String Representation Terry Reedy <tjreedy@udel.edu> - 2013-09-09 16:47 -0400
Re: Chardet, file, ... and the Flexible String Representation random832@fastmail.us - 2013-09-10 11:36 -0400
Re: Chardet, file, ... and the Flexible String Representation random832@fastmail.us - 2013-09-09 14:34 -0400
Re: Chardet, file, ... and the Flexible String Representation Ian Kelly <ian.g.kelly@gmail.com> - 2013-09-09 13:03 -0600
Re: Chardet, file, ... and the Flexible String Representation random832@fastmail.us - 2013-09-09 15:27 -0400
Re: Chardet, file, ... and the Flexible String Representation Serhiy Storchaka <storchaka@gmail.com> - 2013-09-12 00:11 +0300
csiph-web