Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #58915

Re: chunking a long string?

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!xlned.com!feeder5.xlned.com!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <rosuav@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.009
X-Spam-Evidence '*H*': 0.98; '*S*': 0.00; 'python.': 0.02; 'exercise': 0.04; 'python)': 0.05; '64-bit': 0.07; 'memory.': 0.07; 'suddenly': 0.07; 'utf-8': 0.07; '32-bit': 0.09; 'ascii': 0.09; 'character,': 0.09; 'oh,': 0.09; 'subject:string': 0.09; 'python': 0.11; 'cheap.': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'hint:': 0.16; 'storing': 0.16; 'sat,': 0.16; 'wrote:': 0.18; 'all,': 0.19; '>>>': 0.22; 'memory': 0.22; 'bytes': 0.24; 'comparing': 0.24; 'integer': 0.24; 'text,': 0.24; 'compare': 0.26; 'header:In-Reply-To:1': 0.27; 'words': 0.29; "doesn't": 0.30; '(like': 0.30; 'message- id:@mail.gmail.com': 0.30; '3.2': 0.31; 'languages': 0.32; 'text': 0.33; 'linux': 0.33; 'addresses': 0.33; 'skip:s 30': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'really': 0.36; 'method': 0.36; 'subject:?': 0.36; 'two': 0.37; 'nov': 0.38; 'saves': 0.38; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'to:addr:python.org': 0.39; 'how': 0.40; 'even': 0.60; 'length': 0.61; 'matter': 0.61; "you're": 0.61; 'save': 0.62; 'address': 0.63; 'name': 0.63; 'box,': 0.64; 'more': 0.64; 'results': 0.69; 'wish': 0.70; 'beats': 0.84; 'characters,': 0.84; 'confusing': 0.84; 'dict,': 0.84; 'picture.': 0.84; 'subject:long': 0.84; 'numbers:': 0.91; 'same,': 0.91; 'average': 0.93; '2013': 0.98
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=vOLqAJbOKxsjIWYO5p9rUu16LJ8LQBAgTE0aDF65cAE=; b=n0NywNvTWc5czREjo36K3YQpBybUOUHQgVaSg1g4fYmTZAG9aC2AY+blzScsRzdNWj yLfm7p5i6G5JAzJvV9qveJ0tDhsBzNYjD003RZWg0q1ro/KlIQEI4PsUdLJqn1Eysm/N xDA5FCZY/31hofq4WAVhgpL9de6/IRU/PHzoGUuymp4MBqc9lFkazaXaJni0Cm+rFZI6 CdPB8YEvIjAfT2cDKiX8CfVu0QCNnvC8X3VkqBBk7G3q3H7qT+vwQTeI7GpR3gLmpRDp O+h8bU4oz6geG7l6cFdokWRPvK5TRNLmpOKNhXqOhnvF1VtUsuPSwFLHj1Cr9Vs6VCH8 Us8g==
MIME-Version 1.0
X-Received by 10.68.254.164 with SMTP id aj4mr78805pbd.161.1383985580515; Sat, 09 Nov 2013 00:26:20 -0800 (PST)
In-Reply-To <39112f0b-f834-4e4a-86f2-ca19078e6de4@googlegroups.com>
References <mailman.2232.1383932895.18130.python-list@python.org> <c1bb3377-4425-4707-9ae7-aa7251cebc75@googlegroups.com> <527d85e8$0$29983$c3e8da3$5496439d@news.astraweb.com> <39112f0b-f834-4e4a-86f2-ca19078e6de4@googlegroups.com>
Date Sat, 9 Nov 2013 19:26:20 +1100
Subject Re: chunking a long string?
From Chris Angelico <rosuav@gmail.com>
To python-list@python.org
Content-Type text/plain; charset=ISO-8859-1
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.2283.1383985583.18130.python-list@python.org> (permalink)
Lines 43
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1383985583 news.xs4all.nl 15949 [2001:888:2000:d::a6]:41784
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:58915

Show key headers only | View raw


On Sat, Nov 9, 2013 at 7:14 PM,  <wxjmfauth@gmail.com> wrote:
> If you wish to count the the frequency of chars in a text
> and store the results in a dict, {char: number_of_that_char, ...},
> do not forget to save the key in utf-XXX, it saves memory.

Oh, if you're that concerned about memory usage of individual
characters, try storing them as integers:

>>> sys.getsizeof("a")
26
>>> sys.getsizeof("a".encode("utf-32"))
25
>>> sys.getsizeof("a".encode("utf-8"))
18
>>> sys.getsizeof(ord("a"))
14

I really don't see that UTF-32 is much advantage here. UTF-8 happens
to be, because I used an ASCII character, but the integer beats them
all, even for larger numbers:
>>> sys.getsizeof(ord("\U0001d11e"))
16

And there's even less difference on my Linux box, but of course, you
never compare against Linux because Python 3.2 wide builds don't suit
your numbers.

For longer strings, there's an even more efficient way to store them.
Just store the memory address - that's going to be 4 bytes or 8,
depending on whether it's a 32-bit or 64-bit build of Python. There's
a name for this method of comparing strings: interning. Some languages
do it automatically for all strings, others (like Python) only when
you ask for it. Suddenly it doesn't matter at all what the storage
format is - if the two strings are the same, their addresses are the
same, and conversely. That's how to make it cheap.

> Hint: If you attempt to do the same exercise with
> words in a "latin" text, never forget the length average
> of a word is approximatively 1000 chars.

I think you're confusing length of word with value of picture.

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

chunking a long string? Roy Smith <roy@panix.com> - 2013-11-08 12:48 -0500
  Re: chunking a long string? wxjmfauth@gmail.com - 2013-11-08 12:43 -0800
    Re: chunking a long string? Chris Angelico <rosuav@gmail.com> - 2013-11-09 07:53 +1100
    Re: chunking a long string? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-08 20:57 +0000
    Re: chunking a long string? Tim Chase <python.list@tim.thechases.com> - 2013-11-08 15:04 -0600
    Re: chunking a long string? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-08 21:06 +0000
    Re: chunking a long string? Chris Angelico <rosuav@gmail.com> - 2013-11-09 08:04 +1100
    Re: chunking a long string? Chris Angelico <rosuav@gmail.com> - 2013-11-09 08:17 +1100
    Re: chunking a long string? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-09 00:46 +0000
      Re: chunking a long string? wxjmfauth@gmail.com - 2013-11-09 00:14 -0800
        Re: chunking a long string? Chris Angelico <rosuav@gmail.com> - 2013-11-09 19:26 +1100
          Re: chunking a long string? Roy Smith <roy@panix.com> - 2013-11-09 09:37 -0500
            Re: chunking a long string? Chris Angelico <rosuav@gmail.com> - 2013-11-10 02:02 +1100
              Re: chunking a long string? Roy Smith <roy@panix.com> - 2013-11-09 10:21 -0500
                Re: chunking a long string? Chris Angelico <rosuav@gmail.com> - 2013-11-10 02:30 +1100
                Re: chunking a long string? Roy Smith <roy@panix.com> - 2013-11-09 10:35 -0500
            Re: chunking a long string? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-09 15:37 +0000
              Re: chunking a long string? Chris Angelico <rosuav@gmail.com> - 2013-11-10 09:14 +1100
                Re: chunking a long string? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-10 06:39 +0000
                Re: chunking a long string? Chris Angelico <rosuav@gmail.com> - 2013-11-10 19:46 +1100
        Re: chunking a long string? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-09 10:13 +0000
  Re: chunking a long string? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-09 00:54 +0000

csiph-web