Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #58915

Re: chunking a long string?

References <mailman.2232.1383932895.18130.python-list@python.org> <c1bb3377-4425-4707-9ae7-aa7251cebc75@googlegroups.com> <527d85e8$0$29983$c3e8da3$5496439d@news.astraweb.com> <39112f0b-f834-4e4a-86f2-ca19078e6de4@googlegroups.com>
Date 2013-11-09 19:26 +1100
Subject Re: chunking a long string?
From Chris Angelico <rosuav@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.2283.1383985583.18130.python-list@python.org> (permalink)

Show all headers | View raw


On Sat, Nov 9, 2013 at 7:14 PM,  <wxjmfauth@gmail.com> wrote:
> If you wish to count the the frequency of chars in a text
> and store the results in a dict, {char: number_of_that_char, ...},
> do not forget to save the key in utf-XXX, it saves memory.

Oh, if you're that concerned about memory usage of individual
characters, try storing them as integers:

>>> sys.getsizeof("a")
26
>>> sys.getsizeof("a".encode("utf-32"))
25
>>> sys.getsizeof("a".encode("utf-8"))
18
>>> sys.getsizeof(ord("a"))
14

I really don't see that UTF-32 is much advantage here. UTF-8 happens
to be, because I used an ASCII character, but the integer beats them
all, even for larger numbers:
>>> sys.getsizeof(ord("\U0001d11e"))
16

And there's even less difference on my Linux box, but of course, you
never compare against Linux because Python 3.2 wide builds don't suit
your numbers.

For longer strings, there's an even more efficient way to store them.
Just store the memory address - that's going to be 4 bytes or 8,
depending on whether it's a 32-bit or 64-bit build of Python. There's
a name for this method of comparing strings: interning. Some languages
do it automatically for all strings, others (like Python) only when
you ask for it. Suddenly it doesn't matter at all what the storage
format is - if the two strings are the same, their addresses are the
same, and conversely. That's how to make it cheap.

> Hint: If you attempt to do the same exercise with
> words in a "latin" text, never forget the length average
> of a word is approximatively 1000 chars.

I think you're confusing length of word with value of picture.

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

chunking a long string? Roy Smith <roy@panix.com> - 2013-11-08 12:48 -0500
  Re: chunking a long string? wxjmfauth@gmail.com - 2013-11-08 12:43 -0800
    Re: chunking a long string? Chris Angelico <rosuav@gmail.com> - 2013-11-09 07:53 +1100
    Re: chunking a long string? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-08 20:57 +0000
    Re: chunking a long string? Tim Chase <python.list@tim.thechases.com> - 2013-11-08 15:04 -0600
    Re: chunking a long string? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-08 21:06 +0000
    Re: chunking a long string? Chris Angelico <rosuav@gmail.com> - 2013-11-09 08:04 +1100
    Re: chunking a long string? Chris Angelico <rosuav@gmail.com> - 2013-11-09 08:17 +1100
    Re: chunking a long string? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-09 00:46 +0000
      Re: chunking a long string? wxjmfauth@gmail.com - 2013-11-09 00:14 -0800
        Re: chunking a long string? Chris Angelico <rosuav@gmail.com> - 2013-11-09 19:26 +1100
          Re: chunking a long string? Roy Smith <roy@panix.com> - 2013-11-09 09:37 -0500
            Re: chunking a long string? Chris Angelico <rosuav@gmail.com> - 2013-11-10 02:02 +1100
              Re: chunking a long string? Roy Smith <roy@panix.com> - 2013-11-09 10:21 -0500
                Re: chunking a long string? Chris Angelico <rosuav@gmail.com> - 2013-11-10 02:30 +1100
                Re: chunking a long string? Roy Smith <roy@panix.com> - 2013-11-09 10:35 -0500
            Re: chunking a long string? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-09 15:37 +0000
              Re: chunking a long string? Chris Angelico <rosuav@gmail.com> - 2013-11-10 09:14 +1100
                Re: chunking a long string? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-10 06:39 +0000
                Re: chunking a long string? Chris Angelico <rosuav@gmail.com> - 2013-11-10 19:46 +1100
        Re: chunking a long string? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-09 10:13 +0000
  Re: chunking a long string? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-09 00:54 +0000

csiph-web