Groups > comp.lang.python > #70202 > unrolled thread

MemoryError in data conversion

Started by	Mok-Kong Shen <mok-kong.shen@t-online.de>
First post	2014-04-14 03:46 +0200
Last post	2014-04-14 08:42 -0400
Articles	11 — 5 participants

Back to article view | Back to comp.lang.python

  MemoryError in data conversion Mok-Kong Shen <mok-kong.shen@t-online.de> - 2014-04-14 03:46 +0200
    Re: MemoryError in data conversion dieter <dieter@handshake.de> - 2014-04-14 08:14 +0200
    Re: MemoryError in data conversion Peter Otten <__peter__@web.de> - 2014-04-14 09:46 +0200
      Re: MemoryError in data conversion Mok-Kong Shen <mok-kong.shen@t-online.de> - 2014-04-14 14:26 +0200
        Re: MemoryError in data conversion Peter Otten <__peter__@web.de> - 2014-04-14 15:59 +0200
          Re: MemoryError in data conversion Mok-Kong Shen <mok-kong.shen@t-online.de> - 2014-04-14 23:20 +0200
            Re: MemoryError in data conversion Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-04-15 11:51 +1200
              Re: MemoryError in data conversion Mok-Kong Shen <mok-kong.shen@t-online.de> - 2014-04-15 10:55 +0200
                Re: MemoryError in data conversion Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-04-16 00:08 +1200
              Re: MemoryError in data conversion Peter Otten <__peter__@web.de> - 2014-04-15 11:59 +0200
    Re:MemoryError in data conversion Dave Angel <davea@davea.name> - 2014-04-14 08:42 -0400

#70202 — MemoryError in data conversion

From	Mok-Kong Shen <mok-kong.shen@t-online.de>
Date	2014-04-14 03:46 +0200
Subject	MemoryError in data conversion
Message-ID	<lifelc$ccj$1@news.albasani.net>

The code attached below produces in one of the two IMHO similar cases
(excepting the sizes of the lists involved) MemoryError. Could experts
kindly tell why that's so and whether there is any work-around feasible.

Thanks in advances.

M. K. Shen

-----------------------------------------------------------------

import ast

def buildhuffmantree(slist,flist):
   item=slist[:]
   freq=flist[:]
   while len(item)>2:
     mn=min(freq)
     id=freq.index(mn)
     u=item[id]
     del item[id]
     del freq[id]
     mn1=min(freq)
     id=freq.index(mn1)
     v=item[id]
     del item[id]
     del freq[id]
     item.append([u,v])
     freq.append(mn+mn1)
   return(item)

def processing(slist,flist):
   bintree=buildhuffmantree(slist,flist)
   print(bintree)
   byarray=bytearray(str(bintree),"latin-1")
   bintree1=ast.literal_eval(byarray.decode("latin-1"))
   print(bintree1)
   print(bintree==bintree1)

slist1=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 'eof']

flist1=[18, 16, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, -1]

slist2=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
         18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
         34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
         50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
         66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
         82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,
         98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110,
         111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123,
         124, 125, 126, 127, 'eof']

flist2=[2, 2, 0, 2, 0, 0, 1, 2, 1, 0, 2, 0, 0, 1, 1, 0, 2, 0, 0, 0, 1,
         0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 0, 0, 0,
         0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 1, 0, 2, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, -1]

processing(slist1,flist1)  ### This works fine
print()

processing(slist2,flist2)  ### This leads to MemoryError

[toc] | [next] | [standalone]

#70204

From	dieter <dieter@handshake.de>
Date	2014-04-14 08:14 +0200
Message-ID	<mailman.9236.1397456067.18130.python-list@python.org>
In reply to	#70202

Mok-Kong Shen <mok-kong.shen@t-online.de> writes:

> The code attached below produces in one of the two IMHO similar cases
> (excepting the sizes of the lists involved) MemoryError. Could experts
> kindly tell why that's so and whether there is any work-around feasible.

"MemoryError" means: the Python process wants more memory from the
operating system than this can give.

Your options:

  *  increase the memory resources (RAM, swap space) of your system

  *  check the memory related configuration of your operating system
     (there may be a limit for memory allocated to processes -
     try to increase this)

  *  change your algorithm such that less memory is needed

[toc] | [prev] | [next] | [standalone]

#70208

From	Peter Otten <__peter__@web.de>
Date	2014-04-14 09:46 +0200
Message-ID	<mailman.9239.1397461622.18130.python-list@python.org>
In reply to	#70202

Mok-Kong Shen wrote:

> The code attached below produces in one of the two IMHO similar cases
> (excepting the sizes of the lists involved) MemoryError. Could experts
> kindly tell why that's so and whether there is any work-around feasible.

Here's a simpler way to reproduce the error:

>>> import ast
>>> def nested_list_literal(n):
...     return "[" * n + "42" + "]" * n
... 
>>> ast.literal_eval(nested_list_literal(98))
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[42]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
>>> ast.literal_eval(nested_list_literal(99))
s_push: parser stack overflow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.3/ast.py", line 47, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/usr/lib/python3.3/ast.py", line 35, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
MemoryError

You ran into a limitation of the compiler. For us to suggest a workaround 
you'd have to explain why you want to convert the list returned from 
buildhuffmantree() into python source code and back.

> Thanks in advances.
> 
> M. K. Shen
> 
> -----------------------------------------------------------------
> 
> import ast
> 
> def buildhuffmantree(slist,flist):
>    item=slist[:]
>    freq=flist[:]
>    while len(item)>2:
>      mn=min(freq)
>      id=freq.index(mn)
>      u=item[id]
>      del item[id]
>      del freq[id]
>      mn1=min(freq)
>      id=freq.index(mn1)
>      v=item[id]
>      del item[id]
>      del freq[id]
>      item.append([u,v])
>      freq.append(mn+mn1)
>    return(item)
> 
> def processing(slist,flist):
>    bintree=buildhuffmantree(slist,flist)
>    print(bintree)
>    byarray=bytearray(str(bintree),"latin-1")
>    bintree1=ast.literal_eval(byarray.decode("latin-1"))
>    print(bintree1)
>    print(bintree==bintree1)
> 
> slist1=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 'eof']
> 
> flist1=[18, 16, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, -1]
> 
> slist2=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
>          18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
>          34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
>          50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
>          66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
>          82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,
>          98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110,
>          111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123,
>          124, 125, 126, 127, 'eof']
> 
> flist2=[2, 2, 0, 2, 0, 0, 1, 2, 1, 0, 2, 0, 0, 1, 1, 0, 2, 0, 0, 0, 1,
>          0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 0, 0, 0,
>          0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>          0, 1, 0, 2, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
>          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
>          0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>          0, 0, -1]
> 
> processing(slist1,flist1)  ### This works fine
> print()
> 
> processing(slist2,flist2)  ### This leads to MemoryError

[toc] | [prev] | [next] | [standalone]

#70214

From	Mok-Kong Shen <mok-kong.shen@t-online.de>
Date	2014-04-14 14:26 +0200
Message-ID	<ligk5n$3jp$1@news.albasani.net>
In reply to	#70208

Am 14.04.2014 09:46, schrieb Peter Otten:

> You ran into a limitation of the compiler. For us to suggest a workaround
> you'd have to explain why you want to convert the list returned from
> buildhuffmantree() into python source code and back.

That list gives the Huffman encoding tree for compressing a given piece
of source text. I am writing a Python code to implement an algorithm
(not new, being first sketched in the literature since decades but yet
having no publically available implementation as far as I am aware) of
encryption processing that has Huffman data compression as its major
constituent. Now, for good security against cryptanalysis, this list
(which has to be included in the ciphertext for decryption by the
recipient) has to be well scrambled in some way. I choose to use 8-bit
bytes as units for the scrambling. Hence I convert the list to a
bytearray for performing scrambling. On decryption I reverse the
scrambling and get back the original bytearray and use ast to recover
from it the list so as to be able to do the decompression. Hopefully
this description is sufficiently clear.

M. K. Shen

[toc] | [prev] | [next] | [standalone]

#70220

From	Peter Otten <__peter__@web.de>
Date	2014-04-14 15:59 +0200
Message-ID	<mailman.9250.1397484008.18130.python-list@python.org>
In reply to	#70214

Mok-Kong Shen wrote:

> Am 14.04.2014 09:46, schrieb Peter Otten:
> 
>> You ran into a limitation of the compiler. For us to suggest a workaround
>> you'd have to explain why you want to convert the list returned from
>> buildhuffmantree() into python source code and back.
> 
> That list gives the Huffman encoding tree for compressing a given piece
> of source text. I am writing a Python code to implement an algorithm
> (not new, being first sketched in the literature since decades but yet
> having no publically available implementation as far as I am aware) of
> encryption processing that has Huffman data compression as its major
> constituent. Now, for good security against cryptanalysis, this list
> (which has to be included in the ciphertext for decryption by the
> recipient) has to be well scrambled in some way. I choose to use 8-bit
> bytes as units for the scrambling. Hence I convert the list to a
> bytearray for performing scrambling. On decryption I reverse the
> scrambling and get back the original bytearray and use ast to recover
> from it the list so as to be able to do the decompression. Hopefully
> this description is sufficiently clear.

You could use json, but you may run into the same problem with that, too 
(only later):

>>> import json
>>> items = []
>>> for i in range(1000):
...     s = json.dumps(items)
...     items = [items]
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python3.3/json/__init__.py", line 236, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.3/json/encoder.py", line 191, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.3/json/encoder.py", line 249, in iterencode
    return _iterencode(o, 0)
RuntimeError: maximum recursion depth exceeded while encoding a JSON object
>>> i
995

The safest option is probably to serialize the original flist and slist, and 
use them to create the tree on the fly.

[toc] | [prev] | [next] | [standalone]

#70233

From	Mok-Kong Shen <mok-kong.shen@t-online.de>
Date	2014-04-14 23:20 +0200
Message-ID	<lihjf0$4t6$1@news.albasani.net>
In reply to	#70220

Am 14.04.2014 15:59, schrieb Peter Otten:

> You could use json, but you may run into the same problem with that, too
> (only later):
>
>>>> import json
>>>> items = []
>>>> for i in range(1000):
> ...     s = json.dumps(items)
> ...     items = [items]
> ...
> Traceback (most recent call last):
>    File "<stdin>", line 2, in <module>
>    File "/usr/lib/python3.3/json/__init__.py", line 236, in dumps
>      return _default_encoder.encode(obj)
>    File "/usr/lib/python3.3/json/encoder.py", line 191, in encode
>      chunks = self.iterencode(o, _one_shot=True)
>    File "/usr/lib/python3.3/json/encoder.py", line 249, in iterencode
>      return _iterencode(o, 0)
> RuntimeError: maximum recursion depth exceeded while encoding a JSON object
>>>> i
> 995
>
> The safest option is probably to serialize the original flist and slist, and
> use them to create the tree on the fly.

Thank you very much for your efforts to help me.

I have yet a question out of curiosity: Why is my 2nd list structure,
that apparently is too complex for handling by eval and json, seemingly
not a problem for pickle?

M. K. Shen

[toc] | [prev] | [next] | [standalone]

#70238

From	Gregory Ewing <greg.ewing@canterbury.ac.nz>
Date	2014-04-15 11:51 +1200
Message-ID	<br3ajeFdnj3U1@mid.individual.net>
In reply to	#70233

Mok-Kong Shen wrote:
> I have yet a question out of curiosity: Why is my 2nd list structure,
> that apparently is too complex for handling by eval and json, seemingly
> not a problem for pickle?

Pickle is intended for arbitrary data structures, so it
is designed to be able to handle deeply-nested and/or
recursive data. Eval only has to handle nesting to depths
likely to be encountered in source code. Apparently the
json parser also assumes you're not going to be using
very deep nesting.

-- 
Greg

[toc] | [prev] | [next] | [standalone]

#70262

From	Mok-Kong Shen <mok-kong.shen@t-online.de>
Date	2014-04-15 10:55 +0200
Message-ID	<liis6b$c1q$1@news.albasani.net>
In reply to	#70238

Am 15.04.2014 01:51, schrieb Gregory Ewing:
> Mok-Kong Shen wrote:
>> I have yet a question out of curiosity: Why is my 2nd list structure,
>> that apparently is too complex for handling by eval and json, seemingly
>> not a problem for pickle?
>
> Pickle is intended for arbitrary data structures, so it
> is designed to be able to handle deeply-nested and/or
> recursive data. Eval only has to handle nesting to depths
> likely to be encountered in source code. Apparently the
> json parser also assumes you're not going to be using
> very deep nesting.

What I need is to have my (complicated) list to be put into
a bytearray, do some proceesing, transfer it to the recipient.
The recipient reverses the processing, obtains a bytearray
that is the same as my original one and gets from it the same
list as mine. Using pickle I can manage to do it (in fact I
did try and succeed with my 2nd list), but that's IMHO a
rather unnatural/awkward work-around. (It means that I have
to pickle out the list to a file and read in the content of
the file in order to have it as a bytearray etc. etc.)

M. K. Shen

[toc] | [prev] | [next] | [standalone]

#70270

From	Gregory Ewing <greg.ewing@canterbury.ac.nz>
Date	2014-04-16 00:08 +1200
Message-ID	<br4lpqFm06tU1@mid.individual.net>
In reply to	#70262

Mok-Kong Shen wrote:
> (It means that I have
> to pickle out the list to a file and read in the content of
> the file in order to have it as a bytearray etc. etc.)

No, you don't -- pickle.dumps() returns the pickled
data as a bytes object instead of writing it to a file.

-- 
Greg

[toc] | [prev] | [next] | [standalone]

#70268

From	Peter Otten <__peter__@web.de>
Date	2014-04-15 11:59 +0200
Message-ID	<mailman.9278.1397555958.18130.python-list@python.org>
In reply to	#70238

Gregory Ewing wrote:

> Mok-Kong Shen wrote:
>> I have yet a question out of curiosity: Why is my 2nd list structure,
>> that apparently is too complex for handling by eval and json, seemingly
>> not a problem for pickle?
> 
> Pickle is intended for arbitrary data structures, so it
> is designed to be able to handle deeply-nested and/or
> recursive data. Eval only has to handle nesting to depths
> likely to be encountered in source code. Apparently the
> json parser also assumes you're not going to be using
> very deep nesting.

But pickle does have the same limitation:

>>> def check(load, dump):
...     items = []
...     try:
...             for i in range(10**6):
...                     assert load(dump(items)) == items
...                     items = [items]
...     except RuntimeError:
...             return i
... 
>>> check(json.loads, json.dumps)
994
>>> check(pickle.loads, pickle.dumps)
499

Mok-Kong Shen, for pickle and json you can increase the limit a bit: 

>>> import sys
>>> sys.setrecursionlimit(2000)
>>> check(json.loads, json.dumps)
1994
>>> check(pickle.loads, pickle.dumps)
999

But be careful, if you choose the limit too high you'll see Python react 
like any other C program:

>>> sys.setrecursionlimit(100000)
>>> items = []
>>> for i in range(100000):
...     items = [items]
... 
>>> s = pickle.dumps(items)
Segmentation fault

For literal_eval() the limit is unfortunately hard-coded in the C source.

Mok-Kong Shen wrote:

> What I need is to have my (complicated) list to be put into
> a bytearray, do some proceesing, transfer it to the recipient.
> The recipient reverses the processing, obtains a bytearray
> that is the same as my original one and gets from it the same
> list as mine. Using pickle I can manage to do it (in fact I
> did try and succeed with my 2nd list), but that's IMHO a
> rather unnatural/awkward work-around. (It means that I have
> to pickle out the list to a file and read in the content of
> the file in order to have it as a bytearray etc. etc.)
 
The limit has been increased before, see

http://bugs.python.org/issue1881

and maybe you can get the core developers to increase it again, but 
generally speaking the existence of a recursion limit is the price you pay 
for easy interfacing with C. 

literal_eval() could allocate its stack dynamically, but that would probably 
complicate the code so that return on investment is questionable.

[toc] | [prev] | [next] | [standalone]

#70216

From	Dave Angel <davea@davea.name>
Date	2014-04-14 08:42 -0400
Message-ID	<mailman.9247.1397479021.18130.python-list@python.org>
In reply to	#70202

Mok-Kong Shen <mok-kong.shen@t-online.de> Wrote in message:
> 
> The code attached below produces in one of the two IMHO similar cases
> (excepting the sizes of the lists involved) MemoryError. Could experts
> kindly tell why that's so and whether there is any work-around feasible.

Where's your stack trace for the error?  If it happens that it
 gets the error in the Ast call, then examine byarray.  I expect
 it's too large or too complex for the compiler. 




-- 
DaveA

[toc] | [prev] | [standalone]

csiph-web

MemoryError in data conversion

Contents

#70202 — MemoryError in data conversion

#70204

#70208

#70214

#70220

#70233

#70238

#70262

#70270

#70268

#70216