Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #70202 > unrolled thread
| Started by | Mok-Kong Shen <mok-kong.shen@t-online.de> |
|---|---|
| First post | 2014-04-14 03:46 +0200 |
| Last post | 2014-04-14 08:42 -0400 |
| Articles | 11 — 5 participants |
Back to article view | Back to comp.lang.python
MemoryError in data conversion Mok-Kong Shen <mok-kong.shen@t-online.de> - 2014-04-14 03:46 +0200
Re: MemoryError in data conversion dieter <dieter@handshake.de> - 2014-04-14 08:14 +0200
Re: MemoryError in data conversion Peter Otten <__peter__@web.de> - 2014-04-14 09:46 +0200
Re: MemoryError in data conversion Mok-Kong Shen <mok-kong.shen@t-online.de> - 2014-04-14 14:26 +0200
Re: MemoryError in data conversion Peter Otten <__peter__@web.de> - 2014-04-14 15:59 +0200
Re: MemoryError in data conversion Mok-Kong Shen <mok-kong.shen@t-online.de> - 2014-04-14 23:20 +0200
Re: MemoryError in data conversion Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-04-15 11:51 +1200
Re: MemoryError in data conversion Mok-Kong Shen <mok-kong.shen@t-online.de> - 2014-04-15 10:55 +0200
Re: MemoryError in data conversion Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-04-16 00:08 +1200
Re: MemoryError in data conversion Peter Otten <__peter__@web.de> - 2014-04-15 11:59 +0200
Re:MemoryError in data conversion Dave Angel <davea@davea.name> - 2014-04-14 08:42 -0400
| From | Mok-Kong Shen <mok-kong.shen@t-online.de> |
|---|---|
| Date | 2014-04-14 03:46 +0200 |
| Subject | MemoryError in data conversion |
| Message-ID | <lifelc$ccj$1@news.albasani.net> |
The code attached below produces in one of the two IMHO similar cases
(excepting the sizes of the lists involved) MemoryError. Could experts
kindly tell why that's so and whether there is any work-around feasible.
Thanks in advances.
M. K. Shen
-----------------------------------------------------------------
import ast
def buildhuffmantree(slist,flist):
item=slist[:]
freq=flist[:]
while len(item)>2:
mn=min(freq)
id=freq.index(mn)
u=item[id]
del item[id]
del freq[id]
mn1=min(freq)
id=freq.index(mn1)
v=item[id]
del item[id]
del freq[id]
item.append([u,v])
freq.append(mn+mn1)
return(item)
def processing(slist,flist):
bintree=buildhuffmantree(slist,flist)
print(bintree)
byarray=bytearray(str(bintree),"latin-1")
bintree1=ast.literal_eval(byarray.decode("latin-1"))
print(bintree1)
print(bintree==bintree1)
slist1=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 'eof']
flist1=[18, 16, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, -1]
slist2=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,
98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110,
111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123,
124, 125, 126, 127, 'eof']
flist2=[2, 2, 0, 2, 0, 0, 1, 2, 1, 0, 2, 0, 0, 1, 1, 0, 2, 0, 0, 0, 1,
0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 2, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, -1]
processing(slist1,flist1) ### This works fine
print()
processing(slist2,flist2) ### This leads to MemoryError
[toc] | [next] | [standalone]
| From | dieter <dieter@handshake.de> |
|---|---|
| Date | 2014-04-14 08:14 +0200 |
| Message-ID | <mailman.9236.1397456067.18130.python-list@python.org> |
| In reply to | #70202 |
Mok-Kong Shen <mok-kong.shen@t-online.de> writes:
> The code attached below produces in one of the two IMHO similar cases
> (excepting the sizes of the lists involved) MemoryError. Could experts
> kindly tell why that's so and whether there is any work-around feasible.
"MemoryError" means: the Python process wants more memory from the
operating system than this can give.
Your options:
* increase the memory resources (RAM, swap space) of your system
* check the memory related configuration of your operating system
(there may be a limit for memory allocated to processes -
try to increase this)
* change your algorithm such that less memory is needed
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2014-04-14 09:46 +0200 |
| Message-ID | <mailman.9239.1397461622.18130.python-list@python.org> |
| In reply to | #70202 |
Mok-Kong Shen wrote:
> The code attached below produces in one of the two IMHO similar cases
> (excepting the sizes of the lists involved) MemoryError. Could experts
> kindly tell why that's so and whether there is any work-around feasible.
Here's a simpler way to reproduce the error:
>>> import ast
>>> def nested_list_literal(n):
... return "[" * n + "42" + "]" * n
...
>>> ast.literal_eval(nested_list_literal(98))
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[42]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
>>> ast.literal_eval(nested_list_literal(99))
s_push: parser stack overflow
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.3/ast.py", line 47, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/usr/lib/python3.3/ast.py", line 35, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
MemoryError
You ran into a limitation of the compiler. For us to suggest a workaround
you'd have to explain why you want to convert the list returned from
buildhuffmantree() into python source code and back.
> Thanks in advances.
>
> M. K. Shen
>
> -----------------------------------------------------------------
>
> import ast
>
> def buildhuffmantree(slist,flist):
> item=slist[:]
> freq=flist[:]
> while len(item)>2:
> mn=min(freq)
> id=freq.index(mn)
> u=item[id]
> del item[id]
> del freq[id]
> mn1=min(freq)
> id=freq.index(mn1)
> v=item[id]
> del item[id]
> del freq[id]
> item.append([u,v])
> freq.append(mn+mn1)
> return(item)
>
> def processing(slist,flist):
> bintree=buildhuffmantree(slist,flist)
> print(bintree)
> byarray=bytearray(str(bintree),"latin-1")
> bintree1=ast.literal_eval(byarray.decode("latin-1"))
> print(bintree1)
> print(bintree==bintree1)
>
> slist1=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 'eof']
>
> flist1=[18, 16, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, -1]
>
> slist2=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
> 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
> 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
> 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
> 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
> 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,
> 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110,
> 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123,
> 124, 125, 126, 127, 'eof']
>
> flist2=[2, 2, 0, 2, 0, 0, 1, 2, 1, 0, 2, 0, 0, 1, 1, 0, 2, 0, 0, 0, 1,
> 0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 0, 0, 0,
> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 1, 0, 2, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, -1]
>
> processing(slist1,flist1) ### This works fine
> print()
>
> processing(slist2,flist2) ### This leads to MemoryError
[toc] | [prev] | [next] | [standalone]
| From | Mok-Kong Shen <mok-kong.shen@t-online.de> |
|---|---|
| Date | 2014-04-14 14:26 +0200 |
| Message-ID | <ligk5n$3jp$1@news.albasani.net> |
| In reply to | #70208 |
Am 14.04.2014 09:46, schrieb Peter Otten: > You ran into a limitation of the compiler. For us to suggest a workaround > you'd have to explain why you want to convert the list returned from > buildhuffmantree() into python source code and back. That list gives the Huffman encoding tree for compressing a given piece of source text. I am writing a Python code to implement an algorithm (not new, being first sketched in the literature since decades but yet having no publically available implementation as far as I am aware) of encryption processing that has Huffman data compression as its major constituent. Now, for good security against cryptanalysis, this list (which has to be included in the ciphertext for decryption by the recipient) has to be well scrambled in some way. I choose to use 8-bit bytes as units for the scrambling. Hence I convert the list to a bytearray for performing scrambling. On decryption I reverse the scrambling and get back the original bytearray and use ast to recover from it the list so as to be able to do the decompression. Hopefully this description is sufficiently clear. M. K. Shen
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2014-04-14 15:59 +0200 |
| Message-ID | <mailman.9250.1397484008.18130.python-list@python.org> |
| In reply to | #70214 |
Mok-Kong Shen wrote:
> Am 14.04.2014 09:46, schrieb Peter Otten:
>
>> You ran into a limitation of the compiler. For us to suggest a workaround
>> you'd have to explain why you want to convert the list returned from
>> buildhuffmantree() into python source code and back.
>
> That list gives the Huffman encoding tree for compressing a given piece
> of source text. I am writing a Python code to implement an algorithm
> (not new, being first sketched in the literature since decades but yet
> having no publically available implementation as far as I am aware) of
> encryption processing that has Huffman data compression as its major
> constituent. Now, for good security against cryptanalysis, this list
> (which has to be included in the ciphertext for decryption by the
> recipient) has to be well scrambled in some way. I choose to use 8-bit
> bytes as units for the scrambling. Hence I convert the list to a
> bytearray for performing scrambling. On decryption I reverse the
> scrambling and get back the original bytearray and use ast to recover
> from it the list so as to be able to do the decompression. Hopefully
> this description is sufficiently clear.
You could use json, but you may run into the same problem with that, too
(only later):
>>> import json
>>> items = []
>>> for i in range(1000):
... s = json.dumps(items)
... items = [items]
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/lib/python3.3/json/__init__.py", line 236, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python3.3/json/encoder.py", line 191, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.3/json/encoder.py", line 249, in iterencode
return _iterencode(o, 0)
RuntimeError: maximum recursion depth exceeded while encoding a JSON object
>>> i
995
The safest option is probably to serialize the original flist and slist, and
use them to create the tree on the fly.
[toc] | [prev] | [next] | [standalone]
| From | Mok-Kong Shen <mok-kong.shen@t-online.de> |
|---|---|
| Date | 2014-04-14 23:20 +0200 |
| Message-ID | <lihjf0$4t6$1@news.albasani.net> |
| In reply to | #70220 |
Am 14.04.2014 15:59, schrieb Peter Otten: > You could use json, but you may run into the same problem with that, too > (only later): > >>>> import json >>>> items = [] >>>> for i in range(1000): > ... s = json.dumps(items) > ... items = [items] > ... > Traceback (most recent call last): > File "<stdin>", line 2, in <module> > File "/usr/lib/python3.3/json/__init__.py", line 236, in dumps > return _default_encoder.encode(obj) > File "/usr/lib/python3.3/json/encoder.py", line 191, in encode > chunks = self.iterencode(o, _one_shot=True) > File "/usr/lib/python3.3/json/encoder.py", line 249, in iterencode > return _iterencode(o, 0) > RuntimeError: maximum recursion depth exceeded while encoding a JSON object >>>> i > 995 > > The safest option is probably to serialize the original flist and slist, and > use them to create the tree on the fly. Thank you very much for your efforts to help me. I have yet a question out of curiosity: Why is my 2nd list structure, that apparently is too complex for handling by eval and json, seemingly not a problem for pickle? M. K. Shen
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2014-04-15 11:51 +1200 |
| Message-ID | <br3ajeFdnj3U1@mid.individual.net> |
| In reply to | #70233 |
Mok-Kong Shen wrote: > I have yet a question out of curiosity: Why is my 2nd list structure, > that apparently is too complex for handling by eval and json, seemingly > not a problem for pickle? Pickle is intended for arbitrary data structures, so it is designed to be able to handle deeply-nested and/or recursive data. Eval only has to handle nesting to depths likely to be encountered in source code. Apparently the json parser also assumes you're not going to be using very deep nesting. -- Greg
[toc] | [prev] | [next] | [standalone]
| From | Mok-Kong Shen <mok-kong.shen@t-online.de> |
|---|---|
| Date | 2014-04-15 10:55 +0200 |
| Message-ID | <liis6b$c1q$1@news.albasani.net> |
| In reply to | #70238 |
Am 15.04.2014 01:51, schrieb Gregory Ewing: > Mok-Kong Shen wrote: >> I have yet a question out of curiosity: Why is my 2nd list structure, >> that apparently is too complex for handling by eval and json, seemingly >> not a problem for pickle? > > Pickle is intended for arbitrary data structures, so it > is designed to be able to handle deeply-nested and/or > recursive data. Eval only has to handle nesting to depths > likely to be encountered in source code. Apparently the > json parser also assumes you're not going to be using > very deep nesting. What I need is to have my (complicated) list to be put into a bytearray, do some proceesing, transfer it to the recipient. The recipient reverses the processing, obtains a bytearray that is the same as my original one and gets from it the same list as mine. Using pickle I can manage to do it (in fact I did try and succeed with my 2nd list), but that's IMHO a rather unnatural/awkward work-around. (It means that I have to pickle out the list to a file and read in the content of the file in order to have it as a bytearray etc. etc.) M. K. Shen
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2014-04-16 00:08 +1200 |
| Message-ID | <br4lpqFm06tU1@mid.individual.net> |
| In reply to | #70262 |
Mok-Kong Shen wrote: > (It means that I have > to pickle out the list to a file and read in the content of > the file in order to have it as a bytearray etc. etc.) No, you don't -- pickle.dumps() returns the pickled data as a bytes object instead of writing it to a file. -- Greg
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2014-04-15 11:59 +0200 |
| Message-ID | <mailman.9278.1397555958.18130.python-list@python.org> |
| In reply to | #70238 |
Gregory Ewing wrote: > Mok-Kong Shen wrote: >> I have yet a question out of curiosity: Why is my 2nd list structure, >> that apparently is too complex for handling by eval and json, seemingly >> not a problem for pickle? > > Pickle is intended for arbitrary data structures, so it > is designed to be able to handle deeply-nested and/or > recursive data. Eval only has to handle nesting to depths > likely to be encountered in source code. Apparently the > json parser also assumes you're not going to be using > very deep nesting. But pickle does have the same limitation: >>> def check(load, dump): ... items = [] ... try: ... for i in range(10**6): ... assert load(dump(items)) == items ... items = [items] ... except RuntimeError: ... return i ... >>> check(json.loads, json.dumps) 994 >>> check(pickle.loads, pickle.dumps) 499 Mok-Kong Shen, for pickle and json you can increase the limit a bit: >>> import sys >>> sys.setrecursionlimit(2000) >>> check(json.loads, json.dumps) 1994 >>> check(pickle.loads, pickle.dumps) 999 But be careful, if you choose the limit too high you'll see Python react like any other C program: >>> sys.setrecursionlimit(100000) >>> items = [] >>> for i in range(100000): ... items = [items] ... >>> s = pickle.dumps(items) Segmentation fault For literal_eval() the limit is unfortunately hard-coded in the C source. Mok-Kong Shen wrote: > What I need is to have my (complicated) list to be put into > a bytearray, do some proceesing, transfer it to the recipient. > The recipient reverses the processing, obtains a bytearray > that is the same as my original one and gets from it the same > list as mine. Using pickle I can manage to do it (in fact I > did try and succeed with my 2nd list), but that's IMHO a > rather unnatural/awkward work-around. (It means that I have > to pickle out the list to a file and read in the content of > the file in order to have it as a bytearray etc. etc.) The limit has been increased before, see http://bugs.python.org/issue1881 and maybe you can get the core developers to increase it again, but generally speaking the existence of a recursion limit is the price you pay for easy interfacing with C. literal_eval() could allocate its stack dynamically, but that would probably complicate the code so that return on investment is questionable.
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2014-04-14 08:42 -0400 |
| Message-ID | <mailman.9247.1397479021.18130.python-list@python.org> |
| In reply to | #70202 |
Mok-Kong Shen <mok-kong.shen@t-online.de> Wrote in message: > > The code attached below produces in one of the two IMHO similar cases > (excepting the sizes of the lists involved) MemoryError. Could experts > kindly tell why that's so and whether there is any work-around feasible. Where's your stack trace for the error? If it happens that it gets the error in the Ast call, then examine byarray. I expect it's too large or too complex for the compiler. -- DaveA
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web