Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #66338

Re: Generator using item[n-1] + item[n] memory

Path csiph.com!usenet.pasdenom.info!news.albasani.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <ian.g.kelly@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'cpython': 0.05; 'discard': 0.07; 'memory.': 0.07; 'skip:` 10': 0.07; 'stops': 0.07; 'agrees': 0.09; 'arrays': 0.09; 'assuming': 0.09; 'method:': 0.09; 'occasionally': 0.09; 'subject:using': 0.09; 'sys,': 0.09; 'python': 0.11; 'def': 0.12; '3.3,': 0.16; '3:27': 0.16; 'computes': 0.16; 'current:': 0.16; 'elements,': 0.16; 'iteration': 0.16; 'iteration.': 0.16; 'jumps': 0.16; 'line.split()': 0.16; 'nick': 0.16; 'objects.': 0.16; 'peak': 0.16; 'profiling,': 0.16; 'referencing': 0.16; 'resource,': 0.16; 'result[key]': 0.16; 'script,': 0.16; 'subject:item': 0.16; 'usage,': 0.16; 'elements': 0.16; 'skip:# 20': 0.16; 'wrote:': 0.18; 'variable': 0.18; '(but': 0.19; 'subject:] ': 0.20; 'seems': 0.21; '(the': 0.22; 'feb': 0.22; 'memory': 0.22; 'example': 0.22; 'import': 0.22; 'script': 0.25; 'holds': 0.26; 'references': 0.26; 'header:In-Reply-To:1': 0.27; 'function': 0.29; '[1]': 0.29; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'code': 0.31; '3.x': 0.31; 'please.': 0.31; 'python).': 0.31; 'yields': 0.31; 'probably': 0.32; 'run': 0.32; 'text': 0.33; 'running': 0.33; 'alone': 0.33; 'fri,': 0.33; 'period': 0.33; 'skip:# 10': 0.33; 'skip:b 30': 0.33; 'basic': 0.35; "can't": 0.35; 'objects': 0.35; 'test': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'version': 0.36; '14,': 0.36; 'crazy': 0.36; 'data,': 0.36; 'yield': 0.36; 'next': 0.36; "i'll": 0.36; 'similar': 0.36; 'two': 0.37; 'step': 0.37; 'skip:o 20': 0.38; 'process,': 0.38; 'to:addr :python-list': 0.38; 'files': 0.38; 'pm,': 0.38; 'does': 0.39; 'subject:[': 0.39; 'itself': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'skip:p 20': 0.39; 'how': 0.40; 'skip:u 10': 0.60; 'simple,': 0.60; 'free': 0.61; 'back': 0.62; 'save': 0.62; 'such': 0.63; 'stand': 0.64; 'more': 0.64; 'series': 0.66; 'between': 0.67; '20,': 0.68; 'skip:r 40': 0.68; 'results': 0.69; 'saving': 0.69; 'limit': 0.70; 'url:a': 0.72; 'increase': 0.74; 'guaranteed': 0.75; 'sizes:': 0.84; 'usage.': 0.84; 'doubling': 0.91
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=6I3cZEzjpJuLh0o43p8rLqNKQtELb97MGWWoo/Z9Tzw=; b=UmhZJlSHG+jXVy5n7PR2LsCuboaYjALBau9pS+XAiGZxAGkyv3SnQobP/z5bsmmVS4 sAPeZUKJwV6AlqEfg2xLB5fSfxcBIK3s77f4OmzrAHCgmkQWeWF6CGkY71FZLYfazeih IEYMMRfSLF0PZh67goLwj9+1X7vX4MpRg15TNjISCpgusH9D+j6ialpszniLYaHpBis1 b9amS3A8E5lCslkM+tRXF8vSQC8Ekz4xOJRhop1086wJT7vYV9nN7dEjhgH+//RDQiNk MYXK08CmAmabD0DA42wo5b0KHnN2MheO72ii8OFHiA6mHFM0SxbGd4duB4v/QOxgCFEf HNyw==
X-Received by 10.68.213.41 with SMTP id np9mr11993377pbc.90.1392418788854; Fri, 14 Feb 2014 14:59:48 -0800 (PST)
MIME-Version 1.0
In-Reply-To <CAHkxivc-mMfCNU3=hgk72-0pQ98rgcf0-iSMd-Y-m0pf93Dcww@mail.gmail.com>
References <CAHkxivc-mMfCNU3=hgk72-0pQ98rgcf0-iSMd-Y-m0pf93Dcww@mail.gmail.com>
From Ian Kelly <ian.g.kelly@gmail.com>
Date Fri, 14 Feb 2014 15:59:05 -0700
Subject Re: Generator using item[n-1] + item[n] memory
To Python <python-list@python.org>
Content-Type text/plain; charset=ISO-8859-1
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.6943.1392418798.18130.python-list@python.org> (permalink)
Lines 91
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1392418798 news.xs4all.nl 2880 [2001:888:2000:d::a6]:56658
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:66338

Show key headers only | View raw


On Fri, Feb 14, 2014 at 3:27 PM, Nick Timkovich <prometheus235@gmail.com> wrote:
> I have a Python 3.x program that processes several large text files that
> contain sizeable arrays of data that can occasionally brush up against the
> memory limit of my puny workstation.  From some basic memory profiling, it
> seems like when using the generator, the memory usage of my script balloons
> to hold consecutive elements, using up to twice the memory I expect.
>
> I made a simple, stand alone example to test the generator and I get similar
> results in Python 2.7, 3.3, and 3.4.  My test code follows, `memory_usage()`
> is a modifed version of [this function from an SO
> question](http://stackoverflow.com/a/898406/194586) which uses
> `/proc/self/status` and agrees with `top` as I watch it.  `resource` is
> probably a more cross-platform method:
>
> ###############
>
> import sys, resource, gc, time
>
> def biggen():
>     sizes = 1, 1, 10, 1, 1, 10, 10, 1, 1, 10, 10, 20, 1, 1, 20, 20, 1, 1
>     for size in sizes:
>         data = [1] * int(size * 1e6)
>         #time.sleep(1)
>         yield data
>
> def consumer():
>     for data in biggen():
>         rusage = resource.getrusage(resource.RUSAGE_SELF)
>         peak_mb = rusage.ru_maxrss/1024.0
>         print('Peak: {0:6.1f} MB, Data Len: {1:6.1f} M'.format(
>                 peak_mb, len(data)/1e6))
>         #print(memory_usage())
>
>         data = None  # go
>         del data     # away
>         gc.collect() # please.
>
> # def memory_usage():
> #     """Memory usage of the current process, requires /proc/self/status"""
> #     # http://stackoverflow.com/a/898406/194586
> #     result = {'peak': 0, 'rss': 0}
> #     for line in open('/proc/self/status'):
> #         parts = line.split()
> #         key = parts[0][2:-1].lower()
> #         if key in result:
> #             result[key] = int(parts[1])/1024.0
> #     return 'Peak: {peak:6.1f} MB, Current: {rss:6.1f} MB'.format(**result)
>
> print(sys.version)
> consumer()
>
> ###############
>
> In practice I'll process data coming from such a generator loop, saving just
> what I need, then discard it.
>
> When I run the above script, and two large elements come in series (the data
> size can be highly variable), it seems like Python computes the next before
> freeing the previous, leading to up to double the memory usage.
>
> [...]
>
> The crazy belt-and-suspenders-and-duct-tape approach `data = None`, `del
> data`, and `gc.collect()` does nothing.

Because at the time you call gc.collect(), the generator still holds a
reference to the data, so it can't be collected.  Assuming this is
running in CPython and there are no reference cycles in the data, the
collection is unnecessary anyway, since CPython will automatically
free the data immediately when there are no references (but this is
not guaranteed for other implementations of Python).

> I'm pretty sure the generator itself is not doubling up on memory because
> otherwise a single large value it yields would increase the peak usage, and
> in the *same iteration* a large object appeared; it's only large consecutive
> objects.

Look again.  What happens to the data between two iterations of the generator?

1) data variable holds the data from the prior iteration
2) the loop jumps back up to the top
3) the data for the next iteration is constructed
4) the data for the next iteration is assigned to the data variable

It is not until step 4 that the variable stops referencing the data
from the prior iteration.  So there is a brief period where both of
these objects must still be in memory.

> How can I save my memory?

Try unreferencing the data in the generator at the end of each iteration.

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Generator using item[n-1] + item[n] memory Ian Kelly <ian.g.kelly@gmail.com> - 2014-02-14 15:59 -0700

csiph-web