Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.002 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'url:pypi': 0.03; 'url:sourceforge': 0.03; 'output': 0.04; 'discard': 0.05; 'lines,': 0.05; 'raised': 0.07; 'sized': 0.07; 'python': 0.09; 'currently,': 0.09; 'derived': 0.09; 'runtime': 0.09; 'cc:addr :python-list': 0.10; 'stored': 0.10; 'url:)': 0.13; 'file,': 0.15; '10x': 0.16; 'blocks': 0.16; 'heap,': 0.16; 'subject:usage': 0.16; 'virtualbox': 0.16; 'wrote:': 0.17; 'tim': 0.18; 'memory': 0.18; 'windows': 0.19; 'are:': 0.20; 'causing': 0.20; 'all,': 0.21; 'cc:2**0': 0.23; 'seems': 0.23; 'thus': 0.24; 'cc:no real name:2**0': 0.24; 'idea': 0.24; 'linux': 0.24; 'machine': 0.24; 'tried': 0.25; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; "doesn't": 0.28; 'diagnose': 0.29; 'subject:per': 0.29; 'objects': 0.29; "i'm": 0.29; 'that.': 0.30; 'normally': 0.30; 'performing': 0.30; 'figure': 0.30; 'on,': 0.30; 'expect': 0.31; 'code': 0.31; 'point': 0.31; '(and': 0.32; 'gets': 0.32; 'url:python': 0.32; 'file': 0.32; 'could': 0.32; 'hopefully': 0.33; 'much.': 0.33; 'ubuntu': 0.33; 'everyone': 0.33; 'thanks': 0.34; 'consistent': 0.35; 'massive': 0.35; 'problem,': 0.35; 'so,': 0.35; 'pm,': 0.35; 'really': 0.36; 'but': 0.36; 'url:org': 0.36; 'level.': 0.36; "didn't": 0.36; 'should': 0.36; 'enough': 0.36; 'reported': 0.37; 'does': 0.37; 'two': 0.37; 'uses': 0.37; 'virtual': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'files': 0.38; 'mean': 0.38; 'some': 0.38; 'system.': 0.39; 'received:192': 0.39; 'received:192.168': 0.40; 'help': 0.40; 'most': 0.61; 'map': 0.61; 'kind': 0.61; 'free': 0.61; 'back': 0.62; 'strange': 0.62; 'provide': 0.62; 'confirm': 0.64; 'taking': 0.65; 'header:Reply-To:1': 0.68; 'received:74.208': 0.71; 'million': 0.72; 'reply-to:no real name:2**0': 0.72; 'hanging': 0.84; 'manages': 0.84; 'received:74.208.4.194': 0.84; 'specs:': 0.84; 'stores.': 0.84; 'anywhere,': 0.93 Date: Mon, 24 Sep 2012 21:14:23 -0400 From: Dave Angel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: MrsEntity Subject: Re: Memory usage per top 10x usage per heapy References: <983c532f-3ff6-4bd2-bb48-07cf4d065a4b@googlegroups.com> In-Reply-To: <983c532f-3ff6-4bd2-bb48-07cf4d065a4b@googlegroups.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:qIlFQCLKD5rJ0RHUp6ulwc1vdkRDNYTmzR8IUPDyoyQ /TeAvdetacMbZedkrvpIJeyVj+vAiUGBa1xG+llCsjvN9nFLwU 9OxWazmZD8ca2lpWAxqQfcV6dLQfBOpQMepMap+qZKgCdqZ9zS RbkMm48+IV7X7R8aUSHLcoAYjeFKgtOW+SebS6cCY+jGhKkZ66 YJueFK2T5ugi2WBpuFxZLTjuEAJ+pNIQhCj6+LDudAxDlpotIm XF3ITDLif+RUBTz5+ANg5T9PakWZiQOiM5KqhMR4QmhNLsXCeh Zwt0HqhCSalufqqPROrFVpTXeTJ01gG11RWpuK1PkZIcFjufg= = Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: d@davea.name List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 35 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1348535702 news.xs4all.nl 6869 [2001:888:2000:d::a6]:35385 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:29987 On 09/24/2012 05:59 PM, MrsEntity wrote: > Hi all, > > I'm working on some code that parses a 500kb, 2M line file Just curious; which is it, two million lines, or half a million bytes? > line by line and saves, per line, some derived strings into various data structures. I thus expect that memory use should monotonically increase. Currently, the program is taking up so much memory - even on 1/2 sized files - that on 2GB machine which machine is 2gb, the Windows machine, or the VM? You could get thrashing at either level. > I'm thrashing swap. What's strange is that heapy (http://guppy-pe.sourceforge.net/) is showing that the code uses about 10x less memory than reported by top, and the heapy data seems consistent with what I was expecting based on the objects the code stores. I tried using memory_profiler (http://pypi.python.org/pypi/memory_profiler) but it didn't really provide any illuminating information. The code does create and discard a number of objects per line of the file, but they should not be stored anywhere, and heapy seems to confirm that. So, my questions are: > > 1) For those of you kind enough to help me figure out what's going on, what additional data would you like? I didn't want swamp everyone with the code and heapy/memory_profiler output but I can do so if it's valuable. > 2) How can I diagnose (and hopefully fix) what's causing the massive memory usage when it appears, from heapy, that the code is performing reasonably? > > Specs: Ubuntu 12.04 in Virtualbox on Win7/64, Python 2.7/64 > > Thanks very much. Tim raised most of my concerns, but I would point out that just because you free up the memory from the Python doesn't mean it gets released back to the system. The C runtime manages its own heap, and is pretty persistent about hanging onto memory once obtained. It's not normally a problem, since most small blocks are reused. But it can get fragmented. And i have no idea how well Virtual Box maps the Linux memory map into the Windows one. -- DaveA