Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #30005
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <junkshops@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'url:pypi': 0.03; 'url:sourceforge': 0.03; 'licenses': 0.04; 'output': 0.04; 'discard': 0.05; 'lines,': 0.05; 'memory.': 0.05; 'bits': 0.07; 'raised': 0.07; 'reason,': 0.07; 'sized': 0.07; 'python': 0.09; 'currently,': 0.09; 'derived': 0.09; 'runtime': 0.09; 'cc:addr :python-list': 0.10; 'stored': 0.10; 'url:)': 0.13; 'file,': 0.15; '10x': 0.16; 'blocks': 0.16; 'carriage': 0.16; 'heap,': 0.16; 'next.': 0.16; 'storing': 0.16; 'subject:usage': 0.16; 'tax.': 0.16; 'virtualbox': 0.16; 'wrote:': 0.17; 'tim': 0.18; 'memory': 0.18; 'windows': 0.19; 'are:': 0.20; 'causing': 0.20; 'file.': 0.20; 'all,': 0.21; 'bit': 0.21; 'sorry,': 0.22; 'cheers,': 0.23; 'cc:2**0': 0.23; 'seems': 0.23; 'thus': 0.24; 'cc:no real name:2**0': 0.24; 'idea': 0.24; 'linux': 0.24; 'machine': 0.24; 'tried': 0.25; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'have,': 0.27; "doesn't": 0.28; 'diagnose': 0.29; 'subject:per': 0.29; 'character': 0.29; 'objects': 0.29; "i'm": 0.29; 'that.': 0.30; 'normally': 0.30; 'performing': 0.30; 'figure': 0.30; 'on,': 0.30; 'expect': 0.31; 'code': 0.31; 'point': 0.31; '(and': 0.32; 'gets': 0.32; 'url:python': 0.32; 'file': 0.32; 'received:209.85.160.46': 0.32; 'could': 0.32; 'hopefully': 0.33; 'much.': 0.33; 'ubuntu': 0.33; 'everyone': 0.33; 'that,': 0.34; 'received:google.com': 0.34; 'loss': 0.34; 'thanks': 0.34; 'consistent': 0.35; 'massive': 0.35; 'problem,': 0.35; 'so,': 0.35; 'pm,': 0.35; 'received:209.85': 0.35; 'really': 0.36; 'ability': 0.36; 'but': 0.36; 'message- id:@gmail.com': 0.36; 'url:org': 0.36; 'level.': 0.36; 'loaded': 0.36; "didn't": 0.36; 'anything': 0.36; 'should': 0.36; 'enough': 0.36; 'reported': 0.37; 'does': 0.37; 'two': 0.37; 'uses': 0.37; 'virtual': 0.37; 'received:209': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'files': 0.38; 'mean': 0.38; 'object': 0.38; 'some': 0.38; 'system.': 0.39; 'received:192': 0.39; 'received:192.168': 0.40; 'header:Received:5': 0.40; 'help': 0.40; 'think': 0.40; 'most': 0.61; 'map': 0.61; 'kind': 0.61; 'free': 0.61; 'back': 0.62; 'strange': 0.62; 'provide': 0.62; 'more': 0.63; 'confirm': 0.64; 'taking': 0.65; 'price': 0.66; 'fact,': 0.69; 'million': 0.72; 'low': 0.83; 'confusing': 0.84; 'destruction,': 0.84; 'hanging': 0.84; 'manages': 0.84; "shouldn't,": 0.84; 'specs:': 0.84; 'stores.': 0.84; 'preventing': 0.91; 'angel': 0.93; 'anywhere,': 0.93 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=X3r/60AmkWQ0DdPX+X8AmJMb81/5RwVDPeh6y6JR7mQ=; b=vrpQQj6UtzvCk/Z+P//89RJn4wxjHfm9bPMNVr8Iw//7pn6gBbRShJMHMdWwz2FH0E ZOzxT/eZdhv/p4oM9TDrQuGCpD2z9io8LveS0OTS5mevwCl7R1ApF7xwJp4ucCCbIgsT pduoYWfJ8cMeglasDthXsMEOaSq0F9dkISZeKIlLnJwQoFDjTEt2vEQTp2LyTOL+17QB YBdN9n7lcqsnL+b+Dvytta3OxOB0E1VkYTA99j9Yd1UJnSdZhtd3FpoAKxq6nWgY0qSa mphRxsW3rN4aSn620AqvJ6YD8IfMt2AYWTfq6gHq9fF5lQgot0/0mwVWIR+PwGGYtKTO ySqQ== |
| Date | Mon, 24 Sep 2012 21:21:05 -0700 |
| From | Junkshops <junkshops@gmail.com> |
| User-Agent | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20111105 Thunderbird/8.0 |
| MIME-Version | 1.0 |
| To | d@davea.name |
| Subject | Re: Memory usage per top 10x usage per heapy |
| References | <983c532f-3ff6-4bd2-bb48-07cf4d065a4b@googlegroups.com> <5061056F.6080702@davea.name> |
| In-Reply-To | <5061056F.6080702@davea.name> |
| Content-Type | text/plain; charset=ISO-8859-1; format=flowed |
| Content-Transfer-Encoding | 7bit |
| Cc | python-list@python.org |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.1267.1348546870.27098.python-list@python.org> (permalink) |
| Lines | 58 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1348546870 news.xs4all.nl 6935 [2001:888:2000:d::a6]:45464 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:30005 |
Show key headers only | View raw
> Just curious; which is it, two million lines, or half a million bytes? I have, in fact, this very afternoon, invented a means of writing a carriage return character using only 2 bits of information. I am prepared to sell licenses to this revolutionary technology for the low price of $29.95 plus tax. Sorry, that should've been a 500Mb, 2M line file. > which machine is 2gb, the Windows machine, or the VM? VM. Winders is 4gb. > ...but I would point out that just because > you free up the memory from the Python doesn't mean it gets released > back to the system. The C runtime manages its own heap, and is pretty > persistent about hanging onto memory once obtained. It's not normally a > problem, since most small blocks are reused. But it can get > fragmented. And i have no idea how well Virtual Box maps the Linux > memory map into the Windows one. Right, I understand that - but what's confusing me is that, given the memory use is (I assume) monotonically increasing, the code should never use more than what's reported by heapy once all the data is loaded into memory, given that memory released by the code to the Python runtime is reused. To the best of my ability to tell I'm not storing anything I shouldn't, so the only thing I can think of is that all the object creation and destruction, for some reason, it preventing reuse of memory. I'm at a bit of a loss regarding what to try next. Cheers, MrsE On 9/24/2012 6:14 PM, Dave Angel wrote: > On 09/24/2012 05:59 PM, MrsEntity wrote: >> Hi all, >> >> I'm working on some code that parses a 500kb, 2M line file > Just curious; which is it, two million lines, or half a million bytes? > >> line by line and saves, per line, some derived strings into various data structures. I thus expect that memory use should monotonically increase. Currently, the program is taking up so much memory - even on 1/2 sized files - that on 2GB machine > which machine is 2gb, the Windows machine, or the VM? You could get > thrashing at either level. > >> I'm thrashing swap. What's strange is that heapy (http://guppy-pe.sourceforge.net/) is showing that the code uses about 10x less memory than reported by top, and the heapy data seems consistent with what I was expecting based on the objects the code stores. I tried using memory_profiler (http://pypi.python.org/pypi/memory_profiler) but it didn't really provide any illuminating information. The code does create and discard a number of objects per line of the file, but they should not be stored anywhere, and heapy seems to confirm that. So, my questions are: >> >> 1) For those of you kind enough to help me figure out what's going on, what additional data would you like? I didn't want swamp everyone with the code and heapy/memory_profiler output but I can do so if it's valuable. >> 2) How can I diagnose (and hopefully fix) what's causing the massive memory usage when it appears, from heapy, that the code is performing reasonably? >> >> Specs: Ubuntu 12.04 in Virtualbox on Win7/64, Python 2.7/64 >> >> Thanks very much. > Tim raised most of my concerns, but I would point out that just because > you free up the memory from the Python doesn't mean it gets released > back to the system. The C runtime manages its own heap, and is pretty > persistent about hanging onto memory once obtained. It's not normally a > problem, since most small blocks are reused. But it can get > fragmented. And i have no idea how well Virtual Box maps the Linux > memory map into the Windows one. > > >
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Memory usage per top 10x usage per heapy MrsEntity <junkshops@gmail.com> - 2012-09-24 14:59 -0700
Re: Memory usage per top 10x usage per heapy Tim Chase <python.list@tim.thechases.com> - 2012-09-24 18:22 -0500
Re: Memory usage per top 10x usage per heapy Junkshops <junkshops@gmail.com> - 2012-09-24 16:58 -0700
Re: Memory usage per top 10x usage per heapy bryanjugglercryptographer@yahoo.com - 2012-09-27 01:00 -0700
Re: Memory usage per top 10x usage per heapy bryanjugglercryptographer@yahoo.com - 2012-09-27 01:00 -0700
Re: Memory usage per top 10x usage per heapy Dave Angel <d@davea.name> - 2012-09-24 21:14 -0400
Re: Memory usage per top 10x usage per heapy Junkshops <junkshops@gmail.com> - 2012-09-24 21:21 -0700
Re: Memory usage per top 10x usage per heapy Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-09-25 00:41 -0400
Re: Memory usage per top 10x usage per heapy Tim Chase <python.list@tim.thechases.com> - 2012-09-25 05:51 -0500
Re: Memory usage per top 10x usage per heapy Dave Angel <d@davea.name> - 2012-09-25 07:06 -0400
Re: Memory usage per top 10x usage per heapy Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-25 12:10 +0100
Re: gracious responses (was: Memory usage per top 10x usage per heapy) Tim Chase <python.list@tim.thechases.com> - 2012-09-25 06:40 -0500
Re: gracious responses (was: Memory usage per top 10x usage per heapy) alex23 <wuwei23@gmail.com> - 2012-09-25 05:44 -0700
Re: gracious responses Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-25 13:53 +0100
Re: gracious responses Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-25 12:54 +0100
Re: gracious responses Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-25 15:17 +0000
Re: Memory usage per top 10x usage per heapy Dave Angel <d@davea.name> - 2012-09-25 14:50 -0400
Re: Memory usage per top 10x usage per heapy Junkshops <junkshops@gmail.com> - 2012-09-25 14:02 -0700
Re: Memory usage per top 10x usage per heapy Junkshops <junkshops@gmail.com> - 2012-09-25 14:35 -0700
Re: Memory usage per top 10x usage per heapy Tim Chase <python.list@tim.thechases.com> - 2012-09-25 17:10 -0500
Re: Memory usage per top 10x usage per heapy Ian Kelly <ian.g.kelly@gmail.com> - 2012-09-25 16:09 -0600
Re: Memory usage per top 10x usage per heapy Tim Chase <python.list@tim.thechases.com> - 2012-09-25 18:35 -0500
csiph-web