Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #29358

RE: Python garbage collector/memory manager behaving strangely

Path csiph.com!usenet.pasdenom.info!news.albasani.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <alok.jadhav@credit-suisse.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'exception': 0.03; 'output': 0.04; 'string.': 0.04; 'subject:Python': 0.05; 'cpython': 0.05; 'lines,': 0.05; 'skip:` 10': 0.05; 'sufficient': 0.05; 'converts': 0.07; 'extracted': 0.07; 'finished,': 0.07; 'lines.': 0.07; 'parsing': 0.07; 'python': 0.09; 'calculating': 0.09; 'doubles': 0.09; 'garbage': 0.09; 'idea?': 0.09; 'naturally': 0.09; 'parsed': 0.09; 'read()': 0.09; 'rows,': 0.09; 'cc:addr:python-list': 0.10; 'skip:= 70': 0.10; 'file,': 0.15; '(pdb)': 0.16; '10:13': 0.16; 'enough.': 0.16; 'hungry': 0.16; 'hyperlink': 0.16; 'row': 0.16; 'subprocess': 0.16; 'later': 0.16; 'string': 0.17; 'wrote:': 0.17; 'everyone,': 0.17; 'message-----': 0.17; 'memory': 0.18; 'windows': 0.19; 'code.': 0.20; 'trying': 0.21; 'thanks.': 0.21; 'runs': 0.22; 'cc:2**0': 0.23; 'monday,': 0.23; 'specified': 0.23; 'split': 0.23; 'task': 0.23; 'cc:no real name:2**0': 0.24; 'second': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'looks': 0.26; '(most': 0.27; 'done.': 0.27; 'necessary.': 0.27; 'skip:s 60': 0.27; 'lines': 0.28; 'subject:/': 0.28; 'run': 0.28; 'behaving': 0.29; 'thinks': 0.29; 'time:': 0.29; 'array': 0.29; 'objects': 0.29; 'starts': 0.29; 'source': 0.29; 'writes': 0.30; 'error': 0.30; 'gets': 0.32; 'file': 0.32; 'space,': 0.32; 'skip:- 10': 0.32; 'goes': 0.33; 'skip:s 30': 0.33; 'correctly.': 0.33; 'function.': 0.33; 'monitored': 0.33; 'traceback': 0.33; "can't": 0.34; 'program,': 0.34; 'done': 0.34; 'nature': 0.35; 'pm,': 0.35; 'there': 0.35; 'subject:': 0.36; 'but': 0.36; 'email addr:python.org': 0.36; 'should': 0.36; 'charset:us-ascii': 0.36; 'enough': 0.36; 'being': 0.37; 'why': 0.37; 'item': 0.37; 'sent:': 0.37; 'subject:: ': 0.38; 'from:': 0.38; 'skip:o 20': 0.38; 'sure': 0.38; 'delete': 0.38; 'shows': 0.38; 'url:en': 0.38; 'release': 0.39; 'space': 0.39; 'skip:" 10': 0.40; 'end': 0.40; 'link': 0.60; 'containing': 0.61; 'first': 0.61; 'free': 0.61; 'more.': 0.62; 'email name :python-list': 0.62; 'repeat': 0.62; 'more': 0.63; 'total': 0.65; 'stuck': 0.65; 'phone': 0.68; 'received:199': 0.71; 'million': 0.72; 'day': 0.73; 'hong': 0.91; 'kong': 0.91; '***': 0.93; 'angel': 0.93; 'commerce': 0.93
Content-class urn:content-classes:message
MIME-Version 1.0
Content-Type text/plain; charset=US-ASCII
Content-Transfer-Encoding 7bit
X-MimeOLE Produced By Microsoft Exchange V6.5
Subject RE: Python garbage collector/memory manager behaving strangely
Date Mon, 17 Sep 2012 10:49:19 +0800
In-Reply-To <5056871E.7050206@davea.name>
X-MS-Has-Attach
X-MS-TNEF-Correlator
Thread-Topic Python garbage collector/memory manager behaving strangely
Thread-Index Ac2UegOwkmMWnCNMTty7kundzNhRDgABPR9A
References <CEE8C35195DB944D9C75ABB15A04193B14E77085@EHKG17P32001A.csfb.cs-group.com> <5056871E.7050206@davea.name>
From "Jadhav, Alok" <alok.jadhav@credit-suisse.com>
To <d@davea.name>
X-OriginalArrivalTime 17 Sep 2012 02:49:22.0412 (UTC) FILETIME=[0AA782C0:01CD947F]
Cc python-list@python.org
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.819.1347850176.27098.python-list@python.org> (permalink)
Lines 164
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1347850176 news.xs4all.nl 6880 [2001:888:2000:d::a6]:41736
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:29358

Show key headers only | View raw


I am thinking of calling a new subprocess which will do the memory
hungry job and then release the memory as specified in the link below

http://stackoverflow.com/questions/1316767/how-can-i-explicitly-free-mem
ory-in-python/1316799#1316799

Regards,
Alok



-----Original Message-----
From: Dave Angel [mailto:d@davea.name] 
Sent: Monday, September 17, 2012 10:13 AM
To: Jadhav, Alok
Cc: python-list@python.org
Subject: Re: Python garbage collector/memory manager behaving strangely

On 09/16/2012 09:07 PM, Jadhav, Alok wrote:
> Hi Everyone,
>
>  
>
> I have a simple program which reads a large file containing few
million
> rows, parses each row (`numpy array`) and converts into an array of
> doubles (`python array`) and later writes into an `hdf5 file`. I
repeat
> this loop for multiple days. After reading each file, i delete all the
> objects and call garbage collector.  When I run the program, First day
> is parsed without any error but on the second day i get `MemoryError`.
I
> monitored the memory usage of my program, during first day of parsing,
> memory usage is around **1.5 GB**. When the first day parsing is
> finished, memory usage goes down to **50 MB**. Now when 2nd day starts
> and i try to read the lines from the file I get `MemoryError`.
Following
> is the output of the program.
>
>  
>
>  
>
>     source file extracted at C:\rfadump\au\2012.08.07.txt
>
>     parsing started
>
>     current time: 2012-09-16 22:40:16.829000
>
>     500000 lines parsed
>
>     1000000 lines parsed
>
>     1500000 lines parsed
>
>     2000000 lines parsed
>
>     2500000 lines parsed
>
>     3000000 lines parsed
>
>     3500000 lines parsed
>
>     4000000 lines parsed
>
>     4500000 lines parsed
>
>     5000000 lines parsed
>
>     parsing done.
>
>     end time is 2012-09-16 23:34:19.931000
>
>     total time elapsed 0:54:03.102000
>
>     repacking file
>
>     done
>
>     >
s:\users\aaj\projects\pythonhf\rfadumptohdf.py(132)generateFiles()
>
>     -> while single_date <= self.end_date:
>
>     (Pdb) c
>
>     *** 2012-08-08 ***
>
>     source file extracted at C:\rfadump\au\2012.08.08.txt
>
>     cought an exception while generating file for day 2012-08-08.
>
>     Traceback (most recent call last):
>
>       File "rfaDumpToHDF.py", line 175, in generateFile
>
>         lines = self.rawfile.read().split('|\n')
>
>     MemoryError
>
>  
>
> I am very sure that windows system task manager shows the memory usage
> as **50 MB** for this process. It looks like the garbage collector or
> memory manager for Python is not calculating the free memory
correctly.
> There should be lot of free memory but it thinks there is not enough. 
>
>  
>
> Any idea?
>
>  
>
> Thanks.
>
>  
>
>  
>
> Alok Jadhav
>
> CREDIT SUISSE AG
>
> GAT IT Hong Kong, KVAG 67
>
> International Commerce Centre | Hong Kong | Hong Kong
>
> Phone +852 2101 6274 | Mobile +852 9169 7172
>
> alok.jadhav@credit-suisse.com | www.credit-suisse.com
> <http://www.credit-suisse.com/> 
>
>  
>

Don't blame CPython.  You're trying to do a read() of a large file,
which will result in a single large string.  Then you split it into
lines.  Why not just read it in as lines, in which case the large string
isn't necessary.   Take a look at the readlines() function.  Chances are
that even that is unnecessary, but i can't tell without seeing more of
the code.

  lines = self.rawfile.read().split('|\n')

   lines = self.rawfile.readlines()

When a single large item is being allocated, it's not enough to have
sufficient free space, the space also has to be contiguous.  After a
program runs for a while, its space naturally gets fragmented more and
more.  it's the nature of the C runtime, and CPython is stuck with it.



-- 

DaveA


=============================================================================== 
Please access the attached hyperlink for an important electronic communications disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=============================================================================== 

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

RE: Python garbage collector/memory manager behaving strangely "Jadhav, Alok" <alok.jadhav@credit-suisse.com> - 2012-09-17 10:49 +0800

csiph-web