Groups > comp.lang.python > #29356 > unrolled thread

Re: Python garbage collector/memory manager behaving strangely

Started by	Dave Angel <d@davea.name>
First post	2012-09-16 22:12 -0400
Last post	2012-10-31 19:40 -0500
Articles	2 — 2 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Python garbage collector/memory manager behaving strangely Dave Angel <d@davea.name> - 2012-09-16 22:12 -0400
    Re: Python garbage collector/memory manager behaving strangely Robert Miles <robertmiles@teranews.com> - 2012-10-31 19:40 -0500

#29356 — Re: Python garbage collector/memory manager behaving strangely

From	Dave Angel <d@davea.name>
Date	2012-09-16 22:12 -0400
Subject	Re: Python garbage collector/memory manager behaving strangely
Message-ID	<mailman.817.1347847997.27098.python-list@python.org>

On 09/16/2012 09:07 PM, Jadhav, Alok wrote:
> Hi Everyone,
>
>  
>
> I have a simple program which reads a large file containing few million
> rows, parses each row (`numpy array`) and converts into an array of
> doubles (`python array`) and later writes into an `hdf5 file`. I repeat
> this loop for multiple days. After reading each file, i delete all the
> objects and call garbage collector.  When I run the program, First day
> is parsed without any error but on the second day i get `MemoryError`. I
> monitored the memory usage of my program, during first day of parsing,
> memory usage is around **1.5 GB**. When the first day parsing is
> finished, memory usage goes down to **50 MB**. Now when 2nd day starts
> and i try to read the lines from the file I get `MemoryError`. Following
> is the output of the program.
>
>  
>
>  
>
>     source file extracted at C:\rfadump\au\2012.08.07.txt
>
>     parsing started
>
>     current time: 2012-09-16 22:40:16.829000
>
>     500000 lines parsed
>
>     1000000 lines parsed
>
>     1500000 lines parsed
>
>     2000000 lines parsed
>
>     2500000 lines parsed
>
>     3000000 lines parsed
>
>     3500000 lines parsed
>
>     4000000 lines parsed
>
>     4500000 lines parsed
>
>     5000000 lines parsed
>
>     parsing done.
>
>     end time is 2012-09-16 23:34:19.931000
>
>     total time elapsed 0:54:03.102000
>
>     repacking file
>
>     done
>
>     > s:\users\aaj\projects\pythonhf\rfadumptohdf.py(132)generateFiles()
>
>     -> while single_date <= self.end_date:
>
>     (Pdb) c
>
>     *** 2012-08-08 ***
>
>     source file extracted at C:\rfadump\au\2012.08.08.txt
>
>     cought an exception while generating file for day 2012-08-08.
>
>     Traceback (most recent call last):
>
>       File "rfaDumpToHDF.py", line 175, in generateFile
>
>         lines = self.rawfile.read().split('|\n')
>
>     MemoryError
>
>  
>
> I am very sure that windows system task manager shows the memory usage
> as **50 MB** for this process. It looks like the garbage collector or
> memory manager for Python is not calculating the free memory correctly.
> There should be lot of free memory but it thinks there is not enough. 
>
>  
>
> Any idea?
>
>  
>
> Thanks.
>
>  
>
>  
>
> Alok Jadhav
>
> CREDIT SUISSE AG
>
> GAT IT Hong Kong, KVAG 67
>
> International Commerce Centre | Hong Kong | Hong Kong
>
> Phone +852 2101 6274 | Mobile +852 9169 7172
>
> alok.jadhav@credit-suisse.com | www.credit-suisse.com
> <http://www.credit-suisse.com/> 
>
>  
>

Don't blame CPython.  You're trying to do a read() of a large file,
which will result in a single large string.  Then you split it into
lines.  Why not just read it in as lines, in which case the large string
isn't necessary.   Take a look at the readlines() function.  Chances are
that even that is unnecessary, but i can't tell without seeing more of
the code.

  lines = self.rawfile.read().split('|\n')

   lines = self.rawfile.readlines()

When a single large item is being allocated, it's not enough to have
sufficient free space, the space also has to be contiguous.  After a
program runs for a while, its space naturally gets fragmented more and
more.  it's the nature of the C runtime, and CPython is stuck with it.



-- 

DaveA

[toc] | [next] | [standalone]

#32546

From	Robert Miles <robertmiles@teranews.com>
Date	2012-10-31 19:40 -0500
Message-ID	<uyjks.8230$pn7.2199@newsfe18.iad>
In reply to	#29356

On 9/16/2012 9:12 PM, Dave Angel wrote:
> On 09/16/2012 09:07 PM, Jadhav, Alok wrote:
>> Hi Everyone,
>>
>>
>>
>> I have a simple program which reads a large file containing few million
>> rows, parses each row (`numpy array`) and converts into an array of
>> doubles (`python array`) and later writes into an `hdf5 file`. I repeat
>> this loop for multiple days. After reading each file, i delete all the
>> objects and call garbage collector.  When I run the program, First day
>> is parsed without any error but on the second day i get `MemoryError`. I
>> monitored the memory usage of my program, during first day of parsing,
>> memory usage is around **1.5 GB**. When the first day parsing is
>> finished, memory usage goes down to **50 MB**. Now when 2nd day starts
>> and i try to read the lines from the file I get `MemoryError`. Following
>> is the output of the program.

Is it a 32-bit program?  If so, expect the maximum amount of memory it
can use to hold the program, its current dataspace, and images of all
the files it has open to be about 3.5 GB, even if it is running on a
64-bit computer with over 4 GB of memory.  It seems that 32-bit
addresses can only refer to 4 GB of memory, and part of that 4 GB
must be used for whatever the operating system needs for running
32-bit programs.  With some of the older compilers, only 2 GB can be
used for the program; the other 2 GB is reserved for the operating system.

How practical would it be to have that program run twice a day?
The first time, it should ignore all the data for the second half
of the day; the second time, it should ignore all the data for the
first half of the day.

[toc] | [prev] | [standalone]

csiph-web

Re: Python garbage collector/memory manager behaving strangely

Contents

#29356 — Re: Python garbage collector/memory manager behaving strangely

#32546