Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #29356 > unrolled thread
| Started by | Dave Angel <d@davea.name> |
|---|---|
| First post | 2012-09-16 22:12 -0400 |
| Last post | 2012-10-31 19:40 -0500 |
| Articles | 2 — 2 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Python garbage collector/memory manager behaving strangely Dave Angel <d@davea.name> - 2012-09-16 22:12 -0400
Re: Python garbage collector/memory manager behaving strangely Robert Miles <robertmiles@teranews.com> - 2012-10-31 19:40 -0500
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2012-09-16 22:12 -0400 |
| Subject | Re: Python garbage collector/memory manager behaving strangely |
| Message-ID | <mailman.817.1347847997.27098.python-list@python.org> |
On 09/16/2012 09:07 PM, Jadhav, Alok wrote:
> Hi Everyone,
>
>
>
> I have a simple program which reads a large file containing few million
> rows, parses each row (`numpy array`) and converts into an array of
> doubles (`python array`) and later writes into an `hdf5 file`. I repeat
> this loop for multiple days. After reading each file, i delete all the
> objects and call garbage collector. When I run the program, First day
> is parsed without any error but on the second day i get `MemoryError`. I
> monitored the memory usage of my program, during first day of parsing,
> memory usage is around **1.5 GB**. When the first day parsing is
> finished, memory usage goes down to **50 MB**. Now when 2nd day starts
> and i try to read the lines from the file I get `MemoryError`. Following
> is the output of the program.
>
>
>
>
>
> source file extracted at C:\rfadump\au\2012.08.07.txt
>
> parsing started
>
> current time: 2012-09-16 22:40:16.829000
>
> 500000 lines parsed
>
> 1000000 lines parsed
>
> 1500000 lines parsed
>
> 2000000 lines parsed
>
> 2500000 lines parsed
>
> 3000000 lines parsed
>
> 3500000 lines parsed
>
> 4000000 lines parsed
>
> 4500000 lines parsed
>
> 5000000 lines parsed
>
> parsing done.
>
> end time is 2012-09-16 23:34:19.931000
>
> total time elapsed 0:54:03.102000
>
> repacking file
>
> done
>
> > s:\users\aaj\projects\pythonhf\rfadumptohdf.py(132)generateFiles()
>
> -> while single_date <= self.end_date:
>
> (Pdb) c
>
> *** 2012-08-08 ***
>
> source file extracted at C:\rfadump\au\2012.08.08.txt
>
> cought an exception while generating file for day 2012-08-08.
>
> Traceback (most recent call last):
>
> File "rfaDumpToHDF.py", line 175, in generateFile
>
> lines = self.rawfile.read().split('|\n')
>
> MemoryError
>
>
>
> I am very sure that windows system task manager shows the memory usage
> as **50 MB** for this process. It looks like the garbage collector or
> memory manager for Python is not calculating the free memory correctly.
> There should be lot of free memory but it thinks there is not enough.
>
>
>
> Any idea?
>
>
>
> Thanks.
>
>
>
>
>
> Alok Jadhav
>
> CREDIT SUISSE AG
>
> GAT IT Hong Kong, KVAG 67
>
> International Commerce Centre | Hong Kong | Hong Kong
>
> Phone +852 2101 6274 | Mobile +852 9169 7172
>
> alok.jadhav@credit-suisse.com | www.credit-suisse.com
> <http://www.credit-suisse.com/>
>
>
>
Don't blame CPython. You're trying to do a read() of a large file,
which will result in a single large string. Then you split it into
lines. Why not just read it in as lines, in which case the large string
isn't necessary. Take a look at the readlines() function. Chances are
that even that is unnecessary, but i can't tell without seeing more of
the code.
lines = self.rawfile.read().split('|\n')
lines = self.rawfile.readlines()
When a single large item is being allocated, it's not enough to have
sufficient free space, the space also has to be contiguous. After a
program runs for a while, its space naturally gets fragmented more and
more. it's the nature of the C runtime, and CPython is stuck with it.
--
DaveA
[toc] | [next] | [standalone]
| From | Robert Miles <robertmiles@teranews.com> |
|---|---|
| Date | 2012-10-31 19:40 -0500 |
| Message-ID | <uyjks.8230$pn7.2199@newsfe18.iad> |
| In reply to | #29356 |
On 9/16/2012 9:12 PM, Dave Angel wrote: > On 09/16/2012 09:07 PM, Jadhav, Alok wrote: >> Hi Everyone, >> >> >> >> I have a simple program which reads a large file containing few million >> rows, parses each row (`numpy array`) and converts into an array of >> doubles (`python array`) and later writes into an `hdf5 file`. I repeat >> this loop for multiple days. After reading each file, i delete all the >> objects and call garbage collector. When I run the program, First day >> is parsed without any error but on the second day i get `MemoryError`. I >> monitored the memory usage of my program, during first day of parsing, >> memory usage is around **1.5 GB**. When the first day parsing is >> finished, memory usage goes down to **50 MB**. Now when 2nd day starts >> and i try to read the lines from the file I get `MemoryError`. Following >> is the output of the program. Is it a 32-bit program? If so, expect the maximum amount of memory it can use to hold the program, its current dataspace, and images of all the files it has open to be about 3.5 GB, even if it is running on a 64-bit computer with over 4 GB of memory. It seems that 32-bit addresses can only refer to 4 GB of memory, and part of that 4 GB must be used for whatever the operating system needs for running 32-bit programs. With some of the older compilers, only 2 GB can be used for the program; the other 2 GB is reserved for the operating system. How practical would it be to have that program run twice a day? The first time, it should ignore all the data for the second half of the day; the second time, it should ignore all the data for the first half of the day.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web