Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #39104 > unrolled thread

memory management

Started bySudheer Joseph <sjo.india@gmail.com>
First post2013-02-18 07:29 -0800
Last post2013-02-18 17:13 -0800
Articles 4 — 2 participants

Back to article view | Back to comp.lang.python


Contents

  memory management Sudheer Joseph <sjo.india@gmail.com> - 2013-02-18 07:29 -0800
    Re: memory management Dave Angel <davea@davea.name> - 2013-02-18 19:10 -0500
      Re: memory management Sudheer Joseph <sjo.india@gmail.com> - 2013-02-18 17:13 -0800
      Re: memory management Sudheer Joseph <sjo.india@gmail.com> - 2013-02-18 17:13 -0800

#39104 — memory management

FromSudheer Joseph <sjo.india@gmail.com>
Date2013-02-18 07:29 -0800
Subjectmemory management
Message-ID<acd5afba-fa4f-4736-b941-778c646e20bf@googlegroups.com>
HI,
        I have been trying to compute cross correlation between a time series at a location f(1) and the timeseries of spatial data f(XYT) and saving the resulting correlation coefficients and lags in a 3 dimensional array which is of fairly big size. Though the code I made for this purpose works up to few iterations then it hangs due to apparent memory crunch. Can anybody suggest a better way to handle this situation so that the computation and data storing can be done with out hangups. Finally I intend to save the data as netcdf file which is not implemented as of now. Below is the piece of code I wrote for this purpose.

from mpl_toolkits.basemap import Basemap as bm, shiftgrid, cm
import numpy as np
import matplotlib.pyplot as plt
from netCDF4 import Dataset
from math import pow, sqrt
import sys
from scipy.stats import t
indep=120
nlags=365
ncin = Dataset('qu_ru.nc', 'r')
lons = ncin.variables['LON421_600'][:]
lats = ncin.variables['LAT81_220'][:]
dep = ncin.variables['DEPTH1_29'][:]
adep=(dep==indep).nonzero()
didx=int(adep[0])
qu = ncin.variables['qu'][:,:,:]
#qv = ncin.variables['QV'][0,:,:]
ru = ncin.variables['ru'][:,didx,0,0]
ncin.close()
fig = plt.figure()
ax = fig.add_axes([0.1,0.1,0.8,0.8])
# use major and minor sphere radii from WGS84 ellipsoid.
m = bm(projection='cyl', llcrnrlon=30, llcrnrlat=-40,urcrnrlon=120, urcrnrlat=30)
# transform to nx x ny regularly spaced 5km native projection grid
nx = int((m.xmax-m.xmin))+1; ny = int((m.ymax-m.ymin)+1)
q=ru[1:2190]
qmean=np.mean(q)
qstd=np.std(q)
qnorm=(q-qmean)/qstd
lags3d=np.arange(731*140*180).reshape(731,140,180)
r3d=np.arange(731*140*180).reshape(731,140,180)
for i in np.arange(len(lons)):
   for j in np.arange(len(lats)):
      print i,j
      p=qu[1:2190,j,i].squeeze()
      p.shape
      pmean=np.mean(p)
      pstd=np.std(p)
      pnorm=(p-pmean)/pstd
      n=len(p)
#      fg=plt.figure()
      c=plt.xcorr(p,q,usevlines=True,maxlags=nlags,normed=True,lw=2)
      acp=plt.acorr(p,usevlines=True,maxlags=nlags,normed=True,lw=2)
      acq=plt.acorr(q,usevlines=True,maxlags=nlags,normed=True,lw=2)
      acp[1][nlags]=0
      acq[1][nlags]=0
      lags=c[0]
      r=c[1]
      lags3d[:,j,i]=lags
      r3d[:,j,i]=r 

[toc] | [next] | [standalone]


#39143

FromDave Angel <davea@davea.name>
Date2013-02-18 19:10 -0500
Message-ID<mailman.1983.1361232670.2939.python-list@python.org>
In reply to#39104
On 02/18/2013 10:29 AM, Sudheer Joseph wrote:
> HI,
>          I have been trying to compute cross correlation between a time series at a location f(1) and the timeseries of spatial data f(XYT) and saving the resulting correlation coefficients and lags in a 3 dimensional array which is of fairly big size. Though the code I made for this purpose works up to few iterations then it hangs due to apparent memory crunch. Can anybody suggest a better way to handle this situation so that the computation and data storing can be done with out hangups. Finally I intend to save the data as netcdf file which is not implemented as of now. Below is the piece of code I wrote for this purpose.
>

Python version and OS please.  And is the Python 32bit or 64bit?  How 
much RAM does the computer have, and how big are the swapfiles ?

"Fairly big" is fairly vague.  To some people, a list with 100k members 
is huge, but not to a modern computer.

How have you checked whether it's running out of memory?  Have you run 
'top' on it?  Or is that just a guess?

I haven't used numpy, scipy, nor matplotlib, and it's been a long time 
since I did correlations.  But are you sure you're not just implementing 
an O(n**3) algorithm or something, and it's just extremely slow?


> from mpl_toolkits.basemap import Basemap as bm, shiftgrid, cm
> import numpy as np
> import matplotlib.pyplot as plt
> from netCDF4 import Dataset
> from math import pow, sqrt
> import sys
> from scipy.stats import t

  <snip>

-- 
DaveA

[toc] | [prev] | [next] | [standalone]


#39152

FromSudheer Joseph <sjo.india@gmail.com>
Date2013-02-18 17:13 -0800
Message-ID<b2deb7e8-b229-4198-ab17-5f87a606ad8c@googlegroups.com>
In reply to#39143
> Python version and OS please.  And is the Python 32bit or 64bit?  How 
> 
> much RAM does the computer have, and how big are the swapfiles ?
> 
Python 2.7.3
ubuntu 12.04 64 bit
4GB RAM
> 
> "Fairly big" is fairly vague.  To some people, a list with 100k members 
> 
> is huge, but not to a modern computer.
I have a data loaded to memory from netcdf file which is 2091*140*180 grid points (2091 time, 140 latitude 180 longitude) apart from this I define a 2 3d arrays r3d and lags3d to store the output for writing out to netcdf file after completion. 
> 
> 
> How have you checked whether it's running out of memory?  Have you run 
> 
> 'top' on it?  Or is that just a guess?

I have not done this but the speed (assessed from the listing of grid i and j) get stopped after j=6 ie after running 6 longitude grids)
>
Will check the top as you suggested

Here is the result of top it used about 3gB memory

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 3069 sjo       20   0 3636m 3.0g 2504 D    3 78.7   3:07.44 python  
> 
> I haven't used numpy, scipy, nor matplotlib, and it's been a long time 
> 
> since I did correlations.  But are you sure you're not just implementing 
> 
> an O(n**3) algorithm or something, and it's just extremely slow?
> 
Correlation do not involve such computation normally, I am not sure if internally python does some thing like that.
with best regards,
Sudheer
> 
> 
> 
> > from mpl_toolkits.basemap import Basemap as bm, shiftgrid, cm
> 
> > import numpy as np
> 
> > import matplotlib.pyplot as plt
> 
> > from netCDF4 import Dataset
> 
> > from math import pow, sqrt
> 
> > import sys
> 
> > from scipy.stats import t
> 
> 
> 
>   <snip>
> 
> 
> 
> -- 
> 
> DaveA

[toc] | [prev] | [next] | [standalone]


#39153

FromSudheer Joseph <sjo.india@gmail.com>
Date2013-02-18 17:13 -0800
Message-ID<mailman.1988.1361236422.2939.python-list@python.org>
In reply to#39143
> Python version and OS please.  And is the Python 32bit or 64bit?  How 
> 
> much RAM does the computer have, and how big are the swapfiles ?
> 
Python 2.7.3
ubuntu 12.04 64 bit
4GB RAM
> 
> "Fairly big" is fairly vague.  To some people, a list with 100k members 
> 
> is huge, but not to a modern computer.
I have a data loaded to memory from netcdf file which is 2091*140*180 grid points (2091 time, 140 latitude 180 longitude) apart from this I define a 2 3d arrays r3d and lags3d to store the output for writing out to netcdf file after completion. 
> 
> 
> How have you checked whether it's running out of memory?  Have you run 
> 
> 'top' on it?  Or is that just a guess?

I have not done this but the speed (assessed from the listing of grid i and j) get stopped after j=6 ie after running 6 longitude grids)
>
Will check the top as you suggested

Here is the result of top it used about 3gB memory

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 3069 sjo       20   0 3636m 3.0g 2504 D    3 78.7   3:07.44 python  
> 
> I haven't used numpy, scipy, nor matplotlib, and it's been a long time 
> 
> since I did correlations.  But are you sure you're not just implementing 
> 
> an O(n**3) algorithm or something, and it's just extremely slow?
> 
Correlation do not involve such computation normally, I am not sure if internally python does some thing like that.
with best regards,
Sudheer
> 
> 
> 
> > from mpl_toolkits.basemap import Basemap as bm, shiftgrid, cm
> 
> > import numpy as np
> 
> > import matplotlib.pyplot as plt
> 
> > from netCDF4 import Dataset
> 
> > from math import pow, sqrt
> 
> > import sys
> 
> > from scipy.stats import t
> 
> 
> 
>   <snip>
> 
> 
> 
> -- 
> 
> DaveA

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web