Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #39104 > unrolled thread
| Started by | Sudheer Joseph <sjo.india@gmail.com> |
|---|---|
| First post | 2013-02-18 07:29 -0800 |
| Last post | 2013-02-18 17:13 -0800 |
| Articles | 4 — 2 participants |
Back to article view | Back to comp.lang.python
memory management Sudheer Joseph <sjo.india@gmail.com> - 2013-02-18 07:29 -0800
Re: memory management Dave Angel <davea@davea.name> - 2013-02-18 19:10 -0500
Re: memory management Sudheer Joseph <sjo.india@gmail.com> - 2013-02-18 17:13 -0800
Re: memory management Sudheer Joseph <sjo.india@gmail.com> - 2013-02-18 17:13 -0800
| From | Sudheer Joseph <sjo.india@gmail.com> |
|---|---|
| Date | 2013-02-18 07:29 -0800 |
| Subject | memory management |
| Message-ID | <acd5afba-fa4f-4736-b941-778c646e20bf@googlegroups.com> |
HI,
I have been trying to compute cross correlation between a time series at a location f(1) and the timeseries of spatial data f(XYT) and saving the resulting correlation coefficients and lags in a 3 dimensional array which is of fairly big size. Though the code I made for this purpose works up to few iterations then it hangs due to apparent memory crunch. Can anybody suggest a better way to handle this situation so that the computation and data storing can be done with out hangups. Finally I intend to save the data as netcdf file which is not implemented as of now. Below is the piece of code I wrote for this purpose.
from mpl_toolkits.basemap import Basemap as bm, shiftgrid, cm
import numpy as np
import matplotlib.pyplot as plt
from netCDF4 import Dataset
from math import pow, sqrt
import sys
from scipy.stats import t
indep=120
nlags=365
ncin = Dataset('qu_ru.nc', 'r')
lons = ncin.variables['LON421_600'][:]
lats = ncin.variables['LAT81_220'][:]
dep = ncin.variables['DEPTH1_29'][:]
adep=(dep==indep).nonzero()
didx=int(adep[0])
qu = ncin.variables['qu'][:,:,:]
#qv = ncin.variables['QV'][0,:,:]
ru = ncin.variables['ru'][:,didx,0,0]
ncin.close()
fig = plt.figure()
ax = fig.add_axes([0.1,0.1,0.8,0.8])
# use major and minor sphere radii from WGS84 ellipsoid.
m = bm(projection='cyl', llcrnrlon=30, llcrnrlat=-40,urcrnrlon=120, urcrnrlat=30)
# transform to nx x ny regularly spaced 5km native projection grid
nx = int((m.xmax-m.xmin))+1; ny = int((m.ymax-m.ymin)+1)
q=ru[1:2190]
qmean=np.mean(q)
qstd=np.std(q)
qnorm=(q-qmean)/qstd
lags3d=np.arange(731*140*180).reshape(731,140,180)
r3d=np.arange(731*140*180).reshape(731,140,180)
for i in np.arange(len(lons)):
for j in np.arange(len(lats)):
print i,j
p=qu[1:2190,j,i].squeeze()
p.shape
pmean=np.mean(p)
pstd=np.std(p)
pnorm=(p-pmean)/pstd
n=len(p)
# fg=plt.figure()
c=plt.xcorr(p,q,usevlines=True,maxlags=nlags,normed=True,lw=2)
acp=plt.acorr(p,usevlines=True,maxlags=nlags,normed=True,lw=2)
acq=plt.acorr(q,usevlines=True,maxlags=nlags,normed=True,lw=2)
acp[1][nlags]=0
acq[1][nlags]=0
lags=c[0]
r=c[1]
lags3d[:,j,i]=lags
r3d[:,j,i]=r
[toc] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-02-18 19:10 -0500 |
| Message-ID | <mailman.1983.1361232670.2939.python-list@python.org> |
| In reply to | #39104 |
On 02/18/2013 10:29 AM, Sudheer Joseph wrote: > HI, > I have been trying to compute cross correlation between a time series at a location f(1) and the timeseries of spatial data f(XYT) and saving the resulting correlation coefficients and lags in a 3 dimensional array which is of fairly big size. Though the code I made for this purpose works up to few iterations then it hangs due to apparent memory crunch. Can anybody suggest a better way to handle this situation so that the computation and data storing can be done with out hangups. Finally I intend to save the data as netcdf file which is not implemented as of now. Below is the piece of code I wrote for this purpose. > Python version and OS please. And is the Python 32bit or 64bit? How much RAM does the computer have, and how big are the swapfiles ? "Fairly big" is fairly vague. To some people, a list with 100k members is huge, but not to a modern computer. How have you checked whether it's running out of memory? Have you run 'top' on it? Or is that just a guess? I haven't used numpy, scipy, nor matplotlib, and it's been a long time since I did correlations. But are you sure you're not just implementing an O(n**3) algorithm or something, and it's just extremely slow? > from mpl_toolkits.basemap import Basemap as bm, shiftgrid, cm > import numpy as np > import matplotlib.pyplot as plt > from netCDF4 import Dataset > from math import pow, sqrt > import sys > from scipy.stats import t <snip> -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Sudheer Joseph <sjo.india@gmail.com> |
|---|---|
| Date | 2013-02-18 17:13 -0800 |
| Message-ID | <b2deb7e8-b229-4198-ab17-5f87a606ad8c@googlegroups.com> |
| In reply to | #39143 |
> Python version and OS please. And is the Python 32bit or 64bit? How > > much RAM does the computer have, and how big are the swapfiles ? > Python 2.7.3 ubuntu 12.04 64 bit 4GB RAM > > "Fairly big" is fairly vague. To some people, a list with 100k members > > is huge, but not to a modern computer. I have a data loaded to memory from netcdf file which is 2091*140*180 grid points (2091 time, 140 latitude 180 longitude) apart from this I define a 2 3d arrays r3d and lags3d to store the output for writing out to netcdf file after completion. > > > How have you checked whether it's running out of memory? Have you run > > 'top' on it? Or is that just a guess? I have not done this but the speed (assessed from the listing of grid i and j) get stopped after j=6 ie after running 6 longitude grids) > Will check the top as you suggested Here is the result of top it used about 3gB memory PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3069 sjo 20 0 3636m 3.0g 2504 D 3 78.7 3:07.44 python > > I haven't used numpy, scipy, nor matplotlib, and it's been a long time > > since I did correlations. But are you sure you're not just implementing > > an O(n**3) algorithm or something, and it's just extremely slow? > Correlation do not involve such computation normally, I am not sure if internally python does some thing like that. with best regards, Sudheer > > > > > from mpl_toolkits.basemap import Basemap as bm, shiftgrid, cm > > > import numpy as np > > > import matplotlib.pyplot as plt > > > from netCDF4 import Dataset > > > from math import pow, sqrt > > > import sys > > > from scipy.stats import t > > > > <snip> > > > > -- > > DaveA
[toc] | [prev] | [next] | [standalone]
| From | Sudheer Joseph <sjo.india@gmail.com> |
|---|---|
| Date | 2013-02-18 17:13 -0800 |
| Message-ID | <mailman.1988.1361236422.2939.python-list@python.org> |
| In reply to | #39143 |
> Python version and OS please. And is the Python 32bit or 64bit? How > > much RAM does the computer have, and how big are the swapfiles ? > Python 2.7.3 ubuntu 12.04 64 bit 4GB RAM > > "Fairly big" is fairly vague. To some people, a list with 100k members > > is huge, but not to a modern computer. I have a data loaded to memory from netcdf file which is 2091*140*180 grid points (2091 time, 140 latitude 180 longitude) apart from this I define a 2 3d arrays r3d and lags3d to store the output for writing out to netcdf file after completion. > > > How have you checked whether it's running out of memory? Have you run > > 'top' on it? Or is that just a guess? I have not done this but the speed (assessed from the listing of grid i and j) get stopped after j=6 ie after running 6 longitude grids) > Will check the top as you suggested Here is the result of top it used about 3gB memory PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3069 sjo 20 0 3636m 3.0g 2504 D 3 78.7 3:07.44 python > > I haven't used numpy, scipy, nor matplotlib, and it's been a long time > > since I did correlations. But are you sure you're not just implementing > > an O(n**3) algorithm or something, and it's just extremely slow? > Correlation do not involve such computation normally, I am not sure if internally python does some thing like that. with best regards, Sudheer > > > > > from mpl_toolkits.basemap import Basemap as bm, shiftgrid, cm > > > import numpy as np > > > import matplotlib.pyplot as plt > > > from netCDF4 import Dataset > > > from math import pow, sqrt > > > import sys > > > from scipy.stats import t > > > > <snip> > > > > -- > > DaveA
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web