Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #94435 > unrolled thread
| Started by | Heli Nix <hemla21@gmail.com> |
|---|---|
| First post | 2015-07-23 02:21 -0700 |
| Last post | 2015-07-29 07:23 -0700 |
| Articles | 5 — 4 participants |
Back to article view | Back to comp.lang.python
Optimizing if statement check over a numpy value Heli Nix <hemla21@gmail.com> - 2015-07-23 02:21 -0700
Re: Optimizing if statement check over a numpy value MRAB <python@mrabarnett.plus.com> - 2015-07-23 10:55 +0100
Re: Optimizing if statement check over a numpy value Laura Creighton <lac@openend.se> - 2015-07-23 12:13 +0200
Re: Optimizing if statement check over a numpy value Jeremy Sanders <jeremy@jeremysanders.net> - 2015-07-23 13:42 +0200
Re: Optimizing if statement check over a numpy value Heli Nix <hemla21@gmail.com> - 2015-07-29 07:23 -0700
| From | Heli Nix <hemla21@gmail.com> |
|---|---|
| Date | 2015-07-23 02:21 -0700 |
| Subject | Optimizing if statement check over a numpy value |
| Message-ID | <65c45685-dee1-41f8-a16a-7a062f4e7b02@googlegroups.com> |
Dear all,
I have the following piece of code. I am reading a numpy dataset from an hdf5 file and I am changing values to a new value if they equal 1.
There is 90 percent chance that (if id not in myList:) is true and in 10 percent of time is false.
with h5py.File(inputFile, 'r') as f1:
with h5py.File(inputFile2, 'w') as f2:
ds=f1["MyDataset"].value
myList=[list of Indices that must not be given the new_value]
new_value=1e-20
for index,val in np.ndenumerate(ds):
if val==1.0 :
id=index[0]+1
if id not in myList:
ds[index]=new_value
dset1 = f2.create_dataset("Cell Ids", data=cellID_ds)
dset2 = f2.create_dataset("Porosity", data=poros_ds)
My numpy array has 16M data and it takes 9 hrs to run. If I comment my if statement (if id not in myList:) it only takes 5 minutes to run.
Is there any way that I can optimize this if statement.
Thank you very much in Advance for your help.
Best Regards,
[toc] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2015-07-23 10:55 +0100 |
| Subject | Re: Optimizing if statement check over a numpy value |
| Message-ID | <mailman.906.1437645368.3674.python-list@python.org> |
| In reply to | #94435 |
On 2015-07-23 10:21, Heli Nix wrote:
> Dear all,
>
> I have the following piece of code. I am reading a numpy dataset from an hdf5 file and I am changing values to a new value if they equal 1.
>
> There is 90 percent chance that (if id not in myList:) is true and in 10 percent of time is false.
>
> with h5py.File(inputFile, 'r') as f1:
> with h5py.File(inputFile2, 'w') as f2:
> ds=f1["MyDataset"].value
> myList=[list of Indices that must not be given the new_value]
>
> new_value=1e-20
> for index,val in np.ndenumerate(ds):
> if val==1.0 :
> id=index[0]+1
> if id not in myList:
> ds[index]=new_value
>
> dset1 = f2.create_dataset("Cell Ids", data=cellID_ds)
> dset2 = f2.create_dataset("Porosity", data=poros_ds)
>
> My numpy array has 16M data and it takes 9 hrs to run. If I comment my if statement (if id not in myList:) it only takes 5 minutes to run.
>
> Is there any way that I can optimize this if statement.
>
> Thank you very much in Advance for your help.
>
> Best Regards,
>
When checking for presence in a list, it has to check every entry. The
time taken is proportional to the length of the list.
The time taken to check for presence in a set, however, is a constant.
Replace the list myList with a set.
[toc] | [prev] | [next] | [standalone]
| From | Laura Creighton <lac@openend.se> |
|---|---|
| Date | 2015-07-23 12:13 +0200 |
| Subject | Re: Optimizing if statement check over a numpy value |
| Message-ID | <mailman.908.1437646424.3674.python-list@python.org> |
| In reply to | #94435 |
Take a look at the sorted collection recipe: http://code.activestate.com/recipes/577197-sortedcollection/ You want myList to be a sorted List. You want lookups to be fast. See if that improves things enough for you. It may be possible to have better speedups if instead of myList you write myTree and store the values in a tree, depending on what the values of id are -- it could be completely useless for you, as well. Laura
[toc] | [prev] | [next] | [standalone]
| From | Jeremy Sanders <jeremy@jeremysanders.net> |
|---|---|
| Date | 2015-07-23 13:42 +0200 |
| Message-ID | <mailman.912.1437651747.3674.python-list@python.org> |
| In reply to | #94435 |
Heli Nix wrote: > Is there any way that I can optimize this if statement. Array processing is much faster in numpy. Maybe this is close to what you want import numpy as N # input data vals = N.array([42, 1, 5, 3.14, 53, 1, 12, 11, 1]) # list of items to exclude exclude = [1] # convert to a boolean array exclbool = N.zeros(vals.shape, dtype=bool) exclbool[exclude] = True # do replacement ones = vals==1.0 # Note: ~ is numpy.logical_not vals[ones & (~exclbool)] = 1e-20 I think you'll have to convert your HDF array into a numpy array first, using numpy.array(). Jeremy
[toc] | [prev] | [next] | [standalone]
| From | Heli Nix <hemla21@gmail.com> |
|---|---|
| Date | 2015-07-29 07:23 -0700 |
| Message-ID | <d0a2be39-9cb6-4dec-92a0-e13e47642b6b@googlegroups.com> |
| In reply to | #94444 |
On Thursday, July 23, 2015 at 1:43:00 PM UTC+2, Jeremy Sanders wrote: > Heli Nix wrote: > > > Is there any way that I can optimize this if statement. > > Array processing is much faster in numpy. Maybe this is close to what you > want > > import numpy as N > # input data > vals = N.array([42, 1, 5, 3.14, 53, 1, 12, 11, 1]) > # list of items to exclude > exclude = [1] > # convert to a boolean array > exclbool = N.zeros(vals.shape, dtype=bool) > exclbool[exclude] = True > # do replacement > ones = vals==1.0 > # Note: ~ is numpy.logical_not > vals[ones & (~exclbool)] = 1e-20 > > I think you'll have to convert your HDF array into a numpy array first, > using numpy.array(). > > Jeremy Dear all, I tried the sorted python list, but this did not really help the runtime. I haven“t had time to check the sorted collections. I solved my runtime problem by using the script from Jeremy up here. It was a life saviour and it is amazing how powerful numpy is. Thanks a lot Jeremy for this. By the way, I did not have to do any array conversion. The array read from hdf5 file using h5py is already a numpy array. The runtime over an array of around 16M reduced from around 12 hours (previous script) to 3 seconds using numpy on the same machine. Thanks alot for your help,
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web