Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #29567

Re: looping in array vs looping in a dic

Date 2012-09-20 20:09 +0100
From MRAB <python@mrabarnett.plus.com>
Subject Re: looping in array vs looping in a dic
References <007b2d71-3355-4085-b84f-204834b2c8d0@googlegroups.com>
Newsgroups comp.lang.python
Message-ID <mailman.968.1348168201.27098.python-list@python.org> (permalink)

Show all headers | View raw


On 2012-09-20 19:31, giuseppe.amatulli@gmail.com wrote:
> Hi,
> I have this script in python that i need to apply for very large arrays (arrays coming from satellite images).
> The script works grate but i would like to speed up the process.
> The larger computational time is in the for loop process.
> Is there is a way to improve that part?
> Should be better to use dic() instead of np.ndarray for saving the results?
> and if yes how i can make the sum in dic()(like in the correspondent matrix[row_c,1] = matrix[row_c,1] + valuesRaster[row,col] )?
> If the dic() is the solution way is faster?
>
> Thanks
> Giuseppe
>
> import numpy  as  np
> import sys
> from time import clock, time
>
> # create the arrays
>
> start = time()
> valuesRaster = np.random.random_integers(0, 100, 100).reshape(10, 10)
> valuesCategory = np.random.random_integers(1, 10, 100).reshape(10, 10)
>
> elapsed = (time() - start)
> print(elapsed , "create the data")
>
> start = time()
>
> categories = np.unique(valuesCategory)
> matrix = np.c_[ categories , np.zeros(len(categories))]
>
> elapsed = (time() - start)
> print(elapsed , "create the matrix and append a colum zero ")
>
> rows = 10
> cols = 10
>
> start = time()
>
> for col in range(0,cols):
>      for row in range(0,rows):
>          for row_c in range(0,len(matrix)) :
>              if valuesCategory[row,col] == matrix[row_c,0] :
>                  matrix[row_c,1] = matrix[row_c,1] + valuesRaster[row,col]
>                  break
> elapsed = (time() - start)
> print(elapsed , "loop in the  data ")
>
> print (matrix)
>
If I understand the code correctly, 'matrix' contains the categories in
column 0 and the totals in column 1.

What you're doing is performing a linear search through the categories
and then adding to the corresponding total.

Linear searches are slow because on average you have to search through
half of the list. Using a dict would be much faster (although you
should of course measure it!).

Try something like this:

import numpy as np
from time import time

# Create the arrays.

start = time()

valuesRaster = np.random.random_integers(0, 100, 100).reshape(10, 10)
valuesCategory = np.random.random_integers(1, 10, 100).reshape(10, 10)

elapsed = time() - start
print(elapsed, "Create the data.")

start = time()

categories = np.unique(valuesCategory)
totals = dict.fromkeys(categories, 0)

elapsed = time() - start
print(elapsed, "Create the totals dict.")

rows = 100
cols = 10

start = time()

for col in range(cols):
     for row in range(rows):
         cat = valuesCategory[row, col]
         ras = valuesRaster[row, col]
         totals[cat] += ras

elapsed = time() - start
print(elapsed, "Loop in the data.")

print(totals)

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

looping in array vs looping in a dic giuseppe.amatulli@gmail.com - 2012-09-20 11:31 -0700
  Re: looping in array vs looping in a dic MRAB <python@mrabarnett.plus.com> - 2012-09-20 20:09 +0100
  Re: looping in array vs looping in a dic Ian Kelly <ian.g.kelly@gmail.com> - 2012-09-20 13:28 -0600
  Re: looping in array vs looping in a dic Ian Kelly <ian.g.kelly@gmail.com> - 2012-09-20 13:29 -0600
    Re: looping in array vs looping in a dic giuseppe.amatulli@gmail.com - 2012-09-20 16:35 -0700
      Re: looping in array vs looping in a dic MRAB <python@mrabarnett.plus.com> - 2012-09-21 00:58 +0100
    Re: looping in array vs looping in a dic giuseppe.amatulli@gmail.com - 2012-09-20 16:35 -0700

csiph-web