Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From: srinivas devaki <mr.eightnoteight@gmail.com>
Newsgroups: comp.lang.python
Subject: Re: looping and searching in numpy array
Date: Mon, 14 Mar 2016 10:19:44 +0530
Lines: 70
Message-ID: <mailman.85.1457930987.12893.python-list@python.org>
References: <77bd470b-cc05-4117-9ed1-6309d7a5633a@googlegroups.com> <CACs7g=DL8UExWQYJOSbFEQDvNvY3M46MRP==jFBisS4tytxFxg@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
In-Reply-To: <CACs7g=DL8UExWQYJOSbFEQDvNvY3M46MRP==jFBisS4tytxFxg@mail.gmail.com>
Precedence: list
Xref: csiph.com comp.lang.python:104799

problem is infact not related to numpy at all. the complexity of your
algorithm is O(len(npArray1) * len(npArray2))

which means the number of computations that you are doing is in the range
of 10**10,

if the absolute difference between the maximum element and minimum element
is less than 10**6, you can improve your code by pre-computing the first
occurrence of a number by using an array of size of that difference(afore
mentioned).

if your npArray2 doesn't have such a pattern, you have to precompute it by
using a dict (I don't know if numpy has such data structure)

an optimised pseudo code would look like

mmdiff = max(npArray2) - min(npArray2)
if mmdiff < 10**6:
    precom = np.array([-1] * mmdiff)
    offset = min(npArray2)
    for i, x in enumerate(npArray2):
        precom[x - offset] = i
    for id in npArray1:
        if 0 <= id - offset < mmdiff and precom[id - offset] != -1:
            new_id = precom[id]
            # your code
else:
    precom = {}
    for i, x in enumerate(npArray1):
        if x not in precom:
            precom[x] = i
    for id in npArray1:
        if id in precom:
            new_id = precom[id]
            # your code


you can just use the else case which will work for all cases but if your
npArray2 has such a pattern then the above code will perform better.

Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
On Mar 10, 2016 5:15 PM, "Heli" <hemla21@gmail.com> wrote:

Dear all,

I need to loop over a numpy array and then do the following search. The
following is taking almost 60(s) for an array (npArray1 and npArray2 in the
example below) with around 300K values.


for id in np.nditer(npArray1):

       newId=(np.where(npArray2==id))[0][0]


Is there anyway I can make the above faster? I need to run the script above
on much bigger arrays (50M). Please note that my two numpy arrays in the
lines above, npArray1 and npArray2  are not necessarily the same size, but
they are both 1d.


Thanks a lot for your help,

--
https://mail.python.org/mailman/listinfo/python-list