Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: srinivas devaki Newsgroups: comp.lang.python Subject: Re: looping and searching in numpy array Date: Mon, 14 Mar 2016 10:19:44 +0530 Lines: 70 Message-ID: References: <77bd470b-cc05-4117-9ed1-6309d7a5633a@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de xE1k4GNshdJCvlIX8hHBOwCXHbj8S0GA/3bFSPbgmRxw== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.012 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'else:': 0.03; 'below)': 0.07; 'cc:addr:python-list': 0.09; 'dict': 0.09; 'anyway': 0.11; 'size,': 0.13; '+91': 0.15; '-1:': 0.16; '2016': 0.16; 'numpy': 0.16; 'optimised': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'subject:array': 0.16; 'subject:looping': 0.16; 'wrote:': 0.16; '<': 0.18; 'element': 0.18; 'email addr:gmail.com>': 0.18; 'all,': 0.20; 'student': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'algorithm': 0.20; 'arrays': 0.22; 'junior': 0.22; '8bit%:5': 0.23; 'absolute': 0.23; 'bigger': 0.23; 'header:In-Reply-To:1': 0.24; 'all.': 0.24; 'script': 0.25; "doesn't": 0.26; 'example': 0.26; 'message-id:@mail.gmail.com': 0.27; 'search.': 0.29; 'array': 0.29; 'url:mailman': 0.30; 'code': 0.30; 'related': 0.32; 'help,': 0.32; 'run': 0.33; 'problem': 0.33; 'url:python': 0.33; 'values.': 0.33; 'url:listinfo': 0.34; 'received:google.com': 0.35; 'but': 0.36; 'there': 0.36; 'url:org': 0.36; 'lines': 0.36; 'received:209.85': 0.36; 'cases': 0.36; 'indian': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'two': 0.37; 'skip:& 10': 0.37; 'thanks': 0.37; 'received:209.85.213': 0.37; 'doing': 0.38; 'difference': 0.38; 'received:209': 0.38; 'minimum': 0.38; 'means': 0.39; 'data': 0.39; 'skip:e 20': 0.39; 'url:mail': 0.40; 'your': 0.60; 'maximum': 0.61; 'school': 0.62; 'skip:n 10': 0.62; 'above,': 0.63; 'necessarily': 0.63; 'between': 0.65; 'mar': 0.65; 'better.': 0.66; 'skip:\xc2 10': 0.67; 'dear': 0.67; '8bit%:100': 0.70; '(3rd': 0.84; '5:15': 0.84; 'complexity': 0.84; 'ph:': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=4rKS82e0ChjhttRw38IT9Z58Fb++2TbR7TuQ/AEkKuk=; b=Dfyfg/nvvscU3ULv8XieqAxgGAHPXp5XR0hK++XNxdJIsH+KizfpHT/gEqq0NF+E5l zZBStkGWdko2XOAuPsEcJQJUK1MMX0N5OiLGB2cgvQd7jB9rWBs05NSTTjWRioouLEM4 xl5q5PDRq6qd8qWVnGGMLJ1UoGtq0f3MYuf7kiuEptjZN3Tr6yEGOiUI5+Wt12QynmOn ZtgzHrGfe2Dy0Fng928yZbYFKR1onn/IJxQIsLxFoSsICR74PqLhOO6hefly3DFi/ixA Hz5qQWbwTy9drIel0l6Bz4Hhtsz50+8pJ083Byzgc0o9ttNPY8gt4Hy4/s5PVR8gIwnA rwZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=4rKS82e0ChjhttRw38IT9Z58Fb++2TbR7TuQ/AEkKuk=; b=gSFt9mEAieKxRT+uVRn+Fz6JkAZ4GzAQAsTylrRCwSKb41HoC/AWJGcDS2zzWoCb49 EKaA7FtEPjKyH+P0pO3/d5LpHw7dvd3cNwnfCDPHiKLDDpPnsa7tpSrahm6BcE5PWkrY MQJvRNBYCD+cGNMqHRgUnpUgxDHkKMvR5+1XmGY7EHd23qSKr+IktKYn+bjgU5/R/xrk 5v19djC98WA3191H9Z520Fd3B4pfK1lgbr9URCFM5mJhzQGcWFsXhb03pfbikAYKXntT 4O1fbWNXJfcGchDAiZWjdEaxP9ZMFnsAsXGg+Ff11TOsy+u8cVISHu4pbAxyyqfxGbS8 w+zw== X-Gm-Message-State: AD7BkJKCHLMNcGpbJVN+EHV24YgFmMu5zbushD+iahn0Gop92N9ye/TZXAp59G5vk9fUXOCl41YITWvhTjMfAQ== X-Received: by 10.50.178.180 with SMTP id cz20mr14740039igc.44.1457930984751; Sun, 13 Mar 2016 21:49:44 -0700 (PDT) In-Reply-To: X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:104799 problem is infact not related to numpy at all. the complexity of your algorithm is O(len(npArray1) * len(npArray2)) which means the number of computations that you are doing is in the range of 10**10, if the absolute difference between the maximum element and minimum element is less than 10**6, you can improve your code by pre-computing the first occurrence of a number by using an array of size of that difference(afore mentioned). if your npArray2 doesn't have such a pattern, you have to precompute it by using a dict (I don't know if numpy has such data structure) an optimised pseudo code would look like mmdiff = max(npArray2) - min(npArray2) if mmdiff < 10**6: precom = np.array([-1] * mmdiff) offset = min(npArray2) for i, x in enumerate(npArray2): precom[x - offset] = i for id in npArray1: if 0 <= id - offset < mmdiff and precom[id - offset] != -1: new_id = precom[id] # your code else: precom = {} for i, x in enumerate(npArray1): if x not in precom: precom[x] = i for id in npArray1: if id in precom: new_id = precom[id] # your code you can just use the else case which will work for all cases but if your npArray2 has such a pattern then the above code will perform better. Regards Srinivas Devaki Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad) Computer Science and Engineering Department ph: +91 9491 383 249 telegram_id: @eightnoteight On Mar 10, 2016 5:15 PM, "Heli" wrote: Dear all, I need to loop over a numpy array and then do the following search. The following is taking almost 60(s) for an array (npArray1 and npArray2 in the example below) with around 300K values. for id in np.nditer(npArray1): newId=(np.where(npArray2==id))[0][0] Is there anyway I can make the above faster? I need to run the script above on much bigger arrays (50M). Please note that my two numpy arrays in the lines above, npArray1 and npArray2 are not necessarily the same size, but they are both 1d. Thanks a lot for your help, -- https://mail.python.org/mailman/listinfo/python-list