Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Peter Otten <__peter__@web.de> Newsgroups: comp.lang.python Subject: Re: looping and searching in numpy array Date: Thu, 10 Mar 2016 14:02:25 +0100 Organization: None Lines: 36 Message-ID: References: <77bd470b-cc05-4117-9ed1-6309d7a5633a@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit X-Trace: news.uni-berlin.de UYNwspnX4Cf/JpoXYpbt9wlLrW5QA5pCLLed6lxRUWVQ== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'python)': 0.05; 'below)': 0.07; 'iterate': 0.09; 'lookup': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'python': 0.10; 'anyway': 0.11; 'index': 0.13; 'size,': 0.13; "(i'm": 0.16; 'numpy': 0.16; 'occurence': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'received:t-ipconnect.de': 0.16; 'subject:array': 0.16; 'subject:looping': 0.16; 'wrote:': 0.16; '(in': 0.18; 'all,': 0.20; 'arrays': 0.22; 'bigger': 0.23; 'script': 0.25; 'header :User-Agent:1': 0.26; 'example': 0.26; 'header:X-Complaints-To:1': 0.26; 'search.': 0.29; 'array': 0.29; 'run': 0.33; 'values.': 0.33; 'but': 0.36; 'instead': 0.36; 'there': 0.36; 'lines': 0.36; 'basic': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'two': 0.37; 'received:org': 0.37; 'mean': 0.38; 'skip:e 20': 0.39; 'to:addr:python.org': 0.40; 'received:de': 0.40; 'above,': 0.63; 'necessarily': 0.63; 'times': 0.63; 'dear': 0.67 X-Injected-Via-Gmane: http://gmane.org/ X-Gmane-NNTP-Posting-Host: p57bd8240.dip0.t-ipconnect.de User-Agent: KNode/4.13.3 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:104511 Heli wrote: > Dear all, > > I need to loop over a numpy array and then do the following search. The > following is taking almost 60(s) for an array (npArray1 and npArray2 in > the example below) with around 300K values. > > > for id in np.nditer(npArray1): > > newId=(np.where(npArray2==id))[0][0] > > > Is there anyway I can make the above faster? I need to run the script > above on much bigger arrays (50M). Please note that my two numpy arrays in > the lines above, npArray1 and npArray2 are not necessarily the same size, > but they are both 1d. You mean you are looking for the index of the first occurence in npArray2 for every value of npArray1? I don't know how to do this in numpy (I'm not an expert), but even basic Python might be acceptable: lookup = {} for i, v in enumerate(npArray2): if v not in lookup: lookup[v] = i for v in npArray1: print(lookup.get(v, "")) That way you iterate once (in Python) instead of 2*len(npArray1) times (in C) over npArray2.