Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #97936 > unrolled thread

Nearest neighbours of points

Started byPoul Riis <priisdk@gmail.com>
First post2015-10-24 13:05 -0700
Last post2015-10-31 23:00 -0400
Articles 10 — 7 participants

Back to article view | Back to comp.lang.python


Contents

  Nearest neighbours of points Poul Riis <priisdk@gmail.com> - 2015-10-24 13:05 -0700
    Re: Nearest neighbours of points Christian Gollwitzer <auriocus@gmx.de> - 2015-10-24 22:45 +0200
    Re: Nearest neighbours of points Poul Riis <priisdk@gmail.com> - 2015-10-24 15:44 -0700
      Re: Nearest neighbours of points Christian Gollwitzer <auriocus@gmx.de> - 2015-10-25 09:55 +0100
        Re: Nearest neighbours of points Poul Riis <priisdk@gmail.com> - 2015-10-25 05:00 -0700
        Re: Nearest neighbours of points Peter Pearson <pkpearson@nowhere.invalid> - 2015-10-25 17:01 +0000
      Re: Nearest neighbours of points Bartc <bc@freeuk.com> - 2015-10-25 12:36 +0000
    Re: Nearest neighbours of points Fabien <fabien.maussion@gmail.com> - 2015-10-26 03:25 +0100
    Re: Nearest neighbours of points Tom P <werotizy@freent.dd> - 2015-10-31 22:28 +0100
      Re: Nearest neighbours of points Terry Reedy <tjreedy@udel.edu> - 2015-10-31 23:00 -0400

#97936 — Nearest neighbours of points

FromPoul Riis <priisdk@gmail.com>
Date2015-10-24 13:05 -0700
SubjectNearest neighbours of points
Message-ID<81947104-f6dd-48b3-ba7b-f03b0c5b6f39@googlegroups.com>
I have N points in 3D, organized in a list. I want to to point out the numbers of the two that have the smallest distance.
With scipy.spatial.distance.pdist I can make a list of all the distances, and I can point out the number of the minimum value of that list (see simple example below - the line with pts.append... should be indented three times). But I guess there is a standard (numpy?) routine which points out the numbers of the corresponding two points but I cannot find it. Can someone help? 

Poul Riis




import numpy as np
import scipy
from scipy.spatial.distance import pdist


pts=[]
for i in range(-1,2):
    for j in range(-1,2):
        for k in range(-1,2):                 pts.append((i+np.random.random()/10,j+np.random.random()/10,k+np.random.random()/10))
for i in range(0,len(pts)):
    print(pts[i])
distances=scipy.spatial.distance.pdist(pts)
n=np.argmin(distances)
for i in range(0,len(distances)):
    print(i,distances[i])
print('The minimum distance is: ',min(distances),' which has number ',n)

[toc] | [next] | [standalone]


#97937

FromChristian Gollwitzer <auriocus@gmx.de>
Date2015-10-24 22:45 +0200
Message-ID<n0gqhc$6l1$1@dont-email.me>
In reply to#97936
Am 24.10.15 um 22:05 schrieb Poul Riis:
> I have N points in 3D, organized in a list. I want to to point out the numbers of the two that have the smallest distance.
> With scipy.spatial.distance.pdist I can make a list of all the distances, and I can point out the number of the minimum value of that list (see simple example below - the line with pts.append... should be indented three times). But I guess there is a standard (numpy?) routine which points out the numbers of the corresponding two points but I cannot find it. Can someone help?

What makes you think that there should be a special function? I think 
the intent is quite clearly expressed using your code:

 > distances=scipy.spatial.distance.pdist(pts)
 > n=np.argmin(distances)

Are you hoping for a speed improvement? Your algorithm is O(n^2) in the 
number of points n. You can probably go down to O(n*log(n)) using a 
space partitioning tree like a kd-tree. I can't tell whether it's worth 
the effort, that depends on n. Also your code is invoking two compiled 
functions; I doubt it could be much faster, unless the algorithm is changed.

	Christian

[toc] | [prev] | [next] | [standalone]


#97938

FromPoul Riis <priisdk@gmail.com>
Date2015-10-24 15:44 -0700
Message-ID<2be81eec-e7dc-4c83-a578-f8c15eb4c1dc@googlegroups.com>
In reply to#97936
A program that does what I want can be found below.
However, I want to know if there is a faster way because I have many moving points and the two nearest neighbours must be identified a lot of times.

Poul Riis



import math
import numpy as np
import scipy
from scipy.spatial.distance import pdist

def calc_dist(p1,p2):
    return math.sqrt((p2[0] - p1[0]) ** 2 +
                     (p2[1] - p1[1]) ** 2 +
                     (p2[2] - p1[2]) ** 2)

pts=[]
for i in range(-1,2):
    for j in range(-1,2):
        for k in range(-1,2):
            pts.append((i+np.random.random()/10,j+np.random.random()/10,k+np.random.random()/10))
for i in range(0,len(pts)):
    print(i,pts[i])

#The minimum distance is easily found
distances=scipy.spatial.distance.pdist(pts)
distmin=min(distances)
n=np.argmin(distances)
for i in range(0,len(distances)):
    print(i,distances[i])
print('The minimum distance is: ',distmin,' which has number ',n)


#But that doesn't tell which two points it was...
n=-1
for i in range(0,27):
    for j in range(i+1,27):
        n=n+1
        dist=calc_dist(pts[i],pts[j])
        if n==0:
            distold=dist
        else:
            if dist <distold:
                distold=dist
                istore=i
                jstore=j
                neighbour=n
print('The two closest points are: ')
print(pts[istore])
print('and')
print(pts[jstore])
print('They have the numbers ',istore,' and ',jstore)
print('Their distance is ',calc_dist(pts[istore],pts[jstore]))
print('The index with the minimum value of the distance array is ',neighbour)

[toc] | [prev] | [next] | [standalone]


#97940

FromChristian Gollwitzer <auriocus@gmx.de>
Date2015-10-25 09:55 +0100
Message-ID<n0i5a0$qqs$1@dont-email.me>
In reply to#97938
Am 25.10.15 um 00:44 schrieb Poul Riis:
> A program that does what I want can be found below. However, I want
> to know if there is a faster way because I have many moving points
> and the two nearest neighbours must be identified a lot of times.
>

Ah, that changes the game. But I'm still puzzled - is this first scipy 
function fast enough? Because your second computation using teh nested 
for loops in my tests gave exactly the same index. I'm not sure it is 
guaranteed, but it seems that the pdist function uses the same ordering 
as your code. In which case you can simply calculate i and j from the 
index n.

For instance, if you have i and j, you can compute n by the formula:

print('The index computed: ', N*istore-istore-istore*(istore+1)/2+jstore-1)

Now you are left with determining i and j from n. You need to find the 
largest i such that the j from solving the above equation is not 
negative. If you are lazy, and it is fast enough, you can compute this 
value in a loop - it is O(n). Or you can try solvin it as a quadratic 
inequation for i.


	Christian

[toc] | [prev] | [next] | [standalone]


#97946

FromPoul Riis <priisdk@gmail.com>
Date2015-10-25 05:00 -0700
Message-ID<f6b73797-c758-47d6-a527-27120f7af70e@googlegroups.com>
In reply to#97940
Yes, I also have that formula - one equation with two unknowns. I guess there is some simple solution but I haven't yet managed to find it. But this is mathematics, not programming. Still, there might be some well-written fast algorithm out there...?


Poul Riis

[toc] | [prev] | [next] | [standalone]


#97950

FromPeter Pearson <pkpearson@nowhere.invalid>
Date2015-10-25 17:01 +0000
Message-ID<d94g7nFns35U1@mid.individual.net>
In reply to#97940
On Sun, 25 Oct 2015 09:55:30 +0100, Christian Gollwitzer wrote:
> Am 25.10.15 um 00:44 schrieb Poul Riis:
>> A program that does what I want can be found below. However, I want
>> to know if there is a faster way because I have many moving points
>> and the two nearest neighbours must be identified a lot of times.
>>
>
> Ah, that changes the game. But I'm still puzzled - is this first scipy 
> function fast enough? Because your second computation using teh nested 
> for loops in my tests gave exactly the same index. I'm not sure it is 
> guaranteed, but it seems that the pdist function uses the same ordering 
> as your code. In which case you can simply calculate i and j from the 
> index n.
>
> For instance, if you have i and j, you can compute n by the formula:
>
> print('The index computed: ', N*istore-istore-istore*(istore+1)/2+jstore-1)
>
> Now you are left with determining i and j from n. You need to find the 
> largest i such that the j from solving the above equation is not 
> negative. If you are lazy, and it is fast enough, you can compute this 
> value in a loop - it is O(n). Or you can try solvin it as a quadratic 
> inequation for i.

My algebra is a little rusty, but I think this boils down to

 i = floor(0.5*(2*N-3 - sqrt((2*N-3)**2 - 8*(n+1))))

Sorry I don't have time to test it right now.

-- 
To email me, substitute nowhere->runbox, invalid->com.

[toc] | [prev] | [next] | [standalone]


#97947

FromBartc <bc@freeuk.com>
Date2015-10-25 12:36 +0000
Message-ID<n0ii97$7fe$1@dont-email.me>
In reply to#97938
On 24/10/2015 23:44, Poul Riis wrote:

> def calc_dist(p1,p2):
>      return math.sqrt((p2[0] - p1[0]) ** 2 +
>                       (p2[1] - p1[1]) ** 2 +
>                       (p2[2] - p1[2]) ** 2)

>          dist=calc_dist(pts[i],pts[j])
>          if n==0:
>              distold=dist
>          else:
>              if dist <distold:

Just to compare two distances, you can probably work with the squares. 
You don't need the square root in calc_dist.

You will need it for the final report, but that's only one square root, 
not thousands or millions.

-- 
Bartc

[toc] | [prev] | [next] | [standalone]


#97953

FromFabien <fabien.maussion@gmail.com>
Date2015-10-26 03:25 +0100
Message-ID<n0k2u1$gqb$2@speranza.aioe.org>
In reply to#97936
On 10/24/2015 10:05 PM, Poul Riis wrote:
> With scipy.spatial.distance.pdist


I don't really understand your problem. With 
scipy.spatial.distance.pdist you'll have all you need:

 From the docs:

Returns:	
Returns a condensed distance matrix Y. For each i and j (where i<j<n), 
the metric dist(u=X[i], v=X[j]) is computed and stored in entry ij.

So you have the distances, the indexes of the points so you also have 
the value of these points...

[toc] | [prev] | [next] | [standalone]


#98024

FromTom P <werotizy@freent.dd>
Date2015-10-31 22:28 +0100
Message-ID<d9kq38Ft8k9U1@mid.individual.net>
In reply to#97936
On 10/24/2015 10:05 PM, Poul Riis wrote:
> I have N points in 3D, organized in a list. I want to to point out the numbers of the two that have the smallest distance.
> With scipy.spatial.distance.pdist I can make a list of all the distances, and I can point out the number of the minimum value of that list (see simple example below - the line with pts.append... should be indented three times). But I guess there is a standard (numpy?) routine which points out the numbers of the corresponding two points but I cannot find it. Can someone help?
>
> Poul Riis
>
>
>
>
> import numpy as np
> import scipy
> from scipy.spatial.distance import pdist
>
>
> pts=[]
> for i in range(-1,2):
>      for j in range(-1,2):
>          for k in range(-1,2):                 pts.append((i+np.random.random()/10,j+np.random.random()/10,k+np.random.random()/10))
> for i in range(0,len(pts)):
>      print(pts[i])
> distances=scipy.spatial.distance.pdist(pts)
> n=np.argmin(distances)
> for i in range(0,len(distances)):
>      print(i,distances[i])
> print('The minimum distance is: ',min(distances),' which has number ',n)
>

I won't claim to have the definitive answer but - is this a clustering 
problem? Did you look at any machine learning packages?

[toc] | [prev] | [next] | [standalone]


#98028

FromTerry Reedy <tjreedy@udel.edu>
Date2015-10-31 23:00 -0400
Message-ID<mailman.6.1446346873.4463.python-list@python.org>
In reply to#98024
On 10/31/2015 5:28 PM, Tom P wrote:
> On 10/24/2015 10:05 PM, Poul Riis wrote:
>> I have N points in 3D, organized in a list. I want to to point out the
>> numbers of the two that have the smallest distance.
>> With scipy.spatial.distance.pdist I can make a list of all the
>> distances, and I can point out the number of the minimum value of that
>> list (see simple example below

distances is conceptually a 2-d matrix, but since it would be 
symmetrical, I gather that it is stored as 1-d represention  You should 
be able to find the formula that converts the linear index to the 
indexes of the corresponding symmetric matrix.  It depends on whether 
distances represent the upper or lower triangle of the symmetric matrix 
and whether the triangular matrix is stored by rows or columns.

-- 
Terry Jan Reedy

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web