Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #97936 > unrolled thread
| Started by | Poul Riis <priisdk@gmail.com> |
|---|---|
| First post | 2015-10-24 13:05 -0700 |
| Last post | 2015-10-31 23:00 -0400 |
| Articles | 10 — 7 participants |
Back to article view | Back to comp.lang.python
Nearest neighbours of points Poul Riis <priisdk@gmail.com> - 2015-10-24 13:05 -0700
Re: Nearest neighbours of points Christian Gollwitzer <auriocus@gmx.de> - 2015-10-24 22:45 +0200
Re: Nearest neighbours of points Poul Riis <priisdk@gmail.com> - 2015-10-24 15:44 -0700
Re: Nearest neighbours of points Christian Gollwitzer <auriocus@gmx.de> - 2015-10-25 09:55 +0100
Re: Nearest neighbours of points Poul Riis <priisdk@gmail.com> - 2015-10-25 05:00 -0700
Re: Nearest neighbours of points Peter Pearson <pkpearson@nowhere.invalid> - 2015-10-25 17:01 +0000
Re: Nearest neighbours of points Bartc <bc@freeuk.com> - 2015-10-25 12:36 +0000
Re: Nearest neighbours of points Fabien <fabien.maussion@gmail.com> - 2015-10-26 03:25 +0100
Re: Nearest neighbours of points Tom P <werotizy@freent.dd> - 2015-10-31 22:28 +0100
Re: Nearest neighbours of points Terry Reedy <tjreedy@udel.edu> - 2015-10-31 23:00 -0400
| From | Poul Riis <priisdk@gmail.com> |
|---|---|
| Date | 2015-10-24 13:05 -0700 |
| Subject | Nearest neighbours of points |
| Message-ID | <81947104-f6dd-48b3-ba7b-f03b0c5b6f39@googlegroups.com> |
I have N points in 3D, organized in a list. I want to to point out the numbers of the two that have the smallest distance.
With scipy.spatial.distance.pdist I can make a list of all the distances, and I can point out the number of the minimum value of that list (see simple example below - the line with pts.append... should be indented three times). But I guess there is a standard (numpy?) routine which points out the numbers of the corresponding two points but I cannot find it. Can someone help?
Poul Riis
import numpy as np
import scipy
from scipy.spatial.distance import pdist
pts=[]
for i in range(-1,2):
for j in range(-1,2):
for k in range(-1,2): pts.append((i+np.random.random()/10,j+np.random.random()/10,k+np.random.random()/10))
for i in range(0,len(pts)):
print(pts[i])
distances=scipy.spatial.distance.pdist(pts)
n=np.argmin(distances)
for i in range(0,len(distances)):
print(i,distances[i])
print('The minimum distance is: ',min(distances),' which has number ',n)
[toc] | [next] | [standalone]
| From | Christian Gollwitzer <auriocus@gmx.de> |
|---|---|
| Date | 2015-10-24 22:45 +0200 |
| Message-ID | <n0gqhc$6l1$1@dont-email.me> |
| In reply to | #97936 |
Am 24.10.15 um 22:05 schrieb Poul Riis: > I have N points in 3D, organized in a list. I want to to point out the numbers of the two that have the smallest distance. > With scipy.spatial.distance.pdist I can make a list of all the distances, and I can point out the number of the minimum value of that list (see simple example below - the line with pts.append... should be indented three times). But I guess there is a standard (numpy?) routine which points out the numbers of the corresponding two points but I cannot find it. Can someone help? What makes you think that there should be a special function? I think the intent is quite clearly expressed using your code: > distances=scipy.spatial.distance.pdist(pts) > n=np.argmin(distances) Are you hoping for a speed improvement? Your algorithm is O(n^2) in the number of points n. You can probably go down to O(n*log(n)) using a space partitioning tree like a kd-tree. I can't tell whether it's worth the effort, that depends on n. Also your code is invoking two compiled functions; I doubt it could be much faster, unless the algorithm is changed. Christian
[toc] | [prev] | [next] | [standalone]
| From | Poul Riis <priisdk@gmail.com> |
|---|---|
| Date | 2015-10-24 15:44 -0700 |
| Message-ID | <2be81eec-e7dc-4c83-a578-f8c15eb4c1dc@googlegroups.com> |
| In reply to | #97936 |
A program that does what I want can be found below.
However, I want to know if there is a faster way because I have many moving points and the two nearest neighbours must be identified a lot of times.
Poul Riis
import math
import numpy as np
import scipy
from scipy.spatial.distance import pdist
def calc_dist(p1,p2):
return math.sqrt((p2[0] - p1[0]) ** 2 +
(p2[1] - p1[1]) ** 2 +
(p2[2] - p1[2]) ** 2)
pts=[]
for i in range(-1,2):
for j in range(-1,2):
for k in range(-1,2):
pts.append((i+np.random.random()/10,j+np.random.random()/10,k+np.random.random()/10))
for i in range(0,len(pts)):
print(i,pts[i])
#The minimum distance is easily found
distances=scipy.spatial.distance.pdist(pts)
distmin=min(distances)
n=np.argmin(distances)
for i in range(0,len(distances)):
print(i,distances[i])
print('The minimum distance is: ',distmin,' which has number ',n)
#But that doesn't tell which two points it was...
n=-1
for i in range(0,27):
for j in range(i+1,27):
n=n+1
dist=calc_dist(pts[i],pts[j])
if n==0:
distold=dist
else:
if dist <distold:
distold=dist
istore=i
jstore=j
neighbour=n
print('The two closest points are: ')
print(pts[istore])
print('and')
print(pts[jstore])
print('They have the numbers ',istore,' and ',jstore)
print('Their distance is ',calc_dist(pts[istore],pts[jstore]))
print('The index with the minimum value of the distance array is ',neighbour)
[toc] | [prev] | [next] | [standalone]
| From | Christian Gollwitzer <auriocus@gmx.de> |
|---|---|
| Date | 2015-10-25 09:55 +0100 |
| Message-ID | <n0i5a0$qqs$1@dont-email.me> |
| In reply to | #97938 |
Am 25.10.15 um 00:44 schrieb Poul Riis:
> A program that does what I want can be found below. However, I want
> to know if there is a faster way because I have many moving points
> and the two nearest neighbours must be identified a lot of times.
>
Ah, that changes the game. But I'm still puzzled - is this first scipy
function fast enough? Because your second computation using teh nested
for loops in my tests gave exactly the same index. I'm not sure it is
guaranteed, but it seems that the pdist function uses the same ordering
as your code. In which case you can simply calculate i and j from the
index n.
For instance, if you have i and j, you can compute n by the formula:
print('The index computed: ', N*istore-istore-istore*(istore+1)/2+jstore-1)
Now you are left with determining i and j from n. You need to find the
largest i such that the j from solving the above equation is not
negative. If you are lazy, and it is fast enough, you can compute this
value in a loop - it is O(n). Or you can try solvin it as a quadratic
inequation for i.
Christian
[toc] | [prev] | [next] | [standalone]
| From | Poul Riis <priisdk@gmail.com> |
|---|---|
| Date | 2015-10-25 05:00 -0700 |
| Message-ID | <f6b73797-c758-47d6-a527-27120f7af70e@googlegroups.com> |
| In reply to | #97940 |
Yes, I also have that formula - one equation with two unknowns. I guess there is some simple solution but I haven't yet managed to find it. But this is mathematics, not programming. Still, there might be some well-written fast algorithm out there...? Poul Riis
[toc] | [prev] | [next] | [standalone]
| From | Peter Pearson <pkpearson@nowhere.invalid> |
|---|---|
| Date | 2015-10-25 17:01 +0000 |
| Message-ID | <d94g7nFns35U1@mid.individual.net> |
| In reply to | #97940 |
On Sun, 25 Oct 2015 09:55:30 +0100, Christian Gollwitzer wrote:
> Am 25.10.15 um 00:44 schrieb Poul Riis:
>> A program that does what I want can be found below. However, I want
>> to know if there is a faster way because I have many moving points
>> and the two nearest neighbours must be identified a lot of times.
>>
>
> Ah, that changes the game. But I'm still puzzled - is this first scipy
> function fast enough? Because your second computation using teh nested
> for loops in my tests gave exactly the same index. I'm not sure it is
> guaranteed, but it seems that the pdist function uses the same ordering
> as your code. In which case you can simply calculate i and j from the
> index n.
>
> For instance, if you have i and j, you can compute n by the formula:
>
> print('The index computed: ', N*istore-istore-istore*(istore+1)/2+jstore-1)
>
> Now you are left with determining i and j from n. You need to find the
> largest i such that the j from solving the above equation is not
> negative. If you are lazy, and it is fast enough, you can compute this
> value in a loop - it is O(n). Or you can try solvin it as a quadratic
> inequation for i.
My algebra is a little rusty, but I think this boils down to
i = floor(0.5*(2*N-3 - sqrt((2*N-3)**2 - 8*(n+1))))
Sorry I don't have time to test it right now.
--
To email me, substitute nowhere->runbox, invalid->com.
[toc] | [prev] | [next] | [standalone]
| From | Bartc <bc@freeuk.com> |
|---|---|
| Date | 2015-10-25 12:36 +0000 |
| Message-ID | <n0ii97$7fe$1@dont-email.me> |
| In reply to | #97938 |
On 24/10/2015 23:44, Poul Riis wrote: > def calc_dist(p1,p2): > return math.sqrt((p2[0] - p1[0]) ** 2 + > (p2[1] - p1[1]) ** 2 + > (p2[2] - p1[2]) ** 2) > dist=calc_dist(pts[i],pts[j]) > if n==0: > distold=dist > else: > if dist <distold: Just to compare two distances, you can probably work with the squares. You don't need the square root in calc_dist. You will need it for the final report, but that's only one square root, not thousands or millions. -- Bartc
[toc] | [prev] | [next] | [standalone]
| From | Fabien <fabien.maussion@gmail.com> |
|---|---|
| Date | 2015-10-26 03:25 +0100 |
| Message-ID | <n0k2u1$gqb$2@speranza.aioe.org> |
| In reply to | #97936 |
On 10/24/2015 10:05 PM, Poul Riis wrote: > With scipy.spatial.distance.pdist I don't really understand your problem. With scipy.spatial.distance.pdist you'll have all you need: From the docs: Returns: Returns a condensed distance matrix Y. For each i and j (where i<j<n), the metric dist(u=X[i], v=X[j]) is computed and stored in entry ij. So you have the distances, the indexes of the points so you also have the value of these points...
[toc] | [prev] | [next] | [standalone]
| From | Tom P <werotizy@freent.dd> |
|---|---|
| Date | 2015-10-31 22:28 +0100 |
| Message-ID | <d9kq38Ft8k9U1@mid.individual.net> |
| In reply to | #97936 |
On 10/24/2015 10:05 PM, Poul Riis wrote:
> I have N points in 3D, organized in a list. I want to to point out the numbers of the two that have the smallest distance.
> With scipy.spatial.distance.pdist I can make a list of all the distances, and I can point out the number of the minimum value of that list (see simple example below - the line with pts.append... should be indented three times). But I guess there is a standard (numpy?) routine which points out the numbers of the corresponding two points but I cannot find it. Can someone help?
>
> Poul Riis
>
>
>
>
> import numpy as np
> import scipy
> from scipy.spatial.distance import pdist
>
>
> pts=[]
> for i in range(-1,2):
> for j in range(-1,2):
> for k in range(-1,2): pts.append((i+np.random.random()/10,j+np.random.random()/10,k+np.random.random()/10))
> for i in range(0,len(pts)):
> print(pts[i])
> distances=scipy.spatial.distance.pdist(pts)
> n=np.argmin(distances)
> for i in range(0,len(distances)):
> print(i,distances[i])
> print('The minimum distance is: ',min(distances),' which has number ',n)
>
I won't claim to have the definitive answer but - is this a clustering
problem? Did you look at any machine learning packages?
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2015-10-31 23:00 -0400 |
| Message-ID | <mailman.6.1446346873.4463.python-list@python.org> |
| In reply to | #98024 |
On 10/31/2015 5:28 PM, Tom P wrote: > On 10/24/2015 10:05 PM, Poul Riis wrote: >> I have N points in 3D, organized in a list. I want to to point out the >> numbers of the two that have the smallest distance. >> With scipy.spatial.distance.pdist I can make a list of all the >> distances, and I can point out the number of the minimum value of that >> list (see simple example below distances is conceptually a 2-d matrix, but since it would be symmetrical, I gather that it is stored as 1-d represention You should be able to find the formula that converts the linear index to the indexes of the corresponding symmetric matrix. It depends on whether distances represent the upper or lower triangle of the symmetric matrix and whether the triangular matrix is stored by rows or columns. -- Terry Jan Reedy
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web