如何在numpy数组中找到最近的值?例子:
np.find_nearest(array, value)
如何在numpy数组中找到最近的值?例子:
np.find_nearest(array, value)
当前回答
如果你有很多值需要搜索(值可以是多维数组),下面是@Dimitri的快速向量化解决方案:
# `values` should be sorted
def get_closest(array, values):
# make sure array is a numpy array
array = np.array(array)
# get insert positions
idxs = np.searchsorted(array, values, side="left")
# find indexes where previous index is closer
prev_idx_is_less = ((idxs == len(array))|(np.fabs(values - array[np.maximum(idxs-1, 0)]) < np.fabs(values - array[np.minimum(idxs, len(array)-1)])))
idxs[prev_idx_is_less] -= 1
return array[idxs]
基准
>使用@Demitri的解决方案比使用for循环快100倍”
>>> %timeit ar=get_closest(np.linspace(1, 1000, 100), np.random.randint(0, 1050, (1000, 1000)))
139 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit ar=[find_nearest(np.linspace(1, 1000, 100), value) for value in np.random.randint(0, 1050, 1000*1000)]
took 21.4 seconds
其他回答
下面是一个使用2D数组的版本,如果用户拥有scipy的cdist函数,则使用它,如果用户没有,则使用更简单的距离计算。
默认情况下,输出是最接近输入值的索引,但您可以使用output关键字将其更改为'index', 'value'或'both'之一,其中'value'输出数组[index], 'both'输出索引,数组[index]。
对于非常大的数组,您可能需要使用kind='euclidean',因为默认的scipy cdist函数可能会耗尽内存。
这可能不是绝对最快的解决方案,但已经很接近了。
def find_nearest_2d(array, value, kind='cdist', output='index'):
# 'array' must be a 2D array
# 'value' must be a 1D array with 2 elements
# 'kind' defines what method to use to calculate the distances. Can choose one
# of 'cdist' (default) or 'euclidean'. Choose 'euclidean' for very large
# arrays. Otherwise, cdist is much faster.
# 'output' defines what the output should be. Can be 'index' (default) to return
# the index of the array that is closest to the value, 'value' to return the
# value that is closest, or 'both' to return index,value
import numpy as np
if kind == 'cdist':
try: from scipy.spatial.distance import cdist
except ImportError:
print("Warning (find_nearest_2d): Could not import cdist. Reverting to simpler distance calculation")
kind = 'euclidean'
index = np.where(array == value)[0] # Make sure the value isn't in the array
if index.size == 0:
if kind == 'cdist': index = np.argmin(cdist([value],array)[0])
elif kind == 'euclidean': index = np.argmin(np.sum((np.array(array)-np.array(value))**2.,axis=1))
else: raise ValueError("Keyword 'kind' must be one of 'cdist' or 'euclidean'")
if output == 'index': return index
elif output == 'value': return array[index]
elif output == 'both': return index,array[index]
else: raise ValueError("Keyword 'output' must be one of 'index', 'value', or 'both'")
import numpy as np
def find_nearest(array, value):
array = np.asarray(array)
idx = (np.abs(array - value)).argmin()
return array[idx]
使用示例:
array = np.random.random(10)
print(array)
# [ 0.21069679 0.61290182 0.63425412 0.84635244 0.91599191 0.00213826
# 0.17104965 0.56874386 0.57319379 0.28719469]
print(find_nearest(array, value=0.5))
# 0.568743859261
也许对ndarray有帮助:
def find_nearest(X, value):
return X[np.unravel_index(np.argmin(np.abs(X - value)), X.shape)]
所有的答案都有助于收集信息来编写高效的代码。但是,我已经编写了一个小的Python脚本来针对各种情况进行优化。如果提供的数组已排序,则将是最佳情况。如果搜索一个指定值的最近点的索引,那么对半模块是最省时的。当一个索引对应一个数组时,numpy searchsorted是最有效的。
import numpy as np
import bisect
xarr = np.random.rand(int(1e7))
srt_ind = xarr.argsort()
xar = xarr.copy()[srt_ind]
xlist = xar.tolist()
bisect.bisect_left(xlist, 0.3)
In[63]: %时间平分。bisect_left (xlist, 0.3) CPU次数:user 0ns, sys: 0ns, total: 0ns 壁时间:22.2µs
np.searchsorted(xar, 0.3, side="left")
In [64]: %time np。Searchsorted (xar, 0.3, side="left") CPU次数:user 0ns, sys: 0ns, total: 0ns 壁时间:98.9µs
randpts = np.random.rand(1000)
np.searchsorted(xar, randpts, side="left")
%的时间np。Searchsorted (xar, randpts, side="left") CPU次数:用户4ms, sys: 0ns, total: 4ms 壁时间:1.2 ms
如果我们遵循乘法规则,那么numpy应该花费~100 ms,这意味着快了~83倍。
下面是@Ari Onasafari的scipy版本,回答“在向量数组中找到最近的向量”
In [1]: from scipy import spatial
In [2]: import numpy as np
In [3]: A = np.random.random((10,2))*100
In [4]: A
Out[4]:
array([[ 68.83402637, 38.07632221],
[ 76.84704074, 24.9395109 ],
[ 16.26715795, 98.52763827],
[ 70.99411985, 67.31740151],
[ 71.72452181, 24.13516764],
[ 17.22707611, 20.65425362],
[ 43.85122458, 21.50624882],
[ 76.71987125, 44.95031274],
[ 63.77341073, 78.87417774],
[ 8.45828909, 30.18426696]])
In [5]: pt = [6, 30] # <-- the point to find
In [6]: A[spatial.KDTree(A).query(pt)[1]] # <-- the nearest point
Out[6]: array([ 8.45828909, 30.18426696])
#how it works!
In [7]: distance,index = spatial.KDTree(A).query(pt)
In [8]: distance # <-- The distances to the nearest neighbors
Out[8]: 2.4651855048258393
In [9]: index # <-- The locations of the neighbors
Out[9]: 9
#then
In [10]: A[index]
Out[10]: array([ 8.45828909, 30.18426696])