如何在NumPy数组中获得N个最大值的索引?

NumPy提出了一种通过np.argmax获取数组最大值索引的方法。

我想要一个类似的东西，但返回N个最大值的索引。

例如，如果我有一个数组[1,3,2,4,5]，那么nargmax(array, n=3)将返回对应于元素[5,4,3]的下标[4,3,1]。

当前回答

我认为最省时的方法是手动遍历数组并保持k-size的min-heap，正如其他人所提到的那样。

我还想出了一个蛮力方法:

top_k_index_list = [ ]
for i in range(k):
    top_k_index_list.append(np.argmax(my_array))
    my_array[top_k_index_list[-1]] = -float('inf')

在使用argmax获取其索引后，将最大的元素设置为一个较大的负值。然后argmax的下一次调用将返回第二大的元素。您可以记录这些元素的原始值，并在需要时恢复它们。

2018-04-25 10:09:07

其他回答

我能想到的最简单的是:

>>> import numpy as np
>>> arr = np.array([1, 3, 2, 4, 5])
>>> arr.argsort()[-3:][::-1]
array([4, 3, 1])

这涉及到一个完整的数组。我想知道numpy是否提供了一种内置的方法来进行部分排序;到目前为止我还没有找到。

如果这个解决方案太慢(特别是对于小n)，那么可能值得考虑用Cython编写一些东西。

2011-08-02 10:32:53

Use:

from operator import itemgetter
from heapq import nlargest
result = nlargest(N, enumerate(your_list), itemgetter(1))

现在，结果列表将包含N个元组(index, value)，其中value是最大的。

2016-04-17 10:06:35

下面是查看最大元素及其位置的一个非常简单的方法。这里轴是定义域;对于2D情况，axis = 0表示列的最大数量，axis = 1表示行的最大数量。对于高维，这取决于你。

M = np.random.random((3, 4))
print(M)
print(M.max(axis=1), M.argmax(axis=1))

2018-06-16 08:20:27

当top_k<<axis_length时，它优于argsort。

import numpy as np

def get_sorted_top_k(array, top_k=1, axis=-1, reverse=False):
    if reverse:
        axis_length = array.shape[axis]
        partition_index = np.take(np.argpartition(array, kth=-top_k, axis=axis),
                                  range(axis_length - top_k, axis_length), axis)
    else:
        partition_index = np.take(np.argpartition(array, kth=top_k, axis=axis), range(0, top_k), axis)
    top_scores = np.take_along_axis(array, partition_index, axis)
    # resort partition
    sorted_index = np.argsort(top_scores, axis=axis)
    if reverse:
        sorted_index = np.flip(sorted_index, axis=axis)
    top_sorted_scores = np.take_along_axis(top_scores, sorted_index, axis)
    top_sorted_indexes = np.take_along_axis(partition_index, sorted_index, axis)
    return top_sorted_scores, top_sorted_indexes

if __name__ == "__main__":
    import time
    from sklearn.metrics.pairwise import cosine_similarity

    x = np.random.rand(10, 128)
    y = np.random.rand(1000000, 128)
    z = cosine_similarity(x, y)
    start_time = time.time()
    sorted_index_1 = get_sorted_top_k(z, top_k=3, axis=1, reverse=True)[1]
    print(time.time() - start_time)

2021-01-13 09:05:10

这将比完整排序更快，这取决于原始数组的大小和选择的大小:

>>> A = np.random.randint(0,10,10)
>>> A
array([5, 1, 5, 5, 2, 3, 2, 4, 1, 0])
>>> B = np.zeros(3, int)
>>> for i in xrange(3):
...     idx = np.argmax(A)
...     B[i]=idx; A[idx]=0 #something smaller than A.min()
...     
>>> B
array([0, 2, 3])

当然，这涉及到对原始数组的篡改。你可以修复(如果需要)通过复制或替换回原始值. ...对你的用例来说，哪个更便宜。

2011-08-02 13:54:12

如何在NumPy数组中获得N个最大值的索引?

推荐文章

最新文章

标签