在numpy.array中查找唯一行

我需要在numpy.array中找到唯一的行。

例如:

>>> a # I have
array([[1, 1, 1, 0, 0, 0],
       [0, 1, 1, 1, 0, 0],
       [0, 1, 1, 1, 0, 0],
       [1, 1, 1, 0, 0, 0],
       [1, 1, 1, 1, 1, 0]])
>>> new_a # I want to get to
array([[1, 1, 1, 0, 0, 0],
       [0, 1, 1, 1, 0, 0],
       [1, 1, 1, 1, 1, 0]])

我知道我可以创建一个集和循环数组，但我正在寻找一个有效的纯numpy解决方案。我相信有一种方法可以将数据类型设置为void，然后我可以使用numpy。唯一的，但我不知道如何让它工作。

当前回答

基于本页的答案，我写了一个函数，它复制了MATLAB的唯一(input，'rows')函数的功能，并附加了接受公差以检查唯一性的功能。它还返回c = data[ia，:]和data = c[ic，:]这样的索引。如果您发现任何不符或错误，请报告。

def unique_rows(data, prec=5):
    import numpy as np
    d_r = np.fix(data * 10 ** prec) / 10 ** prec + 0.0
    b = np.ascontiguousarray(d_r).view(np.dtype((np.void, d_r.dtype.itemsize * d_r.shape[1])))
    _, ia = np.unique(b, return_index=True)
    _, ic = np.unique(b, return_inverse=True)
    return np.unique(b).view(d_r.dtype).reshape(-1, d_r.shape[1]), ia, ic

2014-11-11 14:57:55

其他回答

np。Unique的工作原理是对一个扁平数组排序，然后查看每一项是否等于前一项。这可以手动完成，无需压平:

ind = np.lexsort(a.T)
a[ind[np.concatenate(([True],np.any(a[ind[1:]]!=a[ind[:-1]],axis=1)))]]

这个方法不使用元组，应该比这里给出的其他方法更快更简单。

注意:以前的版本在A[后面没有ind，这意味着使用了错误的索引。另外，Joe Kington提出了一个很好的观点，这确实产生了各种各样的中间副本。下面的方法通过创建一个排序副本，然后使用它的视图来生成更少的副本:

b = a[np.lexsort(a.T)]
b[np.concatenate(([True], np.any(b[1:] != b[:-1],axis=1)))]

这样更快，使用的内存更少。

同样，如果你想在ndarray中找到唯一的行，而不管数组中有多少个维度，下面的方法可以工作:

b = a[lexsort(a.reshape((a.shape[0],-1)).T)];
b[np.concatenate(([True], np.any(b[1:]!=b[:-1],axis=tuple(range(1,a.ndim)))))]

剩下的一个有趣的问题是，如果你想沿着任意维度数组的任意轴进行排序/惟一，这将更加困难。

编辑:

为了演示速度差异，我在ipython中对答案中描述的三种不同方法进行了一些测试。使用你的精确的a，没有太大的区别，尽管这个版本稍微快一点:

In [87]: %timeit unique(a.view(dtype)).view('<i8')
10000 loops, best of 3: 48.4 us per loop

In [88]: %timeit ind = np.lexsort(a.T); a[np.concatenate(([True], np.any(a[ind[1:]]!= a[ind[:-1]], axis=1)))]
10000 loops, best of 3: 37.6 us per loop

In [89]: %timeit b = [tuple(row) for row in a]; np.unique(b)
10000 loops, best of 3: 41.6 us per loop

然而，使用更大的a，这个版本最终会快得多:

In [96]: a = np.random.randint(0,2,size=(10000,6))

In [97]: %timeit unique(a.view(dtype)).view('<i8')
10 loops, best of 3: 24.4 ms per loop

In [98]: %timeit b = [tuple(row) for row in a]; np.unique(b)
10 loops, best of 3: 28.2 ms per loop

In [99]: %timeit ind = np.lexsort(a.T); a[np.concatenate(([True],np.any(a[ind[1:]]!= a[ind[:-1]],axis=1)))]
100 loops, best of 3: 3.25 ms per loop

2013-06-06 20:13:15

这里是@Greg pythonic answer的另一种变体

np.vstack(set(map(tuple, a)))

2014-12-11 14:45:28

还有另一个可能的解决方案

np.vstack({tuple(row) for row in a})

2014-04-08 15:37:39

让我们以列表的形式获取整个numpy矩阵，然后从这个列表中删除重复项，最后将我们唯一的列表返回到numpy矩阵中:

matrix_as_list=data.tolist() 
matrix_as_list:
[[1, 1, 1, 0, 0, 0], [0, 1, 1, 1, 0, 0], [0, 1, 1, 1, 0, 0], [1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 1, 0]]

uniq_list=list()
uniq_list.append(matrix_as_list[0])

[uniq_list.append(item) for item in matrix_as_list if item not in uniq_list]

unique_matrix=np.array(uniq_list)
unique_matrix:
array([[1, 1, 1, 0, 0, 0],
       [0, 1, 1, 1, 0, 0],
       [1, 1, 1, 1, 1, 0]])

2017-09-18 03:48:16

这些答案对我都没用。我假设我的唯一行包含字符串而不是数字。然而，来自另一个帖子的答案确实起作用了:

来源:https://stackoverflow.com/a/38461043/5402386

你可以使用.count()和.index()列表的方法

coor = np.array([[10, 10], [12, 9], [10, 5], [12, 9]])
coor_tuple = [tuple(x) for x in coor]
unique_coor = sorted(set(coor_tuple), key=lambda x: coor_tuple.index(x))
unique_count = [coor_tuple.count(x) for x in unique_coor]
unique_index = [coor_tuple.index(x) for x in unique_coor]

2017-02-13 22:55:07

在numpy.array中查找唯一行

推荐文章

最新文章

标签