在numpy.array中查找唯一行

我需要在numpy.array中找到唯一的行。

例如:

>>> a # I have
array([[1, 1, 1, 0, 0, 0],
       [0, 1, 1, 1, 0, 0],
       [0, 1, 1, 1, 0, 0],
       [1, 1, 1, 0, 0, 0],
       [1, 1, 1, 1, 1, 0]])
>>> new_a # I want to get to
array([[1, 1, 1, 0, 0, 0],
       [0, 1, 1, 1, 0, 0],
       [1, 1, 1, 1, 1, 0]])

我知道我可以创建一个集和循环数组，但我正在寻找一个有效的纯numpy解决方案。我相信有一种方法可以将数据类型设置为void，然后我可以使用numpy。唯一的，但我不知道如何让它工作。

当前回答

np。给出一个元组列表的唯一作品:

>>> np.unique([(1, 1), (2, 2), (3, 3), (4, 4), (2, 2)])
Out[9]: 
array([[1, 1],
       [2, 2],
       [3, 3],
       [4, 4]])

对于列表列表，它会引发TypeError: unhashable type: 'list'

2013-06-06 19:59:24

其他回答

基于本页的答案，我写了一个函数，它复制了MATLAB的唯一(input，'rows')函数的功能，并附加了接受公差以检查唯一性的功能。它还返回c = data[ia，:]和data = c[ic，:]这样的索引。如果您发现任何不符或错误，请报告。

def unique_rows(data, prec=5):
    import numpy as np
    d_r = np.fix(data * 10 ** prec) / 10 ** prec + 0.0
    b = np.ascontiguousarray(d_r).view(np.dtype((np.void, d_r.dtype.itemsize * d_r.shape[1])))
    _, ia = np.unique(b, return_index=True)
    _, ic = np.unique(b, return_inverse=True)
    return np.unique(b).view(d_r.dtype).reshape(-1, d_r.shape[1]), ia, ic

2014-11-11 14:57:55

最直接的解决方案是通过使行成为字符串，使行成为单个项。然后可以使用numpy将每一行作为一个整体进行比较，以确定其唯一性。这个解决方案是可推广的，你只需要重塑和转置你的数组为其他组合。以下是所提供的问题的解决方案。

import numpy as np

original = np.array([[1, 1, 1, 0, 0, 0],
       [0, 1, 1, 1, 0, 0],
       [0, 1, 1, 1, 0, 0],
       [1, 1, 1, 0, 0, 0],
       [1, 1, 1, 1, 1, 0]])

uniques, index = np.unique([str(i) for i in original], return_index=True)
cleaned = original[index]
print(cleaned)

将:

 array([[0, 1, 1, 1, 0, 0],
        [1, 1, 1, 0, 0, 0],
        [1, 1, 1, 1, 1, 0]])

把我的诺贝尔奖寄出去

2016-09-22 07:53:27

还有另一个可能的解决方案

np.vstack({tuple(row) for row in a})

2014-04-08 15:37:39

我比较了速度的建议替代方案，惊奇地发现，void视图唯一解决方案甚至比numpy的带有axis参数的本机唯一解决方案还要快一点。如果你想要速度，你会想要

numpy.unique(
    a.view(numpy.dtype((numpy.void, a.dtype.itemsize*a.shape[1])))
).view(a.dtype).reshape(-1, a.shape[1])

我已经在npx.unique_rows中实现了最快的变体。

在GitHub上也有一个bug报告。

代码重现情节:

import numpy
import perfplot


def unique_void_view(a):
    return (
        numpy.unique(a.view(numpy.dtype((numpy.void, a.dtype.itemsize * a.shape[1]))))
        .view(a.dtype)
        .reshape(-1, a.shape[1])
    )


def lexsort(a):
    ind = numpy.lexsort(a.T)
    return a[
        ind[numpy.concatenate(([True], numpy.any(a[ind[1:]] != a[ind[:-1]], axis=1)))]
    ]


def vstack(a):
    return numpy.vstack([tuple(row) for row in a])


def unique_axis(a):
    return numpy.unique(a, axis=0)


perfplot.show(
    setup=lambda n: numpy.random.randint(2, size=(n, 20)),
    kernels=[unique_void_view, lexsort, vstack, unique_axis],
    n_range=[2 ** k for k in range(15)],
    xlabel="len(a)",
    equality_check=None,
)

2017-07-09 14:25:58

numpy_indexed包(免责声明:我是它的作者)将Jaime发布的解决方案包装在一个漂亮且经过测试的界面中，加上更多的特性:

import numpy_indexed as npi
new_a = npi.unique(a)  # unique elements over axis=0 (rows) by default

2016-04-02 14:43:13

在numpy.array中查找唯一行

推荐文章

最新文章

标签