频率对NumPy数组中的唯一值进行计数

我如何有效地获得一个NumPy数组中每个唯一值的频率计数?

>>> x = np.array([1,1,1,2,2,2,5,25,1,1])
>>> freq_count(x)
[(1, 5), (2, 3), (5, 1), (25, 1)]

当前回答

使用numpy。唯一的return_counts=True (NumPy 1.9+):

import numpy as np

x = np.array([1,1,1,2,2,2,5,25,1,1])
unique, counts = np.unique(x, return_counts=True)

>>> print(np.asarray((unique, counts)).T)
 [[ 1  5]
  [ 2  3]
  [ 5  1]
  [25  1]]

与scipy.stats.itemfreq相比:

In [4]: x = np.random.random_integers(0,100,1e6)

In [5]: %timeit unique, counts = np.unique(x, return_counts=True)
10 loops, best of 3: 31.5 ms per loop

In [6]: %timeit scipy.stats.itemfreq(x)
10 loops, best of 3: 170 ms per loop

2014-09-19 22:54:29

其他回答

老问题，但我想提供我自己的解决方案，这是最快的，使用普通列表而不是np。数组作为输入(或首先转移到列表)，基于我的台架测试。

如果你也遇到这种情况，请检查一下。

def count(a):
    results = {}
    for x in a:
        if x not in results:
            results[x] = 1
        else:
            results[x] += 1
    return results

例如,

>>>timeit count([1,1,1,2,2,2,5,25,1,1]) would return:

100000个循环，最好的3:2.26µs每循环

>>>timeit count(np.array([1,1,1,2,2,2,5,25,1,1]))

100000个回路，最好的3:8.8µs每回路

>>>timeit count(np.array([1,1,1,2,2,2,5,25,1,1]).tolist())

100000个回路，最佳3:5.85µs每回路

而公认的答案会更慢，而scipy.stats.itemfreq解决方案更糟糕。

更深入的测试并没有证实所制定的期望。

from zmq import Stopwatch
aZmqSTOPWATCH = Stopwatch()

aDataSETasARRAY = ( 100 * abs( np.random.randn( 150000 ) ) ).astype( np.int )
aDataSETasLIST  = aDataSETasARRAY.tolist()

import numba
@numba.jit
def numba_bincount( anObject ):
    np.bincount(    anObject )
    return

aZmqSTOPWATCH.start();np.bincount(    aDataSETasARRAY );aZmqSTOPWATCH.stop()
14328L

aZmqSTOPWATCH.start();numba_bincount( aDataSETasARRAY );aZmqSTOPWATCH.stop()
592L

aZmqSTOPWATCH.start();count(          aDataSETasLIST  );aZmqSTOPWATCH.stop()
148609L

参考下面关于影响小型数据集大量重复测试结果的缓存和其他ram内副作用的评论。

2014-08-19 06:50:47

from collections import Counter
x = array( [1,1,1,2,2,2,5,25,1,1] )
mode = counter.most_common(1)[0][0]

2020-03-02 16:16:27

使用numpy。唯一的return_counts=True (NumPy 1.9+):

import numpy as np

x = np.array([1,1,1,2,2,2,5,25,1,1])
unique, counts = np.unique(x, return_counts=True)

>>> print(np.asarray((unique, counts)).T)
 [[ 1  5]
  [ 2  3]
  [ 5  1]
  [25  1]]

与scipy.stats.itemfreq相比:

In [4]: x = np.random.random_integers(0,100,1e6)

In [5]: %timeit unique, counts = np.unique(x, return_counts=True)
10 loops, best of 3: 31.5 ms per loop

In [6]: %timeit scipy.stats.itemfreq(x)
10 loops, best of 3: 170 ms per loop

2014-09-19 22:54:29

像这样的东西应该做到:

#create 100 random numbers
arr = numpy.random.random_integers(0,50,100)

#create a dictionary of the unique values
d = dict([(i,0) for i in numpy.unique(arr)])
for number in arr:
    d[j]+=1   #increment when that value is found

另外，之前的这篇关于有效计算独特元素的文章似乎与您的问题非常相似，除非我遗漏了什么。

2012-05-24 16:32:11

import pandas as pd
import numpy as np

print(pd.Series(name_of_array).value_counts())

2018-05-31 04:48:13

频率对NumPy数组中的唯一值进行计数

推荐文章

最新文章

标签