我如何有效地获得一个NumPy数组中每个唯一值的频率计数?
>>> x = np.array([1,1,1,2,2,2,5,25,1,1])
>>> freq_count(x)
[(1, 5), (2, 3), (5, 1), (25, 1)]
我如何有效地获得一个NumPy数组中每个唯一值的频率计数?
>>> x = np.array([1,1,1,2,2,2,5,25,1,1])
>>> freq_count(x)
[(1, 5), (2, 3), (5, 1), (25, 1)]
当前回答
多维频率计数,即计数数组。
>>> print(color_array )
array([[255, 128, 128],
[255, 128, 128],
[255, 128, 128],
...,
[255, 128, 128],
[255, 128, 128],
[255, 128, 128]], dtype=uint8)
>>> np.unique(color_array,return_counts=True,axis=0)
(array([[ 60, 151, 161],
[ 60, 155, 162],
[ 60, 159, 163],
[ 61, 143, 162],
[ 61, 147, 162],
[ 61, 162, 163],
[ 62, 166, 164],
[ 63, 137, 162],
[ 63, 169, 164],
array([ 1, 2, 2, 1, 4, 1, 1, 2,
3, 1, 1, 1, 2, 5, 2, 2,
898, 1, 1,
其他回答
import pandas as pd
import numpy as np
print(pd.Series(name_of_array).value_counts())
像这样的东西应该做到:
#create 100 random numbers
arr = numpy.random.random_integers(0,50,100)
#create a dictionary of the unique values
d = dict([(i,0) for i in numpy.unique(arr)])
for number in arr:
d[j]+=1 #increment when that value is found
另外,之前的这篇关于有效计算独特元素的文章似乎与您的问题非常相似,除非我遗漏了什么。
看看np.bincount:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html
import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]
然后:
zip(ii,y[ii])
# [(1, 5), (2, 3), (5, 1), (25, 1)]
or:
np.vstack((ii,y[ii])).T
# array([[ 1, 5],
[ 2, 3],
[ 5, 1],
[25, 1]])
或者你想结合计数和唯一值。
我对此也很感兴趣,所以我做了一点性能比较(使用perfplot,我的一个爱好项目)。结果:
y = np.bincount(a)
ii = np.nonzero(y)[0]
out = np.vstack((ii, y[ii])).T
是目前为止最快的。(请注意对数缩放。)
代码生成的情节:
import numpy as np
import pandas as pd
import perfplot
from scipy.stats import itemfreq
def bincount(a):
y = np.bincount(a)
ii = np.nonzero(y)[0]
return np.vstack((ii, y[ii])).T
def unique(a):
unique, counts = np.unique(a, return_counts=True)
return np.asarray((unique, counts)).T
def unique_count(a):
unique, inverse = np.unique(a, return_inverse=True)
count = np.zeros(len(unique), dtype=int)
np.add.at(count, inverse, 1)
return np.vstack((unique, count)).T
def pandas_value_counts(a):
out = pd.value_counts(pd.Series(a))
out.sort_index(inplace=True)
out = np.stack([out.keys().values, out.values]).T
return out
b = perfplot.bench(
setup=lambda n: np.random.randint(0, 1000, n),
kernels=[bincount, unique, itemfreq, unique_count, pandas_value_counts],
n_range=[2 ** k for k in range(26)],
xlabel="len(a)",
)
b.save("out.png")
b.show()
使用numpy。唯一的return_counts=True (NumPy 1.9+):
import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
unique, counts = np.unique(x, return_counts=True)
>>> print(np.asarray((unique, counts)).T)
[[ 1 5]
[ 2 3]
[ 5 1]
[25 1]]
与scipy.stats.itemfreq相比:
In [4]: x = np.random.random_integers(0,100,1e6)
In [5]: %timeit unique, counts = np.unique(x, return_counts=True)
10 loops, best of 3: 31.5 ms per loop
In [6]: %timeit scipy.stats.itemfreq(x)
10 loops, best of 3: 170 ms per loop