我如何有效地获得一个NumPy数组中每个唯一值的频率计数?
>>> x = np.array([1,1,1,2,2,2,5,25,1,1])
>>> freq_count(x)
[(1, 5), (2, 3), (5, 1), (25, 1)]
我如何有效地获得一个NumPy数组中每个唯一值的频率计数?
>>> x = np.array([1,1,1,2,2,2,5,25,1,1])
>>> freq_count(x)
[(1, 5), (2, 3), (5, 1), (25, 1)]
当前回答
看看np.bincount:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html
import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]
然后:
zip(ii,y[ii])
# [(1, 5), (2, 3), (5, 1), (25, 1)]
or:
np.vstack((ii,y[ii])).T
# array([[ 1, 5],
[ 2, 3],
[ 5, 1],
[25, 1]])
或者你想结合计数和唯一值。
其他回答
使用numpy。唯一的return_counts=True (NumPy 1.9+):
import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
unique, counts = np.unique(x, return_counts=True)
>>> print(np.asarray((unique, counts)).T)
[[ 1 5]
[ 2 3]
[ 5 1]
[25 1]]
与scipy.stats.itemfreq相比:
In [4]: x = np.random.random_integers(0,100,1e6)
In [5]: %timeit unique, counts = np.unique(x, return_counts=True)
10 loops, best of 3: 31.5 ms per loop
In [6]: %timeit scipy.stats.itemfreq(x)
10 loops, best of 3: 170 ms per loop
像这样的东西应该做到:
#create 100 random numbers
arr = numpy.random.random_integers(0,50,100)
#create a dictionary of the unique values
d = dict([(i,0) for i in numpy.unique(arr)])
for number in arr:
d[j]+=1 #increment when that value is found
另外,之前的这篇关于有效计算独特元素的文章似乎与您的问题非常相似,除非我遗漏了什么。
看看np.bincount:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html
import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]
然后:
zip(ii,y[ii])
# [(1, 5), (2, 3), (5, 1), (25, 1)]
or:
np.vstack((ii,y[ii])).T
# array([[ 1, 5],
[ 2, 3],
[ 5, 1],
[25, 1]])
或者你想结合计数和唯一值。
Most of simple problems get complicated because simple functionality like order() in R that gives a statistical result in both and descending order is missing in various python libraries. But if we devise our thinking that all such statistical ordering and parameters in python are easily found in pandas, we can can result sooner than looking in 100 different places. Also, development of R and pandas go hand-in-hand because they were created for same purpose. To solve this problem I use following code that gets me by anywhere:
unique, counts = np.unique(x, return_counts=True)
d = {'unique':unique, 'counts':count} # pass the list to a dictionary
df = pd.DataFrame(d) #dictionary object can be easily passed to make a dataframe
df.sort_values(by = 'count', ascending=False, inplace = True)
df = df.reset_index(drop=True) #optional only if you want to use it further
numpy。Bincount可能是最好的选择。如果你的数组除了包含小的密集整数之外还包含任何东西,那么像这样包装它可能是有用的:
def count_unique(keys):
uniq_keys = np.unique(keys)
bins = uniq_keys.searchsorted(keys)
return uniq_keys, np.bincount(bins)
例如:
>>> x = array([1,1,1,2,2,2,5,25,1,1])
>>> count_unique(x)
(array([ 1, 2, 5, 25]), array([5, 3, 1, 1]))