找到Python列表中最常见元素的有效方法是什么?

我的列表项可能不是可哈希的,所以不能使用字典。 同样,在抽取的情况下,应返回索引最低的项。例子:

>>> most_common(['duck', 'duck', 'goose'])
'duck'
>>> most_common(['goose', 'duck', 'duck', 'goose'])
'goose'

当前回答

在这里:

def most_common(l):
    max = 0
    maxitem = None
    for x in set(l):
        count =  l.count(x)
        if count > max:
            max = count
            maxitem = x
    return maxitem

我有一种模糊的感觉,在标准库的某个地方有一个方法可以给你每个元素的计数,但我找不到它。

其他回答

从这里借鉴,这可以在Python 2.7中使用:

from collections import Counter

def Most_Common(lst):
    data = Counter(lst)
    return data.most_common(1)[0][0]

比Alex的解决方案快4-6倍,比newacct提出的一行程序快50倍。

在CPython 3.6+(任何Python 3.7+)上,上面将选择第一个看到的元素。如果你在旧的Python上运行,为了检索列表中第一个出现的元素,你需要进行两次传递来保持顺序:

# Only needed pre-3.6!
def most_common(lst):
    data = Counter(lst)
    return max(lst, key=data.get)

我这样做使用scipy统计模块和lambda:

import scipy.stats
lst = [1,2,3,4,5,6,7,5]
most_freq_val = lambda x: scipy.stats.mode(x)[0][0]
print(most_freq_val(lst))

结果:

 most_freq_val = 5
#This will return the list sorted by frequency:

def orderByFrequency(list):

    listUniqueValues = np.unique(list)
    listQty = []
    listOrderedByFrequency = []
    
    for i in range(len(listUniqueValues)):
        listQty.append(list.count(listUniqueValues[i]))
    for i in range(len(listQty)):
        index_bigger = np.argmax(listQty)
        for j in range(listQty[index_bigger]):
            listOrderedByFrequency.append(listUniqueValues[index_bigger])
        listQty[index_bigger] = -1
    return listOrderedByFrequency

#And this will return a list with the most frequent values in a list:

def getMostFrequentValues(list):
    
    if (len(list) <= 1):
        return list
    
    list_most_frequent = []
    list_ordered_by_frequency = orderByFrequency(list)
    
    list_most_frequent.append(list_ordered_by_frequency[0])
    frequency = list_ordered_by_frequency.count(list_ordered_by_frequency[0])
    
    index = 0
    while(index < len(list_ordered_by_frequency)):
        index = index + frequency
        
        if(index < len(list_ordered_by_frequency)):
            testValue = list_ordered_by_frequency[index]
            testValueFrequency = list_ordered_by_frequency.count(testValue)
            
            if (testValueFrequency == frequency):
                list_most_frequent.append(testValue)
            else:
                break    
    
    return list_most_frequent

#tests:
print(getMostFrequentValues([]))
print(getMostFrequentValues([1]))
print(getMostFrequentValues([1,1]))
print(getMostFrequentValues([2,1]))
print(getMostFrequentValues([2,2,1]))
print(getMostFrequentValues([1,2,1,2]))
print(getMostFrequentValues([1,2,1,2,2]))
print(getMostFrequentValues([3,2,3,5,6,3,2,2]))
print(getMostFrequentValues([1,2,2,60,50,3,3,50,3,4,50,4,4,60,60]))

Results:
[]
[1]
[1]
[1, 2]
[2]
[1, 2]
[2]
[2, 3]
[3, 4, 50, 60]

如果排序和哈希都不可行,这是一个明显的缓慢的解决方案(O(n²)),但相等比较(==)可用:

def most_common(items):
  if not items:
    raise ValueError
  fitems = [] 
  best_idx = 0
  for item in items:   
    item_missing = True
    i = 0
    for fitem in fitems:  
      if fitem[0] == item:
        fitem[1] += 1
        d = fitem[1] - fitems[best_idx][1]
        if d > 0 or (d == 0 and fitems[best_idx][2] > fitem[2]):
          best_idx = i
        item_missing = False
        break
      i += 1
    if item_missing:
      fitems.append([item, 1, i])
  return items[best_idx]

但是,如果你的列表(n)的长度很大,那么让你的项目可哈希或可排序(正如其他答案所建议的那样)几乎总是能更快地找到最常见的元素。哈希时平均为O(n),排序时最差为O(n*log(n))。

你想要的在统计中被称为模式,Python当然有一个内置函数来为你做这件事:

>>> from statistics import mode
>>> mode([1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 6, 6, 6])
3

请注意,如果没有“最常见元素”,例如前两个元素并列的情况,这将在Python上引发StatisticsError <=3.7,从3.8开始,它将返回遇到的第一个。