找到Python列表中最常见元素的有效方法是什么?

我的列表项可能不是可哈希的,所以不能使用字典。 同样,在抽取的情况下,应返回索引最低的项。例子:

>>> most_common(['duck', 'duck', 'goose'])
'duck'
>>> most_common(['goose', 'duck', 'duck', 'goose'])
'goose'

当前回答

如果没有最低索引的要求,您可以使用集合。计数器:

from collections import Counter

a = [1936, 2401, 2916, 4761, 9216, 9216, 9604, 9801] 

c = Counter(a)

print(c.most_common(1)) # the one most common element... 2 would mean the 2 most common
[(9216, 2)] # a set containing the element, and it's count in 'a'

其他回答

如果排序和哈希都不可行,这是一个明显的缓慢的解决方案(O(n²)),但相等比较(==)可用:

def most_common(items):
  if not items:
    raise ValueError
  fitems = [] 
  best_idx = 0
  for item in items:   
    item_missing = True
    i = 0
    for fitem in fitems:  
      if fitem[0] == item:
        fitem[1] += 1
        d = fitem[1] - fitems[best_idx][1]
        if d > 0 or (d == 0 and fitems[best_idx][2] > fitem[2]):
          best_idx = i
        item_missing = False
        break
      i += 1
    if item_missing:
      fitems.append([item, 1, i])
  return items[best_idx]

但是,如果你的列表(n)的长度很大,那么让你的项目可哈希或可排序(正如其他答案所建议的那样)几乎总是能更快地找到最常见的元素。哈希时平均为O(n),排序时最差为O(n*log(n))。

ans  = [1, 1, 0, 0, 1, 1]
all_ans = {ans.count(ans[i]): ans[i] for i in range(len(ans))}
print(all_ans)
all_ans={4: 1, 2: 0}
max_key = max(all_ans.keys())

4

print(all_ans[max_key])

1

最常见的元素应该是在数组中出现超过N/2次的元素,其中N是len(数组)。下面的技术将以O(n)个时间复杂度完成,只消耗O(1)个辅助空间。

from collections import Counter

def majorityElement(arr):        
    majority_elem = Counter(arr)
    size = len(arr)
    for key, val in majority_elem.items():
        if val > size/2:
            return key
    return -1

对列表的一个副本排序并找到运行时间最长的。您可以在用每个元素的索引对列表排序之前对其进行修饰,然后在并列的情况下选择从最低索引开始的运行。

#This will return the list sorted by frequency:

def orderByFrequency(list):

    listUniqueValues = np.unique(list)
    listQty = []
    listOrderedByFrequency = []
    
    for i in range(len(listUniqueValues)):
        listQty.append(list.count(listUniqueValues[i]))
    for i in range(len(listQty)):
        index_bigger = np.argmax(listQty)
        for j in range(listQty[index_bigger]):
            listOrderedByFrequency.append(listUniqueValues[index_bigger])
        listQty[index_bigger] = -1
    return listOrderedByFrequency

#And this will return a list with the most frequent values in a list:

def getMostFrequentValues(list):
    
    if (len(list) <= 1):
        return list
    
    list_most_frequent = []
    list_ordered_by_frequency = orderByFrequency(list)
    
    list_most_frequent.append(list_ordered_by_frequency[0])
    frequency = list_ordered_by_frequency.count(list_ordered_by_frequency[0])
    
    index = 0
    while(index < len(list_ordered_by_frequency)):
        index = index + frequency
        
        if(index < len(list_ordered_by_frequency)):
            testValue = list_ordered_by_frequency[index]
            testValueFrequency = list_ordered_by_frequency.count(testValue)
            
            if (testValueFrequency == frequency):
                list_most_frequent.append(testValue)
            else:
                break    
    
    return list_most_frequent

#tests:
print(getMostFrequentValues([]))
print(getMostFrequentValues([1]))
print(getMostFrequentValues([1,1]))
print(getMostFrequentValues([2,1]))
print(getMostFrequentValues([2,2,1]))
print(getMostFrequentValues([1,2,1,2]))
print(getMostFrequentValues([1,2,1,2,2]))
print(getMostFrequentValues([3,2,3,5,6,3,2,2]))
print(getMostFrequentValues([1,2,2,60,50,3,3,50,3,4,50,4,4,60,60]))

Results:
[]
[1]
[1]
[1, 2]
[2]
[1, 2]
[2]
[2, 3]
[3, 4, 50, 60]