我如何找到一个列表中的重复，并与他们创建另一个列表?

如何在整数列表中找到重复项并创建重复项的另一个列表?

当前回答

def removeduplicates(a):
  seen = set()

  for i in a:
    if i not in seen:
      seen.add(i)
  return seen 

print(removeduplicates([1,1,2,2]))

2017-08-15 04:30:39

其他回答

我注意到大多数解决方案的复杂度为O(n * n)，对于大型列表来说非常缓慢。所以我想分享一下我写的函数，它支持整数或字符串，在最好的情况下是O(n)。对于一个包含10万个元素的列表，最上面的解决方案需要超过30秒，而我的解决方案只需0.12秒

def get_duplicates(list1):
    '''Return all duplicates given a list. O(n) complexity for best case scenario.
    input: [1, 1, 1, 2, 3, 4, 4]
    output: [1, 1, 4]
    '''
    dic = {}
    for el in list1:
        try:
            dic[el] += 1
        except:
            dic[el] = 1
    dupes = []
    for key in dic.keys():
        for i in range(dic[key] - 1):
            dupes.append(key)
    return dupes


list1 = [1, 1, 1, 2, 3, 4, 4]
> print(get_duplicates(list1))
[1, 1, 4]

或者获得唯一的副本:

> print(list(set(get_duplicates(list1))))
[1, 4]

2022-06-13 21:47:28

方法1:

list(set([val for idx, val in enumerate(input_list) if val in input_list[idx+1:]]))

解释: [val for idx, val in enumerate(input_list) if val in input_list[idx+1:]]是一个列表推导式，它返回一个元素，如果该元素从当前位置存在，则在列表中返回下标。

例子: input_list =[3 42 42岁,31日,31日,31日,31日,5日,6日6日6日6日6日,7日,42)

从索引为0的列表第一个元素42开始，它检查元素42是否存在于input_list[1:]中(即从索引1到列表末尾)。因为42存在于input_list[1:]中，它将返回42。

然后它转到下一个索引为1的元素31，并检查元素31是否存在于input_list[2:](即从索引2到列表末尾)，因为31存在于input_list[2:]中，它将返回31。

类似地，它遍历列表中的所有元素，只将重复/重复的元素返回到列表中。

然后，因为列表中有重复项，我们需要从每个重复项中选择一个，即从重复项中删除重复项，为此，我们调用python内置的名为set()的函数，它会删除重复项，

然后我们就得到了一个集合，而不是一个列表，因此为了将集合转换为列表，我们使用类型转换，list()，它将元素集转换为列表。

方法2:

def dupes(ilist):
    temp_list = [] # initially, empty temporary list
    dupe_list = [] # initially, empty duplicate list
    for each in ilist:
        if each in temp_list: # Found a Duplicate element
            if not each in dupe_list: # Avoid duplicate elements in dupe_list
                dupe_list.append(each) # Add duplicate element to dupe_list
        else: 
            temp_list.append(each) # Add a new (non-duplicate) to temp_list

    return dupe_list

解释: 首先，我们创建两个空列表。然后继续遍历列表中的所有元素，以查看temp_list(最初为空)中是否存在该元素。如果它不在temp_list中，则使用append方法将它添加到temp_list中。

如果它已经存在于temp_list中，这意味着列表中的当前元素是重复的，因此我们需要使用append方法将它添加到dupe_list中。

2019-02-05 01:43:28

一句话解决方案:

set([i for i in list if sum([1 for a in list if a == i]) > 1])

2015-07-11 17:10:23

你不需要计数，只需要该物品之前是否被看到过。把这个答案用在这个问题上:

def list_duplicates(seq):
  seen = set()
  seen_add = seen.add
  # adds all elements it doesn't know yet to seen and all other to seen_twice
  seen_twice = set( x for x in seq if x in seen or seen_add(x) )
  # turn the set into a list (as requested)
  return list( seen_twice )

a = [1,2,3,2,1,5,6,5,5,5]
list_duplicates(a) # yields [1, 2, 5]

以防速度很重要，这里有一些时间安排:

# file: test.py
import collections

def thg435(l):
    return [x for x, y in collections.Counter(l).items() if y > 1]

def moooeeeep(l):
    seen = set()
    seen_add = seen.add
    # adds all elements it doesn't know yet to seen and all other to seen_twice
    seen_twice = set( x for x in l if x in seen or seen_add(x) )
    # turn the set into a list (as requested)
    return list( seen_twice )

def RiteshKumar(l):
    return list(set([x for x in l if l.count(x) > 1]))

def JohnLaRooy(L):
    seen = set()
    seen2 = set()
    seen_add = seen.add
    seen2_add = seen2.add
    for item in L:
        if item in seen:
            seen2_add(item)
        else:
            seen_add(item)
    return list(seen2)

l = [1,2,3,2,1,5,6,5,5,5]*100

以下是结果:(做得好@JohnLaRooy!)

$ python -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
10000 loops, best of 3: 74.6 usec per loop
$ python -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
10000 loops, best of 3: 91.3 usec per loop
$ python -mtimeit -s 'import test' 'test.thg435(test.l)'
1000 loops, best of 3: 266 usec per loop
$ python -mtimeit -s 'import test' 'test.RiteshKumar(test.l)'
100 loops, best of 3: 8.35 msec per loop

有趣的是，除了计时本身，当使用pypy时，排名也略有变化。最有趣的是，基于counter的方法极大地受益于pypy的优化，而我建议的方法缓存方法似乎几乎没有任何效果。

$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
100000 loops, best of 3: 17.8 usec per loop
$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'
10000 loops, best of 3: 23 usec per loop
$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
10000 loops, best of 3: 39.3 usec per loop

显然，这种效应与输入数据的“重复性”有关。我设置了l = [random.randrange(1000000) for I in xrange(10000)]，得到了这些结果:

$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
1000 loops, best of 3: 495 usec per loop
$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
1000 loops, best of 3: 499 usec per loop
$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'
1000 loops, best of 3: 1.68 msec per loop

2012-03-23 09:25:06

如果你不关心自己编写算法或使用库，Python 3.8一行代码:

l = [1,2,3,2,1,5,6,5,5,5]

res = [(x, count) for x, g in groupby(sorted(l)) if (count := len(list(g))) > 1]

print(res)

打印项目和计数:

[(1, 2), (2, 2), (5, 4)]

groupby接受一个分组函数，因此您可以以不同的方式定义分组，并根据需要返回额外的Tuple字段。

2020-04-02 02:38:50

我如何找到一个列表中的重复，并与他们创建另一个列表?

推荐文章

最新文章

标签