如何在整数列表中找到重复项并创建重复项的另一个列表?


当前回答

我是很晚才开始讨论这个问题的。尽管如此,我还是想用一句话来解决这个问题。因为这就是Python的魅力所在。 如果我们只是想把副本放到一个单独的列表(或任何集合)中,我建议这样做。假设我们有一个重复的列表我们称之为目标

    target=[1,2,3,4,4,4,3,5,6,8,4,3]

现在如果我们想要得到副本,我们可以使用下面的一行代码:

    duplicates=dict(set((x,target.count(x)) for x in filter(lambda rec : target.count(rec)>1,target)))

这段代码将把复制的记录作为键,并将其作为值放入字典'duplicate '中。“复制”字典将如下所示:

    {3: 3, 4: 4} #it saying 3 is repeated 3 times and 4 is 4 times

如果你只是想在一个列表中单独列出所有重复的记录,它的代码也更短:

    duplicates=filter(lambda rec : target.count(rec)>1,target)

输出将是:

    [3, 4, 4, 4, 3, 4, 3]

这在python 2.7中完美地工作。X +版本

其他回答

你不需要计数,只需要该物品之前是否被看到过。把这个答案用在这个问题上:

def list_duplicates(seq):
  seen = set()
  seen_add = seen.add
  # adds all elements it doesn't know yet to seen and all other to seen_twice
  seen_twice = set( x for x in seq if x in seen or seen_add(x) )
  # turn the set into a list (as requested)
  return list( seen_twice )

a = [1,2,3,2,1,5,6,5,5,5]
list_duplicates(a) # yields [1, 2, 5]

以防速度很重要,这里有一些时间安排:

# file: test.py
import collections

def thg435(l):
    return [x for x, y in collections.Counter(l).items() if y > 1]

def moooeeeep(l):
    seen = set()
    seen_add = seen.add
    # adds all elements it doesn't know yet to seen and all other to seen_twice
    seen_twice = set( x for x in l if x in seen or seen_add(x) )
    # turn the set into a list (as requested)
    return list( seen_twice )

def RiteshKumar(l):
    return list(set([x for x in l if l.count(x) > 1]))

def JohnLaRooy(L):
    seen = set()
    seen2 = set()
    seen_add = seen.add
    seen2_add = seen2.add
    for item in L:
        if item in seen:
            seen2_add(item)
        else:
            seen_add(item)
    return list(seen2)

l = [1,2,3,2,1,5,6,5,5,5]*100

以下是结果:(做得好@JohnLaRooy!)

$ python -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
10000 loops, best of 3: 74.6 usec per loop
$ python -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
10000 loops, best of 3: 91.3 usec per loop
$ python -mtimeit -s 'import test' 'test.thg435(test.l)'
1000 loops, best of 3: 266 usec per loop
$ python -mtimeit -s 'import test' 'test.RiteshKumar(test.l)'
100 loops, best of 3: 8.35 msec per loop

有趣的是,除了计时本身,当使用pypy时,排名也略有变化。最有趣的是,基于counter的方法极大地受益于pypy的优化,而我建议的方法缓存方法似乎几乎没有任何效果。

$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
100000 loops, best of 3: 17.8 usec per loop
$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'
10000 loops, best of 3: 23 usec per loop
$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
10000 loops, best of 3: 39.3 usec per loop

显然,这种效应与输入数据的“重复性”有关。我设置了l = [random.randrange(1000000) for I in xrange(10000)],得到了这些结果:

$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
1000 loops, best of 3: 495 usec per loop
$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
1000 loops, best of 3: 499 usec per loop
$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'
1000 loops, best of 3: 1.68 msec per loop

在列表中使用list.count()方法查找给定列表的重复元素

arr=[]
dup =[]
for i in range(int(input("Enter range of list: "))):
    arr.append(int(input("Enter Element in a list: ")))
for i in arr:
    if arr.count(i)>1 and i not in dup:
        dup.append(i)
print(dup)

使用Set函数 如:-

arr=[1,4,2,5,2,3,4,1,4,5,2,3]
arr2=list(set(arr))
print(arr2)

输出:- [1,2,3,4,5]

使用array删除副本

eg:-

arr=[1,4,2,5,2,3,4,1,4,5,2,3]
arr3=[]
for i in arr:
    if(i not in arr3):
     arr3.append(i)
print(arr3)

输出: [1,4,2,5,3]

使用Lambda函数

eg:-

rem_duplicate_func=lambda arr:set(arr)
print(rem_duplicate_func(arr))

输出: {1,2,3,4,5}

从字典中删除重复值

eg:-

dict1={
    'car':["Ford","Toyota","Ford","Toyota"],
    'brand':["Mustang","Ranz","Mustang","Ranz"] } dict2={} for key,value in dict1.items():
    dict2[key]=set(value) print(dict2)

输出: {“车”:{“丰田”、“福特”},“品牌”:{“主攻”、“野马”}}

对称差异-删除重复元素

eg:-

set1={1,2,4,5}
set2={2,1,5,7}
rem_dup_ele=set1.symmetric_difference(set2)
print(rem_dup_ele)

输出: {4 7}

假设我们有这个元素列表:

a = [1, 2, 3, 2, 1, 5, 6, 5, 5, 5]

我们可以只使用集合来找到唯一的元素:

unique = set()
for num in a:
    if num not in unique:
        unique.add(num)
    else:
        unique = unique - set([num])

最后:

>>> unique
{3, 6}

如果你想要得到副本,你可以简单地做:

>>> duplicates = set(a) - unique
>>> duplicates
{1, 2, 5}

注:

集合中的元素查找是O(1) 从集合中移除的元素是O(1)

集合。Counter是python 2.7中的新功能:


Python 2.5.4 (r254:67916, May 31 2010, 15:03:39) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
a = [1,2,3,2,1,5,6,5,5,5]
import collections
print [x for x, y in collections.Counter(a).items() if y > 1]
Type "help", "copyright", "credits" or "license" for more information.
  File "", line 1, in 
AttributeError: 'module' object has no attribute 'Counter'
>>> 

在早期版本中,你可以使用传统的字典:

a = [1,2,3,2,1,5,6,5,5,5]
d = {}
for elem in a:
    if elem in d:
        d[elem] += 1
    else:
        d[elem] = 1

print [x for x, y in d.items() if y > 1]