如何在整数列表中找到重复项并创建重复项的另一个列表?
当前回答
试试这个检查副本
>>> def checkDuplicate(List):
duplicate={}
for i in List:
## checking whether the item is already present in dictionary or not
## increasing count if present
## initializing count to 1 if not present
duplicate[i]=duplicate.get(i,0)+1
return [k for k,v in duplicate.items() if v>1]
>>> checkDuplicate([1,2,3,"s",1,2,3])
[1, 2, 3]
其他回答
使用toolz时:
from toolz import frequencies, valfilter
a = [1,2,2,3,4,5,4]
>>> list(valfilter(lambda count: count > 1, frequencies(a)).keys())
[2,4]
通过检查出现的次数,简单地遍历列表中的每个元素,然后将它们添加到一个集,然后打印重复的元素。希望这能帮助到一些人。
myList = [2 ,4 , 6, 8, 4, 6, 12];
newList = set()
for i in myList:
if myList.count(i) >= 2:
newList.add(i)
print(list(newList))
## [4 , 6]
还有其他测试。当然要做……
set([x for x in l if l.count(x) > 1])
...代价太大了。使用下一个final方法大约快500倍(数组越长结果越好):
def dups_count_dict(l):
d = {}
for item in l:
if item not in d:
d[item] = 0
d[item] += 1
result_d = {key: val for key, val in d.iteritems() if val > 1}
return result_d.keys()
只有2个循环,没有非常昂贵的l.count()操作。
下面是一个比较方法的代码。代码如下,输出如下:
dups_count: 13.368s # this is a function which uses l.count()
dups_count_dict: 0.014s # this is a final best function (of the 3 functions)
dups_count_counter: 0.024s # collections.Counter
测试代码:
import numpy as np
from time import time
from collections import Counter
class TimerCounter(object):
def __init__(self):
self._time_sum = 0
def start(self):
self.time = time()
def stop(self):
self._time_sum += time() - self.time
def get_time_sum(self):
return self._time_sum
def dups_count(l):
return set([x for x in l if l.count(x) > 1])
def dups_count_dict(l):
d = {}
for item in l:
if item not in d:
d[item] = 0
d[item] += 1
result_d = {key: val for key, val in d.iteritems() if val > 1}
return result_d.keys()
def dups_counter(l):
counter = Counter(l)
result_d = {key: val for key, val in counter.iteritems() if val > 1}
return result_d.keys()
def gen_array():
np.random.seed(17)
return list(np.random.randint(0, 5000, 10000))
def assert_equal_results(*results):
primary_result = results[0]
other_results = results[1:]
for other_result in other_results:
assert set(primary_result) == set(other_result) and len(primary_result) == len(other_result)
if __name__ == '__main__':
dups_count_time = TimerCounter()
dups_count_dict_time = TimerCounter()
dups_count_counter = TimerCounter()
l = gen_array()
for i in range(3):
dups_count_time.start()
result1 = dups_count(l)
dups_count_time.stop()
dups_count_dict_time.start()
result2 = dups_count_dict(l)
dups_count_dict_time.stop()
dups_count_counter.start()
result3 = dups_counter(l)
dups_count_counter.stop()
assert_equal_results(result1, result2, result3)
print 'dups_count: %.3f' % dups_count_time.get_time_sum()
print 'dups_count_dict: %.3f' % dups_count_dict_time.get_time_sum()
print 'dups_count_counter: %.3f' % dups_count_counter.get_time_sum()
一个非常简单的解决方案,但是复杂度是O(n*n)。
>>> xs = [1,2,3,4,4,5,5,6,1]
>>> set([x for x in xs if xs.count(x) > 1])
set([1, 4, 5])
我们可以使用itertools。Groupby,以便找到所有有dup的项:
from itertools import groupby
myList = [2, 4, 6, 8, 4, 6, 12]
# when the list is sorted, groupby groups by consecutive elements which are similar
for x, y in groupby(sorted(myList)):
# list(y) returns all the occurences of item x
if len(list(y)) > 1:
print x
输出将是:
4
6
推荐文章
- Numpy Max vs amax vs maximum
- 我应该在.gitignore文件中添加Django迁移文件吗?
- 每n行有熊猫
- 实例属性attribute_name定义在__init__之外
- 如何获取在Python中捕获的异常的名称?
- 第一次出现的值大于现有值的Numpy
- 如何从Python函数中返回两个值?
- 前一个月的Python日期
- Python中方括号括起来的列表和圆括号括起来的列表有什么区别?
- Python日志记录不输出任何东西
- 每n秒运行特定代码
- SQLAlchemy是否有与Django的get_or_create等价的函数?
- 如何将python datetime转换为字符串,具有可读格式的日期?
- 美丽的汤和提取div及其内容的ID
- 在Python中重置生成器对象