如何在整数列表中找到重复项并创建重复项的另一个列表?
当前回答
为了好玩,只需要一行语句。
(lambda iterable: reduce(lambda (uniq, dup), item: (uniq, dup | {item}) if item in uniq else (uniq | {item}, dup), iterable, (set(), set())))(some_iterable)
其他回答
我会用熊猫做这个,因为我经常用熊猫
import pandas as pd
a = [1,2,3,3,3,4,5,6,6,7]
vc = pd.Series(a).value_counts()
vc[vc > 1].index.tolist()
给了
[3,6]
可能不是很有效,但它肯定比许多其他答案的代码更少,所以我想我可以贡献一下
为了实现这个问题,我们可以使用多种不同的方法来解决它,这两种是常见的解决方案,但在实际场景中实现它们时,我们还必须考虑时间复杂性。
import random
import time
dupl_list = [random.randint(1,1000) for x in range(500)]
print("List with duplicate integers")
print (dupl_list)
#Method 1
print("******************Method 1 *************")
def Repeat_num(x):
_size = len(x)
repeated = []
for i in range(_size):
# print(i)
k = i + 1
for j in range(k, _size):
# print(j)
if x[i] == x[j] and x[i] not in repeated:
repeated.append(x[i])
return repeated
start = time.time()
print(Repeat_num(dupl_list))
end = time.time()
print("The time of execution of above program is :",(end-start) * 10**3, "ms")
print("***************Method 2****************")
#method 2 - using count()
def repeast_count(dup_list):
new = []
for a in dup_list:
# print(a)
# checking the occurrence of elements
n = dup_list.count(a)
# if the occurrence is more than
# one we add it to the output list
if n > 1:
if new.count(a) == 0: # condition to check
new.append(a)
return new
start = time.time()
print(repeast_count(dupl_list))
end = time.time()
print("The time of execution of above program is :",(end-start) * 10**3, "ms")
# #输出示例:
List with duplicate integers
[5, 45, 28, 81, 32, 98, 8, 83, 47, 95, 41, 49, 4, 1, 85, 26, 38, 82, 54, 11]
******************Method 1 *************
[]
The time of execution of above program is : 1.1069774627685547 ms
***************Method 2****************
[]
The time of execution of above program is : 0.1881122589111328 ms
对于一般的理解,方法1是好的,但是对于真正的实现,我更喜欢方法2,因为它比方法1花费的时间更少。
我注意到大多数解决方案的复杂度为O(n * n),对于大型列表来说非常缓慢。所以我想分享一下我写的函数,它支持整数或字符串,在最好的情况下是O(n)。对于一个包含10万个元素的列表,最上面的解决方案需要超过30秒,而我的解决方案只需0.12秒
def get_duplicates(list1):
'''Return all duplicates given a list. O(n) complexity for best case scenario.
input: [1, 1, 1, 2, 3, 4, 4]
output: [1, 1, 4]
'''
dic = {}
for el in list1:
try:
dic[el] += 1
except:
dic[el] = 1
dupes = []
for key in dic.keys():
for i in range(dic[key] - 1):
dupes.append(key)
return dupes
list1 = [1, 1, 1, 2, 3, 4, 4]
> print(get_duplicates(list1))
[1, 1, 4]
或者获得唯一的副本:
> print(list(set(get_duplicates(list1))))
[1, 4]
第三个接受答案的例子给出了一个错误的答案,并且没有试图给出重复的答案。下面是正确的版本:
number_lst = [1, 1, 2, 3, 5, ...]
seen_set = set()
duplicate_set = set(x for x in number_lst if x in seen_set or seen_set.add(x))
unique_set = seen_set - duplicate_set
some_list = ['a', 'b', 'c', 'b', 'd', 'm', 'n', 'n']
some_dictionary = {}
for element in some_list:
if element not in some_dictionary:
some_dictionary[element] = 1
else:
some_dictionary[element] += 1
for key, value in some_dictionary.items():
if value > 1:
print(key, end = ' ')
# another way
duplicates = []
for x in some_list:
if some_list.count(x) > 1 and x not in duplicates:
duplicates.append(x)
print()
print(duplicates)
来源:这里
推荐文章
- 证书验证失败:无法获得本地颁发者证书
- 当使用pip3安装包时,“Python中的ssl模块不可用”
- 无法切换Python与pyenv
- Python if not == vs if !=
- 如何从scikit-learn决策树中提取决策规则?
- 为什么在Mac OS X v10.9 (Mavericks)的终端中apt-get功能不起作用?
- 将旋转的xtick标签与各自的xtick对齐
- 为什么元组可以包含可变项?
- 如何合并字典的字典?
- 如何创建类属性?
- 不区分大小写的“in”
- 在Python中获取迭代器中的元素个数
- 解析日期字符串并更改格式
- 使用try和。Python中的if
- 如何在Python中获得所有直接子目录