在Python中,计算两个列表之间的差值的最佳方法是什么?
例子
A = [1,2,3,4]
B = [2,5]
A - B = [1,3,4]
B - A = [5]
在Python中,计算两个列表之间的差值的最佳方法是什么?
例子
A = [1,2,3,4]
B = [2,5]
A - B = [1,3,4]
B - A = [5]
当前回答
有三种选择,其中两种是可以接受的,另一种不应该这样做。
在较高的级别上,这3个选项是:
减去两组(有时最好) 检查每个列表项是否存在于一个集合中(大多数情况下最好) 检查每个列表项是否存在于列表中(不做)
选项3)永远不应该超过选项2)。根据应用程序的需要,您可能更喜欢选项1)或2),而在大多数用例中,2)可能是首选方法。2)与1)的性能非常相似,因为两者都具有O(m + n)个时间复杂度。相比之下,2)在空间复杂度上比1)有边际优势,并且既保持了原始列表的顺序,又保持了原始列表中的任何重复。
如果你想删除重复,不关心顺序,那么1)可能是最适合你的。
import time
def fun1(l1, l2):
# Order and duplications in l1 are both lost, O(m) + O(n)
return set(l1) - set(l2)
def fun2(l1, l2):
# Order and duplications in l1 are both preserved, O(m) + O(n)
l2_set = set(l2)
return [item for item in l1 if item not in l2_set]
def fun3(l1, l2):
# Order and duplications in l1 are both preserved, O(m * n)
# Don't do
return [item for item in l1 if item not in l2]
A = list(range(7500))
B = list(range(5000, 10000))
loops = 100
start = time.time()
for _ in range(loops):
fun1(A, B)
print(f"fun1 time: {time.time() - start}")
start = time.time()
for _ in range(loops):
fun2(A, B)
print(f"fun2 time: {time.time() - start}")
start = time.time()
for _ in range(loops):
fun3(A, B)
print(f"fun3 time: {time.time() - start}")
fun1 time: 0.03749704360961914
fun2 time: 0.04109621047973633
fun3 time: 32.55076885223389
其他回答
一个衬套:
diff = lambda l1,l2: [x for x in l1 if x not in l2]
diff(A,B)
diff(B,A)
Or:
diff = lambda l1,l2: filter(lambda x: x not in l2, l1)
diff(A,B)
diff(B,A)
当查看in -operator的TimeComplexity时,在最坏的情况下它与O(n)一起工作。即使是集合。
因此,当比较两个数组时,最好情况下的TimeComplexity为O(n),最坏情况下为O(n²)。
另一种(但不幸的是更复杂)解决方案,在最好和最坏的情况下都适用于O(n):
# Compares the difference of list a and b
# uses a callback function to compare items
def diff(a, b, callback):
a_missing_in_b = []
ai = 0
bi = 0
a = sorted(a, callback)
b = sorted(b, callback)
while (ai < len(a)) and (bi < len(b)):
cmp = callback(a[ai], b[bi])
if cmp < 0:
a_missing_in_b.append(a[ai])
ai += 1
elif cmp > 0:
# Item b is missing in a
bi += 1
else:
# a and b intersecting on this item
ai += 1
bi += 1
# if a and b are not of same length, we need to add the remaining items
for ai in xrange(ai, len(a)):
a_missing_in_b.append(a[ai])
return a_missing_in_b
e.g.
>>> a=[1,2,3]
>>> b=[2,4,6]
>>> diff(a, b, cmp)
[1, 3]
添加一个答案来处理我们想要严格区别重复的情况,也就是说,在第一个列表中有我们想要保留在结果中的重复。例如,得到,
[1, 1, 1, 2] - [1, 1] --> [1, 2]
我们可以用一个额外的计数器来得到一个优雅的差分函数。
from collections import Counter
def diff(first, second):
secondCntr = Counter(second)
second = set(second)
res = []
for i in first:
if i not in second:
res.append(i)
elif i in secondCntr:
if secondCntr[i] > 0:
secondCntr[i] -= 1
else:
res.append(i)
return res
最简单的方法,
使用设置().difference(设置())
list_a = [1,2,3]
list_b = [2,3]
print set(list_a).difference(set(list_b))
答案设置([1])
你可以做一个
list(set(A)-set(B))
and
list(set(B)-set(A))