在Python中,计算两个列表之间的差值的最佳方法是什么?
例子
A = [1,2,3,4]
B = [2,5]
A - B = [1,3,4]
B - A = [5]
在Python中,计算两个列表之间的差值的最佳方法是什么?
例子
A = [1,2,3,4]
B = [2,5]
A - B = [1,3,4]
B - A = [5]
当前回答
在字典列表的情况下,当集合解引发时,完整列表理解解工作
TypeError: unhashable type: 'dict'
测试用例
def diff(a, b):
return [aa for aa in a if aa not in b]
d1 = {"a":1, "b":1}
d2 = {"a":2, "b":2}
d3 = {"a":3, "b":3}
>>> diff([d1, d2, d3], [d2, d3])
[{'a': 1, 'b': 1}]
>>> diff([d1, d2, d3], [d1])
[{'a': 2, 'b': 2}, {'a': 3, 'b': 3}]
其他回答
上面的例子使计算差异的问题变得微不足道。假设排序或重复数据删除确实使计算差异变得更容易,但如果您的比较无法承担这些假设,那么您将需要一个diff算法的重要实现。请参阅python标准库中的difflib。
#! /usr/bin/python2
from difflib import SequenceMatcher
A = [1,2,3,4]
B = [2,5]
squeeze=SequenceMatcher( None, A, B )
print "A - B = [%s]"%( reduce( lambda p,q: p+q,
map( lambda t: squeeze.a[t[1]:t[2]],
filter(lambda x:x[0]!='equal',
squeeze.get_opcodes() ) ) ) )
或Python3…
#! /usr/bin/python3
from difflib import SequenceMatcher
from functools import reduce
A = [1,2,3,4]
B = [2,5]
squeeze=SequenceMatcher( None, A, B )
print( "A - B = [%s]"%( reduce( lambda p,q: p+q,
map( lambda t: squeeze.a[t[1]:t[2]],
filter(lambda x:x[0]!='equal',
squeeze.get_opcodes() ) ) ) ) )
输出:
A - B = [[1, 3, 4]]
在字典列表的情况下,当集合解引发时,完整列表理解解工作
TypeError: unhashable type: 'dict'
测试用例
def diff(a, b):
return [aa for aa in a if aa not in b]
d1 = {"a":1, "b":1}
d2 = {"a":2, "b":2}
d3 = {"a":3, "b":3}
>>> diff([d1, d2, d3], [d2, d3])
[{'a': 1, 'b': 1}]
>>> diff([d1, d2, d3], [d1])
[{'a': 2, 'b': 2}, {'a': 3, 'b': 3}]
如果你不关心项目的顺序或重复,请使用set。使用列表推导式:
>>> def diff(first, second):
second = set(second)
return [item for item in first if item not in second]
>>> diff(A, B)
[1, 3, 4]
>>> diff(B, A)
[5]
>>>
当查看in -operator的TimeComplexity时,在最坏的情况下它与O(n)一起工作。即使是集合。
因此,当比较两个数组时,最好情况下的TimeComplexity为O(n),最坏情况下为O(n²)。
另一种(但不幸的是更复杂)解决方案,在最好和最坏的情况下都适用于O(n):
# Compares the difference of list a and b
# uses a callback function to compare items
def diff(a, b, callback):
a_missing_in_b = []
ai = 0
bi = 0
a = sorted(a, callback)
b = sorted(b, callback)
while (ai < len(a)) and (bi < len(b)):
cmp = callback(a[ai], b[bi])
if cmp < 0:
a_missing_in_b.append(a[ai])
ai += 1
elif cmp > 0:
# Item b is missing in a
bi += 1
else:
# a and b intersecting on this item
ai += 1
bi += 1
# if a and b are not of same length, we need to add the remaining items
for ai in xrange(ai, len(a)):
a_missing_in_b.append(a[ai])
return a_missing_in_b
e.g.
>>> a=[1,2,3]
>>> b=[2,4,6]
>>> diff(a, b, cmp)
[1, 3]
一个衬套:
diff = lambda l1,l2: [x for x in l1 if x not in l2]
diff(A,B)
diff(B,A)
Or:
diff = lambda l1,l2: filter(lambda x: x not in l2, l1)
diff(A,B)
diff(B,A)