在Python中,计算两个列表之间的差值的最佳方法是什么?
例子
A = [1,2,3,4]
B = [2,5]
A - B = [1,3,4]
B - A = [5]
在Python中,计算两个列表之间的差值的最佳方法是什么?
例子
A = [1,2,3,4]
B = [2,5]
A - B = [1,3,4]
B - A = [5]
当前回答
您可能希望使用集合而不是列表。
其他回答
您可能希望使用集合而不是列表。
在这个线程中,我没有看到保留a中的重复的解决方案。当a中的一个元素与B中的一个元素匹配时,这个元素必须在B中删除,这样当相同的元素在a中再次出现时,如果这个元素在B中只出现一次,那么它必须出现在差异中。
def diff(first, second):
l2 = list(second)
l3 = []
for el in first:
if el in l2:
l2.remove(el)
else:
l3 += [el]
return l3
l1 = [1, 2, 1, 3, 4]
l2 = [1, 2, 3, 3]
diff(l1, l2)
>>> [1, 4]
如果顺序无关紧要,你可以简单地计算集合差值:
>>> set([1,2,3,4]) - set([2,5])
set([1, 4, 3])
>>> set([2,5]) - set([1,2,3,4])
set([5])
A = [1,2,3,4]
B = [2,5]
#A - B
x = list(set(A) - set(B))
#B - A
y = list(set(B) - set(A))
print x
print y
Python 2.7.3(默认,2014年2月27日,19:58:35)- IPython 1.1.0 - timeit:(github gist)
def diff(a, b):
b = set(b)
return [aa for aa in a if aa not in b]
def set_diff(a, b):
return list(set(a) - set(b))
diff_lamb_hension = lambda l1,l2: [x for x in l1 if x not in l2]
diff_lamb_filter = lambda l1,l2: filter(lambda x: x not in l2, l1)
from difflib import SequenceMatcher
def squeezer(a, b):
squeeze = SequenceMatcher(None, a, b)
return reduce(lambda p,q: p+q, map(
lambda t: squeeze.a[t[1]:t[2]],
filter(lambda x:x[0]!='equal',
squeeze.get_opcodes())))
结果:
# Small
a = range(10)
b = range(10/2)
timeit[diff(a, b)]
100000 loops, best of 3: 1.97 µs per loop
timeit[set_diff(a, b)]
100000 loops, best of 3: 2.71 µs per loop
timeit[diff_lamb_hension(a, b)]
100000 loops, best of 3: 2.1 µs per loop
timeit[diff_lamb_filter(a, b)]
100000 loops, best of 3: 3.58 µs per loop
timeit[squeezer(a, b)]
10000 loops, best of 3: 36 µs per loop
# Medium
a = range(10**4)
b = range(10**4/2)
timeit[diff(a, b)]
1000 loops, best of 3: 1.17 ms per loop
timeit[set_diff(a, b)]
1000 loops, best of 3: 1.27 ms per loop
timeit[diff_lamb_hension(a, b)]
1 loops, best of 3: 736 ms per loop
timeit[diff_lamb_filter(a, b)]
1 loops, best of 3: 732 ms per loop
timeit[squeezer(a, b)]
100 loops, best of 3: 12.8 ms per loop
# Big
a = xrange(10**7)
b = xrange(10**7/2)
timeit[diff(a, b)]
1 loops, best of 3: 1.74 s per loop
timeit[set_diff(a, b)]
1 loops, best of 3: 2.57 s per loop
timeit[diff_lamb_filter(a, b)]
# too long to wait for
timeit[diff_lamb_filter(a, b)]
# too long to wait for
timeit[diff_lamb_filter(a, b)]
# TypeError: sequence index must be integer, not 'slice'
@roman-bodnarchuk列表推导函数def diff(a, b)似乎更快。