在Python中,计算两个列表之间的差值的最佳方法是什么?
例子
A = [1,2,3,4]
B = [2,5]
A - B = [1,3,4]
B - A = [5]
在Python中,计算两个列表之间的差值的最佳方法是什么?
例子
A = [1,2,3,4]
B = [2,5]
A - B = [1,3,4]
B - A = [5]
当前回答
如果你不关心项目的顺序或重复,请使用set。使用列表推导式:
>>> def diff(first, second):
second = set(second)
return [item for item in first if item not in second]
>>> diff(A, B)
[1, 3, 4]
>>> diff(B, A)
[5]
>>>
其他回答
简单的代码,让你与多个项目的差异,如果你想要:
a=[1,2,3,3,4]
b=[2,4]
tmp = copy.deepcopy(a)
for k in b:
if k in tmp:
tmp.remove(k)
print(tmp)
Python 2.7.3(默认,2014年2月27日,19:58:35)- IPython 1.1.0 - timeit:(github gist)
def diff(a, b):
b = set(b)
return [aa for aa in a if aa not in b]
def set_diff(a, b):
return list(set(a) - set(b))
diff_lamb_hension = lambda l1,l2: [x for x in l1 if x not in l2]
diff_lamb_filter = lambda l1,l2: filter(lambda x: x not in l2, l1)
from difflib import SequenceMatcher
def squeezer(a, b):
squeeze = SequenceMatcher(None, a, b)
return reduce(lambda p,q: p+q, map(
lambda t: squeeze.a[t[1]:t[2]],
filter(lambda x:x[0]!='equal',
squeeze.get_opcodes())))
结果:
# Small
a = range(10)
b = range(10/2)
timeit[diff(a, b)]
100000 loops, best of 3: 1.97 µs per loop
timeit[set_diff(a, b)]
100000 loops, best of 3: 2.71 µs per loop
timeit[diff_lamb_hension(a, b)]
100000 loops, best of 3: 2.1 µs per loop
timeit[diff_lamb_filter(a, b)]
100000 loops, best of 3: 3.58 µs per loop
timeit[squeezer(a, b)]
10000 loops, best of 3: 36 µs per loop
# Medium
a = range(10**4)
b = range(10**4/2)
timeit[diff(a, b)]
1000 loops, best of 3: 1.17 ms per loop
timeit[set_diff(a, b)]
1000 loops, best of 3: 1.27 ms per loop
timeit[diff_lamb_hension(a, b)]
1 loops, best of 3: 736 ms per loop
timeit[diff_lamb_filter(a, b)]
1 loops, best of 3: 732 ms per loop
timeit[squeezer(a, b)]
100 loops, best of 3: 12.8 ms per loop
# Big
a = xrange(10**7)
b = xrange(10**7/2)
timeit[diff(a, b)]
1 loops, best of 3: 1.74 s per loop
timeit[set_diff(a, b)]
1 loops, best of 3: 2.57 s per loop
timeit[diff_lamb_filter(a, b)]
# too long to wait for
timeit[diff_lamb_filter(a, b)]
# too long to wait for
timeit[diff_lamb_filter(a, b)]
# TypeError: sequence index must be integer, not 'slice'
@roman-bodnarchuk列表推导函数def diff(a, b)似乎更快。
你可以做一个
list(set(A)-set(B))
and
list(set(B)-set(A))
有三种选择,其中两种是可以接受的,另一种不应该这样做。
在较高的级别上,这3个选项是:
减去两组(有时最好) 检查每个列表项是否存在于一个集合中(大多数情况下最好) 检查每个列表项是否存在于列表中(不做)
选项3)永远不应该超过选项2)。根据应用程序的需要,您可能更喜欢选项1)或2),而在大多数用例中,2)可能是首选方法。2)与1)的性能非常相似,因为两者都具有O(m + n)个时间复杂度。相比之下,2)在空间复杂度上比1)有边际优势,并且既保持了原始列表的顺序,又保持了原始列表中的任何重复。
如果你想删除重复,不关心顺序,那么1)可能是最适合你的。
import time
def fun1(l1, l2):
# Order and duplications in l1 are both lost, O(m) + O(n)
return set(l1) - set(l2)
def fun2(l1, l2):
# Order and duplications in l1 are both preserved, O(m) + O(n)
l2_set = set(l2)
return [item for item in l1 if item not in l2_set]
def fun3(l1, l2):
# Order and duplications in l1 are both preserved, O(m * n)
# Don't do
return [item for item in l1 if item not in l2]
A = list(range(7500))
B = list(range(5000, 10000))
loops = 100
start = time.time()
for _ in range(loops):
fun1(A, B)
print(f"fun1 time: {time.time() - start}")
start = time.time()
for _ in range(loops):
fun2(A, B)
print(f"fun2 time: {time.time() - start}")
start = time.time()
for _ in range(loops):
fun3(A, B)
print(f"fun3 time: {time.time() - start}")
fun1 time: 0.03749704360961914
fun2 time: 0.04109621047973633
fun3 time: 32.55076885223389
如果你的顺序不重要,两个集合都可以散列,你可以在两个集合之间使用一个对称差分。
这将返回集合A或集合B中出现的值,但不会同时出现。
例如,问题显示了在列表A和列表B上执行的差值的返回值。
如果我们要(将两个列表转换为集合并)执行对称差分,我们将在一次操作中得到两者的合并结果。
A = [1,2,3,4]
B = [2,5]
print(set(A) ^ set(B)
# {1, 3, 4, 5}
加上这个答案,因为我还没有看到现有答案中提供的对称差异