假设我有两个表,l1和l2。我想执行l1 - l2,返回l1中不在l2中的所有元素。
我可以想出一个简单的循环方法来做这个,但那真的很低效。python式的高效方法是什么?
举个例子,如果l1 = [1,2,6,8], l2 = [2,3,5,8], l1 - l2应该返回[1,6]
假设我有两个表,l1和l2。我想执行l1 - l2,返回l1中不在l2中的所有元素。
我可以想出一个简单的循环方法来做这个,但那真的很低效。python式的高效方法是什么?
举个例子,如果l1 = [1,2,6,8], l2 = [2,3,5,8], l1 - l2应该返回[1,6]
当前回答
Python有一个称为列表推导式的语言特性,它非常适合使这类事情变得极其简单。下面的语句完全是你想要的,并将结果存储在l3中:
l3 = [x for x in l1 if x not in l2]
L3将包含[1,6]。
其他回答
扩展Donut的答案和这里的其他答案,通过使用生成器推导式而不是列表推导式,以及使用集合数据结构(因为in操作符在列表上是O(n),而在集合上是O(1)),您可以得到更好的结果。
这里有一个函数适合你:
def filter_list(full_list, excludes):
s = set(excludes)
return (x for x in full_list if x not in s)
结果将是一个可迭代对象,它将惰性地获取过滤后的列表。如果你需要一个真正的列表对象(例如,如果你需要对结果执行len()),那么你可以很容易地像这样构建一个列表:
filtered_list = list(filter_list(full_list, excludes))
性能比较
在Python 3.9.1和Python 2.7.16上比较这里提到的所有答案的性能。
Python 3.9.1
答案按表现顺序列出:
Arkku's set difference using subtraction "-" operation - (91.3 nsec per loop) mquadri$ python3 -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1 - l2" 5000000 loops, best of 5: 91.3 nsec per loop Moinuddin Quadri's using set().difference()- (133 nsec per loop) mquadri$ python3 -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1.difference(l2)" 2000000 loops, best of 5: 133 nsec per loop Moinuddin Quadri's list comprehension with set based lookup- (366 nsec per loop) mquadri$ python3 -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "[x for x in l1 if x not in l2]" 1000000 loops, best of 5: 366 nsec per loop Donut's list comprehension on plain list - (489 nsec per loop) mquadri$ python3 -m timeit -s "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "[x for x in l1 if x not in l2]" 500000 loops, best of 5: 489 nsec per loop Daniel Pryden's generator expression with set based lookup and type-casting to list - (583 nsec per loop) : Explicitly type-casting to list to get the final object as list, as requested by OP. If generator expression is replaced with list comprehension, it'll become same as Moinuddin Quadri's list comprehension with set based lookup. mquadri$ mquadri$ python3 -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(x for x in l1 if x not in l2)" 500000 loops, best of 5: 583 nsec per loop Moinuddin Quadri's using filter() and explicitly type-casting to list (need to explicitly type-cast as in Python 3.x, it returns iterator) - (681 nsec per loop) mquadri$ python3 -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(filter(lambda x: x not in l2, l1))" 500000 loops, best of 5: 681 nsec per loop Akshay Hazari's using combination of functools.reduce + filter -(3.36 usec per loop) : Explicitly type-casting to list as from Python 3.x it started returned returning iterator. Also we need to import functools to use reduce in Python 3.x mquadri$ python3 -m timeit "from functools import reduce; l1 = [1,2,6,8]; l2 = [2,3,5,8];" "list(reduce(lambda x,y : filter(lambda z: z!=y,x) ,l1,l2))" 100000 loops, best of 5: 3.36 usec per loop
Python 2.7.16
答案按表现顺序列出:
Arkku's set difference using subtraction "-" operation - (0.0783 usec per loop) mquadri$ python -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1 - l2" 10000000 loops, best of 3: 0.0783 usec per loop Moinuddin Quadri's using set().difference()- (0.117 usec per loop) mquadri$ mquadri$ python -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1.difference(l2)" 10000000 loops, best of 3: 0.117 usec per loop Moinuddin Quadri's list comprehension with set based lookup- (0.246 usec per loop) mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "[x for x in l1 if x not in l2]" 1000000 loops, best of 3: 0.246 usec per loop Donut's list comprehension on plain list - (0.372 usec per loop) mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "[x for x in l1 if x not in l2]" 1000000 loops, best of 3: 0.372 usec per loop Moinuddin Quadri's using filter() - (0.593 usec per loop) mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "filter(lambda x: x not in l2, l1)" 1000000 loops, best of 3: 0.593 usec per loop Daniel Pryden's generator expression with set based lookup and type-casting to list - (0.964 per loop) : Explicitly type-casting to list to get the final object as list, as requested by OP. If generator expression is replaced with list comprehension, it'll become same as Moinuddin Quadri's list comprehension with set based lookup. mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(x for x in l1 if x not in l2)" 1000000 loops, best of 3: 0.964 usec per loop Akshay Hazari's using combination of functools.reduce + filter -(2.78 usec per loop) mquadri$ python -m timeit "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "reduce(lambda x,y : filter(lambda z: z!=y,x) ,l1,l2)" 100000 loops, best of 3: 2.78 usec per loop
使用filterfalse而不使用lambda-expression
When using functions like filter or filterfalse and similar from itertools you can usually save performance by avoiding lambda-expressions and using already existing functions. Instances of list and set defines a __contains__-method to use for containment checks. The in-operator calls this method under the hood, so using x in l2 can be replaced by l2.__contains__(x). Usually this replacement is not really prettier but in this specific case it allows us to gain better performance than using a lambda-expression, when used in combination with filterfalse:
>>> from itertools import filterfalse
>>> l1 = [1, 2, 6, 8]
>>> l2 = [2, 3, 5, 8]
>>> list(filterfalse(l2.__contains__, l1))
[1, 6]
Filterfalse创建一个迭代器,该迭代器生成的所有元素在用作12.2 .__contains__的参数时返回false。
Sets有一个更快的__contains__实现,所以更好的是:
>>> from itertools import filterfalse
>>> l1 = [1, 2, 6, 8]
>>> l2 = set([2, 3, 5, 8])
>>> list(filterfalse(l2.__contains__, l1))
[1, 6]
性能
使用列表:
$ python3 -m timeit -s "from itertools import filterfalse; l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(filterfalse(l2.__contains__, l1))"
500000 loops, best of 5: 522 nsec per loop
使用设置:
$ python3 -m timeit -s "from itertools import filterfalse; l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(filterfalse(l2.__contains__, l1))"
1000000 loops, best of 5: 359 nsec per loop
一种方法是使用集合:
>>> set([1,2,6,8]) - set([2,3,5,8])
set([1, 6])
但是请注意,集合不会保留元素的顺序,并且会删除任何重复的元素。元素也需要是可哈希的。如果这些限制是可以容忍的,那么这通常是最简单和性能最高的选项。
试试这个:
l1=[1,2,6,8]
l2=[2,3,5,8]
r=[]
for x in l1:
if x in l2:
continue
r=r+[x]
print(r)