假设我有两个表,l1和l2。我想执行l1 - l2,返回l1中不在l2中的所有元素。

我可以想出一个简单的循环方法来做这个,但那真的很低效。python式的高效方法是什么?

举个例子,如果l1 = [1,2,6,8], l2 = [2,3,5,8], l1 - l2应该返回[1,6]


当前回答

性能比较

在Python 3.9.1和Python 2.7.16上比较这里提到的所有答案的性能。

Python 3.9.1

答案按表现顺序列出:

Arkku's set difference using subtraction "-" operation - (91.3 nsec per loop) mquadri$ python3 -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1 - l2" 5000000 loops, best of 5: 91.3 nsec per loop Moinuddin Quadri's using set().difference()- (133 nsec per loop) mquadri$ python3 -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1.difference(l2)" 2000000 loops, best of 5: 133 nsec per loop Moinuddin Quadri's list comprehension with set based lookup- (366 nsec per loop) mquadri$ python3 -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "[x for x in l1 if x not in l2]" 1000000 loops, best of 5: 366 nsec per loop Donut's list comprehension on plain list - (489 nsec per loop) mquadri$ python3 -m timeit -s "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "[x for x in l1 if x not in l2]" 500000 loops, best of 5: 489 nsec per loop Daniel Pryden's generator expression with set based lookup and type-casting to list - (583 nsec per loop) : Explicitly type-casting to list to get the final object as list, as requested by OP. If generator expression is replaced with list comprehension, it'll become same as Moinuddin Quadri's list comprehension with set based lookup. mquadri$ mquadri$ python3 -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(x for x in l1 if x not in l2)" 500000 loops, best of 5: 583 nsec per loop Moinuddin Quadri's using filter() and explicitly type-casting to list (need to explicitly type-cast as in Python 3.x, it returns iterator) - (681 nsec per loop) mquadri$ python3 -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(filter(lambda x: x not in l2, l1))" 500000 loops, best of 5: 681 nsec per loop Akshay Hazari's using combination of functools.reduce + filter -(3.36 usec per loop) : Explicitly type-casting to list as from Python 3.x it started returned returning iterator. Also we need to import functools to use reduce in Python 3.x mquadri$ python3 -m timeit "from functools import reduce; l1 = [1,2,6,8]; l2 = [2,3,5,8];" "list(reduce(lambda x,y : filter(lambda z: z!=y,x) ,l1,l2))" 100000 loops, best of 5: 3.36 usec per loop

Python 2.7.16

答案按表现顺序列出:

Arkku's set difference using subtraction "-" operation - (0.0783 usec per loop) mquadri$ python -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1 - l2" 10000000 loops, best of 3: 0.0783 usec per loop Moinuddin Quadri's using set().difference()- (0.117 usec per loop) mquadri$ mquadri$ python -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1.difference(l2)" 10000000 loops, best of 3: 0.117 usec per loop Moinuddin Quadri's list comprehension with set based lookup- (0.246 usec per loop) mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "[x for x in l1 if x not in l2]" 1000000 loops, best of 3: 0.246 usec per loop Donut's list comprehension on plain list - (0.372 usec per loop) mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "[x for x in l1 if x not in l2]" 1000000 loops, best of 3: 0.372 usec per loop Moinuddin Quadri's using filter() - (0.593 usec per loop) mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "filter(lambda x: x not in l2, l1)" 1000000 loops, best of 3: 0.593 usec per loop Daniel Pryden's generator expression with set based lookup and type-casting to list - (0.964 per loop) : Explicitly type-casting to list to get the final object as list, as requested by OP. If generator expression is replaced with list comprehension, it'll become same as Moinuddin Quadri's list comprehension with set based lookup. mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(x for x in l1 if x not in l2)" 1000000 loops, best of 3: 0.964 usec per loop Akshay Hazari's using combination of functools.reduce + filter -(2.78 usec per loop) mquadri$ python -m timeit "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "reduce(lambda x,y : filter(lambda z: z!=y,x) ,l1,l2)" 100000 loops, best of 3: 2.78 usec per loop

其他回答

扩展Donut的答案和这里的其他答案,通过使用生成器推导式而不是列表推导式,以及使用集合数据结构(因为in操作符在列表上是O(n),而在集合上是O(1)),您可以得到更好的结果。

这里有一个函数适合你:

def filter_list(full_list, excludes):
    s = set(excludes)
    return (x for x in full_list if x not in s)

结果将是一个可迭代对象,它将惰性地获取过滤后的列表。如果你需要一个真正的列表对象(例如,如果你需要对结果执行len()),那么你可以很容易地像这样构建一个列表:

filtered_list = list(filter_list(full_list, excludes))

使用set.difference ():

你可以使用set.difference()来获取新的set,其中包含集合中不存在于其他集合中的元素。即set(A).difference(B)将返回包含A中存在而B中不存在的项的set。例如:

>>> set([1,2,6,8]).difference([2,3,5,8])
{1, 6}

它是Arkku的答案中提到的一种函数方法(它使用算术减法-运算符来获得集差)。

因为集合是无序的,你将失去初始列表中元素的顺序。(如果你想保持元素的顺序,请继续阅读下一节)

使用列表理解和基于集的查找

如果你想保持从初始列表的顺序,那么甜甜圈的基于列表理解的答案将做的技巧。但是,通过在内部使用set来检查元素是否存在于其他列表中,可以从接受的答案中获得更好的性能。例如:

l1, l2 = [1,2,6,8], [2,3,5,8]
s2 = set(l2)  # Type-cast `l2` to `set`

l3 = [x for x in l1 if x not in s2]
                             #   ^ Doing membership checking on `set` s2

如果你有兴趣了解为什么与列表相比,set的成员资格检查更快,请阅读这篇文章:是什么使得set比list更快?


使用filter()和lambda表达式

下面是使用filter()和lambda表达式的另一种替代方法。在这里添加它只是为了参考,但它不是有效的性能:

>>> l1 = [1,2,6,8]
>>> l2 = set([2,3,5,8])

#     v  `filter` returns the a iterator object. Here I'm type-casting 
#     v  it to `list` in order to display the resultant value
>>> list(filter(lambda x: x not in l2, l1))
[1, 6]

Python 3.8上的集合和列表理解基准

(加起来就是Moinuddin Quadri的基准)

tldr:使用Arkku的集合解决方案,相比之下比承诺的还要快!

根据列表检查现有文件

在我的例子中,我发现使用Arkku的集合解决方案要比python的列表理解方法快40倍(!)。

列表理解:

%%time
import glob
existing = [int(os.path.basename(x).split(".")[0]) for x in glob.glob("*.txt")]
wanted = list(range(1, 100000))
[i for i in wanted if i not in existing]

墙壁时间:28.2秒

Sets

%%time
import glob
existing = [int(os.path.basename(x).split(".")[0]) for x in glob.glob("*.txt")]
wanted = list(range(1, 100000))
set(wanted) - set(existing)

壁时间:689毫秒

性能比较

在Python 3.9.1和Python 2.7.16上比较这里提到的所有答案的性能。

Python 3.9.1

答案按表现顺序列出:

Arkku's set difference using subtraction "-" operation - (91.3 nsec per loop) mquadri$ python3 -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1 - l2" 5000000 loops, best of 5: 91.3 nsec per loop Moinuddin Quadri's using set().difference()- (133 nsec per loop) mquadri$ python3 -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1.difference(l2)" 2000000 loops, best of 5: 133 nsec per loop Moinuddin Quadri's list comprehension with set based lookup- (366 nsec per loop) mquadri$ python3 -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "[x for x in l1 if x not in l2]" 1000000 loops, best of 5: 366 nsec per loop Donut's list comprehension on plain list - (489 nsec per loop) mquadri$ python3 -m timeit -s "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "[x for x in l1 if x not in l2]" 500000 loops, best of 5: 489 nsec per loop Daniel Pryden's generator expression with set based lookup and type-casting to list - (583 nsec per loop) : Explicitly type-casting to list to get the final object as list, as requested by OP. If generator expression is replaced with list comprehension, it'll become same as Moinuddin Quadri's list comprehension with set based lookup. mquadri$ mquadri$ python3 -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(x for x in l1 if x not in l2)" 500000 loops, best of 5: 583 nsec per loop Moinuddin Quadri's using filter() and explicitly type-casting to list (need to explicitly type-cast as in Python 3.x, it returns iterator) - (681 nsec per loop) mquadri$ python3 -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(filter(lambda x: x not in l2, l1))" 500000 loops, best of 5: 681 nsec per loop Akshay Hazari's using combination of functools.reduce + filter -(3.36 usec per loop) : Explicitly type-casting to list as from Python 3.x it started returned returning iterator. Also we need to import functools to use reduce in Python 3.x mquadri$ python3 -m timeit "from functools import reduce; l1 = [1,2,6,8]; l2 = [2,3,5,8];" "list(reduce(lambda x,y : filter(lambda z: z!=y,x) ,l1,l2))" 100000 loops, best of 5: 3.36 usec per loop

Python 2.7.16

答案按表现顺序列出:

Arkku's set difference using subtraction "-" operation - (0.0783 usec per loop) mquadri$ python -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1 - l2" 10000000 loops, best of 3: 0.0783 usec per loop Moinuddin Quadri's using set().difference()- (0.117 usec per loop) mquadri$ mquadri$ python -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1.difference(l2)" 10000000 loops, best of 3: 0.117 usec per loop Moinuddin Quadri's list comprehension with set based lookup- (0.246 usec per loop) mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "[x for x in l1 if x not in l2]" 1000000 loops, best of 3: 0.246 usec per loop Donut's list comprehension on plain list - (0.372 usec per loop) mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "[x for x in l1 if x not in l2]" 1000000 loops, best of 3: 0.372 usec per loop Moinuddin Quadri's using filter() - (0.593 usec per loop) mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "filter(lambda x: x not in l2, l1)" 1000000 loops, best of 3: 0.593 usec per loop Daniel Pryden's generator expression with set based lookup and type-casting to list - (0.964 per loop) : Explicitly type-casting to list to get the final object as list, as requested by OP. If generator expression is replaced with list comprehension, it'll become same as Moinuddin Quadri's list comprehension with set based lookup. mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(x for x in l1 if x not in l2)" 1000000 loops, best of 3: 0.964 usec per loop Akshay Hazari's using combination of functools.reduce + filter -(2.78 usec per loop) mquadri$ python -m timeit "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "reduce(lambda x,y : filter(lambda z: z!=y,x) ,l1,l2)" 100000 loops, best of 3: 2.78 usec per loop

Python有一个称为列表推导式的语言特性,它非常适合使这类事情变得极其简单。下面的语句完全是你想要的,并将结果存储在l3中:

l3 = [x for x in l1 if x not in l2]

L3将包含[1,6]。