2024-10-29 07:00:04

计算列表差值

在Python中,计算两个列表之间的差值的最佳方法是什么?

例子

A = [1,2,3,4]
B = [2,5]

A - B = [1,3,4]
B - A = [5]

当前回答

最简单的方法,

使用设置().difference(设置())

list_a = [1,2,3]
list_b = [2,3]
print set(list_a).difference(set(list_b))

答案设置([1])

其他回答

一个衬套:

diff = lambda l1,l2: [x for x in l1 if x not in l2]
diff(A,B)
diff(B,A)

Or:

diff = lambda l1,l2: filter(lambda x: x not in l2, l1)
diff(A,B)
diff(B,A)

最简单的方法,

使用设置().difference(设置())

list_a = [1,2,3]
list_b = [2,3]
print set(list_a).difference(set(list_b))

答案设置([1])

如果你想要递归地深入到列表中的项目,我已经为python编写了一个包:https://github.com/erasmose/deepdiff

安装

从PyPi安装:

pip install deepdiff

如果你是Python3,你还需要安装:

pip install future six

示例使用

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function

同一对象返回空

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> ddiff = DeepDiff(t1, t2)
>>> print (ddiff.changes)
    {}

项目类型发生变化

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> ddiff = DeepDiff(t1, t2)
>>> print (ddiff.changes)
    {'type_changes': ["root[2]: 2=<type 'int'> vs. 2=<type 'str'>"]}

某项的值已更改

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> ddiff = DeepDiff(t1, t2)
>>> print (ddiff.changes)
    {'values_changed': ['root[2]: 2 ====>> 4']}

项目添加和/或删除

>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes)
    {'dic_item_added': ['root[5, 6]'],
     'dic_item_removed': ['root[4]'],
     'values_changed': ['root[2]: 2 ====>> 4']}

字符串的区别

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
    { 'values_changed': [ 'root[2]: 2 ====>> 4',
                          "root[4]['b']:\n--- \n+++ \n@@ -1 +1 @@\n-world\n+world!"]}
>>>
>>> print (ddiff.changes['values_changed'][1])
    root[4]['b']:
    --- 
    +++ 
    @@ -1 +1 @@
    -world
    +world!

字符串差2

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
    { 'values_changed': [ "root[4]['b']:\n--- \n+++ \n@@ -1,5 +1,4 @@\n-world!\n-Goodbye!\n+world\n 1\n 2\n End"]}
>>>
>>> print (ddiff.changes['values_changed'][0])
    root[4]['b']:
    --- 
    +++ 
    @@ -1,5 +1,4 @@
    -world!
    -Goodbye!
    +world
     1
     2
     End

类型变化

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
    { 'type_changes': [ "root[4]['b']: [1, 2, 3]=<type 'list'> vs. world\n\n\nEnd=<type 'str'>"]}

列表的区别

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
    { 'list_removed': ["root[4]['b']: [3]"]}

区别2:注意它不考虑顺序

>>> # Note that it DOES NOT take order into account
... t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
    { }

包含字典的列表:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
    { 'dic_item_removed': ["root[4]['b'][2][2]"],
      'values_changed': ["root[4]['b'][2][1]: 1 ====>> 3"]}

有三种选择,其中两种是可以接受的,另一种不应该这样做。

在较高的级别上,这3个选项是:

减去两组(有时最好) 检查每个列表项是否存在于一个集合中(大多数情况下最好) 检查每个列表项是否存在于列表中(不做)

选项3)永远不应该超过选项2)。根据应用程序的需要,您可能更喜欢选项1)或2),而在大多数用例中,2)可能是首选方法。2)与1)的性能非常相似,因为两者都具有O(m + n)个时间复杂度。相比之下,2)在空间复杂度上比1)有边际优势,并且既保持了原始列表的顺序,又保持了原始列表中的任何重复。

如果你想删除重复,不关心顺序,那么1)可能是最适合你的。

import time

def fun1(l1, l2):
    # Order and duplications in l1 are both lost, O(m) + O(n)
    return set(l1) - set(l2)

def fun2(l1, l2):
    # Order and duplications in l1 are both preserved, O(m) + O(n)
    l2_set = set(l2)
    return [item for item in l1 if item not in l2_set]

def fun3(l1, l2):
    # Order and duplications in l1 are both preserved, O(m * n)
    # Don't do
    return [item for item in l1 if item not in l2]

A = list(range(7500))
B = list(range(5000, 10000))

loops = 100

start = time.time()
for _ in range(loops):
    fun1(A, B)
print(f"fun1 time: {time.time() - start}")

start = time.time()
for _ in range(loops):
    fun2(A, B)
print(f"fun2 time: {time.time() - start}")

start = time.time()
for _ in range(loops):
    fun3(A, B)
print(f"fun3 time: {time.time() - start}")
fun1 time: 0.03749704360961914
fun2 time: 0.04109621047973633
fun3 time: 32.55076885223389

您可能希望使用集合而不是列表。