我需要合并多个字典,这是我有例如:

dict1 = {1:{"a":{A}}, 2:{"b":{B}}}

dict2 = {2:{"c":{C}}, 3:{"d":{D}}}

A、B、C和D是树的叶子,比如{"info1":"value", "info2":"value2"}

字典的级别(深度)未知,可能是{2:{"c":{"z":{"y":{c}}}}}

在我的例子中,它表示一个目录/文件结构,节点是文档,叶子是文件。

我想将它们合并得到:

 dict3 = {1:{"a":{A}}, 2:{"b":{B},"c":{C}}, 3:{"d":{D}}}

我不确定如何用Python轻松做到这一点。


当前回答

这是我做的递归合并字典到无限深度的解决方案。传递给函数的第一个字典是主字典——其中的值将覆盖第二个字典中相同键的值。

def merge(dict1: dict, dict2: dict) -> dict:
    merged = dict1

    for key in dict2:
        if type(dict2[key]) == dict:
            merged[key] = merge(dict1[key] if key in dict1 else {}, dict2[key])
        else:
            if key not in dict1.keys():
                merged[key] = dict2[key]

    return merged

其他回答

这应该有助于将所有项从dict2合并到dict1:

for item in dict2:
    if item in dict1:
        for leaf in dict2[item]:
            dict1[item][leaf] = dict2[item][leaf]
    else:
        dict1[item] = dict2[item]

请测试一下,告诉我们这是否是你想要的。

编辑:

上述解决方案只合并了一个级别,但正确地解决了op给出的例子。如果合并多个级别,应该使用递归。

from collections import defaultdict
from itertools import chain

class DictHelper:

@staticmethod
def merge_dictionaries(*dictionaries, override=True):
    merged_dict = defaultdict(set)
    all_unique_keys = set(chain(*[list(dictionary.keys()) for dictionary in dictionaries]))  # Build a set using all dict keys
    for key in all_unique_keys:
        keys_value_type = list(set(filter(lambda obj_type: obj_type != type(None), [type(dictionary.get(key, None)) for dictionary in dictionaries])))
        # Establish the object type for each key, return None if key is not present in dict and remove None from final result
        if len(keys_value_type) != 1:
            raise Exception("Different objects type for same key: {keys_value_type}".format(keys_value_type=keys_value_type))

        if keys_value_type[0] == list:
            values = list(chain(*[dictionary.get(key, []) for dictionary in dictionaries]))  # Extract the value for each key
            merged_dict[key].update(values)

        elif keys_value_type[0] == dict:
            # Extract all dictionaries by key and enter in recursion
            dicts_to_merge = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
            merged_dict[key] = DictHelper.merge_dictionaries(*dicts_to_merge)

        else:
            # if override => get value from last dictionary else make a list of all values
            values = list(filter(lambda obj: obj != None, [dictionary.get(key, None) for dictionary in dictionaries]))
            merged_dict[key] = values[-1] if override else values

    return dict(merged_dict)



if __name__ == '__main__':
  d1 = {'aaaaaaaaa': ['to short', 'to long'], 'bbbbb': ['to short', 'to long'], "cccccc": ["the is a test"]}
  d2 = {'aaaaaaaaa': ['field is not a bool'], 'bbbbb': ['field is not a bool']}
  d3 = {'aaaaaaaaa': ['filed is not a string', "to short"], 'bbbbb': ['field is not an integer']}
  print(DictHelper.merge_dictionaries(d1, d2, d3))

  d4 = {"a": {"x": 1, "y": 2, "z": 3, "d": {"x1": 10}}}
  d5 = {"a": {"x": 10, "y": 20, "d": {"x2": 20}}}
  print(DictHelper.merge_dictionaries(d4, d5))

输出:

{'bbbbb': {'to long', 'field is not an integer', 'to short', 'field is not a bool'}, 
'aaaaaaaaa': {'to long', 'to short', 'filed is not a string', 'field is not a bool'}, 
'cccccc': {'the is a test'}}

{'a': {'y': 20, 'd': {'x1': 10, 'x2': 20}, 'z': 3, 'x': 10}}

如果有人想要另一种方法来解决这个问题,这是我的解决方案。

优点:简洁、声明性和函数式风格(递归,没有突变)。

潜在缺点:这可能不是你想要的合并。查阅文档字符串以了解语义。

def deep_merge(a, b):
    """
    Merge two values, with `b` taking precedence over `a`.

    Semantics:
    - If either `a` or `b` is not a dictionary, `a` will be returned only if
      `b` is `None`. Otherwise `b` will be returned.
    - If both values are dictionaries, they are merged as follows:
        * Each key that is found only in `a` or only in `b` will be included in
          the output collection with its value intact.
        * For any key in common between `a` and `b`, the corresponding values
          will be merged with the same semantics.
    """
    if not isinstance(a, dict) or not isinstance(b, dict):
        return a if b is None else b
    else:
        # If we're here, both a and b must be dictionaries or subtypes thereof.

        # Compute set of all keys in both dictionaries.
        keys = set(a.keys()) | set(b.keys())

        # Build output dictionary, merging recursively values with common keys,
        # where `None` is used to mean the absence of a value.
        return {
            key: deep_merge(a.get(key), b.get(key))
            for key in keys
        }

andrew cookes的回答有一个小问题:在某些情况下,当你修改返回的dict时,它会修改第二个参数b。具体来说是因为这句话:

if key in a:
    ...
else:
    a[key] = b[key]

如果b[key]是一个字典,它将被简单地赋给a,这意味着对该字典的任何后续修改将同时影响a和b。

a={}
b={'1':{'2':'b'}}
c={'1':{'3':'c'}}
merge(merge(a,b), c) # {'1': {'3': 'c', '2': 'b'}}
a # {'1': {'3': 'c', '2': 'b'}} (as expected)
b # {'1': {'3': 'c', '2': 'b'}} <----
c # {'1': {'3': 'c'}} (unmodified)

为了解决这个问题,这一行必须用这个替换:

if isinstance(b[key], dict):
    a[key] = clone_dict(b[key])
else:
    a[key] = b[key]

其中clone_dict为:

def clone_dict(obj):
    clone = {}
    for key, value in obj.iteritems():
        if isinstance(value, dict):
            clone[key] = clone_dict(value)
        else:
            clone[key] = value
    return

不动。这显然没有考虑到list, set和其他东西,但我希望它说明了合并字典时的陷阱。

为了完整起见,这里是我的版本,在那里你可以传递它多个字典:

def merge_dicts(*args):
    def clone_dict(obj):
        clone = {}
        for key, value in obj.iteritems():
            if isinstance(value, dict):
                clone[key] = clone_dict(value)
            else:
                clone[key] = value
        return

    def merge(a, b, path=[]):
        for key in b:
            if key in a:
                if isinstance(a[key], dict) and isinstance(b[key], dict):
                    merge(a[key], b[key], path + [str(key)])
                elif a[key] == b[key]:
                    pass
                else:
                    raise Exception('Conflict at `{path}\''.format(path='.'.join(path + [str(key)])))
            else:
                if isinstance(b[key], dict):
                    a[key] = clone_dict(b[key])
                else:
                    a[key] = b[key]
        return a
    return reduce(merge, args, {})

我有一个迭代的解决方案-工作得更好的大字典&很多(例如jsons等):

import collections


def merge_dict_with_subdicts(dict1: dict, dict2: dict) -> dict:
    """
    similar behaviour to builtin dict.update - but knows how to handle nested dicts
    """
    q = collections.deque([(dict1, dict2)])
    while len(q) > 0:
        d1, d2 = q.pop()
        for k, v in d2.items():
            if k in d1 and isinstance(d1[k], dict) and isinstance(v, dict):
                q.append((d1[k], v))
            else:
                d1[k] = v

    return dict1

注意,这将使用d2中的值来覆盖d1,以防它们都不是字典。(与python的dict.update()相同)

一些测试:

def test_deep_update():
    d = dict()
    merge_dict_with_subdicts(d, {"a": 4})
    assert d == {"a": 4}

    new_dict = {
        "b": {
            "c": {
                "d": 6
            }
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 4,
        "b": {
            "c": {
                "d": 6
            }
        }
    }

    new_dict = {
        "a": 3,
        "b": {
            "f": 7
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 3,
        "b": {
            "c": {
                "d": 6
            },
            "f": 7
        }
    }

    # test a case where one of the dicts has dict as value and the other has something else
    new_dict = {
        'a': {
            'b': 4
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d['a']['b'] == 4

我已经测试了大约1200个字典——这种方法花了0.4秒,而递归的解决方案花了2.5秒。