我需要合并多个字典,这是我有例如:

dict1 = {1:{"a":{A}}, 2:{"b":{B}}}

dict2 = {2:{"c":{C}}, 3:{"d":{D}}}

A、B、C和D是树的叶子,比如{"info1":"value", "info2":"value2"}

字典的级别(深度)未知,可能是{2:{"c":{"z":{"y":{c}}}}}

在我的例子中,它表示一个目录/文件结构,节点是文档,叶子是文件。

我想将它们合并得到:

 dict3 = {1:{"a":{A}}, 2:{"b":{B},"c":{C}}, 3:{"d":{D}}}

我不确定如何用Python轻松做到这一点。


当前回答

我有两个字典(a和b),每个字典可以包含任意数量的嵌套字典。我想递归地合并它们,b优先于a。

将嵌套字典视为树,我想要的是:

更新a,使b中每个叶结点的每条路径都表示在a中 如果在b的对应路径中找到了叶子,则覆盖a的子树 保持所有b个叶节点都是叶节点的不变式。

现有的答案对我来说有点复杂,有些细节被束之高阁。我将以下内容整合在一起,它们通过了我的数据集的单元测试。

  def merge_map(a, b):
    if not isinstance(a, dict) or not isinstance(b, dict):
      return b

    for key in b.keys():
      a[key] = merge_map(a[key], b[key]) if key in a else b[key]
    return a

示例(为清晰起见,已格式化):

 a = {
    1 : {'a': 'red', 
         'b': {'blue': 'fish', 'yellow': 'bear' },
         'c': { 'orange': 'dog'},
    },
    2 : {'d': 'green'},
    3: 'e'
  }

  b = {
    1 : {'b': 'white'},
    2 : {'d': 'black'},
    3: 'e'
  }


  >>> merge_map(a, b)
  {1: {'a': 'red', 
       'b': 'white',
       'c': {'orange': 'dog'},},
   2: {'d': 'black'},
   3: 'e'}

b中需要维护的路径为:

1 -> 'b' -> 'white' 2 -> 'd' -> 'black' 3 -> 'e'。

A拥有独特且不冲突的路径:

1 -> 'a' -> 'red' 1 -> 'c' -> 'orange' -> 'dog'

所以它们仍然在合并后的映射中表示。

其他回答

这个问题的一个问题是字典的值可以是任意复杂的数据块。基于这些和其他答案,我得出了以下代码:

class YamlReaderError(Exception):
    pass

def data_merge(a, b):
    """merges b into a and return merged result

    NOTE: tuples and arbitrary objects are not handled as it is totally ambiguous what should happen"""
    key = None
    # ## debug output
    # sys.stderr.write("DEBUG: %s to %s\n" %(b,a))
    try:
        if a is None or isinstance(a, str) or isinstance(a, unicode) or isinstance(a, int) or isinstance(a, long) or isinstance(a, float):
            # border case for first run or if a is a primitive
            a = b
        elif isinstance(a, list):
            # lists can be only appended
            if isinstance(b, list):
                # merge lists
                a.extend(b)
            else:
                # append to list
                a.append(b)
        elif isinstance(a, dict):
            # dicts must be merged
            if isinstance(b, dict):
                for key in b:
                    if key in a:
                        a[key] = data_merge(a[key], b[key])
                    else:
                        a[key] = b[key]
            else:
                raise YamlReaderError('Cannot merge non-dict "%s" into dict "%s"' % (b, a))
        else:
            raise YamlReaderError('NOT IMPLEMENTED "%s" into "%s"' % (b, a))
    except TypeError, e:
        raise YamlReaderError('TypeError "%s" in key "%s" when merging "%s" into "%s"' % (e, key, b, a))
    return a

我的用例是合并YAML文件,其中我只需要处理可能的数据类型的子集。因此我可以忽略元组和其他对象。对我来说,合理的合并逻辑意味着

取代标量 添加列表 通过添加缺失键和更新现有键来合并字典

其他任何事情和不可预见的事情都会导致错误。

这里有一个使用生成器的简单方法:

def mergedicts(dict1, dict2):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            if isinstance(dict1[k], dict) and isinstance(dict2[k], dict):
                yield (k, dict(mergedicts(dict1[k], dict2[k])))
            else:
                # If one of the values is not a dict, you can't continue merging it.
                # Value from second dict overrides one in first and we move on.
                yield (k, dict2[k])
                # Alternatively, replace this with exception raiser to alert you of value conflicts
        elif k in dict1:
            yield (k, dict1[k])
        else:
            yield (k, dict2[k])

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}

print dict(mergedicts(dict1,dict2))

这个打印:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

这个简单的递归过程将一个字典合并到另一个字典,同时覆盖冲突的键:

#!/usr/bin/env python2.7

def merge_dicts(dict1, dict2):
    """ Recursively merges dict2 into dict1 """
    if not isinstance(dict1, dict) or not isinstance(dict2, dict):
        return dict2
    for k in dict2:
        if k in dict1:
            dict1[k] = merge_dicts(dict1[k], dict2[k])
        else:
            dict1[k] = dict2[k]
    return dict1

print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {2:{"c":"C"}, 3:{"d":"D"}}))
print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {1:{"a":"A"}, 2:{"b":"C"}}))

输出:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}
{1: {'a': 'A'}, 2: {'b': 'C'}}

下面的函数将b合并为a。

def mergedicts(a, b):
    for key in b:
        if isinstance(a.get(key), dict) or isinstance(b.get(key), dict):
            mergedicts(a[key], b[key])
        else:
            a[key] = b[key]
    return a

这个版本的函数将处理N个字典,并且只处理字典——不能传递不恰当的参数,否则将引发TypeError。合并本身解释了键冲突,它不是覆盖来自合并链下的字典的数据,而是创建一组值并追加到该值;没有数据丢失。

它可能不是页面上最有效的,但它是最彻底的,当你合并2到N字典时,你不会丢失任何信息。

def merge_dicts(*dicts):
    if not reduce(lambda x, y: isinstance(y, dict) and x, dicts, True):
        raise TypeError, "Object in *dicts not of type dict"
    if len(dicts) < 2:
        raise ValueError, "Requires 2 or more dict objects"


    def merge(a, b):
        for d in set(a.keys()).union(b.keys()):
            if d in a and d in b:
                if type(a[d]) == type(b[d]):
                    if not isinstance(a[d], dict):
                        ret = list({a[d], b[d]})
                        if len(ret) == 1: ret = ret[0]
                        yield (d, sorted(ret))
                    else:
                        yield (d, dict(merge(a[d], b[d])))
                else:
                    raise TypeError, "Conflicting key:value type assignment"
            elif d in a:
                yield (d, a[d])
            elif d in b:
                yield (d, b[d])
            else:
                raise KeyError

    return reduce(lambda x, y: dict(merge(x, y)), dicts[1:], dicts[0])

print merge_dicts({1:1,2:{1:2}},{1:2,2:{3:1}},{4:4})

输出:{1:[1,2],2:{1:2,3:1},4:4}