我需要合并多个字典,这是我有例如:
dict1 = {1:{"a":{A}}, 2:{"b":{B}}}
dict2 = {2:{"c":{C}}, 3:{"d":{D}}}
A、B、C和D是树的叶子,比如{"info1":"value", "info2":"value2"}
字典的级别(深度)未知,可能是{2:{"c":{"z":{"y":{c}}}}}
在我的例子中,它表示一个目录/文件结构,节点是文档,叶子是文件。
我想将它们合并得到:
dict3 = {1:{"a":{A}}, 2:{"b":{B},"c":{C}}, 3:{"d":{D}}}
我不确定如何用Python轻松做到这一点。
如果有人想要另一种方法来解决这个问题,这是我的解决方案。
优点:简洁、声明性和函数式风格(递归,没有突变)。
潜在缺点:这可能不是你想要的合并。查阅文档字符串以了解语义。
def deep_merge(a, b):
"""
Merge two values, with `b` taking precedence over `a`.
Semantics:
- If either `a` or `b` is not a dictionary, `a` will be returned only if
`b` is `None`. Otherwise `b` will be returned.
- If both values are dictionaries, they are merged as follows:
* Each key that is found only in `a` or only in `b` will be included in
the output collection with its value intact.
* For any key in common between `a` and `b`, the corresponding values
will be merged with the same semantics.
"""
if not isinstance(a, dict) or not isinstance(b, dict):
return a if b is None else b
else:
# If we're here, both a and b must be dictionaries or subtypes thereof.
# Compute set of all keys in both dictionaries.
keys = set(a.keys()) | set(b.keys())
# Build output dictionary, merging recursively values with common keys,
# where `None` is used to mean the absence of a value.
return {
key: deep_merge(a.get(key), b.get(key))
for key in keys
}
这个问题的一个问题是字典的值可以是任意复杂的数据块。基于这些和其他答案,我得出了以下代码:
class YamlReaderError(Exception):
pass
def data_merge(a, b):
"""merges b into a and return merged result
NOTE: tuples and arbitrary objects are not handled as it is totally ambiguous what should happen"""
key = None
# ## debug output
# sys.stderr.write("DEBUG: %s to %s\n" %(b,a))
try:
if a is None or isinstance(a, str) or isinstance(a, unicode) or isinstance(a, int) or isinstance(a, long) or isinstance(a, float):
# border case for first run or if a is a primitive
a = b
elif isinstance(a, list):
# lists can be only appended
if isinstance(b, list):
# merge lists
a.extend(b)
else:
# append to list
a.append(b)
elif isinstance(a, dict):
# dicts must be merged
if isinstance(b, dict):
for key in b:
if key in a:
a[key] = data_merge(a[key], b[key])
else:
a[key] = b[key]
else:
raise YamlReaderError('Cannot merge non-dict "%s" into dict "%s"' % (b, a))
else:
raise YamlReaderError('NOT IMPLEMENTED "%s" into "%s"' % (b, a))
except TypeError, e:
raise YamlReaderError('TypeError "%s" in key "%s" when merging "%s" into "%s"' % (e, key, b, a))
return a
我的用例是合并YAML文件,其中我只需要处理可能的数据类型的子集。因此我可以忽略元组和其他对象。对我来说,合理的合并逻辑意味着
取代标量
添加列表
通过添加缺失键和更新现有键来合并字典
其他任何事情和不可预见的事情都会导致错误。
换个答案怎么样?!?这也避免了突变/副作用:
def merge(dict1, dict2):
output = {}
# adds keys from `dict1` if they do not exist in `dict2` and vice-versa
intersection = {**dict2, **dict1}
for k_intersect, v_intersect in intersection.items():
if k_intersect not in dict1:
v_dict2 = dict2[k_intersect]
output[k_intersect] = v_dict2
elif k_intersect not in dict2:
output[k_intersect] = v_intersect
elif isinstance(v_intersect, dict):
v_dict2 = dict2[k_intersect]
output[k_intersect] = merge(v_intersect, v_dict2)
else:
output[k_intersect] = v_intersect
return output
dict1 = {1:{"a":{"A"}}, 2:{"b":{"B"}}}
dict2 = {2:{"c":{"C"}}, 3:{"d":{"D"}}}
dict3 = {1:{"a":{"A"}}, 2:{"b":{"B"},"c":{"C"}}, 3:{"d":{"D"}}}
assert dict3 == merge(dict1, dict2)
这里有一个使用生成器的简单方法:
def mergedicts(dict1, dict2):
for k in set(dict1.keys()).union(dict2.keys()):
if k in dict1 and k in dict2:
if isinstance(dict1[k], dict) and isinstance(dict2[k], dict):
yield (k, dict(mergedicts(dict1[k], dict2[k])))
else:
# If one of the values is not a dict, you can't continue merging it.
# Value from second dict overrides one in first and we move on.
yield (k, dict2[k])
# Alternatively, replace this with exception raiser to alert you of value conflicts
elif k in dict1:
yield (k, dict1[k])
else:
yield (k, dict2[k])
dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}
print dict(mergedicts(dict1,dict2))
这个打印:
{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}