我需要合并多个字典,这是我有例如:
dict1 = {1:{"a":{A}}, 2:{"b":{B}}}
dict2 = {2:{"c":{C}}, 3:{"d":{D}}}
A、B、C和D是树的叶子,比如{"info1":"value", "info2":"value2"}
字典的级别(深度)未知,可能是{2:{"c":{"z":{"y":{c}}}}}
在我的例子中,它表示一个目录/文件结构,节点是文档,叶子是文件。
我想将它们合并得到:
dict3 = {1:{"a":{A}}, 2:{"b":{B},"c":{C}}, 3:{"d":{D}}}
我不确定如何用Python轻松做到这一点。
如果你有一个未知级别的字典,那么我会建议一个递归函数:
def combineDicts(dictionary1, dictionary2):
output = {}
for item, value in dictionary1.iteritems():
if dictionary2.has_key(item):
if isinstance(dictionary2[item], dict):
output[item] = combineDicts(value, dictionary2.pop(item))
else:
output[item] = value
for item, value in dictionary2.iteritems():
output[item] = value
return output
在不影响输入字典的情况下返回一个合并。
def _merge_dicts(dictA: Dict = {}, dictB: Dict = {}) -> Dict:
# it suffices to pass as an argument a clone of `dictA`
return _merge_dicts_aux(dictA, dictB, copy(dictA))
def _merge_dicts_aux(dictA: Dict = {}, dictB: Dict = {}, result: Dict = {}, path: List[str] = None) -> Dict:
# conflict path, None if none
if path is None:
path = []
for key in dictB:
# if the key doesn't exist in A, add the B element to A
if key not in dictA:
result[key] = dictB[key]
else:
# if the key value is a dict, both in A and in B, merge the dicts
if isinstance(dictA[key], dict) and isinstance(dictB[key], dict):
_merge_dicts_aux(dictA[key], dictB[key], result[key], path + [str(key)])
# if the key value is the same in A and in B, ignore
elif dictA[key] == dictB[key]:
pass
# if the key value differs in A and in B, raise error
else:
err: str = f"Conflict at {'.'.join(path + [str(key)])}"
raise Exception(err)
return result
灵感来自@andrew cooke的解决方案
这里我有另一个稍微不同的解决方案:
def deepMerge(d1, d2, inconflict = lambda v1,v2 : v2) :
''' merge d2 into d1. using inconflict function to resolve the leaf conflicts '''
for k in d2:
if k in d1 :
if isinstance(d1[k], dict) and isinstance(d2[k], dict) :
deepMerge(d1[k], d2[k], inconflict)
elif d1[k] != d2[k] :
d1[k] = inconflict(d1[k], d2[k])
else :
d1[k] = d2[k]
return d1
默认情况下,它解决冲突,支持来自第二个字典的值,但您可以很容易地覆盖这一点,使用一些巫术,您甚至可以抛出异常。:)。
这个问题的一个问题是字典的值可以是任意复杂的数据块。基于这些和其他答案,我得出了以下代码:
class YamlReaderError(Exception):
pass
def data_merge(a, b):
"""merges b into a and return merged result
NOTE: tuples and arbitrary objects are not handled as it is totally ambiguous what should happen"""
key = None
# ## debug output
# sys.stderr.write("DEBUG: %s to %s\n" %(b,a))
try:
if a is None or isinstance(a, str) or isinstance(a, unicode) or isinstance(a, int) or isinstance(a, long) or isinstance(a, float):
# border case for first run or if a is a primitive
a = b
elif isinstance(a, list):
# lists can be only appended
if isinstance(b, list):
# merge lists
a.extend(b)
else:
# append to list
a.append(b)
elif isinstance(a, dict):
# dicts must be merged
if isinstance(b, dict):
for key in b:
if key in a:
a[key] = data_merge(a[key], b[key])
else:
a[key] = b[key]
else:
raise YamlReaderError('Cannot merge non-dict "%s" into dict "%s"' % (b, a))
else:
raise YamlReaderError('NOT IMPLEMENTED "%s" into "%s"' % (b, a))
except TypeError, e:
raise YamlReaderError('TypeError "%s" in key "%s" when merging "%s" into "%s"' % (e, key, b, a))
return a
我的用例是合并YAML文件,其中我只需要处理可能的数据类型的子集。因此我可以忽略元组和其他对象。对我来说,合理的合并逻辑意味着
取代标量
添加列表
通过添加缺失键和更新现有键来合并字典
其他任何事情和不可预见的事情都会导致错误。