假设你有一本这样的字典:
{'a': 1,
'c': {'a': 2,
'b': {'x': 5,
'y' : 10}},
'd': [1, 2, 3]}
你会如何把它平摊成这样:
{'a': 1,
'c_a': 2,
'c_b_x': 5,
'c_b_y': 10,
'd': [1, 2, 3]}
假设你有一本这样的字典:
{'a': 1,
'c': {'a': 2,
'b': {'x': 5,
'y' : 10}},
'd': [1, 2, 3]}
你会如何把它平摊成这样:
{'a': 1,
'c_a': 2,
'c_b_x': 5,
'c_b_y': 10,
'd': [1, 2, 3]}
当前回答
这里有一个使用堆栈的解决方案。没有递归。
def flatten_nested_dict(nested):
stack = list(nested.items())
ans = {}
while stack:
key, val = stack.pop()
if isinstance(val, dict):
for sub_key, sub_val in val.items():
stack.append((f"{key}_{sub_key}", sub_val))
else:
ans[key] = val
return ans
其他回答
在Python3.5中提供功能和性能的解决方案如何?
from functools import reduce
def _reducer(items, key, val, pref):
if isinstance(val, dict):
return {**items, **flatten(val, pref + key)}
else:
return {**items, pref + key: val}
def flatten(d, pref=''):
return(reduce(
lambda new_d, kv: _reducer(new_d, *kv, pref),
d.items(),
{}
))
这是更有表现力的:
def flatten(d, pref=''):
return(reduce(
lambda new_d, kv: \
isinstance(kv[1], dict) and \
{**new_d, **flatten(kv[1], pref + kv[0])} or \
{**new_d, pref + kv[0]: kv[1]},
d.items(),
{}
))
在使用:
my_obj = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y': 10}}, 'd': [1, 2, 3]}
print(flatten(my_obj))
# {'d': [1, 2, 3], 'cby': 10, 'cbx': 5, 'ca': 2, 'a': 1}
原始海报需要考虑两大因素:
Are there keyspace clobbering issues? For example, {'a_b':{'c':1}, 'a':{'b_c':2}} would result in {'a_b_c':???}. The below solution evades the problem by returning an iterable of pairs. If performance is an issue, does the key-reducer function (which I hereby refer to as 'join') require access to the entire key-path, or can it just do O(1) work at every node in the tree? If you want to be able to say joinedKey = '_'.join(*keys), that will cost you O(N^2) running time. However if you're willing to say nextKey = previousKey+'_'+thisKey, that gets you O(N) time. The solution below lets you do both (since you could merely concatenate all the keys, then postprocess them).
(Performance is not likely an issue, but I'll elaborate on the second point in case anyone else cares: In implementing this, there are numerous dangerous choices. If you do this recursively and yield and re-yield, or anything equivalent which touches nodes more than once (which is quite easy to accidentally do), you are doing potentially O(N^2) work rather than O(N). This is because maybe you are calculating a key a then a_1 then a_1_i..., and then calculating a then a_1 then a_1_ii..., but really you shouldn't have to calculate a_1 again. Even if you aren't recalculating it, re-yielding it (a 'level-by-level' approach) is just as bad. A good example is to think about the performance on {1:{1:{1:{1:...(N times)...{1:SOME_LARGE_DICTIONARY_OF_SIZE_N}...}}}})
下面是一个函数,我写的扁平化dict (d, join=…, lift=…),可以适应许多用途,可以做你想做的事情。遗憾的是,要在不产生上述性能损失的情况下创建这个函数的惰性版本是相当困难的(许多python内置程序,如chain.from_iterable实际上并不是有效的,这是我在确定这个代码之前对三个不同版本的代码进行了大量测试后才意识到的)。
from collections import Mapping
from itertools import chain
from operator import add
_FLAG_FIRST = object()
def flattenDict(d, join=add, lift=lambda x:(x,)):
results = []
def visit(subdict, results, partialKey):
for k,v in subdict.items():
newKey = lift(k) if partialKey==_FLAG_FIRST else join(partialKey,lift(k))
if isinstance(v,Mapping):
visit(v, results, newKey)
else:
results.append((newKey,v))
visit(d, results, _FLAG_FIRST)
return results
为了更好地理解这是怎么回事,下面是一个图表,供那些不熟悉reduce(左),也称为“左折叠”的人使用。有时用一个初始值来代替k0(不是列表的一部分,传递给函数)。这里,J是我们的连接函数。我们用升力(k)对每kn进行预处理。
[k0,k1,...,kN].foldleft(J)
/ \
... kN
/
J(k0,J(k1,J(k2,k3)))
/ \
/ \
J(J(k0,k1),k2) k3
/ \
/ \
J(k0,k1) k2
/ \
/ \
k0 k1
这实际上与functools相同。Reduce,但是我们的函数对树的所有键路径都做了这个。
>>> reduce(lambda a,b:(a,b), range(5))
((((0, 1), 2), 3), 4)
演示(否则我将放在docstring中):
>>> testData = {
'a':1,
'b':2,
'c':{
'aa':11,
'bb':22,
'cc':{
'aaa':111
}
}
}
from pprint import pprint as pp
>>> pp(dict( flattenDict(testData) ))
{('a',): 1,
('b',): 2,
('c', 'aa'): 11,
('c', 'bb'): 22,
('c', 'cc', 'aaa'): 111}
>>> pp(dict( flattenDict(testData, join=lambda a,b:a+'_'+b, lift=lambda x:x) ))
{'a': 1, 'b': 2, 'c_aa': 11, 'c_bb': 22, 'c_cc_aaa': 111}
>>> pp(dict( (v,k) for k,v in flattenDict(testData, lift=hash, join=lambda a,b:hash((a,b))) ))
{1: 12416037344,
2: 12544037731,
11: 5470935132935744593,
22: 4885734186131977315,
111: 3461911260025554326}
性能:
from functools import reduce
def makeEvilDict(n):
return reduce(lambda acc,x:{x:acc}, [{i:0 for i in range(n)}]+range(n))
import timeit
def time(runnable):
t0 = timeit.default_timer()
_ = runnable()
t1 = timeit.default_timer()
print('took {:.2f} seconds'.format(t1-t0))
>>> pp(makeEvilDict(8))
{7: {6: {5: {4: {3: {2: {1: {0: {0: 0,
1: 0,
2: 0,
3: 0,
4: 0,
5: 0,
6: 0,
7: 0}}}}}}}}}
import sys
sys.setrecursionlimit(1000000)
forget = lambda a,b:''
>>> time(lambda: dict(flattenDict(makeEvilDict(10000), join=forget)) )
took 0.10 seconds
>>> time(lambda: dict(flattenDict(makeEvilDict(100000), join=forget)) )
[1] 12569 segmentation fault python
... 唉,别以为那是我的错……
[由于中庸问题,不重要的历史注释]
关于所谓的重复扁平化字典的字典(2层深)列表
That question's solution can be implemented in terms of this one by doing sorted( sum(flatten(...),[]) ). The reverse is not possible: while it is true that the values of flatten(...) can be recovered from the alleged duplicate by mapping a higher-order accumulator, one cannot recover the keys. (edit: Also it turns out that the alleged duplicate owner's question is completely different, in that it only deals with dictionaries exactly 2-level deep, though one of the answers on that page gives a general solution.)
这里有一个优雅的、就地替换的算法。使用Python 2.7和Python 3.5进行测试。使用点字符作为分隔符。
def flatten_json(json):
if type(json) == dict:
for k, v in list(json.items()):
if type(v) == dict:
flatten_json(v)
json.pop(k)
for k2, v2 in v.items():
json[k+"."+k2] = v2
例子:
d = {'a': {'b': 'c'}}
flatten_json(d)
print(d)
unflatten_json(d)
print(d)
输出:
{'a.b': 'c'}
{'a': {'b': 'c'}}
我在这里发布了这段代码以及匹配的unflat_json函数。
Davoud的解决方案非常好,但当嵌套的字典也包含字典列表时,并不能给出令人满意的结果,但他的代码可以适应这种情况:
def flatten_dict(d):
items = []
for k, v in d.items():
try:
if (type(v)==type([])):
for l in v: items.extend(flatten_dict(l).items())
else:
items.extend(flatten_dict(v).items())
except AttributeError:
items.append((k, v))
return dict(items)
这里有一个使用堆栈的解决方案。没有递归。
def flatten_nested_dict(nested):
stack = list(nested.items())
ans = {}
while stack:
key, val = stack.pop()
if isinstance(val, dict):
for sub_key, sub_val in val.items():
stack.append((f"{key}_{sub_key}", sub_val))
else:
ans[key] = val
return ans