假设你有一本这样的字典:
{'a': 1,
'c': {'a': 2,
'b': {'x': 5,
'y' : 10}},
'd': [1, 2, 3]}
你会如何把它平摊成这样:
{'a': 1,
'c_a': 2,
'c_b_x': 5,
'c_b_y': 10,
'd': [1, 2, 3]}
假设你有一本这样的字典:
{'a': 1,
'c': {'a': 2,
'b': {'x': 5,
'y' : 10}},
'd': [1, 2, 3]}
你会如何把它平摊成这样:
{'a': 1,
'c_a': 2,
'c_b_x': 5,
'c_b_y': 10,
'd': [1, 2, 3]}
当前回答
简单的函数来平嵌套字典。对于Python 3,用.items()替换.iteritems()
def flatten_dict(init_dict):
res_dict = {}
if type(init_dict) is not dict:
return res_dict
for k, v in init_dict.iteritems():
if type(v) == dict:
res_dict.update(flatten_dict(v))
else:
res_dict[k] = v
return res_dict
这个想法/要求是: 获取不保留父键的平面字典。
用法示例:
dd = {'a': 3,
'b': {'c': 4, 'd': 5},
'e': {'f':
{'g': 1, 'h': 2}
},
'i': 9,
}
flatten_dict(dd)
>> {'a': 3, 'c': 4, 'd': 5, 'g': 1, 'h': 2, 'i': 9}
保留父密钥也很简单。
其他回答
原始海报需要考虑两大因素:
Are there keyspace clobbering issues? For example, {'a_b':{'c':1}, 'a':{'b_c':2}} would result in {'a_b_c':???}. The below solution evades the problem by returning an iterable of pairs. If performance is an issue, does the key-reducer function (which I hereby refer to as 'join') require access to the entire key-path, or can it just do O(1) work at every node in the tree? If you want to be able to say joinedKey = '_'.join(*keys), that will cost you O(N^2) running time. However if you're willing to say nextKey = previousKey+'_'+thisKey, that gets you O(N) time. The solution below lets you do both (since you could merely concatenate all the keys, then postprocess them).
(Performance is not likely an issue, but I'll elaborate on the second point in case anyone else cares: In implementing this, there are numerous dangerous choices. If you do this recursively and yield and re-yield, or anything equivalent which touches nodes more than once (which is quite easy to accidentally do), you are doing potentially O(N^2) work rather than O(N). This is because maybe you are calculating a key a then a_1 then a_1_i..., and then calculating a then a_1 then a_1_ii..., but really you shouldn't have to calculate a_1 again. Even if you aren't recalculating it, re-yielding it (a 'level-by-level' approach) is just as bad. A good example is to think about the performance on {1:{1:{1:{1:...(N times)...{1:SOME_LARGE_DICTIONARY_OF_SIZE_N}...}}}})
下面是一个函数,我写的扁平化dict (d, join=…, lift=…),可以适应许多用途,可以做你想做的事情。遗憾的是,要在不产生上述性能损失的情况下创建这个函数的惰性版本是相当困难的(许多python内置程序,如chain.from_iterable实际上并不是有效的,这是我在确定这个代码之前对三个不同版本的代码进行了大量测试后才意识到的)。
from collections import Mapping
from itertools import chain
from operator import add
_FLAG_FIRST = object()
def flattenDict(d, join=add, lift=lambda x:(x,)):
results = []
def visit(subdict, results, partialKey):
for k,v in subdict.items():
newKey = lift(k) if partialKey==_FLAG_FIRST else join(partialKey,lift(k))
if isinstance(v,Mapping):
visit(v, results, newKey)
else:
results.append((newKey,v))
visit(d, results, _FLAG_FIRST)
return results
为了更好地理解这是怎么回事,下面是一个图表,供那些不熟悉reduce(左),也称为“左折叠”的人使用。有时用一个初始值来代替k0(不是列表的一部分,传递给函数)。这里,J是我们的连接函数。我们用升力(k)对每kn进行预处理。
[k0,k1,...,kN].foldleft(J)
/ \
... kN
/
J(k0,J(k1,J(k2,k3)))
/ \
/ \
J(J(k0,k1),k2) k3
/ \
/ \
J(k0,k1) k2
/ \
/ \
k0 k1
这实际上与functools相同。Reduce,但是我们的函数对树的所有键路径都做了这个。
>>> reduce(lambda a,b:(a,b), range(5))
((((0, 1), 2), 3), 4)
演示(否则我将放在docstring中):
>>> testData = {
'a':1,
'b':2,
'c':{
'aa':11,
'bb':22,
'cc':{
'aaa':111
}
}
}
from pprint import pprint as pp
>>> pp(dict( flattenDict(testData) ))
{('a',): 1,
('b',): 2,
('c', 'aa'): 11,
('c', 'bb'): 22,
('c', 'cc', 'aaa'): 111}
>>> pp(dict( flattenDict(testData, join=lambda a,b:a+'_'+b, lift=lambda x:x) ))
{'a': 1, 'b': 2, 'c_aa': 11, 'c_bb': 22, 'c_cc_aaa': 111}
>>> pp(dict( (v,k) for k,v in flattenDict(testData, lift=hash, join=lambda a,b:hash((a,b))) ))
{1: 12416037344,
2: 12544037731,
11: 5470935132935744593,
22: 4885734186131977315,
111: 3461911260025554326}
性能:
from functools import reduce
def makeEvilDict(n):
return reduce(lambda acc,x:{x:acc}, [{i:0 for i in range(n)}]+range(n))
import timeit
def time(runnable):
t0 = timeit.default_timer()
_ = runnable()
t1 = timeit.default_timer()
print('took {:.2f} seconds'.format(t1-t0))
>>> pp(makeEvilDict(8))
{7: {6: {5: {4: {3: {2: {1: {0: {0: 0,
1: 0,
2: 0,
3: 0,
4: 0,
5: 0,
6: 0,
7: 0}}}}}}}}}
import sys
sys.setrecursionlimit(1000000)
forget = lambda a,b:''
>>> time(lambda: dict(flattenDict(makeEvilDict(10000), join=forget)) )
took 0.10 seconds
>>> time(lambda: dict(flattenDict(makeEvilDict(100000), join=forget)) )
[1] 12569 segmentation fault python
... 唉,别以为那是我的错……
[由于中庸问题,不重要的历史注释]
关于所谓的重复扁平化字典的字典(2层深)列表
That question's solution can be implemented in terms of this one by doing sorted( sum(flatten(...),[]) ). The reverse is not possible: while it is true that the values of flatten(...) can be recovered from the alleged duplicate by mapping a higher-order accumulator, one cannot recover the keys. (edit: Also it turns out that the alleged duplicate owner's question is completely different, in that it only deals with dictionaries exactly 2-level deep, though one of the answers on that page gives a general solution.)
代码:
test = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]}
def parse_dict(init, lkey=''):
ret = {}
for rkey,val in init.items():
key = lkey+rkey
if isinstance(val, dict):
ret.update(parse_dict(val, key+'_'))
else:
ret[key] = val
return ret
print(parse_dict(test,''))
结果:
$ python test.py
{'a': 1, 'c_a': 2, 'c_b_x': 5, 'd': [1, 2, 3], 'c_b_y': 10}
我使用python3.2,更新为您的python版本。
在Python3.5中提供功能和性能的解决方案如何?
from functools import reduce
def _reducer(items, key, val, pref):
if isinstance(val, dict):
return {**items, **flatten(val, pref + key)}
else:
return {**items, pref + key: val}
def flatten(d, pref=''):
return(reduce(
lambda new_d, kv: _reducer(new_d, *kv, pref),
d.items(),
{}
))
这是更有表现力的:
def flatten(d, pref=''):
return(reduce(
lambda new_d, kv: \
isinstance(kv[1], dict) and \
{**new_d, **flatten(kv[1], pref + kv[0])} or \
{**new_d, pref + kv[0]: kv[1]},
d.items(),
{}
))
在使用:
my_obj = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y': 10}}, 'd': [1, 2, 3]}
print(flatten(my_obj))
# {'d': [1, 2, 3], 'cby': 10, 'cbx': 5, 'ca': 2, 'a': 1}
Davoud的解决方案非常好,但当嵌套的字典也包含字典列表时,并不能给出令人满意的结果,但他的代码可以适应这种情况:
def flatten_dict(d):
items = []
for k, v in d.items():
try:
if (type(v)==type([])):
for l in v: items.extend(flatten_dict(l).items())
else:
items.extend(flatten_dict(v).items())
except AttributeError:
items.append((k, v))
return dict(items)
def flatten(unflattened_dict, separator='_'):
flattened_dict = {}
for k, v in unflattened_dict.items():
if isinstance(v, dict):
sub_flattened_dict = flatten(v, separator)
for k2, v2 in sub_flattened_dict.items():
flattened_dict[k + separator + k2] = v2
else:
flattened_dict[k] = v
return flattened_dict