假设你有一本这样的字典:
{'a': 1,
'c': {'a': 2,
'b': {'x': 5,
'y' : 10}},
'd': [1, 2, 3]}
你会如何把它平摊成这样:
{'a': 1,
'c_a': 2,
'c_b_x': 5,
'c_b_y': 10,
'd': [1, 2, 3]}
假设你有一本这样的字典:
{'a': 1,
'c': {'a': 2,
'b': {'x': 5,
'y' : 10}},
'd': [1, 2, 3]}
你会如何把它平摊成这样:
{'a': 1,
'c_a': 2,
'c_b_x': 5,
'c_b_y': 10,
'd': [1, 2, 3]}
当前回答
def flatten(unflattened_dict, separator='_'):
flattened_dict = {}
for k, v in unflattened_dict.items():
if isinstance(v, dict):
sub_flattened_dict = flatten(v, separator)
for k2, v2 in sub_flattened_dict.items():
flattened_dict[k + separator + k2] = v2
else:
flattened_dict[k] = v
return flattened_dict
其他回答
原始海报需要考虑两大因素:
Are there keyspace clobbering issues? For example, {'a_b':{'c':1}, 'a':{'b_c':2}} would result in {'a_b_c':???}. The below solution evades the problem by returning an iterable of pairs. If performance is an issue, does the key-reducer function (which I hereby refer to as 'join') require access to the entire key-path, or can it just do O(1) work at every node in the tree? If you want to be able to say joinedKey = '_'.join(*keys), that will cost you O(N^2) running time. However if you're willing to say nextKey = previousKey+'_'+thisKey, that gets you O(N) time. The solution below lets you do both (since you could merely concatenate all the keys, then postprocess them).
(Performance is not likely an issue, but I'll elaborate on the second point in case anyone else cares: In implementing this, there are numerous dangerous choices. If you do this recursively and yield and re-yield, or anything equivalent which touches nodes more than once (which is quite easy to accidentally do), you are doing potentially O(N^2) work rather than O(N). This is because maybe you are calculating a key a then a_1 then a_1_i..., and then calculating a then a_1 then a_1_ii..., but really you shouldn't have to calculate a_1 again. Even if you aren't recalculating it, re-yielding it (a 'level-by-level' approach) is just as bad. A good example is to think about the performance on {1:{1:{1:{1:...(N times)...{1:SOME_LARGE_DICTIONARY_OF_SIZE_N}...}}}})
下面是一个函数,我写的扁平化dict (d, join=…, lift=…),可以适应许多用途,可以做你想做的事情。遗憾的是,要在不产生上述性能损失的情况下创建这个函数的惰性版本是相当困难的(许多python内置程序,如chain.from_iterable实际上并不是有效的,这是我在确定这个代码之前对三个不同版本的代码进行了大量测试后才意识到的)。
from collections import Mapping
from itertools import chain
from operator import add
_FLAG_FIRST = object()
def flattenDict(d, join=add, lift=lambda x:(x,)):
results = []
def visit(subdict, results, partialKey):
for k,v in subdict.items():
newKey = lift(k) if partialKey==_FLAG_FIRST else join(partialKey,lift(k))
if isinstance(v,Mapping):
visit(v, results, newKey)
else:
results.append((newKey,v))
visit(d, results, _FLAG_FIRST)
return results
为了更好地理解这是怎么回事,下面是一个图表,供那些不熟悉reduce(左),也称为“左折叠”的人使用。有时用一个初始值来代替k0(不是列表的一部分,传递给函数)。这里,J是我们的连接函数。我们用升力(k)对每kn进行预处理。
[k0,k1,...,kN].foldleft(J)
/ \
... kN
/
J(k0,J(k1,J(k2,k3)))
/ \
/ \
J(J(k0,k1),k2) k3
/ \
/ \
J(k0,k1) k2
/ \
/ \
k0 k1
这实际上与functools相同。Reduce,但是我们的函数对树的所有键路径都做了这个。
>>> reduce(lambda a,b:(a,b), range(5))
((((0, 1), 2), 3), 4)
演示(否则我将放在docstring中):
>>> testData = {
'a':1,
'b':2,
'c':{
'aa':11,
'bb':22,
'cc':{
'aaa':111
}
}
}
from pprint import pprint as pp
>>> pp(dict( flattenDict(testData) ))
{('a',): 1,
('b',): 2,
('c', 'aa'): 11,
('c', 'bb'): 22,
('c', 'cc', 'aaa'): 111}
>>> pp(dict( flattenDict(testData, join=lambda a,b:a+'_'+b, lift=lambda x:x) ))
{'a': 1, 'b': 2, 'c_aa': 11, 'c_bb': 22, 'c_cc_aaa': 111}
>>> pp(dict( (v,k) for k,v in flattenDict(testData, lift=hash, join=lambda a,b:hash((a,b))) ))
{1: 12416037344,
2: 12544037731,
11: 5470935132935744593,
22: 4885734186131977315,
111: 3461911260025554326}
性能:
from functools import reduce
def makeEvilDict(n):
return reduce(lambda acc,x:{x:acc}, [{i:0 for i in range(n)}]+range(n))
import timeit
def time(runnable):
t0 = timeit.default_timer()
_ = runnable()
t1 = timeit.default_timer()
print('took {:.2f} seconds'.format(t1-t0))
>>> pp(makeEvilDict(8))
{7: {6: {5: {4: {3: {2: {1: {0: {0: 0,
1: 0,
2: 0,
3: 0,
4: 0,
5: 0,
6: 0,
7: 0}}}}}}}}}
import sys
sys.setrecursionlimit(1000000)
forget = lambda a,b:''
>>> time(lambda: dict(flattenDict(makeEvilDict(10000), join=forget)) )
took 0.10 seconds
>>> time(lambda: dict(flattenDict(makeEvilDict(100000), join=forget)) )
[1] 12569 segmentation fault python
... 唉,别以为那是我的错……
[由于中庸问题,不重要的历史注释]
关于所谓的重复扁平化字典的字典(2层深)列表
That question's solution can be implemented in terms of this one by doing sorted( sum(flatten(...),[]) ). The reverse is not possible: while it is true that the values of flatten(...) can be recovered from the alleged duplicate by mapping a higher-order accumulator, one cannot recover the keys. (edit: Also it turns out that the alleged duplicate owner's question is completely different, in that it only deals with dictionaries exactly 2-level deep, though one of the answers on that page gives a general solution.)
简单的函数来平嵌套字典。对于Python 3,用.items()替换.iteritems()
def flatten_dict(init_dict):
res_dict = {}
if type(init_dict) is not dict:
return res_dict
for k, v in init_dict.iteritems():
if type(v) == dict:
res_dict.update(flatten_dict(v))
else:
res_dict[k] = v
return res_dict
这个想法/要求是: 获取不保留父键的平面字典。
用法示例:
dd = {'a': 3,
'b': {'c': 4, 'd': 5},
'e': {'f':
{'g': 1, 'h': 2}
},
'i': 9,
}
flatten_dict(dd)
>> {'a': 3, 'c': 4, 'd': 5, 'g': 1, 'h': 2, 'i': 9}
保留父密钥也很简单。
这是一种“功能性的”、“单行程序”实现。它是递归的,基于条件表达式和字典理解。
def flatten_dict(dd, separator='_', prefix=''):
return { prefix + separator + k if prefix else k : v
for kk, vv in dd.items()
for k, v in flatten_dict(vv, separator, kk).items()
} if isinstance(dd, dict) else { prefix : dd }
测试:
In [2]: flatten_dict({'abc':123, 'hgf':{'gh':432, 'yu':433}, 'gfd':902, 'xzxzxz':{"432":{'0b0b0b':231}, "43234":1321}}, '.')
Out[2]:
{'abc': 123,
'gfd': 902,
'hgf.gh': 432,
'hgf.yu': 433,
'xzxzxz.432.0b0b0b': 231,
'xzxzxz.43234': 1321}
使用发电机:
def flat_dic_helper(prepand,d):
if len(prepand) > 0:
prepand = prepand + "_"
for k in d:
i = d[k]
if isinstance(i, dict):
r = flat_dic_helper(prepand + k,i)
for j in r:
yield j
else:
yield (prepand + k,i)
def flat_dic(d):
return dict(flat_dic_helper("",d))
d = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]}
print(flat_dic(d))
>> {'a': 1, 'c_a': 2, 'c_b_x': 5, 'd': [1, 2, 3], 'c_b_y': 10}
这并不局限于字典,而是实现.items()的每个映射类型。进一步列表更快,因为它避免了if条件。尽管如此,功劳还是归于伊姆兰:
def flatten(d, parent_key=''):
items = []
for k, v in d.items():
try:
items.extend(flatten(v, '%s%s_' % (parent_key, k)).items())
except AttributeError:
items.append(('%s%s' % (parent_key, k), v))
return dict(items)