我有一个大字典对象,它有几个键值对(大约16个),但我只对其中3个感兴趣。将这样的字典子集化的最佳方法(最短/有效/最优雅)是什么?
我知道的最好的是:
bigdict = {'a':1,'b':2,....,'z':26}
subdict = {'l':bigdict['l'], 'm':bigdict['m'], 'n':bigdict['n']}
我相信有比这更优雅的方式。
我有一个大字典对象,它有几个键值对(大约16个),但我只对其中3个感兴趣。将这样的字典子集化的最佳方法(最短/有效/最优雅)是什么?
我知道的最好的是:
bigdict = {'a':1,'b':2,....,'z':26}
subdict = {'l':bigdict['l'], 'm':bigdict['m'], 'n':bigdict['n']}
我相信有比这更优雅的方式。
当前回答
py3.8+中另一种避免big_dict中缺少键的None值的方法使用walrus:
small_dict = {key: val for key in ('l', 'm', 'n') if (val := big_dict.get(key))}
其他回答
此答案使用与所选答案类似的字典推导,但不会对缺失项进行省略。
Python 2版本:
{k:v for k, v in bigDict.iteritems() if k in ('l', 'm', 'n')}
Python 3版本:
{k:v for k, v in bigDict.items() if k in ('l', 'm', 'n')}
解决方案
from operator import itemgetter
from typing import List, Dict, Union
def subdict(d: Union[Dict, List], columns: List[str]) -> Union[Dict, List[Dict]]:
"""Return a dict or list of dicts with subset of
columns from the d argument.
"""
getter = itemgetter(*columns)
if isinstance(d, list):
result = []
for subset in map(getter, d):
record = dict(zip(columns, subset))
result.append(record)
return result
elif isinstance(d, dict):
return dict(zip(columns, getter(d)))
raise ValueError('Unsupported type for `d`')
使用实例
# pure dict
d = dict(a=1, b=2, c=3)
print(subdict(d, ['a', 'c']))
>>> In [5]: {'a': 1, 'c': 3}
# list of dicts
d = [
dict(a=1, b=2, c=3),
dict(a=2, b=4, c=6),
dict(a=4, b=8, c=12),
]
print(subdict(d, ['a', 'c']))
>>> In [5]: [{'a': 1, 'c': 3}, {'a': 2, 'c': 6}, {'a': 4, 'c': 12}]
你也可以使用map(这是一个非常有用的函数):
sd = dict(map(lambda k:(k, l.get(k, None)), l)))
例子:
large_dictionary = {'a1':123, 'a2':45, 'a3':344}
list_of_keys = ['a1', 'a3']
small_dictionary = dict(map(lambda key: (key, large_dictionary.get(key, None)), list_of_keys))
PS:我借用了.get(键,None)从以前的答案:)
比较一下所有提到的方法的速度:
更新于2020.07.13(谢谢@user3780389): 仅用于bigdict中的键。
IPython 5.5.0 -- An enhanced Interactive Python.
Python 2.7.18 (default, Aug 8 2019, 00:00:00)
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux2
import numpy.random as nprnd
...: keys = nprnd.randint(100000, size=10000)
...: bigdict = dict([(_, nprnd.rand()) for _ in range(100000)])
...:
...: %timeit {key:bigdict[key] for key in keys}
...: %timeit dict((key, bigdict[key]) for key in keys)
...: %timeit dict(map(lambda k: (k, bigdict[k]), keys))
...: %timeit {key:bigdict[key] for key in set(keys) & set(bigdict.keys())}
...: %timeit dict(filter(lambda i:i[0] in keys, bigdict.items()))
...: %timeit {key:value for key, value in bigdict.items() if key in keys}
100 loops, best of 3: 2.36 ms per loop
100 loops, best of 3: 2.87 ms per loop
100 loops, best of 3: 3.65 ms per loop
100 loops, best of 3: 7.14 ms per loop
1 loop, best of 3: 577 ms per loop
1 loop, best of 3: 563 ms per loop
正如预期的那样:字典推导式是最好的选择。
py3.8+中另一种避免big_dict中缺少键的None值的方法使用walrus:
small_dict = {key: val for key in ('l', 'm', 'n') if (val := big_dict.get(key))}