我有一个大字典对象,它有几个键值对(大约16个),但我只对其中3个感兴趣。将这样的字典子集化的最佳方法(最短/有效/最优雅)是什么?

我知道的最好的是:

bigdict = {'a':1,'b':2,....,'z':26} 
subdict = {'l':bigdict['l'], 'm':bigdict['m'], 'n':bigdict['n']}

我相信有比这更优雅的方式。


当前回答

interesting_keys = ('l', 'm', 'n')
subdict = {x: bigdict[x] for x in interesting_keys if x in bigdict}

其他回答

还有一个问题(我更喜欢Mark Longair的答案)

di = {'a':1,'b':2,'c':3}
req = ['a','c','w']
dict([i for i in di.iteritems() if i[0] in di and i[0] in req])

解决方案

from operator import itemgetter
from typing import List, Dict, Union


def subdict(d: Union[Dict, List], columns: List[str]) -> Union[Dict, List[Dict]]:
    """Return a dict or list of dicts with subset of 
    columns from the d argument.
    """
    getter = itemgetter(*columns)

    if isinstance(d, list):
        result = []
        for subset in map(getter, d):
            record = dict(zip(columns, subset))
            result.append(record)
        return result
    elif isinstance(d, dict):
        return dict(zip(columns, getter(d)))

    raise ValueError('Unsupported type for `d`')

使用实例

# pure dict

d = dict(a=1, b=2, c=3)
print(subdict(d, ['a', 'c']))

>>> In [5]: {'a': 1, 'c': 3}
# list of dicts

d = [
    dict(a=1, b=2, c=3),
    dict(a=2, b=4, c=6),
    dict(a=4, b=8, c=12),
]

print(subdict(d, ['a', 'c']))

>>> In [5]: [{'a': 1, 'c': 3}, {'a': 2, 'c': 6}, {'a': 4, 'c': 12}]

你可以试试:

dict((k, bigdict[k]) for k in ('l', 'm', 'n'))

... 或Python 3 Python 2.7或更高版本(感谢Fábio Diniz指出它在2.7中也适用):

{k: bigdict[k] for k in ('l', 'm', 'n')}

更新:正如Håvard S指出的那样,我假设你知道键将在字典中-如果你不能做出这样的假设,请参阅他的答案。或者,正如timbo在评论中指出的那样,如果你想要bigdict中缺少的键映射到None,你可以这样做:

{k: bigdict.get(k, None) for k in ('l', 'm', 'n')}

如果你正在使用python3,并且你只想要新字典中的键实际上存在于原始字典中,你可以使用fact来查看对象,实现一些set操作:

{k: bigdict[k] for k in bigdict.keys() & {'l', 'm', 'n'}}

至少要短一点:

wanted_keys = ['l', 'm', 'n'] # The keys you want
dict((k, bigdict[k]) for k in wanted_keys if k in bigdict)

比较一下所有提到的方法的速度:

更新于2020.07.13(谢谢@user3780389): 仅用于bigdict中的键。

 IPython 5.5.0 -- An enhanced Interactive Python.
Python 2.7.18 (default, Aug  8 2019, 00:00:00) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux2
import numpy.random as nprnd
  ...: keys = nprnd.randint(100000, size=10000)
  ...: bigdict = dict([(_, nprnd.rand()) for _ in range(100000)])
  ...: 
  ...: %timeit {key:bigdict[key] for key in keys}
  ...: %timeit dict((key, bigdict[key]) for key in keys)
  ...: %timeit dict(map(lambda k: (k, bigdict[k]), keys))
  ...: %timeit {key:bigdict[key] for key in set(keys) & set(bigdict.keys())}
  ...: %timeit dict(filter(lambda i:i[0] in keys, bigdict.items()))
  ...: %timeit {key:value for key, value in bigdict.items() if key in keys}
100 loops, best of 3: 2.36 ms per loop
100 loops, best of 3: 2.87 ms per loop
100 loops, best of 3: 3.65 ms per loop
100 loops, best of 3: 7.14 ms per loop
1 loop, best of 3: 577 ms per loop
1 loop, best of 3: 563 ms per loop

正如预期的那样:字典推导式是最好的选择。