从字典中提取键值对的子集?

我有一个大字典对象，它有几个键值对(大约16个)，但我只对其中3个感兴趣。将这样的字典子集化的最佳方法(最短/有效/最优雅)是什么?

我知道的最好的是:

bigdict = {'a':1,'b':2,....,'z':26} 
subdict = {'l':bigdict['l'], 'm':bigdict['m'], 'n':bigdict['n']}

我相信有比这更优雅的方式。

当前回答

使用地图(halfdanrump的答案)对我来说是最好的，尽管还没有计时……

但是如果你使用一个字典，如果你有一个big_dict:

一定要确保你遍历了要求。这是至关重要的，并且会影响算法的运行时间(大O, theta，你能想到的) 把它写得足够通用，以避免在没有键的情况下出现错误。

例如:

big_dict = {'a':1,'b':2,'c':3,................................................}
req = ['a','c','w']

{k:big_dict.get(k,None) for k in req )
# or 
{k:big_dict[k] for k in req if k in big_dict)

请注意，在相反的情况下，req很大，但my_dict很小，您应该通过my_dict进行循环。

一般来说，我们在做一个交集问题的复杂度是O(min(len(dict)) min(len(req)))Python自己的intersection实现考虑了两个集合的大小，所以它看起来是最优的。而且，作为c语言的核心库的一部分，可能比大多数未优化的python语句要快。因此，我考虑的解决方案是:

dict = {'a':1,'b':2,'c':3,................................................}
req = ['a','c','w',...................]

{k:dic[k] for k in set(req).intersection(dict.keys())}

它将关键操作移到python的c代码中，并适用于所有情况。

2020-11-11 10:08:59

其他回答

你可以试试:

dict((k, bigdict[k]) for k in ('l', 'm', 'n'))

．.．或Python 3 Python 2.7或更高版本(感谢Fábio Diniz指出它在2.7中也适用):

{k: bigdict[k] for k in ('l', 'm', 'n')}

更新:正如Håvard S指出的那样，我假设你知道键将在字典中-如果你不能做出这样的假设，请参阅他的答案。或者，正如timbo在评论中指出的那样，如果你想要bigdict中缺少的键映射到None，你可以这样做:

{k: bigdict.get(k, None) for k in ('l', 'm', 'n')}

如果你正在使用python3，并且你只想要新字典中的键实际上存在于原始字典中，你可以使用fact来查看对象，实现一些set操作:

{k: bigdict[k] for k in bigdict.keys() & {'l', 'm', 'n'}}

2011-03-18 13:28:01

py3.8+中另一种避免big_dict中缺少键的None值的方法使用walrus:

small_dict = {key: val for key in ('l', 'm', 'n') if (val := big_dict.get(key))}

2022-11-10 03:41:02

好吧，这个问题已经困扰过我几次了，谢谢你的提问。

上面的答案似乎是一个很好的解决方案，但如果你在你的代码中使用它，我认为包装功能是有意义的。此外，这里有两个可能的用例:一个是关心是否所有关键字都在原始字典中。还有一个你不知道的地方。如果能对两者一视同仁就好了。

所以，为了我的二分之一的价值，我建议写一个字典的子类，例如。

class my_dict(dict):
    def subdict(self, keywords, fragile=False):
        d = {}
        for k in keywords:
            try:
                d[k] = self[k]
            except KeyError:
                if fragile:
                    raise
        return d

现在您可以使用orig_dict.subdict(关键字)提取子字典

使用例子:

#
## our keywords are letters of the alphabet
keywords = 'abcdefghijklmnopqrstuvwxyz'
#
## our dictionary maps letters to their index
d = my_dict([(k,i) for i,k in enumerate(keywords)])
print('Original dictionary:\n%r\n\n' % (d,))
#
## constructing a sub-dictionary with good keywords
oddkeywords = keywords[::2]
subd = d.subdict(oddkeywords)
print('Dictionary from odd numbered keys:\n%r\n\n' % (subd,))
#
## constructing a sub-dictionary with mixture of good and bad keywords
somebadkeywords = keywords[1::2] + 'A'
try:
    subd2 = d.subdict(somebadkeywords)
    print("We shouldn't see this message")
except KeyError:
    print("subd2 construction fails:")
    print("\toriginal dictionary doesn't contain some keys\n\n")
#
## Trying again with fragile set to false
try:
    subd3 = d.subdict(somebadkeywords, fragile=False)
    print('Dictionary constructed using some bad keys:\n%r\n\n' % (subd3,))
except KeyError:
    print("We shouldn't see this message")

如果你运行上面所有的代码，你应该会看到(类似于)下面的输出(抱歉格式化):

Original dictionary: {'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': 6, 'f': 5, 'i': 8, 'h': 7, 'k': 10, 'j': 9, 'm': 12, 'l': 11, 'o': 14, 'n': 13, 'q': 16, 'p': 15, 's': 18, 'r': 17, 'u': 20, 't': 19, 'w': 22, 'v': 21, 'y': 24, 'x': 23, 'z': 25} Dictionary from odd numbered keys: {'a': 0, 'c': 2, 'e': 4, 'g': 6, 'i': 8, 'k': 10, 'm': 12, 'o': 14, 'q': 16, 's': 18, 'u': 20, 'w': 22, 'y': 24} subd2 construction fails: original dictionary doesn't contain some keys Dictionary constructed using some bad keys: {'b': 1, 'd': 3, 'f': 5, 'h': 7, 'j': 9, 'l': 11, 'n': 13, 'p': 15, 'r': 17, 't': 19, 'v': 21, 'x': 23, 'z': 25}

2015-03-11 12:15:34

至少要短一点:

wanted_keys = ['l', 'm', 'n'] # The keys you want
dict((k, bigdict[k]) for k in wanted_keys if k in bigdict)

2011-03-18 13:28:55

比较一下所有提到的方法的速度:

更新于2020.07.13(谢谢@user3780389): 仅用于bigdict中的键。

 IPython 5.5.0 -- An enhanced Interactive Python.
Python 2.7.18 (default, Aug  8 2019, 00:00:00) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux2
import numpy.random as nprnd
  ...: keys = nprnd.randint(100000, size=10000)
  ...: bigdict = dict([(_, nprnd.rand()) for _ in range(100000)])
  ...: 
  ...: %timeit {key:bigdict[key] for key in keys}
  ...: %timeit dict((key, bigdict[key]) for key in keys)
  ...: %timeit dict(map(lambda k: (k, bigdict[k]), keys))
  ...: %timeit {key:bigdict[key] for key in set(keys) & set(bigdict.keys())}
  ...: %timeit dict(filter(lambda i:i[0] in keys, bigdict.items()))
  ...: %timeit {key:value for key, value in bigdict.items() if key in keys}
100 loops, best of 3: 2.36 ms per loop
100 loops, best of 3: 2.87 ms per loop
100 loops, best of 3: 3.65 ms per loop
100 loops, best of 3: 7.14 ms per loop
1 loop, best of 3: 577 ms per loop
1 loop, best of 3: 563 ms per loop

正如预期的那样:字典推导式是最好的选择。

2016-03-29 09:48:44

从字典中提取键值对的子集?

推荐文章

最新文章

标签