假设我有这个:

[
  {"name": "Tom", "age": 10},
  {"name": "Mark", "age": 5},
  {"name": "Pam", "age": 7}
]

通过搜索“Pam”作为名称,我想检索相关的字典:{name:“Pam”,年龄:7}

如何做到这一点?


当前回答

鸭子将比列表理解或过滤器快得多。它在你的对象上建立一个索引,这样查找就不需要扫描每一个项目。

PIP安装鸭

from ducks import Dex

dicts = [
  {"name": "Tom", "age": 10},
  {"name": "Mark", "age": 5},
  {"name": "Pam", "age": 7}
]

# Build the index
dex = Dex(dicts, {'name': str, 'age': int})

# Find matching objects
dex[{'name': 'Pam', 'age': 7}]

结果:[{'name': 'Pam', 'age': 7}]

其他回答

people = [
{'name': "Tom", 'age': 10},
{'name': "Mark", 'age': 5},
{'name': "Pam", 'age': 7}
]

def search(name):
    for p in people:
        if p['name'] == name:
            return p

search("Pam")

@Frédéric Hamidi的回答很好。在Python 3中。X .next()的语法稍有改变。因此有一个小小的修改:

>>> dicts = [
     { "name": "Tom", "age": 10 },
     { "name": "Mark", "age": 5 },
     { "name": "Pam", "age": 7 },
     { "name": "Dick", "age": 12 }
 ]
>>> next(item for item in dicts if item["name"] == "Pam")
{'age': 7, 'name': 'Pam'}

正如@Matt在评论中提到的,你可以添加一个默认值:

>>> next((item for item in dicts if item["name"] == "Pam"), False)
{'name': 'Pam', 'age': 7}
>>> next((item for item in dicts if item["name"] == "Sam"), False)
False
>>>

你试过熊猫套餐吗?它非常适合这类搜索任务,也进行了优化。

import pandas as pd

listOfDicts = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]

# Create a data frame, keys are used as column headers.
# Dict items with the same key are entered into the same respective column.
df = pd.DataFrame(listOfDicts)

# The pandas dataframe allows you to pick out specific values like so:

df2 = df[ (df['name'] == 'Pam') & (df['age'] == 7) ]

# Alternate syntax, same thing

df2 = df[ (df.name == 'Pam') & (df.age == 7) ]

我在下面添加了一些基准测试,以说明熊猫在更大范围内(即10万+条目)的更快运行时间:

setup_large = 'dicts = [];\
[dicts.extend(({ "name": "Tom", "age": 10 },{ "name": "Mark", "age": 5 },\
{ "name": "Pam", "age": 7 },{ "name": "Dick", "age": 12 })) for _ in range(25000)];\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(dicts);'

setup_small = 'dicts = [];\
dicts.extend(({ "name": "Tom", "age": 10 },{ "name": "Mark", "age": 5 },\
{ "name": "Pam", "age": 7 },{ "name": "Dick", "age": 12 }));\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(dicts);'

method1 = '[item for item in dicts if item["name"] == "Pam"]'
method2 = 'df[df["name"] == "Pam"]'

import timeit
t = timeit.Timer(method1, setup_small)
print('Small Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_small)
print('Small Method Pandas: ' + str(t.timeit(100)))

t = timeit.Timer(method1, setup_large)
print('Large Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_large)
print('Large Method Pandas: ' + str(t.timeit(100)))

#Small Method LC: 0.000191926956177
#Small Method Pandas: 0.044392824173
#Large Method LC: 1.98827004433
#Large Method Pandas: 0.324505090714

将接受的答案放在函数中,以便于重用

def get_item(collection, key, target):
    return next((item for item in collection if item[key] == target), None)

也可以写成

   get_item_lambda = lambda collection, key, target : next((item for item in collection if item[key] == target), None)

结果

    key = "name"
    target = "Pam"
    print(get_item(target_list, key, target))
    print(get_item_lambda(target_list, key, target))

    #{'name': 'Pam', 'age': 7}
    #{'name': 'Pam', 'age': 7}

如果键可能不在目标字典中,请使用dict。get和避免KeyError

def get_item(collection, key, target):
    return next((item for item in collection if item.get(key, None) == target), None)

get_item_lambda = lambda collection, key, target : next((item for item in collection if item.get(key, None) == target), None)

这里是一个比较,使用遍历列表,使用过滤器+lambda或重构(如果需要或对你的情况有效)你的代码dict of dicts而不是list of dicts

import time

# Build list of dicts
list_of_dicts = list()
for i in range(100000):
    list_of_dicts.append({'id': i, 'name': 'Tom'})

# Build dict of dicts
dict_of_dicts = dict()
for i in range(100000):
    dict_of_dicts[i] = {'name': 'Tom'}


# Find the one with ID of 99

# 1. iterate through the list
lod_ts = time.time()
for elem in list_of_dicts:
    if elem['id'] == 99999:
        break
lod_tf = time.time()
lod_td = lod_tf - lod_ts

# 2. Use filter
f_ts = time.time()
x = filter(lambda k: k['id'] == 99999, list_of_dicts)
f_tf = time.time()
f_td = f_tf- f_ts

# 3. find it in dict of dicts
dod_ts = time.time()
x = dict_of_dicts[99999]
dod_tf = time.time()
dod_td = dod_tf - dod_ts


print 'List of Dictionries took: %s' % lod_td
print 'Using filter took: %s' % f_td
print 'Dict of Dicts took: %s' % dod_td

输出是这样的:

List of Dictionries took: 0.0099310874939
Using filter took: 0.0121960639954
Dict of Dicts took: 4.05311584473e-06

结论: 显然,字典是最有效的搜索方法在这种情况下,你知道,你只能通过id搜索。 有趣的是,使用过滤器是最慢的解决方案。