我需要写一个加权版的random。选择(列表中的每个元素有不同的被选择的概率)。这是我想到的:

def weightedChoice(choices):
    """Like random.choice, but each element can have a different chance of
    being selected.

    choices can be any iterable containing iterables with two items each.
    Technically, they can have more than two items, the rest will just be
    ignored.  The first item is the thing being chosen, the second item is
    its weight.  The weights can be any numeric values, what matters is the
    relative differences between them.
    """
    space = {}
    current = 0
    for choice, weight in choices:
        if weight > 0:
            space[current] = choice
            current += weight
    rand = random.uniform(0, current)
    for key in sorted(space.keys() + [current]):
        if rand < key:
            return choice
        choice = space[key]
    return None

这个函数对我来说太复杂了,而且很丑。我希望这里的每个人都能提供一些改进的建议或其他方法。对我来说,效率没有代码的整洁和可读性重要。


当前回答

如果你没有提前定义你想要选择多少项(所以,你没有做k=10这样的事情),你只有概率,你可以做下面的事情。注意,你的概率加起来不需要等于1,它们可以相互独立:

soup_items = ['pepper', 'onion', 'tomato', 'celery'] 
items_probability = [0.2, 0.3, 0.9, 0.1]

selected_items = [item for item,p in zip(soup_items,items_probability) if random.random()<p]
print(selected_items)
>>>['pepper','tomato']

其他回答

一种方法是随机化所有权重的总和,然后使用这些值作为每个变量的极限点。以下是作为生成器的粗略实现。

def rand_weighted(weights):
    """
    Generator which uses the weights to generate a
    weighted random values
    """
    sum_weights = sum(weights.values())
    cum_weights = {}
    current_weight = 0
    for key, value in sorted(weights.iteritems()):
        current_weight += value
        cum_weights[key] = current_weight
    while True:
        sel = int(random.uniform(0, 1) * sum_weights)
        for key, value in sorted(cum_weights.iteritems()):
            if sel < value:
                break
        yield key

从Python 3.6开始,随机模块中有一个方法选择。

In [1]: import random

In [2]: random.choices(
...:     population=[['a','b'], ['b','a'], ['c','b']],
...:     weights=[0.2, 0.2, 0.6],
...:     k=10
...: )

Out[2]:
[['c', 'b'],
 ['c', 'b'],
 ['b', 'a'],
 ['c', 'b'],
 ['c', 'b'],
 ['b', 'a'],
 ['c', 'b'],
 ['b', 'a'],
 ['c', 'b'],
 ['c', 'b']]

注意随机。选择将与替换样本,每个文档:

返回一个k大小的元素列表,这些元素是从替换的填充中选择的。

为确保回答的完整性,请注意:

当从一个有限的总体中抽取一个抽样单位并返回时 对于该种群,在其特征被记录下来之后, 在绘制下一个单元之前,采样被称为“与” 更换”。它基本上意味着每个元素可以被选择多于 一次。

如果您需要在不替换的情况下进行采样,那么就像@ronan-paixão的精彩回答所说的那样,您可以使用numpy。Choice,其replace参数控制这种行为。

如果不介意使用numpy,可以使用numpy.random.choice。

例如:

import numpy

items  = [["item1", 0.2], ["item2", 0.3], ["item3", 0.45], ["item4", 0.05]
elems = [i[0] for i in items]
probs = [i[1] for i in items]

trials = 1000
results = [0] * len(items)
for i in range(trials):
    res = numpy.random.choice(items, p=probs)  #This is where the item is selected!
    results[items.index(res)] += 1
results = [r / float(trials) for r in results]
print "item\texpected\tactual"
for i in range(len(probs)):
    print "%s\t%0.4f\t%0.4f" % (items[i], probs[i], results[i])

如果你知道你需要提前做多少选择,你可以不像这样循环:

numpy.random.choice(items, trials, p=probs)

如果你有一个加权字典而不是一个列表,你可以这样写

items = { "a": 10, "b": 5, "c": 1 } 
random.choice([k for k in items for dummy in range(items[k])])

注意(k, k范围的虚拟物品(物品[k])]产生这个列表(' a ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' c ', ' b ', ' b ', ' b ', ' b ', ' b ']

我可能已经来不及提供任何有用的东西了,但这里有一个简单,简短,非常有效的片段:

def choose_index(probabilies):
    cmf = probabilies[0]
    choice = random.random()
    for k in xrange(len(probabilies)):
        if choice <= cmf:
            return k
        else:
            cmf += probabilies[k+1]

不需要排序你的概率或用你的cmf创建一个向量,它一旦找到它的选择就会终止。内存:O(1),时间:O(N),平均运行时间~ N/2。

如果你有权重,只需添加一行:

def choose_index(weights):
    probabilities = weights / sum(weights)
    cmf = probabilies[0]
    choice = random.random()
    for k in xrange(len(probabilies)):
        if choice <= cmf:
            return k
        else:
            cmf += probabilies[k+1]