我需要写一个加权版的random。选择(列表中的每个元素有不同的被选择的概率)。这是我想到的:

def weightedChoice(choices):
    """Like random.choice, but each element can have a different chance of
    being selected.

    choices can be any iterable containing iterables with two items each.
    Technically, they can have more than two items, the rest will just be
    ignored.  The first item is the thing being chosen, the second item is
    its weight.  The weights can be any numeric values, what matters is the
    relative differences between them.
    """
    space = {}
    current = 0
    for choice, weight in choices:
        if weight > 0:
            space[current] = choice
            current += weight
    rand = random.uniform(0, current)
    for key in sorted(space.keys() + [current]):
        if rand < key:
            return choice
        choice = space[key]
    return None

这个函数对我来说太复杂了,而且很丑。我希望这里的每个人都能提供一些改进的建议或其他方法。对我来说,效率没有代码的整洁和可读性重要。


当前回答

下面是Python 3.6标准库中包含的版本:

import itertools as _itertools
import bisect as _bisect

class Random36(random.Random):
    "Show the code included in the Python 3.6 version of the Random class"

    def choices(self, population, weights=None, *, cum_weights=None, k=1):
        """Return a k sized list of population elements chosen with replacement.

        If the relative weights or cumulative weights are not specified,
        the selections are made with equal probability.

        """
        random = self.random
        if cum_weights is None:
            if weights is None:
                _int = int
                total = len(population)
                return [population[_int(random() * total)] for i in range(k)]
            cum_weights = list(_itertools.accumulate(weights))
        elif weights is not None:
            raise TypeError('Cannot specify both weights and cumulative weights')
        if len(cum_weights) != len(population):
            raise ValueError('The number of weights does not match the population')
        bisect = _bisect.bisect
        total = cum_weights[-1]
        return [population[bisect(cum_weights, random() * total)] for i in range(k)]

来源:https://hg.python.org/cpython/file/tip/Lib/random.py l340

其他回答

下面是使用numpy的另一个版本的weighted_choice。传入weights向量,它将返回一个由0组成的数组,其中包含一个1,表示所选择的bin。该代码默认只进行一次绘制,但您可以传入绘制的数量,并且将返回每个绘制的bin的计数。

如果权重向量的和不等于1,它将被规范化,使之等于1。

import numpy as np

def weighted_choice(weights, n=1):
    if np.sum(weights)!=1:
        weights = weights/np.sum(weights)

    draws = np.random.random_sample(size=n)

    weights = np.cumsum(weights)
    weights = np.insert(weights,0,0.0)

    counts = np.histogram(draws, bins=weights)
    return(counts[0])

假设你有

items = [11, 23, 43, 91] 
probability = [0.2, 0.3, 0.4, 0.1]

你有一个函数,它生成一个介于[0,1)之间的随机数(我们可以在这里使用random.random())。 现在求概率的前缀和

prefix_probability=[0.2,0.5,0.9,1]

现在,我们只需取一个0-1之间的随机数,然后使用二分搜索来查找该数字在prefix_probability中的位置。这个索引就是你的答案

代码是这样的

return items[bisect.bisect(prefix_probability,random.random())]

步骤1:生成您感兴趣的CDF F

步骤2:生成u.r.v. u

步骤3:求z=F^{-1}(u)

这种建模在概率论或随机过程课程中有描述。这是适用的,因为您有简单的CDF。

从版本1.7.0开始,NumPy有一个支持概率分布的选择函数。

from numpy.random import choice
draw = choice(list_of_candidates, number_of_items_to_pick,
              p=probability_distribution)

注意,probability_distribution是一个与list_of_candidate顺序相同的序列。您还可以使用关键字replace=False来更改行为,这样绘制的项就不会被替换。

我看了指向的其他线程,并在我的编码风格中提出了这种变化,这返回了用于计数的索引,但返回字符串很简单(注释返回替代):

import random
import bisect

try:
    range = xrange
except:
    pass

def weighted_choice(choices):
    total, cumulative = 0, []
    for c,w in choices:
        total += w
        cumulative.append((total, c))
    r = random.uniform(0, total)
    # return index
    return bisect.bisect(cumulative, (r,))
    # return item string
    #return choices[bisect.bisect(cumulative, (r,))][0]

# define choices and relative weights
choices = [("WHITE",90), ("RED",8), ("GREEN",2)]

tally = [0 for item in choices]

n = 100000
# tally up n weighted choices
for i in range(n):
    tally[weighted_choice(choices)] += 1

print([t/sum(tally)*100 for t in tally])