random.choice的加权版本

我需要写一个加权版的random。选择(列表中的每个元素有不同的被选择的概率)。这是我想到的:

def weightedChoice(choices):
    """Like random.choice, but each element can have a different chance of
    being selected.

    choices can be any iterable containing iterables with two items each.
    Technically, they can have more than two items, the rest will just be
    ignored.  The first item is the thing being chosen, the second item is
    its weight.  The weights can be any numeric values, what matters is the
    relative differences between them.
    """
    space = {}
    current = 0
    for choice, weight in choices:
        if weight > 0:
            space[current] = choice
            current += weight
    rand = random.uniform(0, current)
    for key in sorted(space.keys() + [current]):
        if rand < key:
            return choice
        choice = space[key]
    return None

这个函数对我来说太复杂了，而且很丑。我希望这里的每个人都能提供一些改进的建议或其他方法。对我来说，效率没有代码的整洁和可读性重要。

当前回答

另一种方法是，假设我们的权重与元素数组中的元素的下标相同。

import numpy as np
weights = [0.1, 0.3, 0.5] #weights for the item at index 0,1,2
# sum of weights should be <=1, you can also divide each weight by sum of all weights to standardise it to <=1 constraint.
trials = 1 #number of trials
num_item = 1 #number of items that can be picked in each trial
selected_item_arr = np.random.multinomial(num_item, weights, trials)
# gives number of times an item was selected at a particular index
# this assumes selection with replacement
# one possible output
# selected_item_arr
# array([[0, 0, 1]])
# say if trials = 5, the the possible output could be 
# selected_item_arr
# array([[1, 0, 0],
#   [0, 0, 1],
#   [0, 0, 1],
#   [0, 1, 0],
#   [0, 0, 1]])

现在我们假设，我们要在一次试验中抽取3个项目。你可以假设有三个球R、G、B大量存在，它们的权重由权重数组给定，可能的结果如下:

num_item = 3
trials = 1
selected_item_arr = np.random.multinomial(num_item, weights, trials)
# selected_item_arr can give output like :
# array([[1, 0, 2]])

您还可以将要选择的项目数量视为一组中二项/多项试验的数量。所以，上面的例子仍然可以作为工作

num_binomial_trial = 5
weights = [0.1,0.9] #say an unfair coin weights for H/T
num_experiment_set = 1
selected_item_arr = np.random.multinomial(num_binomial_trial, weights, num_experiment_set)
# possible output
# selected_item_arr
# array([[1, 4]])
# i.e H came 1 time and T came 4 times in 5 binomial trials. And one set contains 5 binomial trails.

2019-10-24 12:42:57

其他回答

下面是使用numpy的另一个版本的weighted_choice。传入weights向量，它将返回一个由0组成的数组，其中包含一个1，表示所选择的bin。该代码默认只进行一次绘制，但您可以传入绘制的数量，并且将返回每个绘制的bin的计数。

如果权重向量的和不等于1，它将被规范化，使之等于1。

import numpy as np

def weighted_choice(weights, n=1):
    if np.sum(weights)!=1:
        weights = weights/np.sum(weights)

    draws = np.random.random_sample(size=n)

    weights = np.cumsum(weights)
    weights = np.insert(weights,0,0.0)

    counts = np.histogram(draws, bins=weights)
    return(counts[0])

2013-11-04 03:33:10

通解:

import random
def weighted_choice(choices, weights):
    total = sum(weights)
    treshold = random.uniform(0, total)
    for k, weight in enumerate(weights):
        total -= weight
        if total < treshold:
            return choices[k]

2013-06-09 15:26:44

假设你有

items = [11, 23, 43, 91] 
probability = [0.2, 0.3, 0.4, 0.1]

你有一个函数，它生成一个介于[0,1)之间的随机数(我们可以在这里使用random.random())。现在求概率的前缀和

prefix_probability=[0.2,0.5,0.9,1]

现在，我们只需取一个0-1之间的随机数，然后使用二分搜索来查找该数字在prefix_probability中的位置。这个索引就是你的答案

代码是这样的

return items[bisect.bisect(prefix_probability,random.random())]

2022-11-29 07:35:14

我需要做这样的事情非常快速非常简单，从搜索的想法，我终于建立了这个模板。其思想是以json的形式从api接收加权值，这里是由dict模拟的。

然后将其转换为一个列表，其中每个值都与它的权重成比例地重复，只需使用random。选择从列表中选择一个值。

我尝试了10次、100次和1000次迭代。分布似乎很稳定。

def weighted_choice(weighted_dict):
    """Input example: dict(apples=60, oranges=30, pineapples=10)"""
    weight_list = []
    for key in weighted_dict.keys():
        weight_list += [key] * weighted_dict[key]
    return random.choice(weight_list)

2018-10-23 12:30:39

从Python v3.6开始，是随机的。选项可用于从给定的填充中返回具有可选权重的指定大小的元素列表。

随机的。select (population, weights=None， *， cum_weights=None, k=1)

总体:包含独特观测值的列表。(如果为空，则引发IndexError) 权重:进行选择所需的更精确的相对权重。 Cum_weights:进行选择所需的累积权重。 K:要输出列表的大小(len)。(默认len () = 1)

一些注意事项:

1)利用加权抽样与替换，使绘制的项目以后可以被替换。权重序列中的值本身并不重要，但它们的相对比例却很重要。

np.random.choice只能将概率作为权重，也必须确保个人概率的总和达到1个标准，但这里没有这样的规定。只要它们属于数值类型(int/float/fraction, Decimal类型除外)，就仍然可以执行。

>>> import random
# weights being integers
>>> random.choices(["white", "green", "red"], [12, 12, 4], k=10)
['green', 'red', 'green', 'white', 'white', 'white', 'green', 'white', 'red', 'white']
# weights being floats
>>> random.choices(["white", "green", "red"], [.12, .12, .04], k=10)
['white', 'white', 'green', 'green', 'red', 'red', 'white', 'green', 'white', 'green']
# weights being fractions
>>> random.choices(["white", "green", "red"], [12/100, 12/100, 4/100], k=10)
['green', 'green', 'white', 'red', 'green', 'red', 'white', 'green', 'green', 'green']

2)如果既没有指定weights，也没有指定cum_weights，则以等概率进行选择。如果提供了权重序列，则它必须与填充序列的长度相同。

同时指定weights和cum_weights将引发TypeError。

>>> random.choices(["white", "green", "red"], k=10)
['white', 'white', 'green', 'red', 'red', 'red', 'white', 'white', 'white', 'green']

3) cum_weights通常是itertools的结果。累加函数在这种情况下非常方便。

从文档链接: 在内部，相对权重被转换为累积权重在进行选择之前，提供累计权重可以节省工作。

因此，无论是提供weights=[12,12,4]还是cum_weights=[12,24,28]，对于我们所设计的情况都会产生相同的结果，并且后者似乎更快/更有效。

2017-01-10 09:06:25

random.choice的加权版本

推荐文章

最新文章

标签