我需要写一个加权版的random。选择(列表中的每个元素有不同的被选择的概率)。这是我想到的:
def weightedChoice(choices):
"""Like random.choice, but each element can have a different chance of
being selected.
choices can be any iterable containing iterables with two items each.
Technically, they can have more than two items, the rest will just be
ignored. The first item is the thing being chosen, the second item is
its weight. The weights can be any numeric values, what matters is the
relative differences between them.
"""
space = {}
current = 0
for choice, weight in choices:
if weight > 0:
space[current] = choice
current += weight
rand = random.uniform(0, current)
for key in sorted(space.keys() + [current]):
if rand < key:
return choice
choice = space[key]
return None
这个函数对我来说太复杂了,而且很丑。我希望这里的每个人都能提供一些改进的建议或其他方法。对我来说,效率没有代码的整洁和可读性重要。
为random.choice()提供一个预先加权的列表:
解决方案和测试:
import random
options = ['a', 'b', 'c', 'd']
weights = [1, 2, 5, 2]
weighted_options = [[opt]*wgt for opt, wgt in zip(options, weights)]
weighted_options = [opt for sublist in weighted_options for opt in sublist]
print(weighted_options)
# test
counts = {c: 0 for c in options}
for x in range(10000):
counts[random.choice(weighted_options)] += 1
for opt, wgt in zip(options, weights):
wgt_r = counts[opt] / 10000 * sum(weights)
print(opt, counts[opt], wgt, wgt_r)
输出:
['a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'd', 'd']
a 1025 1 1.025
b 1948 2 1.948
c 5019 5 5.019
d 2008 2 2.008
如果你没有提前定义你想要选择多少项(所以,你没有做k=10这样的事情),你只有概率,你可以做下面的事情。注意,你的概率加起来不需要等于1,它们可以相互独立:
soup_items = ['pepper', 'onion', 'tomato', 'celery']
items_probability = [0.2, 0.3, 0.9, 0.1]
selected_items = [item for item,p in zip(soup_items,items_probability) if random.random()<p]
print(selected_items)
>>>['pepper','tomato']
假设你有
items = [11, 23, 43, 91]
probability = [0.2, 0.3, 0.4, 0.1]
你有一个函数,它生成一个介于[0,1)之间的随机数(我们可以在这里使用random.random())。
现在求概率的前缀和
prefix_probability=[0.2,0.5,0.9,1]
现在,我们只需取一个0-1之间的随机数,然后使用二分搜索来查找该数字在prefix_probability中的位置。这个索引就是你的答案
代码是这样的
return items[bisect.bisect(prefix_probability,random.random())]
我可能已经来不及提供任何有用的东西了,但这里有一个简单,简短,非常有效的片段:
def choose_index(probabilies):
cmf = probabilies[0]
choice = random.random()
for k in xrange(len(probabilies)):
if choice <= cmf:
return k
else:
cmf += probabilies[k+1]
不需要排序你的概率或用你的cmf创建一个向量,它一旦找到它的选择就会终止。内存:O(1),时间:O(N),平均运行时间~ N/2。
如果你有权重,只需添加一行:
def choose_index(weights):
probabilities = weights / sum(weights)
cmf = probabilies[0]
choice = random.random()
for k in xrange(len(probabilies)):
if choice <= cmf:
return k
else:
cmf += probabilies[k+1]