我需要写一个加权版的random。选择(列表中的每个元素有不同的被选择的概率)。这是我想到的:
def weightedChoice(choices):
"""Like random.choice, but each element can have a different chance of
being selected.
choices can be any iterable containing iterables with two items each.
Technically, they can have more than two items, the rest will just be
ignored. The first item is the thing being chosen, the second item is
its weight. The weights can be any numeric values, what matters is the
relative differences between them.
"""
space = {}
current = 0
for choice, weight in choices:
if weight > 0:
space[current] = choice
current += weight
rand = random.uniform(0, current)
for key in sorted(space.keys() + [current]):
if rand < key:
return choice
choice = space[key]
return None
这个函数对我来说太复杂了,而且很丑。我希望这里的每个人都能提供一些改进的建议或其他方法。对我来说,效率没有代码的整洁和可读性重要。
下面是使用numpy的另一个版本的weighted_choice。传入weights向量,它将返回一个由0组成的数组,其中包含一个1,表示所选择的bin。该代码默认只进行一次绘制,但您可以传入绘制的数量,并且将返回每个绘制的bin的计数。
如果权重向量的和不等于1,它将被规范化,使之等于1。
import numpy as np
def weighted_choice(weights, n=1):
if np.sum(weights)!=1:
weights = weights/np.sum(weights)
draws = np.random.random_sample(size=n)
weights = np.cumsum(weights)
weights = np.insert(weights,0,0.0)
counts = np.histogram(draws, bins=weights)
return(counts[0])
如果不介意使用numpy,可以使用numpy.random.choice。
例如:
import numpy
items = [["item1", 0.2], ["item2", 0.3], ["item3", 0.45], ["item4", 0.05]
elems = [i[0] for i in items]
probs = [i[1] for i in items]
trials = 1000
results = [0] * len(items)
for i in range(trials):
res = numpy.random.choice(items, p=probs) #This is where the item is selected!
results[items.index(res)] += 1
results = [r / float(trials) for r in results]
print "item\texpected\tactual"
for i in range(len(probs)):
print "%s\t%0.4f\t%0.4f" % (items[i], probs[i], results[i])
如果你知道你需要提前做多少选择,你可以不像这样循环:
numpy.random.choice(items, trials, p=probs)
下面是使用numpy的另一个版本的weighted_choice。传入weights向量,它将返回一个由0组成的数组,其中包含一个1,表示所选择的bin。该代码默认只进行一次绘制,但您可以传入绘制的数量,并且将返回每个绘制的bin的计数。
如果权重向量的和不等于1,它将被规范化,使之等于1。
import numpy as np
def weighted_choice(weights, n=1):
if np.sum(weights)!=1:
weights = weights/np.sum(weights)
draws = np.random.random_sample(size=n)
weights = np.cumsum(weights)
weights = np.insert(weights,0,0.0)
counts = np.histogram(draws, bins=weights)
return(counts[0])