给定一个函数,它产生的是1到5之间的随机整数,写一个函数,它产生的是1到7之间的随机整数。
当前回答
这相当于Adam Rosenfield的解决方案,但对一些读者来说可能更清楚一些。它假设rand5()是一个函数,返回1到5范围内的统计随机整数。
int rand7()
{
int vals[5][5] = {
{ 1, 2, 3, 4, 5 },
{ 6, 7, 1, 2, 3 },
{ 4, 5, 6, 7, 1 },
{ 2, 3, 4, 5, 6 },
{ 7, 0, 0, 0, 0 }
};
int result = 0;
while (result == 0)
{
int i = rand5();
int j = rand5();
result = vals[i-1][j-1];
}
return result;
}
How does it work? Think of it like this: imagine printing out this double-dimension array on paper, tacking it up to a dart board and randomly throwing darts at it. If you hit a non-zero value, it's a statistically random value between 1 and 7, since there are an equal number of non-zero values to choose from. If you hit a zero, just keep throwing the dart until you hit a non-zero. That's what this code is doing: the i and j indexes randomly select a location on the dart board, and if we don't get a good result, we keep throwing darts.
就像亚当说的,在最坏的情况下,它可以一直运行下去,但从统计上看,最坏的情况永远不会发生。:)
其他回答
亚当·罗森菲尔德正确答案的前提是:
X = 5^n(在他的例子中,n=2) 操作n个rand5次调用以获得范围[1,x]内的数字y Z = ((int)(x / 7)) * 7 如果y > z,再试一次。否则返回y % 7 + 1
当n = 2时,有4种可能:y ={22,23,24,25}。如果你使用n = 6,你只有1个扔掉的东西:y ={15625}。
5^6 is 15625 7 times 2232 is 15624
你又给rand5个电话。但是,您获得一个丢弃值(或无限循环)的机会要低得多。如果有办法让y没有可能的一次性值,我还没有找到它。
除了我的第一个答案,我想再补充一个答案。这个答案试图最小化每次调用rand7()时对rand5()的调用次数,以最大限度地利用随机性。也就是说,如果你认为随机性是一种宝贵的资源,我们就会尽可能多地利用它,而不丢弃任何随机比特。这个答案也与伊万的回答中的逻辑有一些相似之处。
The entropy of a random variable is a well-defined quantity. For a random variable which takes on N states with equal probabilities (a uniform distribution), the entropy is log2 N. Thus, rand5() has approximately 2.32193 bits of entropy, and rand7() has about 2.80735 bits of entropy. If we hope to maximize our use of randomness, we need to use all 2.32193 bits of entropy from each call to rand5(), and apply them to generating 2.80735 bits of entropy needed for each call to rand7(). The fundamental limit, then, is that we can do no better than log(7)/log(5) = 1.20906 calls to rand5() per call to rand7().
附注:除非另有说明,否则此答案中的所有对数都将以2为底。Rand5()将被假定为返回范围[0,4]的数字,rand7()将被假定为返回范围[0,6]的数字。分别将范围调整为[1,5]和[1,7]是很简单的。
So how do we do it? We generate an infinitely precise random real number between 0 and 1 (pretend for the moment that we could actually compute and store such an infinitely precise number -- we'll fix this later). We can generate such a number by generating its digits in base 5: we pick the random number 0.a1a2a3..., where each digit ai is chosen by a call to rand5(). For example, if our RNG chose ai = 1 for all i, then ignoring the fact that that isn't very random, that would correspond to the real number 1/5 + 1/52 + 1/53 + ... = 1/4 (sum of a geometric series).
Ok, so we've picked a random real number between 0 and 1. I now claim that such a random number is uniformly distributed. Intuitively, this is easy to understand, since each digit was picked uniformly, and the number is infinitely precise. However, a formal proof of this is somewhat more involved, since now we're dealing with a continuous distribution instead of a discrete distribution, so we need to prove that the probability that our number lies in an interval [a, b] equals the length of that interval, b - a. The proof is left as an exercise for the reader =).
现在我们有一个从范围[0,1]中均匀选择的随机实数,我们需要将它转换为范围[0,6]中的一系列均匀随机数,以生成rand7()的输出。我们怎么做呢?与我们刚才所做的正好相反——我们将其转换为以7为底的无限精确小数,然后每个以7为底的数字将对应于rand7()的一个输出。
以前面的例子为例,如果rand5()产生无限的1流,那么我们的随机实数将是1/4。将1/4换算成7为底,我们得到了无穷大小数0.15151515…,因此我们将产生作为输出1,5,1,5,1,5,等等。
好了,我们有了主要的思想,但还有两个问题:我们实际上无法计算或存储一个无限精确的实数,那么我们如何处理它的有限部分呢?第二,我们怎么把它换算成7进制呢?
将0到1之间的数字转换为以7为底的一种方法如下:
乘以7 结果的积分部分是下一个以7为基数的数字 减去积分部分,只留下小数部分 转到第一步
为了处理无限精度的问题,我们计算一个部分结果,并存储结果的上界。也就是说,假设我们调用rand5()两次,两次都返回1。到目前为止,我们生成的数字是0.11(以5为基数)。无论rand5()调用的无限序列的剩余部分产生什么,我们生成的随机实数永远不会大于0.12:0.11≤0.11xyz…< 0.12。
因此,跟踪当前数字到目前为止,以及它可能的最大值,我们将两个数字都转换为以7为底。如果它们对前k位一致,那么我们就可以安全地输出下k位——不管以5为底的无限流是什么,它们永远不会影响以7为底表示的下k位!
这就是生成rand7()的下一个输出的算法,我们只生成rand5()的足够多的数字,以确保我们确定地知道在将随机实数转换为以7为底的过程中下一个数字的值。下面是一个带有测试工具的Python实现:
import random
rand5_calls = 0
def rand5():
global rand5_calls
rand5_calls += 1
return random.randint(0, 4)
def rand7_gen():
state = 0
pow5 = 1
pow7 = 7
while True:
if state / pow5 == (state + pow7) / pow5:
result = state / pow5
state = (state - result * pow5) * 7
pow7 *= 7
yield result
else:
state = 5 * state + pow7 * rand5()
pow5 *= 5
if __name__ == '__main__':
r7 = rand7_gen()
N = 10000
x = list(next(r7) for i in range(N))
distr = [x.count(i) for i in range(7)]
expmean = N / 7.0
expstddev = math.sqrt(N * (1.0/7.0) * (6.0/7.0))
print '%d TRIALS' % N
print 'Expected mean: %.1f' % expmean
print 'Expected standard deviation: %.1f' % expstddev
print
print 'DISTRIBUTION:'
for i in range(7):
print '%d: %d (%+.3f stddevs)' % (i, distr[i], (distr[i] - expmean) / expstddev)
print
print 'Calls to rand5: %d (average of %f per call to rand7)' % (rand5_calls, float(rand5_calls) / N)
注意,rand7_gen()返回一个生成器,因为它的内部状态涉及到将数字转换为以7为基数。测试工具调用next(r7) 10000次以产生10000个随机数,然后测量它们的分布。只使用整数数学,所以结果是完全正确的。
还要注意,这里的数字变得非常大,非常快。5和7的幂增长很快。因此,在生成大量随机数后,由于大算术,性能将开始明显下降。但请记住,我的目标是最大化随机位的使用,而不是最大化性能(尽管这是次要目标)。
在一次运行中,我对rand5()进行了12091次调用,对rand7()进行了10000次调用,实现了log(7)/log(5)次调用的最小值,平均为4位有效数字,结果输出是均匀的。
为了将这段代码移植到一种没有内置任意大整数的语言中,您必须将pow5和pow7的值限制为本地整型类型的最大值——如果它们变得太大,则重置所有内容并重新开始。这将使每次调用rand7()时对rand5()的平均调用次数略有增加,但希望即使对于32或64位整数也不会增加太多。
PHP解决方案
<?php
function random_5(){
return rand(1,5);
}
function random_7(){
$total = 0;
for($i=0;$i<7;$i++){
$total += random_5();
}
return ($total%7)+1;
}
echo random_7();
?>
int rand7() {
int value = rand5()
+ rand5() * 2
+ rand5() * 3
+ rand5() * 4
+ rand5() * 5
+ rand5() * 6;
return value%7;
}
与选定的解决方案不同,该算法将在常数时间内运行。然而,它对rand5的调用比所选解决方案的平均运行时间多2次。
请注意,这个生成器并不完美(数字0比任何其他数字都有0.0064%的可能性),但对于大多数实际目的,保证恒定的时间可能比这种不准确性更重要。
解释
这个解源于数字15624能被7整除的事实,因此,如果我们可以随机且均匀地生成从0到15624的数字,然后对7取余,我们就可以得到一个近乎均匀的rand7生成器。将rand5滚动6次,将0到15624之间的数字统一生成,并使用这些数字组成以5为基数的数字,如下所示:
rand5 * 5^5 + rand5 * 5^4 + rand5 * 5^3 + rand5 * 5^2 + rand5 * 5 + rand5
mod 7的属性允许我们稍微简化一下方程:
5^5 = 3 mod 7
5^4 = 2 mod 7
5^3 = 6 mod 7
5^2 = 4 mod 7
5^1 = 5 mod 7
So
rand5 * 5^5 + rand5 * 5^4 + rand5 * 5^3 + rand5 * 5^2 + rand5 * 5 + rand5
就变成了
rand5 * 3 + rand5 * 2 + rand5 * 6 + rand5 * 4 + rand5 * 5 + rand5
理论
15624这个数字不是随机选择的,而是可以用费马小定理来发现的,该定理指出,如果p是质数,那么
a^(p-1) = 1 mod p
这就得到,
(5^6)-1 = 0 mod 7
(5^6)-1等于
4 * 5^5 + 4 * 5^4 + 4 * 5^3 + 4 * 5^2 + 4 * 5 + 4
这是一个以5为底的数,因此我们可以看到,这种方法可以用于从任何随机数发生器到任何其他随机数发生器。尽管在使用指数p-1时总是会引入对0的小偏差。
为了更准确地推广这种方法,我们可以有这样一个函数:
def getRandomconverted(frm, to):
s = 0
for i in range(to):
s += getRandomUniform(frm)*frm**i
mx = 0
for i in range(to):
mx = (to-1)*frm**i
mx = int(mx/to)*to # maximum value till which we can take mod
if s < mx:
return s%to
else:
return getRandomconverted(frm, to)
通过使用滚动总数,您可以同时
保持平均分配;而且 不需要牺牲随机序列中的任何元素。
这两个问题都是简单的rand(5)+rand(5)…类型的解决方案。下面的Python代码展示了如何实现它(其中大部分是证明发行版)。
import random
x = []
for i in range (0,7):
x.append (0)
t = 0
tt = 0
for i in range (0,700000):
########################################
##### qq.py #####
r = int (random.random () * 5)
t = (t + r) % 7
########################################
##### qq_notsogood.py #####
#r = 20
#while r > 6:
#r = int (random.random () * 5)
#r = r + int (random.random () * 5)
#t = r
########################################
x[t] = x[t] + 1
tt = tt + 1
high = x[0]
low = x[0]
for i in range (0,7):
print "%d: %7d %.5f" % (i, x[i], 100.0 * x[i] / tt)
if x[i] < low:
low = x[i]
if x[i] > high:
high = x[i]
diff = high - low
print "Variation = %d (%.5f%%)" % (diff, 100.0 * diff / tt)
这个输出显示了结果:
pax$ python qq.py
0: 99908 14.27257
1: 100029 14.28986
2: 100327 14.33243
3: 100395 14.34214
4: 99104 14.15771
5: 99829 14.26129
6: 100408 14.34400
Variation = 1304 (0.18629%)
pax$ python qq.py
0: 99547 14.22100
1: 100229 14.31843
2: 100078 14.29686
3: 99451 14.20729
4: 100284 14.32629
5: 100038 14.29114
6: 100373 14.33900
Variation = 922 (0.13171%)
pax$ python qq.py
0: 100481 14.35443
1: 99188 14.16971
2: 100284 14.32629
3: 100222 14.31743
4: 99960 14.28000
5: 99426 14.20371
6: 100439 14.34843
Variation = 1293 (0.18471%)
一个简单的rand(5)+rand(5),忽略那些返回大于6的情况,其典型变化为18%,是上面所示方法的100倍:
pax$ python qq_notsogood.py
0: 31756 4.53657
1: 63304 9.04343
2: 95507 13.64386
3: 127825 18.26071
4: 158851 22.69300
5: 127567 18.22386
6: 95190 13.59857
Variation = 127095 (18.15643%)
pax$ python qq_notsogood.py
0: 31792 4.54171
1: 63637 9.09100
2: 95641 13.66300
3: 127627 18.23243
4: 158751 22.67871
5: 126782 18.11171
6: 95770 13.68143
Variation = 126959 (18.13700%)
pax$ python qq_notsogood.py
0: 31955 4.56500
1: 63485 9.06929
2: 94849 13.54986
3: 127737 18.24814
4: 159687 22.81243
5: 127391 18.19871
6: 94896 13.55657
Variation = 127732 (18.24743%)
并且,根据Nixuz的建议,我已经清理了脚本,所以您可以提取并使用rand7…材料:
import random
# rand5() returns 0 through 4 inclusive.
def rand5():
return int (random.random () * 5)
# rand7() generator returns 0 through 6 inclusive (using rand5()).
def rand7():
rand7ret = 0
while True:
rand7ret = (rand7ret + rand5()) % 7
yield rand7ret
# Number of test runs.
count = 700000
# Work out distribution.
distrib = [0,0,0,0,0,0,0]
rgen =rand7()
for i in range (0,count):
r = rgen.next()
distrib[r] = distrib[r] + 1
# Print distributions and calculate variation.
high = distrib[0]
low = distrib[0]
for i in range (0,7):
print "%d: %7d %.5f" % (i, distrib[i], 100.0 * distrib[i] / count)
if distrib[i] < low:
low = distrib[i]
if distrib[i] > high:
high = distrib[i]
diff = high - low
print "Variation = %d (%.5f%%)" % (diff, 100.0 * diff / count)