我知道未初始化的局部变量是未定义的行为(UB),而且值可能有陷阱表示,这可能会影响进一步的操作,但有时我想使用随机数仅为视觉表示,而不会在程序的其他部分进一步使用它们,例如,在视觉效果中设置随机颜色的东西,例如:

void updateEffect(){
    for(int i=0;i<1000;i++){
        int r;
        int g;
        int b;
        star[i].setColor(r%255,g%255,b%255);
        bool isVisible;
        star[i].setVisible(isVisible);
    }
}

比那么快吗

void updateEffect(){
    for(int i=0;i<1000;i++){
        star[i].setColor(rand()%255,rand()%255,rand()%255);
        star[i].setVisible(rand()%2==0?true:false);
    }
}

也比其他随机数生成器快吗?


当前回答

把我们的逻辑依赖于语言未定义的行为不是一个好主意。除了在这篇文章中所提到/讨论的内容之外,我想提到的是,使用现代c++方法/风格,这样的程序可能无法编译。

这在我之前的文章中提到过,其中包含了自动功能的优势和相同的有用链接。

https://stackoverflow.com/a/26170069/2724703

因此,如果我们改变上面的代码并用auto替换实际的类型,程序甚至无法编译。

void updateEffect(){
    for(int i=0;i<1000;i++){
        auto r;
        auto g;
        auto b;
        star[i].setColor(r%255,g%255,b%255);
        auto isVisible;
        star[i].setVisible(isVisible);
    }
}

其他回答

There are certain situations in which uninitialized memory may be safely read using type "unsigned char*" [e.g. a buffer returned from malloc]. Code may read such memory without having to worry about the compiler throwing causality out the window, and there are times when it may be more efficient to have code be prepared for anything memory might contain than to ensure that uninitialized data won't be read (a commonplace example of this would be using memcpy on partially-initialized buffer rather than discretely copying all of the elements that contain meaningful data).

然而,即使在这种情况下,人们也应该始终假设,如果字节的任何组合特别烦人,那么读取它总是会产生字节的模式(如果某个模式在生产中是烦人的,但在开发中不是,那么这种模式直到代码进入生产中才会出现)。

Reading uninitialized memory might be useful as part of a random-generation strategy in an embedded system where one can be sure the memory has never been written with substantially-non-random content since the last time the system was powered on, and if the manufacturing process used for the memory causes its power-on state to vary in semi-random fashion. Code should work even if all devices always yield the same data, but in cases where e.g. a group of nodes each need to select arbitrary unique IDs as quickly as possible, having a "not very random" generator which gives half the nodes the same initial ID might be better than not having any initial source of randomness at all.

还没有提到,但是调用未定义行为的代码路径可以做编译器想做的任何事情,例如:

void updateEffect(){}

这肯定比你正确的循环快,因为有UB,是完全符合的。

如果操作得当,使用未初始化的数据来获得随机性并不一定是件坏事。事实上,OpenSSL正是这样做的,以播种它的PRNG。

显然,这种用法并没有很好地记录下来,因为有人注意到Valgrind抱怨使用未初始化的数据,并“修复”了它,导致了PRNG中的一个错误。

所以你可以这样做,但你需要知道你在做什么,并确保任何阅读你的代码的人都理解这一点。

有很多很好的答案,但请允许我补充另一个并强调一点,在确定性计算机中,没有什么是随机的。对于伪rng生成的数字和堆栈上为C/ c++局部变量保留的内存区域中发现的看似“随机”的数字都是如此。

但是…这里有一个关键的区别。

由优秀的伪随机生成器生成的数字具有统计上与真正的随机抽取相似的属性。例如,分布是均匀的。循环长度很长:在循环重复之前,你可以得到数百万个随机数。序列不是自相关的:例如,如果你取第2个、第3个或第27个数字,或者查看生成的数字中的特定数字,你不会开始看到奇怪的模式出现。

相比之下,留在堆栈上的“随机”数字没有任何这些属性。它们的值和明显的随机性完全取决于程序的构造方式、编译方式以及编译器对程序的优化方式。举例来说,这是你的想法的一个变体,作为一个自包含的程序:

#include <stdio.h>

notrandom()
{
        int r, g, b;

        printf("R=%d, G=%d, B=%d", r&255, g&255, b&255);
}

int main(int argc, char *argv[])
{
        int i;
        for (i = 0; i < 10; i++)
        {
                notrandom();
                printf("\n");
        }

        return 0;
}

当我在Linux机器上用GCC编译这段代码并运行它时,结果是相当不愉快的确定性:

R=0, G=19, B=0
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255

If you looked at the compiled code with a disassembler, you could reconstruct what was going on, in detail. The first call to notrandom() used an area of the stack that was not used by this program previously; who knows what was in there. But after that call to notrandom(), there is a call to printf() (which the GCC compiler actually optimizes to a call to putchar(), but never mind) and that overwrites the stack. So the next and subsequent times, when notrandom() is called, the stack will contain stale data from the execution of putchar(), and since putchar() is always called with the same arguments, this stale data will always be the same, too.

因此,这种行为绝对不是随机的,通过这种方式获得的数字也不具有编写良好的伪随机数生成器的任何理想属性。事实上,在大多数现实场景中,它们的值是重复的并且高度相关的。

事实上,和其他人一样,我也会认真考虑解雇那些试图把这个想法当作“高性能RNG”的人。

正如其他人所注意到的,这就是未定义行为(UB)。

实际上,它(可能)实际上(有点)管用。在x86[-64]架构上读取未初始化的寄存器确实会产生垃圾结果,而且可能不会做任何坏事(与例如Itanium相反,那里的寄存器可以被标记为无效,因此读取会传播NaN之类的错误)。

但这里存在两个主要问题:

It won't be particularly random. In this case, you're reading from the stack, so you'll get whatever was there previously. Which might be effectively random, completely structured, the password you entered ten minutes ago, or your grandmother's cookie recipe. It's Bad (capital 'B') practice to let things like this creep into your code. Technically, the compiler could insert reformat_hdd(); every time you read an undefined variable. It won't, but you shouldn't do it anyway. Don't do unsafe things. The fewer exceptions you make, the safer you are from accidental mistakes all the time.

UB更紧迫的问题是它使整个程序的行为没有定义。现代编译器可以使用它来省略大量的代码,甚至可以回溯到过去。玩UB就像维多利亚时代的工程师拆除一个活跃的核反应堆。有无数的事情会出错,而且您可能连一半的基本原则或实现的技术都不知道。这可能没什么,但你仍然不应该让它发生。看看其他漂亮的答案来了解细节。

还有,我会炒了你。