我以前很轻松地使用过工会;今天当我读到这篇文章并知道这个代码时,我很震惊

union ARGB
{
    uint32_t colour;

    struct componentsTag
    {
        uint8_t b;
        uint8_t g;
        uint8_t r;
        uint8_t a;
    } components;

} pixel;

pixel.colour = 0xff040201;  // ARGB::colour is the active member from now on

// somewhere down the line, without any edit to pixel

if(pixel.components.a)      // accessing the non-active member ARGB::components

实际上是未定义的行为,即从工会成员中读取除最近写的人以外的内容会导致未定义的行为。如果这不是联合的预期用途,那么什么才是?谁能详细解释一下吗?

更新:

我想事后澄清一些事情。

The answer to the question isn't the same for C and C++; my ignorant younger self tagged it as both C and C++. After scouring through C++11's standard I couldn't conclusively say that it calls out accessing/inspecting a non-active union member is undefined/unspecified/implementation-defined. All I could find was §9.5/1: If a standard-layout union contains several standard-layout structs that share a common initial sequence, and if an object of this standard-layout union type contains one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of standard-layout struct members. §9.2/19: Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members. While in C, (C99 TC3 - DR 283 onwards) it's legal to do so (thanks to Pascal Cuoq for bringing this up). However, attempting to do it can still lead to undefined behavior, if the value read happens to be invalid (so called "trap representation") for the type it is read through. Otherwise, the value read is implementation defined. C89/90 called this out under unspecified behavior (Annex J) and K&R's book says it's implementation defined. Quote from K&R: This is the purpose of a union - a single variable that can legitimately hold any of one of several types. [...] so long as the usage is consistent: the type retrieved must be the type most recently stored. It is the programmer's responsibility to keep track of which type is currently stored in a union; the results are implementation-dependent if something is stored as one type and extracted as another. Extract from Stroustrup's TC++PL (emphasis mine) Use of unions can be essential for compatness of data [...] sometimes misused for "type conversion".

最重要的是,这个问题(它的标题从我的提问开始就没有改变)是为了理解联合的目的而提出的,而不是关于标准允许什么。例如,使用继承来实现代码重用当然是c++标准允许的,但这并不是将继承引入c++语言特性的目的或初衷。这就是为什么安德烈的回答仍然被人们所接受的原因。


当前回答

你可以使用联合来创建像下面这样的结构体,它包含一个字段,告诉我们联合的哪个组件实际被使用:

struct VAROBJECT
{
    enum o_t { Int, Double, String } objectType;

    union
    {
        int intValue;
        double dblValue;
        char *strValue;
    } value;
} object;

其他回答

你可以使用联合来创建像下面这样的结构体,它包含一个字段,告诉我们联合的哪个组件实际被使用:

struct VAROBJECT
{
    enum o_t { Int, Double, String } objectType;

    union
    {
        int intValue;
        double dblValue;
        char *strValue;
    } value;
} object;

从语言的角度来看,行为是未定义的。考虑到不同的平台在内存对齐和字节序方面可能有不同的约束。大端序机器中的代码与小端序机器中的代码将以不同的方式更新结构中的值。修复语言中的行为将要求所有实现使用相同的字节序(和内存对齐约束……)来限制使用。

如果你正在使用c++(你正在使用两个标签),你真的关心可移植性,那么你可以只使用结构和提供一个setter,采用uint32_t和设置字段适当通过位掩码操作。在C语言中用函数也可以做到这一点。

Edit: I was expecting AProgrammer to write down an answer to vote and close this one. As some comments have pointed out, endianness is dealt in other parts of the standard by letting each implementation decide what to do, and alignment and padding can also be handled differently. Now, the strict aliasing rules that AProgrammer implicitly refers to are a important point here. The compiler is allowed to make assumptions on the modification (or lack of modification) of variables. In the case of the union, the compiler could reorder instructions and move the read of each color component over the write to the colour variable.

正如你所说,这是严格未定义的行为,尽管它将“工作”在许多平台上。使用联合的真正原因是为了创建不同的记录。

union A {
   int i;
   double d;
};

A a[10];    // records in "a" can be either ints or doubles 
a[0].i = 42;
a[1].d = 1.23;

当然,您还需要某种鉴别器来说明这个变体实际上包含了什么。注意,在c++中,联合的用处不大,因为它们只能包含POD类型——实际上是那些没有构造函数和析构函数的类型。

行为可能没有定义,但这只是意味着没有一个“标准”。所有优秀的编译器都提供#pragmas来控制打包和对齐,但可能有不同的默认值。默认值也会根据所使用的优化设置而改变。

此外,工会不仅仅是为了节省空间。它们可以帮助现代编译器使用类型双关语。如果你reinterpret_cast<>所有的东西,编译器就不能假设你正在做什么。它可能不得不放弃它所知道的类型并重新开始(强制写回内存,与CPU时钟速度相比,这是非常低效的)。

在c++中,Boost Variant实现了一个安全的联合版本,旨在尽可能地防止未定义的行为。

它的性能与enum + union结构相同(也分配了堆栈等),但它使用类型的模板列表而不是enum:)