这是个有点奇怪的问题。我的目标是理解语言设计决策,并确定在c++中反射的可能性。
为什么c++语言委员会没有在语言中实现反射?在不运行在虚拟机上的语言(如java)中反射是否太困难了? 如果要在c++中实现反射,会遇到什么挑战?
我想反射的用途是众所周知的:编辑器可以更容易地编写,程序代码将更小,可以为单元测试生成模拟等等。但是如果你能评论一下反射的用法就太好了。
这是个有点奇怪的问题。我的目标是理解语言设计决策,并确定在c++中反射的可能性。
为什么c++语言委员会没有在语言中实现反射?在不运行在虚拟机上的语言(如java)中反射是否太困难了? 如果要在c++中实现反射,会遇到什么挑战?
我想反射的用途是众所周知的:编辑器可以更容易地编写,程序代码将更小,可以为单元测试生成模拟等等。但是如果你能评论一下反射的用法就太好了。
Reflection requires some metadata about types to be stored somewhere that can be queried. Since C++ compiles to native machine code and undergoes heavy changes due to optimization, high level view of the application is pretty much lost in the process of compilation, consequently, it won't be possible to query them at run time. Java and .NET use a very high level representation in the binary code for virtual machines making this level of reflection possible. In some C++ implementations, however, there is something called Run Time Type Information (RTTI) which can be considered a stripped down version of reflection.
反射可以在c++中实现,并且已经在c++中实现过。
它不是原生的c++特性,因为它有一个沉重的成本(内存和速度),不应该由语言默认设置——语言是“默认最大性能”导向的。
因为你不应该为你不需要的东西付钱,而且正如你自己所说,编辑器比其他应用程序更需要它,那么它应该只在你需要的地方实现,而不是“强制”到所有的代码中(你不需要对你将在编辑器或其他类似应用程序中使用的所有数据进行反思)。
The reason C++ doesn't have reflection is that this would require the compilers to add symbol information to the object files, like what members a class type has, information about the members, about the functions and everything. This essentially would render include files useless, as information shipped by declarations would then be read from those object files (modules then). In C++, a type definition can occur multiple times in a program by including the respective headers (provided that all those definitions are the same), so it would have to be decided where to put the information about that type, just as to name one complication here. The aggressive optimization done by a C++ compiler, which can optimize out dozens of class template instantiations, is another strong point. It's possible, but as C++ is compatible to C, this would become an awkward combination.
如果你真的想了解c++的设计决策,可以找一本Ellis和Stroustrup写的《c++注释参考手册》。它并不是最新的标准,但它贯穿了最初的标准,并解释了事情是如何工作的,以及它们是如何实现的。
在c++中有几个关于反射的问题。
It's a lot of work to add, and the C++ committee is fairly conservative, and don't spend time on radical new features unless they're sure it'll pay off. (A suggestion for adding a module system similar to .NET assemblies has been made, and while I think there's general consensus that it'd be nice to have, it's not their top priority at the moment, and has been pushed back until well after C++0x. The motivation for this feature is to get rid of the #include system, but it would also enable at least some metadata). You don't pay for what you don't use. That's one of the must basic design philosophies underlying C++. Why should my code carry around metadata if I may never need it? Moreover, the addition of metadata may inhibit the compiler from optimizing. Why should I pay that cost in my code if I may never need that metadata? Which leads us to another big point: C++ makes very few guarantees about the compiled code. The compiler is allowed to do pretty much anything it likes, as long as the resulting functionality is what is expected. For example, your classes aren't required to actually be there. The compiler can optimize them away, inline everything they do, and it frequently does just that, because even simple template code tends to create quite a few template instantiations. The C++ standard library relies on this aggressive optimization. Functors are only performant if the overhead of instantiating and destructing the object can be optimized away. operator[] on a vector is only comparable to raw array indexing in performance because the entire operator can be inlined and thus removed entirely from the compiled code. C# and Java make a lot of guarantees about the output of the compiler. If I define a class in C#, then that class will exist in the resulting assembly. Even if I never use it. Even if all calls to its member functions could be inlined. The class has to be there, so that reflection can find it. Part of this is alleviated by C# compiling to bytecode, which means that the JIT compiler can remove class definitions and inline functions if it likes, even if the initial C# compiler can't. In C++, you only have one compiler, and it has to output efficient code. If you were allowed to inspect the metadata of a C++ executable, you'd expect to see every class it defined, which means that the compiler would have to preserve all the defined classes, even if they're not necessary. And then there are templates. Templates in C++ are nothing like generics in other languages. Every template instantiation creates a new type. std::vector<int> is a completely separate class from std::vector<float>. That adds up to a lot of different types in a entire program. What should our reflection see? The template std::vector? But how can it, since that's a source-code construct, which has no meaning at runtime? It'd have to see the separate classes std::vector<int> and std::vector<float>. And std::vector<int>::iterator and std::vector<float>::iterator, same for const_iterator and so on. And once you step into template metaprogramming, you quickly end up instantiating hundreds of templates, all of which get inlined and removed again by the compiler. They have no meaning, except as part of a compile-time metaprogram. Should all these hundreds of classes be visible to reflection? They'd have to, because otherwise our reflection would be useless, if it doesn't even guarantee that the classes I defined will actually be there. And a side problem is that the template class doesn't exist until it is instantiated. Imagine a program which uses std::vector<int>. Should our reflection system be able to see std::vector<int>::iterator? On one hand, you'd certainly expect so. It's an important class, and it's defined in terms of std::vector<int>, which does exist in the metadata. On the other hand, if the program never actually uses this iterator class template, its type will never have been instantiated, and so the compiler won't have generated the class in the first place. And it's too late to create it at runtime, since it requires access to the source code. And finally, reflection isn't quite as vital in C++ as it is in C#. The reason is again, template metaprogramming. It can't solve everything, but for many cases where you'd otherwise resort to reflection, it's possible to write a metaprogram which does the same thing at compile-time. boost::type_traits is a simple example. You want to know about type T? Check its type_traits. In C#, you'd have to fish around after its type using reflection. Reflection would still be useful for some things (the main use I can see, which metaprogramming can't easily replace, is for autogenerated serialization code), but it would carry some significant costs for C++, and it's just not necessary as often as it is in other languages.
编辑: 在回应评论时:
cdleary: Yes, debug symbols do something similar, in that they store metadata about the types used in the executable. But they also suffer from the problems I described. If you've ever tried debugging a release build, you'll know what I mean. There are large logical gaps where you created a class in the source code, which has gotten inlined away in the final code. If you were to use reflection for anything useful, you'd need it to be more reliable and consistent. As it is, types would be vanishing and disappearing almost every time you compile. You change a tiny little detail, and the compiler decides to change which types get inlined and which ones don't, as a response. How do you extract anything useful from that, when you're not even guaranteed that the most relevant types will be represented in your metadata? The type you were looking for may have been there in the last build, but now it's gone. And tomorrow, someone will check in a small innocent change to a small innocent function, which makes the type just big enough that it won't get completely inlined, so it'll be back again. That's still useful for debug symbols, but not much more than that. I'd hate trying to generate serialization code for a class under those terms.
Evan Teran: Of course these issues could be resolved. But that falls back to my point #1. It'd take a lot of work, and the C++ committee has plenty of things they feel is more important. Is the benefit of getting some limited reflection (and it would be limited) in C++ really big enough to justify focusing on that at the expense of other features? Is there really a huge benefit in adding features the core language which can already (mostly) be done through libraries and preprocessors like QT's? Perhaps, but the need is a lot less urgent than if such libraries didn't exist. For your specific suggestions though, I believe disallowing it on templates would make it completely useless. You'd be unable to use reflection on the standard library, for example. What kind of reflection wouldn't let you see a std::vector? Templates are a huge part of C++. A feature that doesn't work on templates is basically useless.
But you're right, some form of reflection could be implemented. But it'd be a major change in the language. As it is now, types are exclusively a compile-time construct. They exist for the benefit of the compiler, and nothing else. Once the code has been compiled, there are no classes. If you stretch yourself, you could argue that functions still exist, but really, all there is is a bunch of jump assembler instructions, and a lot of stack push/pop's. There's not much to go on, when adding such metadata.
但就像我说的,有一个修改编译模型的建议,添加自包含的模块,为选择的类型存储元数据,允许其他模块引用它们,而不必使用#includes。这是一个很好的开始,说实话,我很惊讶标准委员会没有因为这个改变太大而把这个提议否决掉。所以也许在5-10年后?:)
对于具有反射的语言来说,它是关于编译器愿意在目标代码中保留多少源代码来启用反射,以及有多少分析机制可用来解释反射的信息。除非编译器保留所有源代码,否则反射分析源代码可用事实的能力将受到限制。
c++编译器没有保留任何东西(好吧,忽略RTTI),所以在语言中没有反射。(Java和c#编译器只保留类、方法名和返回类型,所以你只能得到一点点反射数据,但你不能检查表达式或程序结构,这意味着即使在那些“启用反射”的语言中,你能得到的信息也非常少,因此你真的不能做很多分析)。
但是您可以跳出语言,获得完整的反射功能。C语言中关于反射的另一个堆栈溢出讨论的答案讨论了这个问题。
我相信,如果c++要用作数据库访问、Web会话处理/http和GUI开发的语言,那么c++中的反射是至关重要的。缺乏反射阻碍了orm(如Hibernate或LINQ)、实例化类的XML和JSON解析器、数据序列化和许多其他东西(最初必须使用无类型数据来创建类的实例)。
可以使用软件开发人员在构建过程中可用的编译时开关 为了消除这种“一分钱一分货”的顾虑。
我是一个固件开发人员,不需要反射来从串口读取数据——那么很好,不使用交换机。但是作为一个想要继续使用c++的数据库开发人员,我经常要面对一个可怕的、难以维护的代码,这些代码在数据成员和数据库结构之间映射数据。
无论是Boost序列化还是其他机制都不能真正解决反射问题——它必须由编译器来完成——一旦完成,c++将再次在学校中教授,并用于处理数据处理的软件中
对我来说,这是问题#1(而原生线程原语是问题#2)。
如果c++可以:
变量名、变量类型和const修饰符的类成员数据 函数参数迭代器(只有位置而不是名称) 函数名、返回类型和const修饰符的类成员数据 父类列表(与定义的顺序相同) 模板成员和父类的数据;扩展的模板(意味着实际的类型将可用于反射API,而不是“如何到达那里的模板信息”)
这足以在无类型数据处理的关键处创建非常容易使用的库,而无类型数据处理在当今的web和数据库应用程序中非常普遍 (所有的orm,消息传递机制,xml/json解析器,数据序列化等)。
例如,Q_PROPERTY宏(Qt框架的一部分)支持的基本信息 http://qt.nokia.com/doc/4.5/properties.html扩展到涵盖类方法和e) -将对c++和一般的软件社区非常有益。
当然,我所指的反射不会涵盖语义或更复杂的问题(如注释、源代码行号、数据流分析等)——但我也不认为这些是语言标准的一部分。
根据Alistair Cockburn的说法,在反射环境中不能保证子类型。
反射与潜在类型系统更相关。在c++中,你知道你得到了什么类型,你知道你可以用它做什么。
所有的语言都不应该试图融合其他语言的所有特征。
c++本质上是一个非常非常复杂的宏汇编器。它不是(传统意义上的)c#、Java、Objective-C、Smalltalk等高级语言。
对于不同的工作有不同的工具是很好的。如果我们只有锤子,所有东西看起来都像钉子。拥有脚本语言对于某些作业是有用的,而具有反射性的oo语言(Java, Obj-C, c#)对于另一类作业是有用的,而超级高效的基本的接近机器的语言对于另一类作业是有用的(c++, C, Assembler)。
C++ does an amazing job of extending Assembler technology to incredible levels of complexity management, and abstractions to make programming larger, more complex tasks vastly more possible for human beings. But it is not necessarily a language that is the best suited for those who are approaching their problem from a strictly high-level perspective (Lisp, Smalltalk, Java, C#). If you need a language with those features to best implement a solution to your problems, then thank those who've created such languages for all of us to use!
但c++是为那些出于某种原因,需要在代码和底层机器操作之间建立强相关性的人准备的。无论是它的效率,还是编程设备驱动程序,还是与底层操作系统服务的交互,或者其他什么,c++都更适合这些任务。
C#, Java, Objective-C all require a much larger, richer runtime system to support their execution. That runtime has to be delivered to the system in question - preinstalled to support the operation of your software. And that layer has to be maintained for various target systems, customized by SOME OTHER LANGUAGE to make it work on that platform. And that middle layer - that adaptive layer between the host OS and the your code - the runtime, is almost always written in a language like C or C++ where efficiency is #1, where understanding predictably the exact interaction between software and hardware can be well understood, and manipulated to maximum gain.
我喜欢Smalltalk、Objective-C,以及拥有一个包含反射、元数据、垃圾收集等的丰富运行时系统。可以编写令人惊叹的代码来利用这些设施!但这只是堆栈上的一个更高的层,它必须依赖于更低的层,而这些层最终必须依赖于操作系统和硬件。我们总是需要一种最适合构建这一层的语言:c++ /C/Assembler。
Addendum: C++11/14 are continuing to expand C++ ability to support higher-level abstractions and systems. Threading, synchronization, precise memory models, more precise abstract machine definitions are enabling C++ developers to achieve many of the high-level abstractions that some of these high-level only languages used to have exclusive domain over, while continuing to provide close-to-metal performance and excellent predictability (i.e minimal runtime subsystems). Perhaps reflection facilities will be selectively enabled in a future revision of C++, for those who want it - or perhaps a library will provide such runtime services (maybe there is one now, or the beginnings of one in boost?).
在c++中使用反射的情况有很多,而使用模板元编程等编译时结构无法充分解决这些问题。
N3340建议用富指针作为c++中引入反射的一种方式。除此之外,它还解决了一个问题,那就是除非你使用某个功能,否则就不用为它付费。
反射可以是可选的,就像预处理器指令一样。类似的
#pragma启用反射
通过这种方式,我们可以两全其美,没有这个pragma库就可以在没有反射的情况下创建(没有任何开销),然后就由个人开发人员决定他们想要的是速度还是易用性。
这基本上是因为它是一个“可选的额外项目”。许多人选择c++而不是Java和c#等语言,这样他们可以更好地控制编译器的输出,例如,一个更小和/或更快的程序。
如果您选择添加反射,有各种可用的解决方案。
在过去的10年里,人们一直在尝试向c++中添加反射。最新的提案是针对c++23的,可能会,也可能不会。
与大多数语言中的反射不同,c++反射的计划是编译时反射。所以在编译时,你可以反射结构成员、函数和方法参数和属性、枚举值和名称等。
然后,您可以进行有限的具体化,注入关于反射的信息以生成其他类型和代码。
虽然这有点奇怪,但这意味着不使用反射的程序不会为它支付运行时成本。它也非常强大。
最简单的例子是,您可以使用它来实现运行时反射。
struct Member {
std::string_view name;
std::any_ref value;
};
struct Reflectable {
virtual std::span<Member> GetMembers() const = 0;
virtual std::span<Member> GetMembers() = 0;
};
template<class D>
struct ImplReflectable:Reflectable {
std::span<Member> GetMembers() const final;
std::span<Member> GetMembers() final;
};
template<class D>
std::span<Member> ImplReflectable<D>::GetMembers() const {
// compile time reflection code on D here
}
template<class D>
std::span<Member> ImplReflectable<D>::GetMembers() {
// compile time reflection code on D here
}
你把上面的代码写了一次,突然你就可以对任何你想要反射的类型,你可以这样做:
struct Point : ImplReflectable<Point> {
int x, y;
};
和一个反射系统连接到点。
实现此运行时反射的库可以像您喜欢的那样复杂和强大。每种类型都必须做一些工作(如上所述)才能选择加入,但对于UI库(例如)这样做并不是一个严重的问题。没有选择的类型延续了c++的假设:“如果你不使用它,就不要为它付费”。
但这仅仅是个开始。一个提议,元类,允许:
interface Reflectable {
std::span<Member> GetMembers() const;
std::span<Member> GetMembers();
};
您可以使用元类或接受类型并返回类型的函数。这允许您定义类的元类,如“interface”,用语言编写。现在,接口有点像玩具,但是你可以编写QObject或Reflectable或PolymorphicValueType或NetworkProtocol元类来修改你的类定义的含义。
这可能会也可能不会出现在c++23中。它会继续变得更好,但也会继续被推回去。对于大多数主要的c++编译器,您可以尝试多种编译时反射实现。语法是不断变化的,因为有基于符号运算符的反射库,基于reflexpr的运算符反射库,其中一些反射数据是类型,另一些是constexpr对象和consteval函数。