string_view是c++库基础TS(N3921)中添加到c++ 17的一个提议特性
据我所知,它是一种类型,表示某种类型的字符串“概念”,是任何类型的容器的视图,可以存储一些可视的字符串。
这样对吗? 应该是权威的 常量std::string和参数类型变成string_view ? 关于string_view还有其他需要考虑的要点吗?
string_view是c++库基础TS(N3921)中添加到c++ 17的一个提议特性
据我所知,它是一种类型,表示某种类型的字符串“概念”,是任何类型的容器的视图,可以存储一些可视的字符串。
这样对吗? 应该是权威的 常量std::string和参数类型变成string_view ? 关于string_view还有其他需要考虑的要点吗?
当前回答
任何和所有类型的“字符串引用”和“数组引用”建议的目的都是为了避免复制已经在其他地方拥有的数据,并且只需要一个非突变视图。问题中的string_view就是这样一个提议;更早的还有string_ref和array_ref。
其思想始终是存储一对指向第一个元素的指针和一些现有数据数组或字符串的大小。
这样的视图句柄类可以通过值廉价地传递,并提供廉价的子字符串操作(可以实现为简单的指针增量和大小调整)。
字符串的许多使用并不需要实际拥有字符串,而且所讨论的字符串通常已经被其他人拥有。因此,通过避免不需要的副本(想想可以保存的所有分配和异常),确实有可能提高效率。
最初的C字符串遇到了一个问题,即空结束符是字符串api的一部分,因此在不改变底层字符串的情况下无法轻松创建子字符串(la strtok)。在c++中,这很容易解决,方法是分别存储长度,并将指针和大小包装到一个类中。
The one major obstacle and divergence from the C++ standard library philosophy that I can think of is that such "referential view" classes have completely different ownership semantics from the rest of the standard library. Basically, everything else in the standard library is unconditionally safe and correct (if it compiles, it's correct). With reference classes like this, that's no longer true. The correctness of your program depends on the ambient code that uses these classes. So that's harder to check and to teach.
其他回答
(2021年的自我教育)
从微软的<string_view>:
string_view系列模板专门化提供了一种有效的方法,可以将一个只读的、异常安全的、非所有的句柄传递给任何类字符串对象的字符数据,其序列的第一个元素位于位置0。(…)
摘自微软c++团队博客std::string_view: The Tape of String Types From August 21, 2018(检索2021年4月01日):
string_view solves the “every platform and library has its own string type” problem for parameters. It can bind to any sequence of characters, so you can just write your function as accepting a string view: void f(wstring_view); // string_view that uses wchar_t's and call it without caring what stringlike type the calling code is using (and > for (char*, length) argument pairs just add {} around them) (...) (...) Today, the most common “lowest common denominator” used to pass string data around is the null-terminated string (or as the standard calls it, the Null-Terminated Character Type Sequence). This has been with us since long before C++, and provides clean “flat C” interoperability. However, char* and its support library are associated with exploitable code, because length information is an in-band property of the data and susceptible to tampering. Moreover, the null used to delimit the length prohibits embedded nulls and causes one of the most common string operations, asking for the length, to be linear in the length of the string. (...) Each programming domain makes up their own new string type, lifetime semantics, and interface, but a lot of text processing code out there doesn’t care about that. Allocating entire copies of the data to process just to make differing string types happy is suboptimal for performance and reliability.
任何和所有类型的“字符串引用”和“数组引用”建议的目的都是为了避免复制已经在其他地方拥有的数据,并且只需要一个非突变视图。问题中的string_view就是这样一个提议;更早的还有string_ref和array_ref。
其思想始终是存储一对指向第一个元素的指针和一些现有数据数组或字符串的大小。
这样的视图句柄类可以通过值廉价地传递,并提供廉价的子字符串操作(可以实现为简单的指针增量和大小调整)。
字符串的许多使用并不需要实际拥有字符串,而且所讨论的字符串通常已经被其他人拥有。因此,通过避免不需要的副本(想想可以保存的所有分配和异常),确实有可能提高效率。
最初的C字符串遇到了一个问题,即空结束符是字符串api的一部分,因此在不改变底层字符串的情况下无法轻松创建子字符串(la strtok)。在c++中,这很容易解决,方法是分别存储长度,并将指针和大小包装到一个类中。
The one major obstacle and divergence from the C++ standard library philosophy that I can think of is that such "referential view" classes have completely different ownership semantics from the rest of the standard library. Basically, everything else in the standard library is unconditionally safe and correct (if it compiles, it's correct). With reference classes like this, that's no longer true. The correctness of your program depends on the ambient code that uses these classes. So that's harder to check and to teach.