简单的.substring(…)与原始String对象共享内部使用的char数组,然后使用new String(…)可以在需要时将其复制到新数组(以避免阻碍原始数组的垃圾收集)。
正是因为字符串是不可变的,. substring必须至少复制原始字符串的一部分。复制n个字节需要O(n)个时间。
考虑在。net中,一个数兆字节的字符串,有人对它调用. substring (n, n+3)(对于字符串中间的任何n)。
现在,整个字符串不能被垃圾收集只是因为一个引用持有4个字符? 这似乎是对空间的荒谬浪费。
此外,跟踪对子字符串(甚至可能在子字符串内部)的引用,并试图在最佳时间复制以避免击败GC(如上所述),使这个概念成为噩梦。复制到. substring上并维护直接的不可变模型要简单得多,也更可靠。
字符串“Hello there”因此表示为
0B 00 00 00 48 00 65 00 6C 00 6F 00 20 00 74 00 68 00 65 00 72 00 65 00 00 00
string s = "Hello there";
ReadOnlySpan<char> hello = s.AsSpan(0, 5);
ReadOnlySpan<char> ell = hello.Slice(1, 3);
ReadOnlySpan<char> "substring"独立地存储长度,它不保证值的末尾有一个'\0'。它可以在许多方面“像字符串一样”使用,但它不是“字符串”,因为它既没有BStr特征,也没有CStr特征(更不用说两者都有)。如果你从来不(直接)P/Invoke,那么没有太大的区别(除非你想调用的API没有ReadOnlySpan<char>重载)。
ReadOnlySpan<char>不能用作引用类型的字段,因此还有ReadOnlyMemory<char> (s.AsMemory(0,5)),这是ReadOnlySpan<char>的间接方式,因此存在相同的差异-from-string。
Some of the answers/comments on previous answers talked about it being wasteful to have the garbage collector have to keep a million-character string around while you continue to talk about 5 characters. That is precisely the behavior you can get with the ReadOnlySpan<char> approach. If you're just doing short computations, the ReadOnlySpan approach is probably better. If you need to persist it for a while and you're going to keep only a small percentage of the original string, doing a proper substring (to trim off the excess data) is probably better. There's a transition point somewhere in the middle, but it depends on your specific usage.
简单的.substring(…)与原始String对象共享内部使用的char数组,然后使用new String(…)可以在需要时将其复制到新数组(以避免阻碍原始数组的垃圾收集)。
People typically use "substring" to extract a short string -- say, ten or twenty characters -- out of a somewhat longer string -- maybe a couple hundred characters. You have a line of text in a comma-separated file and you want to extract the third field, which is a last name. The line will be maybe a couple hundred characters long, the name will be a couple dozen. String allocation and memory copying of fifty bytes is astonishingly fast on modern hardware. That making a new data structure that consists of a pointer to the middle of an existing string plus a length is also astonishingly fast is irrelevant; "fast enough" is by definition fast enough.
If the substring operations people typically did on strings were completely different, then it would make sense to go with a persistent approach. If people typically had million-character strings, and were extracting thousands of overlapping substrings with sizes in the hundred-thousand-character range, and those substrings lived a long time on the heap, then it would make perfect sense to go with a persistent substring approach; it would be wasteful and foolish not to. But most line-of-business programmers do not do anything even vaguely like those sorts of things. .NET is not a platform that is tailored for the needs of the Human Genome Project; DNA analysis programmers have to solve problems with those string usage characteristics every day; odds are good that you do not. The few who do build their own persistent data structures that closely match their usage scenarios.
For example, my team writes programs that do on-the-fly analysis of C# and VB code as you type it. Some of those code files are enormous and thus we cannot be doing O(n) string manipulation to extract substrings or insert or delete characters. We have built a bunch of persistent immutable data structures for representing edits to a text buffer that permit us to quickly and efficiently re-use the bulk of the existing string data and the existing lexical and syntactic analyses upon a typical edit. This was a hard problem to solve and its solution was narrowly tailored to the specific domain of C# and VB code editing. It would be unrealistic to expect the built-in string type to solve this problem for us.