为什么他们决定在Java和。net(和其他一些语言)中使字符串不可变?为什么不让它变呢?


当前回答

这是一种权衡。字符串进入String池,当您创建多个相同的字符串时,它们共享相同的内存。设计人员认为这种节省内存的技术在一般情况下会很好地工作,因为程序倾向于反复处理相同的字符串。

缺点是连接产生了许多额外的字符串,这些字符串只是过渡性的,只是垃圾,实际上会损害内存性能。在这些情况下,可以使用StringBuffer和StringBuilder(在Java中,。net中也有StringBuilder)来保存内存。

其他回答

不变性很好。参见有效的Java。如果每次传递String时都必须复制它,那么这将是大量容易出错的代码。您还会混淆哪些修改会影响哪些引用。同样地,Integer必须是不可变的才能像int一样,string必须是不可变的才能像原语一样。在c++中,按值传递字符串是这样做的,源代码中没有明确提到。

String不是一个基本类型,但你通常想用值语义来使用它,即像一个值。

价值观是你可以信任的东西,不会在你背后改变。 String str = someExpr(); 你不希望它改变,除非你对str做些什么。

String作为对象自然具有指针语义,为了获得值语义,它也需要是不可变的。

实际上,字符串在java中是不可变的原因与安全性没有太大关系。主要有以下两个原因:

Thead安全:

字符串是被广泛使用的对象类型。因此,它或多或少可以保证在多线程环境中使用。字符串是不可变的,以确保在线程之间共享字符串是安全的。拥有一个不可变的字符串可以确保当线程A将字符串传递给另一个线程B时,线程B不能意外地修改线程A的字符串。

Not only does this help simplify the already pretty complicated task of multi-threaded programming, but it also helps with performance of multi-threaded applications. Access to mutable objects must somehow be synchronized when they can be accessed from multiple threads, to make sure that one thread doesn't attempt to read the value of your object while it is being modified by another thread. Proper synchronization is both hard to do correctly for the programmer, and expensive at runtime. Immutable objects cannot be modified and therefore do not need synchronization.

性能:

虽然已经提到了字符串实习,但它只代表了Java程序内存效率的一小部分提高。只有字符串字面量被存储。这意味着只有源代码中相同的字符串才会共享相同的字符串对象。如果你的程序动态地创建了相同的字符串,它们将在不同的对象中表示。

More importantly, immutable strings allow them to share their internal data. For many string operations, this means that the underlying array of characters does not need to be copied. For example, say you want to take the five first characters of String. In Java, you would calls myString.substring(0,5). In this case, what the substring() method does is simply to create a new String object that shares myString's underlying char[] but who knows that it starts at index 0 and ends at index 5 of that char[]. To put this in graphical form, you would end up with the following:

 |               myString                  |
 v                                         v
"The quick brown fox jumps over the lazy dog"   <-- shared char[]
 ^   ^
 |   |  myString.substring(0,5)

这使得这种操作非常便宜,O(1),因为该操作既不依赖于原始字符串的长度,也不依赖于我们需要提取的子字符串的长度。这种行为也有一些内存好处,因为许多字符串可以共享它们的底层char[]。

我知道这是个意外,但是… 它们真的是不可变的吗? 考虑以下几点。

public static unsafe void MutableReplaceIndex(string s, char c, int i)
{
    fixed (char* ptr = s)
    {
        *((char*)(ptr + i)) = c;
    }
}

...

string s = "abc";
MutableReplaceIndex(s, '1', 0);
MutableReplaceIndex(s, '2', 1);
MutableReplaceIndex(s, '3', 2);
Console.WriteLine(s); // Prints 1 2 3

你甚至可以让它成为一个扩展方法。

public static class Extensions
{
    public static unsafe void MutableReplaceIndex(this string s, char c, int i)
    {
        fixed (char* ptr = s)
        {
            *((char*)(ptr + i)) = c;
        }
    }
}

是什么使下面的工作

s.MutableReplaceIndex('1', 0);
s.MutableReplaceIndex('2', 1);
s.MutableReplaceIndex('3', 2);

Conclusion: They're in an immutable state which is known by the compiler. Of couse the above only applies to .NET strings as Java doesn't have pointers. However a string can be entirely mutable using pointers in C#. It's not how pointers are intended to be used, has practical usage or is safely used; it's however possible, thus bending the whole "mutable" rule. You can normally not modify an index directly of a string and this is the only way. There is a way that this could be prevented by disallowing pointer instances of strings or making a copy when a string is pointed to, but neither is done, which makes strings in C# not entirely immutable.

线程安全和性能。如果一个字符串不能被修改,那么在多个线程之间传递引用是安全且快速的。如果字符串是可变的,则总是必须将字符串的所有字节复制到新实例,或者提供同步。一个典型的应用程序在每次需要修改字符串时将读取字符串100次。参见维基百科关于不变性的内容。