为什么他们决定在Java和。net(和其他一些语言)中使字符串不可变?为什么不让它变呢?


这主要是出于安全考虑。如果您不能相信您的字符串是防篡改的,那么保护系统就困难得多。


至少有两个原因。

第一-安全http://www.javafaq.nu/java-article1060.html

The main reason why String made immutable was security. Look at this example: We have a file open method with login check. We pass a String to this method to process authentication which is necessary before the call will be passed to OS. If String was mutable it was possible somehow to modify its content after the authentication check before OS gets request from program then it is possible to request any file. So if you have a right to open text file in user directory but then on the fly when somehow you manage to change the file name you can request to open "passwd" file or any other. Then a file can be modified and it will be possible to login directly to OS.

第二-内存效率http://hikrish.blogspot.com/2006/07/why-string-class-is-immutable.html

JVM internally maintains the "String Pool". To achive the memory efficiency, JVM will refer the String object from pool. It will not create the new String objects. So, whenever you create a new string literal, JVM will check in the pool whether it already exists or not. If already present in the pool, just give the reference to the same object or create the new object in the pool. There will be many references point to the same String objects, if someone changes the value, it will affect all the references. So, sun decided to make it immutable.


线程安全和性能。如果一个字符串不能被修改,那么在多个线程之间传递引用是安全且快速的。如果字符串是可变的,则总是必须将字符串的所有字节复制到新实例,或者提供同步。一个典型的应用程序在每次需要修改字符串时将读取字符串100次。参见维基百科关于不变性的内容。


一个因素是,如果字符串是可变的,那么存储字符串的对象必须小心地存储副本,以免它们的内部数据在没有通知的情况下发生变化。鉴于字符串是一种相当基本的类型,就像数字一样,即使它们是通过引用传递的,也可以把它们当作是按值传递的,这是很好的(这也有助于节省内存)。


这是一种权衡。字符串进入String池,当您创建多个相同的字符串时,它们共享相同的内存。设计人员认为这种节省内存的技术在一般情况下会很好地工作,因为程序倾向于反复处理相同的字符串。

缺点是连接产生了许多额外的字符串,这些字符串只是过渡性的,只是垃圾,实际上会损害内存性能。在这些情况下,可以使用StringBuffer和StringBuilder(在Java中,。net中也有StringBuilder)来保存内存。


根据Effective Java,第4章,第73页,第二版:

"There are many good reasons for this: Immutable classes are easier to design, implement, and use than mutable classes. They are less prone to error and are more secure. [...] "Immutable objects are simple. An immutable object can be in exactly one state, the state in which it was created. If you make sure that all constructors establish class invariants, then it is guaranteed that these invariants will remain true for all time, with no effort on your part. [...] Immutable objects are inherently thread-safe; they require no synchronization. They cannot be corrupted by multiple threads accessing them concurrently. This is far and away the easiest approach to achieving thread safety. In fact, no thread can ever observe any effect of another thread on an immutable object. Therefore, immutable objects can be shared freely [...]

同一章的其他要点:

不仅可以共享不可变对象,还可以共享它们的内部结构。 […] 不可变对象为其他对象提供了很好的构建块,无论是可变的还是不可变的。 […] 不可变类的唯一缺点是,它们需要为每个不同的值提供一个单独的对象。


不变性很好。参见有效的Java。如果每次传递String时都必须复制它,那么这将是大量容易出错的代码。您还会混淆哪些修改会影响哪些引用。同样地,Integer必须是不可变的才能像int一样,string必须是不可变的才能像原语一样。在c++中,按值传递字符串是这样做的,源代码中没有明确提到。


String不是一个基本类型,但你通常想用值语义来使用它,即像一个值。

价值观是你可以信任的东西,不会在你背后改变。 String str = someExpr(); 你不希望它改变,除非你对str做些什么。

String作为对象自然具有指针语义,为了获得值语义,它也需要是不可变的。


在c++中使用可变字符串的决定会导致很多问题,请参阅Kelvin Henney关于疯牛病的出色文章。

COW =写入时拷贝。


人们应该问,“为什么X应该是可变的?”最好默认为不可变,因为Princess Fluff已经提到了它的好处。它应该是一个例外,某些东西是可变的。

不幸的是,目前大多数编程语言都默认为可变性,但希望未来的默认更多地是不变性(参见下一个主流编程语言的愿望清单)。


Java中的字符串并不是真正不可变的,您可以使用反射和或类加载来更改它们的值。你不应该依赖这个属性来保证安全。 有关示例请参见:Java中的魔术


实际上,字符串在java中是不可变的原因与安全性没有太大关系。主要有以下两个原因:

Thead安全:

字符串是被广泛使用的对象类型。因此,它或多或少可以保证在多线程环境中使用。字符串是不可变的,以确保在线程之间共享字符串是安全的。拥有一个不可变的字符串可以确保当线程A将字符串传递给另一个线程B时,线程B不能意外地修改线程A的字符串。

Not only does this help simplify the already pretty complicated task of multi-threaded programming, but it also helps with performance of multi-threaded applications. Access to mutable objects must somehow be synchronized when they can be accessed from multiple threads, to make sure that one thread doesn't attempt to read the value of your object while it is being modified by another thread. Proper synchronization is both hard to do correctly for the programmer, and expensive at runtime. Immutable objects cannot be modified and therefore do not need synchronization.

性能:

虽然已经提到了字符串实习,但它只代表了Java程序内存效率的一小部分提高。只有字符串字面量被存储。这意味着只有源代码中相同的字符串才会共享相同的字符串对象。如果你的程序动态地创建了相同的字符串,它们将在不同的对象中表示。

More importantly, immutable strings allow them to share their internal data. For many string operations, this means that the underlying array of characters does not need to be copied. For example, say you want to take the five first characters of String. In Java, you would calls myString.substring(0,5). In this case, what the substring() method does is simply to create a new String object that shares myString's underlying char[] but who knows that it starts at index 0 and ends at index 5 of that char[]. To put this in graphical form, you would end up with the following:

 |               myString                  |
 v                                         v
"The quick brown fox jumps over the lazy dog"   <-- shared char[]
 ^   ^
 |   |  myString.substring(0,5)

这使得这种操作非常便宜,O(1),因为该操作既不依赖于原始字符串的长度,也不依赖于我们需要提取的子字符串的长度。这种行为也有一些内存好处,因为许多字符串可以共享它们的底层char[]。


哇!我不敢相信这里的错误信息。不可变的字符串与安全性无关。如果某人已经可以访问正在运行的应用程序中的对象(如果你试图防止某人在你的应用程序中“入侵”字符串,就必须假设这一点),那么他们肯定有很多其他可用的黑客机会。

String的不可变性解决了线程问题,这是一个相当新颖的想法。嗯…我有一个被两个不同线程改变的对象。我如何解决这个问题?同步对对象的访问?Naawww……让我们不要让任何人改变对象——这将解决我们所有混乱的并发问题!事实上,让我们让所有对象都是不可变的,然后我们就可以从Java语言中删除synchronized结构。

The real reason (pointed out by others above) is memory optimization. It is quite common in any application for the same string literal to be used repeatedly. It is so common, in fact, that decades ago, many compilers made the optimization of storing only a single instance of a String literal. The drawback of this optimization is that runtime code that modifies a String literal introduces a problem because it is modifying the instance for all other code that shares it. For example, it would be not good for a function somewhere in an application to change the String literal "dog" to "cat". A printf("dog") would result in "cat" being written to stdout. For that reason, there needed to be a way of guarding against code that attempts to change String literals (i. e., make them immutable). Some compilers (with support from the OS) would accomplish this by placing String literal into a special readonly memory segment that would cause a memory fault if a write attempt was made.

在Java中,这被称为实习。这里的Java编译器只是遵循了编译器几十年来所做的标准内存优化。为了解决这些String字面值在运行时被修改的相同问题,Java简单地使String类不可变(即,不提供允许您更改String内容的setter)。如果字符串字面量没有发生转换,字符串就不必是不可变的。


在大多数情况下,“字符串”(被用作/视为/认为/假定)是一个有意义的原子单位,就像一个数字一样。

因此,问为什么字符串的单个字符是不可变的,就像问为什么整数的单个比特是不可变的一样。

你应该知道原因。想想看。

我不想这么说,但不幸的是,我们正在讨论这个问题,因为我们的语言很糟糕,我们试图使用一个单一的词,字符串,来描述一个复杂的,上下文定位的概念或对象类。

我们对“字符串”执行计算和比较,类似于对数字的操作。如果字符串(或整数)是可变的,我们必须编写特殊的代码来将它们的值锁定为不可变的局部形式,以便可靠地执行任何类型的计算。因此,最好将字符串视为数字标识符,但它可能是数百位,而不是16位、32位或64位。

When someone says "string", we all think of different things. Those who think of it simply as a set of characters, with no particular purpose in mind, will of course be appalled that someone just decided that they should not be able to manipulate those characters. But the "string" class isn't just an array of characters. It's a STRING, not a char[]. There are some basic assumptions about the concept we refer to as a "string", and it generally can be described as meaningful, atomic unit of coded data like a number. When people talk about "manipulating strings", perhaps they're really talking about manipulating characters to build strings, and a StringBuilder is great for that. Just think a bit about what the word "string" truly means.

考虑一下如果字符串是可变的会是什么样子。如果可变用户名字符串在此函数使用时被另一个线程有意或无意地修改,则以下API函数可能被欺骗返回不同用户的信息:

string GetPersonalInfo( string username, string password )
{
    string stored_password = DBQuery.GetPasswordFor( username );
    if (password == stored_password)
    {
        //another thread modifies the mutable 'username' string
        return DBQuery.GetPersonalInfoFor( username );
    }
}

安全不仅仅是“访问控制”,它还涉及“安全性”和“保证正确性”。如果一个方法不容易编写,也不能可靠地依靠它来执行简单的计算或比较,那么调用它是不安全的,但是对编程语言本身提出质疑是安全的。


不可变性与安全性并没有那么紧密的联系。为此,至少在。net中,你得到了SecureString类。

稍后编辑:在Java中,你会发现GuardedString,一个类似的实现。


几乎每条规则都有例外:

using System;
using System.Runtime.InteropServices;

namespace Guess
{
    class Program
    {
        static void Main(string[] args)
        {
            const string str = "ABC";

            Console.WriteLine(str);
            Console.WriteLine(str.GetHashCode());

            var handle = GCHandle.Alloc(str, GCHandleType.Pinned);

            try
            {
                Marshal.WriteInt16(handle.AddrOfPinnedObject(), 4, 'Z');

                Console.WriteLine(str);
                Console.WriteLine(str.GetHashCode());
            }
            finally
            {
                handle.Free();
            }
        }
    }
}

我知道这是个意外,但是… 它们真的是不可变的吗? 考虑以下几点。

public static unsafe void MutableReplaceIndex(string s, char c, int i)
{
    fixed (char* ptr = s)
    {
        *((char*)(ptr + i)) = c;
    }
}

...

string s = "abc";
MutableReplaceIndex(s, '1', 0);
MutableReplaceIndex(s, '2', 1);
MutableReplaceIndex(s, '3', 2);
Console.WriteLine(s); // Prints 1 2 3

你甚至可以让它成为一个扩展方法。

public static class Extensions
{
    public static unsafe void MutableReplaceIndex(this string s, char c, int i)
    {
        fixed (char* ptr = s)
        {
            *((char*)(ptr + i)) = c;
        }
    }
}

是什么使下面的工作

s.MutableReplaceIndex('1', 0);
s.MutableReplaceIndex('2', 1);
s.MutableReplaceIndex('3', 2);

Conclusion: They're in an immutable state which is known by the compiler. Of couse the above only applies to .NET strings as Java doesn't have pointers. However a string can be entirely mutable using pointers in C#. It's not how pointers are intended to be used, has practical usage or is safely used; it's however possible, thus bending the whole "mutable" rule. You can normally not modify an index directly of a string and this is the only way. There is a way that this could be prevented by disallowing pointer instances of strings or making a copy when a string is pointed to, but neither is done, which makes strings in C# not entirely immutable.