什么是字符串实习在Java,当我应该使用它,为什么?


http://docs.oracle.com/javase/7/docs/api/java/lang/String.html实习生()

基本上,对一系列字符串执行String.intern()将确保具有相同内容的所有字符串共享相同的内存。因此,如果你有一个名字列表,其中“john”出现了1000次,通过实习,你可以确保只有一个“john”实际上被分配了内存。

这对于减少程序的内存需求非常有用。但是请注意,缓存是由JVM在永久内存池中维护的,与堆相比,永久内存池的大小通常是有限的,所以如果没有太多重复值,就不应该使用intern。


关于使用intern()的内存约束的更多信息

On one hand, it is true that you can remove String duplicates by internalizing them. The problem is that the internalized strings go to the Permanent Generation, which is an area of the JVM that is reserved for non-user objects, like Classes, Methods and other internal JVM objects. The size of this area is limited, and is usually much smaller than the heap. Calling intern() on a String has the effect of moving it out from the heap into the permanent generation, and you risk running out of PermGen space.

-- 来自:http://www.codeinstructions.com/2009/01/busting-javalangstringintern-myths.html


从JDK 7(我指的是HotSpot)开始,有些东西发生了变化。

In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.

——来自Java SE 7的特性和增强

更新:从Java 7开始,被存储的字符串存储在主堆中。http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html#jdk7changes


面试中有一些“很吸引人”的问题,比如为什么你的成绩和别人一样!如果执行下面的代码段。

String s1 = "testString";
String s2 = "testString";
if(s1 == s2) System.out.println("equals!");

如果你想比较字符串,你应该使用equals()。上面的代码将输出等于,因为testString已经被编译器为你进行了存储。您可以自己使用intern方法实习字符串,如前面的答案....所示


JLS

JLS 7 3.10.5定义了它,并给出了一个实际示例:

Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern. Example 3.10.5-1. String Literals The program consisting of the compilation unit (§7.3): package testPackage; class Test { public static void main(String[] args) { String hello = "Hello", lo = "lo"; System.out.print((hello == "Hello") + " "); System.out.print((Other.hello == hello) + " "); System.out.print((other.Other.hello == hello) + " "); System.out.print((hello == ("Hel"+"lo")) + " "); System.out.print((hello == ("Hel"+lo)) + " "); System.out.println(hello == ("Hel"+lo).intern()); } } class Other { static String hello = "Hello"; } and the compilation unit: package other; public class Other { public static String hello = "Hello"; } produces the output: true true true true false true

JVMS

JVMS 7 5.1说,使用专用的CONSTANT_String_info结构体神奇而有效地实现了实习(不像大多数其他对象有更通用的表示):

A string literal is a reference to an instance of class String, and is derived from a CONSTANT_String_info structure (§4.4.3) in the binary representation of a class or interface. The CONSTANT_String_info structure gives the sequence of Unicode code points constituting the string literal. The Java programming language requires that identical string literals (that is, literals that contain the same sequence of code points) must refer to the same instance of class String (JLS §3.10.5). In addition, if the method String.intern is called on any string, the result is a reference to the same class instance that would be returned if that string appeared as a literal. Thus, the following expression must have the value true: ("a" + "b" + "c").intern() == "abc" To derive a string literal, the Java Virtual Machine examines the sequence of code points given by the CONSTANT_String_info structure. If the method String.intern has previously been called on an instance of class String containing a sequence of Unicode code points identical to that given by the CONSTANT_String_info structure, then the result of string literal derivation is a reference to that same instance of class String. Otherwise, a new instance of class String is created containing the sequence of Unicode code points given by the CONSTANT_String_info structure; a reference to that class instance is the result of string literal derivation. Finally, the intern method of the new String instance is invoked.

字节码

让我们反编译一些OpenJDK 7字节码,看看实习的实际情况。

如果我们反编译:

public class StringPool {
    public static void main(String[] args) {
        String a = "abc";
        String b = "abc";
        String c = new String("abc");
        System.out.println(a);
        System.out.println(b);
        System.out.println(a == c);
    }
}

我们有常数池

#2 = String             #32   // abc
[...]
#32 = Utf8               abc

和主要:

 0: ldc           #2          // String abc
 2: astore_1
 3: ldc           #2          // String abc
 5: astore_2
 6: new           #3          // class java/lang/String
 9: dup
10: ldc           #2          // String abc
12: invokespecial #4          // Method java/lang/String."<init>":(Ljava/lang/String;)V
15: astore_3
16: getstatic     #5          // Field java/lang/System.out:Ljava/io/PrintStream;
19: aload_1
20: invokevirtual #6          // Method java/io/PrintStream.println:(Ljava/lang/String;)V
23: getstatic     #5          // Field java/lang/System.out:Ljava/io/PrintStream;
26: aload_2
27: invokevirtual #6          // Method java/io/PrintStream.println:(Ljava/lang/String;)V
30: getstatic     #5          // Field java/lang/System.out:Ljava/io/PrintStream;
33: aload_1
34: aload_3
35: if_acmpne     42
38: iconst_1
39: goto          43
42: iconst_0
43: invokevirtual #7          // Method java/io/PrintStream.println:(Z)V

注意:

0和3:加载相同的LDC #2常量(字面量) 12:创建一个新的字符串实例(使用#2作为参数) 35: a和c作为常规对象与if_acmpne进行比较

字节码中常量字符串的表示非常神奇:

它有一个专用的CONSTANT_String_info结构,不像常规对象(例如new String) 该结构指向包含数据的CONSTANT_Utf8_info结构体。这是表示字符串的唯一必要数据。

上面的jvm引用似乎是说,只要Utf8指向的是相同的,那么ldc就加载相同的实例。

我已经对字段做了类似的测试,并且:

String s = "abc"通过ConstantValue Attribute指向常量表 非final字段没有这个属性,但仍然可以用LDC初始化

结论:对字符串池有直接的字节码支持,并且内存表示是有效的。

额外的好处:将其与Integer池进行比较,后者没有直接的字节码支持(即没有CONSTANT_String_info模拟)。


字符串实习是编译器的一种优化技术。如果在一个编译单元中有两个相同的字符串字面值,则生成的代码将确保在程序集中为该字面值(双引号括起来的字符)的所有实例只创建一个字符串对象。

我来自c#背景,所以我可以通过给出一个例子来解释:

object obj = "Int32";
string str1 = "Int32";
string str2 = typeof(int).Name;

以下比较的输出:

Console.WriteLine(obj == str1); // true
Console.WriteLine(str1 == str2); // true    
Console.WriteLine(obj == str2); // false !?

注1:对象通过引用进行比较。

注2:typeof (int)。名称由反射方法求值,因此在编译时不会求值。这里这些比较是在编译时进行的。

结果分析: 1) true,因为它们都包含相同的文字,所以生成的代码将只有一个引用“Int32”的对象。见注1。

2) true,因为两个值的内容都是相同的。

3) FALSE,因为str2和obj没有相同的字面值。见注2。


针对Java 8或以上版本进行更新。 在Java 8中,永久生成(Permanent Generation)空间被移除,并被元空间(Meta space)取代。String池内存被移动到JVM的堆中。

与Java 7相比,堆中的String池大小增加了。因此,您可以为内部化的字符串提供更多的空间,但整个应用程序的内存却较少。

还有一件事,你已经知道在Java中比较2个对象的引用时,'=='用于比较对象的引用,'equals'用于比较对象的内容。

让我们检查一下这段代码:

String value1 = "70";
String value2 = "70";
String value3 = new Integer(70).toString();

结果:

Value1 == value2——> true

Value1 == value3——> false

Value1.equals (value3)——> true

Value1 == value3.intern()——> true

这就是为什么你应该使用'equals'来比较2个String对象。这就是intern()的用处所在。


因为字符串是对象,而且Java中的所有对象总是只存储在堆空间中,所以所有字符串都存储在堆空间中。然而,Java将不使用new关键字创建的字符串保存在堆空间的一个特殊区域中,这个区域称为“字符串池”。Java将使用new关键字创建的字符串保存在常规堆空间中。

The purpose of the string pool is to maintain a set of unique strings. Any time you create a new string without using the new keyword, Java checks whether the same string already exists in the string pool. If it does, Java returns a reference to the same String object and if it does not, Java creates a new String object in the string pool and returns its reference. So, for example, if you use the string "hello" twice in your code as shown below, you will get a reference to the same string. We can actually test this theory out by comparing two different reference variables using the == operator as shown in the following code:

String str1 = "hello";
String str2 = "hello";
System.out.println(str1 == str2); //prints true

String str3 = new String("hello");
String str4 = new String("hello");

System.out.println(str1 == str3); //prints false
System.out.println(str3 == str4); //prints false 

== operator is simply checks whether two references point to the same object or not and returns true if they do. In the above code, str2 gets the reference to the same String object which was created earlier. However, str3 and str4 get references to two entirely different String objects. That is why str1 == str2 returns true but str1 == str3 and str3 == str4 return false . In fact, when you do new String("hello"); two String objects are created instead of just one if this is the first time the string "hello" is used in the anywhere in program - one in the string pool because of the use of a quoted string, and one in the regular heap space because of the use of new keyword.

字符串池是Java通过避免创建包含相同值的多个String对象来节省程序内存的方法。可以使用string的intern方法从字符串池中获取使用new关键字创建的字符串的字符串。它被称为字符串对象的“实习”。例如,

String str1 = "hello";
String str2 = new String("hello");
String str3 = str2.intern(); //get an interned string obj

System.out.println(str1 == str2); //prints false
System.out.println(str1 == str3); //prints true

OCP Java SE 11程序员,Deshmukh


Java interning() method basically makes sure that if String object is present in SCP, If yes then it returns that object and if not then creates that objects in SCP and return its references

for eg: String s1=new String("abc");
        String s2="abc";
        String s3="abc";

s1==s2// false, because 1 object of s1 is stored in heap and other in scp(but this objects doesn't have explicit reference) and s2 in scp
s2==s3// true

now if we do intern on s1
s1=s1.intern() 

//JVM checks if there is any string in the pool with value “abc” is present? Since there is a string object in the pool with value “abc”, its reference is returned.
Notice that we are calling s1 = s1.intern(), so the s1 is now referring to the string pool object having value “abc”.
At this point, all the three string objects are referring to the same object in the string pool. Hence s1==s2 is returning true now.

通过使用堆对象引用,如果我们想要对应SCP对象引用,我们应该使用intern()方法。

例子:

class InternDemo
{
public static void main(String[] args)
{
String s1=new String("smith");
String s2=s1.intern();
String s3="smith";
System.out.println(s2==s3);//true
}
}

实习生流程图