这是Java长期以来的一个抱怨,但它在很大程度上是没有意义的,而且通常是基于查看错误的信息。通常的说法是“Java上的Hello World需要10兆字节!”它为什么需要这个?”好吧,这里有一种方法可以让Hello World在64位JVM上占用超过4g字节……至少从一种衡量方式来看是这样的。
java -Xms1024m -Xmx4096m com.example.Hello
测量内存的不同方法
在Linux上,top命令为内存提供了几个不同的数字。下面是关于Hello World的例子:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2120 kgregory 20 0 4373m 15m 7152 S 0 0.2 0:00.10 java
VIRT is the virtual memory space: the sum of everything in the virtual memory map (see below). It is largely meaningless, except when it isn't (see below).
RES is the resident set size: the number of pages that are currently resident in RAM. In almost all cases, this is the only number that you should use when saying "too big." But it's still not a very good number, especially when talking about Java.
SHR is the amount of resident memory that is shared with other processes. For a Java process, this is typically limited to shared libraries and memory-mapped JARfiles. In this example, I only had one Java process running, so I suspect that the 7k is a result of libraries used by the OS.
SWAP isn't turned on by default, and isn't shown here. It indicates the amount of virtual memory that is currently resident on disk, whether or not it's actually in the swap space. The OS is very good about keeping active pages in RAM, and the only cures for swapping are (1) buy more memory, or (2) reduce the number of processes, so it's best to ignore this number.
Windows任务管理器的情况要复杂一些。在Windows XP下,有“内存使用”和“虚拟内存大小”列,但官方文档对它们的含义保持沉默。Windows Vista和Windows 7增加了更多的列,它们实际上是有文档的。其中,“工作集”测量是最有用的;它大致相当于Linux上RES和SHR的总和。
了解虚拟内存映射
The virtual memory consumed by a process is the total of everything that's in the process memory map. This includes data (eg, the Java heap), but also all of the shared libraries and memory-mapped files used by the program. On Linux, you can use the pmap command to see all of the things mapped into the process space (from here on out I'm only going to refer to Linux, because it's what I use; I'm sure there are equivalent tools for Windows). Here's an excerpt from the memory map of the "Hello World" program; the entire memory map is over 100 lines long, and it's not unusual to have a thousand-line list.
0000000040000000 36K r-x-- /usr/local/java/jdk-1.6-x64/bin/java
0000000040108000 8K rwx-- /usr/local/java/jdk-1.6-x64/bin/java
0000000040eba000 676K rwx-- [ anon ]
00000006fae00000 21248K rwx-- [ anon ]
00000006fc2c0000 62720K rwx-- [ anon ]
0000000700000000 699072K rwx-- [ anon ]
000000072aab0000 2097152K rwx-- [ anon ]
00000007aaab0000 349504K rwx-- [ anon ]
00000007c0000000 1048576K rwx-- [ anon ]
...
00007fa1ed00d000 1652K r-xs- /usr/local/java/jdk-1.6-x64/jre/lib/rt.jar
...
00007fa1ed1d3000 1024K rwx-- [ anon ]
00007fa1ed2d3000 4K ----- [ anon ]
00007fa1ed2d4000 1024K rwx-- [ anon ]
00007fa1ed3d4000 4K ----- [ anon ]
...
00007fa1f20d3000 164K r-x-- /usr/local/java/jdk-1.6-x64/jre/lib/amd64/libjava.so
00007fa1f20fc000 1020K ----- /usr/local/java/jdk-1.6-x64/jre/lib/amd64/libjava.so
00007fa1f21fb000 28K rwx-- /usr/local/java/jdk-1.6-x64/jre/lib/amd64/libjava.so
...
00007fa1f34aa000 1576K r-x-- /lib/x86_64-linux-gnu/libc-2.13.so
00007fa1f3634000 2044K ----- /lib/x86_64-linux-gnu/libc-2.13.so
00007fa1f3833000 16K r-x-- /lib/x86_64-linux-gnu/libc-2.13.so
00007fa1f3837000 4K rwx-- /lib/x86_64-linux-gnu/libc-2.13.so
...
简单解释一下格式:每一行都以段的虚拟内存地址开始。接下来是段大小、权限和段的源。最后一项是一个文件或“anon”,它表示通过mmap分配的内存块。
从上面开始,我们有
The JVM loader (ie, the program that gets run when you type java). This is very small; all it does is load in the shared libraries where the real JVM code is stored.
A bunch of anon blocks holding the Java heap and internal data. This is a Sun JVM, so the heap is broken into multiple generations, each of which is its own memory block. Note that the JVM allocates virtual memory space based on the -Xmx value; this allows it to have a contiguous heap. The -Xms value is used internally to say how much of the heap is "in use" when the program starts, and to trigger garbage collection as that limit is approached.
A memory-mapped JARfile, in this case the file that holds the "JDK classes." When you memory-map a JAR, you can access the files within it very efficiently (versus reading it from the start each time). The Sun JVM will memory-map all JARs on the classpath; if your application code needs to access a JAR, you can also memory-map it.
Per-thread data for two threads. The 1M block is the thread stack. I didn't have a good explanation for the 4k block, but @ericsoe identified it as a "guard block": it does not have read/write permissions, so will cause a segment fault if accessed, and the JVM catches that and translates it to a StackOverFlowError. For a real app, you will see dozens if not hundreds of these entries repeated through the memory map.
One of the shared libraries that holds the actual JVM code. There are several of these.
The shared library for the C standard library. This is just one of many things that the JVM loads that are not strictly part of Java.
共享库特别有趣:每个共享库至少有两个段:一个只读段包含库代码,一个读写段包含库的全局每进程数据(我不知道没有权限的段是什么;我只在x64 Linux上见过)。库的只读部分可以在所有使用该库的进程之间共享;例如,libc有1.5M的虚拟内存空间可以共享。
什么时候虚拟内存大小很重要?
虚拟内存映射包含很多东西。其中一些是只读的,一些是共享的,还有一些是已分配但从未被触及的(例如,在本例中几乎所有的4Gb堆)。但是操作系统足够智能,只加载它需要的东西,所以虚拟内存大小在很大程度上是无关紧要的。
虚拟内存大小很重要的情况是,如果运行在32位操作系统上,则只能分配2Gb(某些情况下是3Gb)进程地址空间。在这种情况下,您正在处理稀缺资源,并且可能不得不做出权衡,例如为了内存映射一个大文件或创建大量线程而减小堆大小。
但是,考虑到64位计算机无处不在,我不认为虚拟内存大小将是一个完全无关的统计数据。
常驻集大小什么时候重要?
常驻集大小是RAM中实际存在的虚拟内存空间的一部分。如果您的RSS增长到总物理内存的很大一部分,那么可能是时候开始担心了。如果您的RSS增长到占用您所有的物理内存,并且您的系统开始交换,那么您早就该开始担心了。
但是RSS也会误导人,特别是在负载较轻的机器上。操作系统不需要花费大量精力来回收进程使用的页面。这样做几乎没有什么好处,而且如果将来流程接触页面,可能会出现代价高昂的页面错误。因此,RSS统计数据可能包括许多不活跃使用的页面。
底线
除非您正在进行交换,否则不要过度关注各种内存统计数据告诉您的信息。需要注意的是,不断增长的RSS可能表明某种类型的内存泄漏。
对于Java程序,关注堆中发生的事情要重要得多。所消耗的空间总量很重要,您可以采取一些步骤来减少空间总量。更重要的是在垃圾收集上花费的时间,以及收集堆的哪些部分。
访问磁盘(即数据库)是昂贵的,而内存是便宜的。如果你可以用其中一个来交换另一个,那就这样做。