在回答了一个关于如何使用System.gc()在Java中强制释放对象的问题(这个人正在清除1.5GB的HashMap)后,我被告知手动调用System.gc()是一种糟糕的做法,但评论并不完全令人信服。此外,似乎没有人敢对我的回答投赞成票,也没有人敢投反对票。

我在那里被告知这是一种糟糕的做法,但后来我又被告知垃圾收集器的运行不再系统地停止整个世界,而且JVM也只能有效地将其用作提示,所以我有点不知所措。

我知道JVM通常比您更了解何时需要回收内存。我也明白,担心几千字节的数据是愚蠢的。我也明白,即使是兆字节的数据也不如几年前了。但还是1.5 gb ?你知道大概有1.5 GB的数据在内存中;这又不是瞎猜的。System.gc()在系统上是坏的,还是在某个点上变得正常了?

所以这个问题实际上是双重的

Why is or isn't it bad practice to call System.gc()? Is it really merely a hint to the JVM under certain implementations, or is it always a full collection cycle? Are there really garbage collector implementations that can do their work without stopping the world? Please shed some light over the various assertions people have made in the comments to my answer. Where's the threshold? Is it never a good idea to call System.gc(), or are there times when it's acceptable? If so, what are those times?


当前回答

首先,规范和现实之间是有区别的。规范说System.gc()提示GC应该运行,VM可以忽略它。实际情况是,VM永远不会忽略对System.gc()的调用。

Calling GC comes with a non-trivial overhead to the call and if you do this at some random point in time it's likely you'll see no reward for your efforts. On the other hand, a naturally triggered collection is very likely to recoup the costs of the call. If you have information that indicates that a GC should be run than you can make the call to System.gc() and you should see benefits. However, it's my experience that this happens only in a few edge cases as it's very unlikely that you'll have enough information to understand if and when System.gc() should be called.

这里列出了一个例子,在IDE中敲击垃圾桶。如果你要去开会,为什么不去呢?开销不会影响您,当您返回时,可能会清理堆。在生产系统中执行此操作,频繁调用收集将使其彻底停止!即使是RMI偶尔发出的调用也会对性能造成破坏。

其他回答

有时(不是经常!)您确实比运行时更了解过去、当前和将来的内存使用情况。这种情况并不经常发生,而且我敢说,在web应用程序中,当提供正常页面时,这种情况绝不会发生。

很多年前,我在一个报告生成器上工作

只有一根线 从队列中读取“报告请求” 从数据库加载报告所需的数据 生成报告并通过电子邮件发送出去。 没完没了地重复,没有特别的要求就睡去。 它没有在报告之间重复使用任何数据,也没有进行任何兑现。

首先,因为它不是实时的,而且用户希望等待报告,GC运行时的延迟不是问题,但是我们需要以比请求更快的速度生成报告。

看了上面的过程大纲,很明显。

我们知道,在报告通过电子邮件发送出去之后,活动对象会非常少,因为下一个请求还没有开始处理。 众所周知,运行垃圾收集周期的成本取决于活动对象的数量,垃圾的数量对GC运行的成本几乎没有影响。 当队列为空时,没有什么更好的事情可做,然后运行GC。

因此,当请求队列为空时执行GC运行显然是非常值得的;这并没有什么坏处。

在每个报告通过电子邮件发送之后执行GC运行可能是值得的,因为我们知道这是GC运行的好时机。但是,如果计算机有足够的ram,则可以通过延迟GC运行来获得更好的结果。

这种行为是在每个安装基础上配置的,对于一些客户来说,在每个报告之后启用强制GC可以大大加快报告的生成速度。(我认为这是由于他们服务器上的内存较低,并且运行了许多其他进程,因此强制GC减少了分页。)

每次工作队列为空时,我们从未检测到一个安装没有从强制GC运行中获益。

但是,需要明确的是,上述情况并不常见。

现在,我更倾向于在单独的进程中运行每个报告,让操作系统清理内存,而不是使用垃圾收集器,并让自定义队列管理器服务在大型服务器上使用多个工作进程。

前面已经解释过,调用system.gc()可能什么都不做,任何“需要”垃圾收集器运行的代码都是坏的。

然而,调用System.gc()是一种糟糕的实践,其实际原因是它效率低下。在最坏的情况下,它的效率非常低!让我解释一下。

典型的GC算法通过遍历堆中的所有非垃圾对象来识别垃圾,并推断任何未访问的对象都必须是垃圾。由此,我们可以对垃圾收集的总工作进行建模,其中一部分与活动数据量成正比,另一部分与垃圾量成正比;即工作=(生活* W1 +垃圾* W2)。

现在假设您在单线程应用程序中执行以下操作。

System.gc(); System.gc();

第一个调用将(我们预测)做(活* W1 +垃圾* W2)工作,并摆脱未处理的垃圾。

第二个调用将执行(live* W1 + 0 * W2)工作,并且不回收任何东西。换句话说,我们做了(活的)工作,却一事无成。

我们可以将收集器的效率建模为收集一个单位垃圾所需的工作量;即效率=(活* W1 +垃圾* W2) /垃圾。因此,为了使GC尽可能高效,我们需要在运行GC时最大化垃圾的价值;也就是说,一直等到堆满。(并且,使堆尽可能大。但这是另一个话题。)

如果应用程序不进行干预(通过调用System.gc()), GC将等到堆满才运行,从而有效地收集garbage1。但是,如果应用程序强制GC运行,则堆可能不会满,结果将是垃圾收集效率低下。应用程序强制GC的频率越高,GC的效率就越低。

注意:上面的解释掩盖了一个事实,即典型的现代GC将堆划分为“空间”,GC可能会动态扩展堆,应用程序的非垃圾对象的工作集可能会变化等等。即便如此,同样的基本原则也适用于所有真正的垃圾收集器2。强制GC运行效率很低。


1 -这就是“吞吐量”收集器的工作原理。并发收集器(如CMS和G1)使用不同的标准来决定何时启动垃圾收集器。

2 -我也排除了专门使用引用计数的内存管理器,但目前没有Java实现使用这种方法…理由很充分。

人们已经很好地解释了为什么不使用它,所以我将告诉你一些你应该使用它的情况:

(下面的评论适用于在带有CMS收集器的Linux上运行的Hotspot,在这里我有信心地说System.gc()实际上总是调用完整的垃圾收集)。

After the initial work of starting up your application, you may be a terrible state of memory usage. Half your tenured generation could be full of garbage, meaning that you are that much closer to your first CMS. In applications where that matters, it is not a bad idea to call System.gc() to "reset" your heap to the starting state of live data. Along the same lines as #1, if you monitor your heap usage closely, you want to have an accurate reading of what your baseline memory usage is. If the first 2 minutes of your application's uptime is all initialization, your data is going to be messed up unless you force (ahem... "suggest") the full gc up front. You may have an application that is designed to never promote anything to the tenured generation while it is running. But maybe you need to initialize some data up-front that is not-so-huge as to automatically get moved to the tenured generation. Unless you call System.gc() after everything is set up, your data could sit in the new generation until the time comes for it to get promoted. All of a sudden your super-duper low-latency, low-GC application gets hit with a HUGE (relatively speaking, of course) latency penalty for promoting those objects during normal operations. It is sometimes useful to have a System.gc call available in a production application for verifying the existence of a memory leak. If you know that the set of live data at time X should exist in a certain ratio to the set of live data at time Y, then it could be useful to call System.gc() a time X and time Y and compare memory usage.

我将要写的一些内容只是对其他答案中已经写过的内容的总结,还有一些是新的。

“为什么调用System.gc()是不好的做法?”这个问题没有计算出来。它假定这是不好的做法,但事实并非如此。这在很大程度上取决于你想要完成什么。

绝大多数程序员不需要System.gc(),而且在绝大多数用例中,它永远不会对他们做任何有用的事情。因此,对于大多数人来说,调用它是一种糟糕的做法,因为它不会做他们认为它会做的任何事情,它只会增加开销。

然而,在极少数情况下,调用System.gc()实际上是有益的:

When you are absolutely sure that you have some CPU time to spare now, and you want to improve the throughput of code that will run later. For example, a web server that discovers that there are no pending web requests at the moment can initiate garbage collection now, so as to reduce the chances that garbage collection will be needed during the processing of a barrage of web requests later on. (Of course this can hurt if a web request arrives during collection, but the web server could be smart about it and abandon collection if a request comes in.) Desktop GUIs are another example: on the idle event (or, more broadly, after a period of inactivity,) you can give the JVM a hint that if it has any garbage collection to do, now is better than later. When you want to detect memory leaks. This is often done in combination with a debug-mode-only finalizer, or with the java.lang.ref.Cleaner class from Java 9 onwards. The idea is that by forcing garbage collection now, and thus discovering memory leaks now as opposed to some random point in time in the future, you can detect the memory leaks as soon as possible after they have happened, and therefore be in a better position to tell precisely which piece of code has leaked memory and why. (Incidentally, this is also one of, or perhaps the only, legitimate use cases for finalizers or the Cleaner. The practice of using finalization for recycling of unmanaged resources is flawed, despite being very widespread and even officially recommended, because it is non-deterministic. For more on this topic, read this: https://blog.michael.gr/2021/01/object-lifetime-awareness.html) When you are measuring the performance of code, (benchmarking,) in order to reduce/minimize the chances of garbage collection occurring during the benchmark, or in order to guarantee that whatever overhead is suffered due to garbage collection during the benchmark is due to garbage generated by the code under benchmark, and not by unrelated code. A good benchmark always starts with an as thorough as possible garbage collection. When you are measuring the memory consumption of code, in order to determine how much garbage is generated by a piece of code. The idea is to perform a full garbage collection so as to start in a clean state, run the code under measurement, obtain the heap size, then do another full garbage collection, obtain the heap size again, and take the difference. (Incidentally, the ability to temporarily suppress garbage collection while running the code under measurement would be useful here, alas, the JVM does not support that. This is deplorable.)

请注意,在上面的用例中,只有一个是在生产场景中;其余的在测试/诊断场景中。

这意味着System.gc()在某些情况下非常有用,这反过来意味着它“只是一个提示”是有问题的。

(只要JVM没有提供一些确定性和有保证的方法来控制垃圾收集,JVM就会在这方面被破坏。)

以下是如何将System.gc()转换为更少的提示:

private static void runGarbageCollection()
{
    for( WeakReference<Object> ref = new WeakReference<>( new Object() ); ; )
    {
        System.gc(); //optional
        Runtime.getRuntime().runFinalization(); //optional
        if( ref.get() == null )
            break;
        Thread.yield();
    }
}

这仍然不能保证您将得到一个完整的GC,但它已经很接近了。具体来说,即使使用了-XX:DisableExplicitGC VM选项,它也会为您提供一定数量的垃圾收集。(因此,它真正使用System.gc()作为提示;它并不依赖于它。)

根据我的经验,使用System.gc()实际上是一种平台特定形式的优化(其中“平台”是硬件架构、OS、JVM版本和可能的更多运行时参数(如可用的RAM)的组合),因为它的行为虽然在特定平台上大致可预测,但在不同平台之间可能(也将)有很大差异。

是的,在某些情况下System.gc()将提高(可感知的)性能。举个例子,如果延迟在你的应用的某些部分是可以容忍的,但在其他部分却不能(就像上文所提到的游戏例子,你希望GC发生在关卡开始时,而不是在关卡进行时)。

然而,它是帮助还是伤害(或什么都不做)在很大程度上取决于平台(如上所定义)。

所以我认为这是针对特定平台的最后一种优化方法(即如果其他性能优化还不够的话)。但是,您绝不应该仅仅因为相信它可能有帮助(没有特定的基准)就调用它,因为它很可能没有帮助。