Tuning Garbage Collection for Mission-Critical Java Applications

9 篇文章 0 订阅
7 篇文章 0 订阅

原文链接http://blog.mgm-tp.com/2013/03/garbage-collection-tuning/

I recently had the opportunity to test and tune the performance of several shop and portal applications built with Java and running on the Sun/Oracle JVM, among them some of the most visited in Germany. In many cases garbage collection is a key aspect of Java server performance. In the following article we take a look at the state-of-the-art advanced GC algorithms and important tuning options and compare them for diverse real-world scenarios.

最近我有机会测试并且调优几个商场和portal在线应用的性能,这些应用使用java构建,并且运行在Sun/Oracle的虚拟机上。在大多数情况下,垃圾回收都是影响java服务器性能的关键因素。在下面的文章中,我们将看到高水平的GC算法和重要的调优建议,并且在不同的环境下进行比较。


Seen from the point of Garbage Collection, Java server applications have wide varying requirements:

  1. Some are high-traffic applications serving a huge amount of requests and creating a huge amount of objects. Sometimes, moderate-traffic applications using wasteful software frameworks do the same thing. Anyway, cleaning up these objects in an efficient way is a challenge for the garbage collector.
  2. Others have extremely long uptimes and require a constant quality of service during that uptime without slow degradation or the risk of sudden deterioration.
  3. Some place tight limits on their user response times (as in the online gaming or betting area) which do not leave much room for extended GC pauses.

从垃圾回收的观点看,java服务器的需求五花八门:

1.    某些高吞吐量的应用处理相当多的请求,并且同时创建了相当数目的对象。某些情况下,中等吞吐量的应用使用不太好的软件框架(这里不太好的是指在对象创建方面无节制?)做相同的事情。无论如何,有效的清理这些创建的对象对于垃圾回收器都是一个挑战。

2.    另外的一些应用在长时间都需要正常运行,因此在正常运行时间,它们需要稳定的高质量的并且没有缓慢变差(比如内存溢出)或者突然变差(比如服务器内部错误)的服务。

3.    某些应用在用户响应时间方面有苛刻的限制(比如在线游戏或者赌博区),因此这些应用不会给GC暂停预留太多的扩展空间。


In many cases you will find a combination of several of these requirements with different priorities. Several of my sample shops and portals were very demanding with respect to point 1, one put extreme priority on point 2 but most applications are not extremely demanding in all of the three aspects at the same time. This leaves you the necessary room to choose the right tradeoffs.

在大多数的应用中你都可以看到上面的几个需求以不同的权重交叉在一起。我的一些商城样例对于观点1要求是很高的,有一个应用对于观点2要求很高,但是大多数的应用同时关注于这3个观点。因此这都需要你进行必要的权衡。


Out-of-the-Box GC Performance

JVMs have improved a lot but still cannot do your job of optimizing the runtime for your application. Default JVM settings have a fourth priority in mind in addition to the 3 mentioned above: minimizing the memory footprint. They need to support millions of users who do not run Java on a server with plenty of memory. This is even true for many e-business products which are most of the time preconfigured to run on developer notebooks instead of production servers. As a consequence, if you run your server with a minimal set of heap and GC parameters like the following

java -Xmx1024m -XX:MaxPermSize=256m -cp Portal.jar my.portal.Portal

尽管JVM已经有了很大的改进,但是还是无法最优化的执行你的应用。除了前面提到的三条观点,默认的JVM的设置还需要关注第四条:最小化内存的使用。上百万的用户,他们没有足够的内存运行他们的程序,在这种情况下,JVM也需要支持(无节制的使用内存会导致这些应用无法使用)。对于许多的电子商务应用,它们预配置运行在开发者的笔记本上,而不是服务器上,在这种情况下上面提到的内存不足的问题就凸显出来了。因此,如果你在服务器上使用如下的最小的堆和GC参数

java -Xmx1024m -XX:MaxPermSize=256m-cp Portal.jar my.portal.Portal


you will almost certainly obtain results which are not good enough for efficient server operation. In the first step, it is good practice to configure not only memory limits but also initial sizes to avoid costly step-by-step increases during server startup. Whenever you know how much memory is enough for your server (which you should try to find out in time) it is best to make initial sizes and limits equal by adding

-Xms1024m -XX:PermSize=256m

你会得到如下结果:对于高效服务器操作,上面的参数是不够好的。首先,在配置内存限制的同时也配置初始化大小(以防止服务器启动的时候,内存从很小的值逐渐增加)是一个好的实践。如果你不知道你的服务器到底需要多少内存(你需要及时的找出这个内存值),最好的方法就是将初始值设置为限制的值(-Xms = -Xmx)。

 -Xms1024m-XX:PermSize=256m


The last basic option frequently found in JVM configurations is a similar setting for the size of the so-called New generation heap:

-XX:NewSize=200m -XX:MaxNewSize=200m

最后一个基本的但是又非常常见的JVM配置是新生代的大小:

-XX:NewSize-200m –XX:MaxNewSize=200m


These and other more sophisticated settings are explained in the next sections but let’s first look how the garbage collector works with them in a load test for one of our portal samples on a rather slow test server:


1. GC behavior of a JVM with little heap tuning (-Xms1024m -Xmx1024m -XX:NewSize=200m -XX:MaxNewSize=200m) over a period of about 25 hours (Click to enlarge).

The blue curve shows the occupied total heap as a function of time, vertical grey lines show the duration of GC pauses.

In addition to these graphs, key indicators of GC operation and performance are shown on the right-hand side. First we have a look at the average amount of garbage created (and collected) in this test run. The value of 30.5 MB/s is marked in yellow because it is a considerable but still moderate garbage creation rate, just about right for an introductory GC tuning example. Other values indicate how well the JVM copes with cleaning up that amount of garbage: 99.55% of that garbage is cleaned up in the New generation and only 0.45% in the Old generation which is rather good and therefore marked green.

Why this is good can be seen from the pauses the GC activity imposes on the JVM (and all the worker threads executing user requests): There are numerous and rather short New generation GC pauses. They occurred on average every 6 seconds and lasted less than 50 milliseconds. Such pauses stopped the JVM during 0.77% of wall time but any single pause is unnoticeable to the users waiting for the server’s response.

On the other hand, Old generation GC pauses stop the JVM during only 0.19% of time. But given the fact that during that time they only clean up 0.45% of the garbage while 99.55% is cleaned up during the 0.77% New generation pause time this shows how extremely inefficient Old generation garbage collection is compared to New generation GC. In addition, Old generation pauses on average occurred less than once per hour but lasted as much as almost 8 seconds on average with a single outlier even reaching 19 seconds. As these are true pauses for all the JVM’s threads processing user requests, they should be as infrequent and short as possible.

From these observations follows the basic tuning goal for generational garbage collection:

Collect as much garbage as possible already in New generation and make Old generation pauses as infrequent and short as possible.

蓝色的曲线表示的是随着时间推移堆的占用情况,灰色的竖线表示的GC暂停的时间。

 

除了这些图表,能够代表GC操作和性能的特征数据列举在右侧。首先,让我们看看垃圾创建(垃圾回收)的平均数数量。30.5MB/s用黄色标注出来,是因为它是一个关键的(有参考意义的)垃圾创建速率,拿这个数据作为GC性能调优的开端刚刚好。另外的一些数据表明了JVM是如何处理这相当数量的垃圾的:99.55%的垃圾是在新生代中清理的,0.45%的垃圾是在老年代中清理的。因此这些数据使用绿色标注。

这是为什么能够看到GC活动导致的JVM的暂停是一件好事:有许多并且时间很短的新生代GC暂停。这种暂停每6s出现一次,并且持续时间不到50ms。这些暂停的时间和占了总时间的0.77%,但是当用户等待来自服务器的响应时,这些暂停基本上都是不易察觉的。

另一方面,老年代暂停时间总和只占总时间的0.19%。但是在这些时间内,只清理了0.45%的垃圾,相比之下新生代使用了0.77%的时间清理了99.55%的垃圾。这也显示出了,相比于新生代垃圾回收器老年代的垃圾回收器是多么的不高效。另外,老年代的暂停每一个小时出现一次,但是其平均持续时间达到了8s,并且异常数据达到了19s(峰值)。由于这些事实,所以暂停应该尽可能的不频繁和尽可能的短。

根据以上的观察事实,可以得出以下基本的调优原则:

尽可能的将垃圾在新生代就回收,使得老年代暂停尽可能的不频繁和短暂。


Basic Ideas of Generational Garbage Collection and Heap Sizing

Start from what you see in a JDK tool like jstat or jvisualvm and its visualgc plugin:


2. The JVM heap structure including the sub-segments of the New generation (right column).

The Java heap is made up of the Perm, Old and New (sometimes called Young) generations. The New generation is further made up of Eden space where objects are created and Survivor spaces S0 and S1 where they are kept later for a limited number of New generation garbage collection cycles. If you want more details, you might want to read Sun/Oracle’s whitepaper “Memory Management in the Java HotSpot Virtual Machine”.

By default the New generation as a whole, and the survivor spaces in particular, are too small to hold objects long enough until most of them are no longer needed and can be collected. Therefore, they are moved to the Old generation prematurely which will then fill up too fast and need to be cleaned up frequently which causes relatively many of the Full GC stops visible in figure 1 above.

Java堆由持久代,老年代,新生代组成。新生代由Eden区和Survivor区组成。Eden用于对象的创建,Survivor包含s1和s2两部分,s1和s2用于保存清理后留下的对象。如果你想了解更多,可以查看这篇文章“Memory Management in the Java HotSpot Virtual Machine”.

注:s1和s2交替使用,t1时刻s1和Eden中保留下来的对象放置到s2中,清理s1。T1时刻s2和Eden中保留下来的对象放置到s1中。循环交替。

 

总的来说,默认情况下新生代(尤其是Survivor区)都是很小的,新生代无法将这些对象存储到它们不再被需要和可以被清理的时候(都不被清理会导致新生代溢出)。因此,它们会被较早地移动到老年代,并且如果新生代的对象不停地移动到老年代并且速度很快,会导致过频繁的Full GC,就像图表1中所示。


Tuning the Generation Sizes

Tuning generational GC means making the New generation as a whole and in particular the survivor spaces larger than they are out-of-the-box. But to do this you also have to consider the GC algorithm used:

The default GC algorithm of a Sun/Oracle JVM running on today’s hardware is called ParallelGC and if it were not the default it could be configured explicitly using the JVM parameter

-XX:+UseParallelGC

This algorithm by default does not work with fixed sizes for Eden and the survivor spaces but uses a policy called “AdaptiveSizePolicy”, which is an adjustment-controlled automatic sizing strategy. As described above, it delivers reasonable behavior for many scenarios including non-server usage but it is not optimal for server operation. To switch it off and start setting your survivor sizes explicitly to fixed values use the following JVM configuration switch:

-XX:-UseAdaptiveSizePolicy

Once this has been done, we can not only further increase the New generation but also effectively set the survivor sizes to a suitable value:

-XX:NewSize=400m -XX:MaxNewSize=400m -XX:SurvivorRatio=6

SurvivorRatio=6” means that each survivor space is 1/6 of Eden size or 1/8 of total New generation size, which in this case means 50 MB while adaptive sizing usually works with much smaller sizes in the range of only a few MB. By repeating the same load test as above with these settings we got the following result:

调优分代的GC意味着将新生代(尤其是Survivor区)设置的比默认的(拆盒即可使用的;开箱即用的)值大一些。但是同时你也需要考虑你所用的GC算法:

 Sun/Oracle虚拟机中的默认GC算法是ParallelGC,如果这个算法不是默认的话,可以通过如下JVM参数明确的指定:

 -XX:+UseParallelGC

默认情况下这个算法没有将Eden和Survivor设置为固定值,而是使用一个叫做”AdaptiveSizePolicy”的参数进行设置,这种机制是可配置比例策略(Adaptive sizing policy)。如上面所描述的,它给许多特殊的情况提供了解决的方法,这些特殊的情况包括非服务器应用(采用固定的大小对于服务器操作不是最合适的)。关闭这个参数或者需要明确地设置Survivor大小为固定值都可以使用下面的参数:

-XX:-UseAdaptvieSizePolicy

一旦这个参数被设置了,我们不能仅仅只增长新生代的大小,还需要进一步地设置Survivor大小为合适的值:

-XX:NewSize=400m-XX:MaxNewSize=400m -XX:SurvivorRatio=6

“SurvivorRatio=6”表示每个Survivor区域的大小是Eden区域大小的1/6(或者说每个Survivor区域是整个新生代大小的1/8),这就意味着当使用可配置比例策略(Adaptive sizingpolicy)时,只有不足50MB大小可用。

在使用上面的参数进行之前的测试时,会得到如下的结果:



3. GC behavior of a JVM with tuned heap sizes (-Xms1024m -Xmx1024m -XX:NewSize=400m -XX:MaxNewSize=400m -XX:-UseAdapativeSizePolicy -XX:SurvivorRatio=6) over a period of 50 hours.

Note that during this test run of doubled duration there was on average almost the same garbage creation rate as before (30.2 compared to 30.5 MB/s). Nevertheless, there were only two Old generation (Full) GC pauses, no more than one in 25 hours. This was achieved by decreasing the rate of garbage ending up in the Old generation (the so-called promotion rate) from 137 kB/s to 6 kB/s or only 0.02% of all garbage. At the same time New generation GC pause duration increased only slightly from an average of 48 to 57 milliseconds and the average interval between pauses rose from 6 to 10 seconds. Altogether, switching off adaptive sizing and fine tuning the heap sizes decreased GC pause time from 0.95% to 0.59% of elapsed time which is an excellent result.

Similar results after tuning can be obtained with the ParNew algorithm as an alternative to the default ParallelGC. It was developed for compatibility with theCMS algorithm mentioned below and can be configured by -XX:+UseParNewGC. It does not use adaptive sizing but works with fixed values for the survivor sizes. Therefore and with the default of SurvivorRatio=8 it usually delivers much better out-of-the-box results for server usage than the ParallelGC.

尽管运行的时间是原来的两倍,但是垃圾的创建速率和之前基本上持平(30.2和30.5)。然而,只有两个老年代(Full)GC暂停,每25小时不足一次。This wasachieved by decreasing the rate of garbage ending up in the Old generation (theso-called promotion rate) from 137 kB/s to 6 kB/s or only 0.02% of all garbage。同时,新生代的GC暂停时间从48ms轻微地增长到了57ms。并且平均间隔时间从6s增长到了10s。总而言之,使用可配置比例的策略同时适度的调节堆的大小,将GC暂停时间总和从运行时间的0.95%降到了0.59%,达到了不错的效果。

将ParNew作为ParallelGC的备选方案,经过调优也可以得到和ParallelGC相似的结果。之所以设计出这个算法是为了和下面提到的CMS算法相兼容。可以通过-XX:+UseParNewGC来设置。

它没有采用可配置比例的策略,而是固定的大小。因此当使用默认的SurvivorRatio=8时,ParNewGC得到的效果要比ParallelGC好。


Getting rid of long Old Generation GC Pauses

The only remaining problem with the latest result above are the long Old generation (Full) GC pauses of about 8 seconds on average. These pauses have been made rare by proper generation tuning but when they occur they still are a nuisance to users because during their duration the JVM is not executing worker threads (stop-the-world GC). In our case, these 8 seconds are caused by an old and slow test server and could be up to a factor of 3 faster on modern hardware. On the other hand, today’s applications typically also use larger heaps than 1 GB and have larger amounts of live objects in the heap than in this example. Web applications nowadays work with heaps up to 64 GB and (at least temporarily) need half of that for their live objects. In such cases, 8 seconds is short for Old generation pauses. They can easily come close to one minute which is totally unacceptable for an interactive web application.

上面的结果中唯一一个遗留的问题就是老年代的平均GC暂停时间达到了8s。虽然经过适当的分代调整,这种长度的暂停极少出现了,但是一旦出现对于用户来说都是不可接受的,因为这段时间JVM无法执行它们的工作线程(stop-the-word GC)。在我们的例子中,这些8s的暂停主要是由于老旧运行缓慢的测试服务器,如果更换为高级的硬件就可以将时间减少为原来的1/3。另一方面,如今典型的应用的堆都大于1GB并且堆中的存活对象比例子中要多得多。Web应用现在的堆可以达到64GB,最少(某些时刻)需要一半的空间存储存活的对象。在这些例子中,8s的暂停对于老年代来说就是短的了。暂停时间可以轻松的逼近1分钟,这对于强调实时交互的应用来说是完全不可接受的。


One option to alleviate the problem is the use of parallel processing for Old generation GC. By default, the ParallelGC and ParNew GC algorithms in Java 6 used multiple GC threads only for young generation collections while Old generation collections were single-threaded. In the case of the ParallelGCcollector this can be changed by adding

-XX:+UseParallelOldGC

Since Java 7 this option is activated by default together with the -XX:+UseParallelGC. However, even with 4 or 8 cpu cores in your system you should not expect much more than an improvement by a factor of 2, often less. In some cases, as in our 8 seconds example above, this can be a welcome improvement but in other more extreme cases it is not enough. The solution is to use low-latency GC algorithms.

缓解这个问题的一个方法是对老年代也采用并行处理。默认情况下,ParralelGC和ParNewGC算法在java6中采用多线程,并且只针对新生代的回收,同时老年代采用的是单线程。如果想采用ParallelGC优化老年代回收,可以通过如下的参数:

-XX:+UseParallelOldGC

Java7之后,默认的设置-XX:+UseParallelG,将新生代和老年代的垃圾回收算法都设置成了ParallelGC。然而,即使在你的系统中使用4核甚至8核的CPU,你也不应该期望会有两倍的性能提升,往往会比两倍要少。在某些情况下,比如我们上面的8s,这种做法(在老年代中使用ParallelGC)都是比较受欢迎,但是在某些严苛的情况下(比强调实时交互的应用)是不够的。解决方法时采用低延时的GC算法。


The Concurrent Mark and Sweep (CMS) Collector

The CMS garbage collector is the first and most-widely used low-latency collector. It has been available since Java 1.4.2 but suffered from instability issues in the beginning. Solving them required quite a few Java 5 releases.

As indicated by its name the CMS collector uses a concurrent approach where most of the work is done by a GC thread that runs concurrently with the worker threads processing user requests. A single normal Old generation stop-the-world GC run is split up into two much shorter stop-the-world pauses plus 5 concurrent phases where worker threads are allowed to go on with their work. Find a more detailed description of the CMS in the article “Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning”.

CMS垃圾回收器是第一个广泛被使用的低延迟垃圾回收器。最早在1.4.2中已经提供了,但是期初有许多不稳定的因素。在java5中通过了几个版本解决了这些问题。

如文中所示CMS的名字,CMS采用了并发的处理方式,额就是GC线程和处理用户请求的工作线程并发执行。单一的老年代stop-the-worldGC被分成了两个短暂的stop-the-world暂停,同时加上了五个并发阶段,在这些阶段工作线程允许和GC线程同时工作。如果想知道更多关于CMS的细节,可以查看文章“Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning”.


The CMS collector is activated by

-XX:+UseConcMarkSweepGC

Applying this to our sample application from above (under higher load than before) led to the following result:

CMS垃圾回收器通过如下方式激活:

-XX:+UseConcMarkSweepGC

将这个垃圾回收器应用到上面的例子中(负载比之前更高),可以看到如下结果:



4. GC behavior of a JVM with tuned heap sizes and CMS (-Xms1200m -Xmx1200m -XX:NewSize=400m -XX:MaxNewSize=400m -XX:SurvivorRatio=6 -XX:+UseConcMarkSweepGC) over a period of 50 hours.

It is visible that the Old generation pauses in the 8 seconds range are now gone. For each Old generation collection (in our case 5 of them in 50 hours) there are now two pauses and all of them are below 1 second.

可以看到老年的8s暂停不见了。对于每次老年代的回收(5h中出现了5次),会有两次暂停,并且每次的时间都低于1s。

By default, the CMS collector uses the ParNew collector to execute the New generation collections. If the ParNew collector runs together with the CMS its pauses tend to be a bit longer than when it runs without it because their cooperation requires some extra effort. In addition to the slightly higher average New generation pause times compared to the previous results, this can be seen from the frequent outliers in New generation pause times which reach up to 0.5 seconds in the test run shown. But they are all short enough to make the CMS/ParNew collector pair a good low-latency option for many applications.

默认情况下, CMS采用ParNew垃圾回收器执行新生代的回收。当在CMS中采用ParNew的时候(CMS和ParNew联合使用),它的暂停时间要比单独采用ParNew要稍长,因为二者的合作会有一定的代价。除了新生代的平均GC暂停有了少许的增加(新生代的频繁的异常值升到了0.5s),其他方面的数据都可以证明CMS/ParNew的组合是许多低延迟应用的不二选择。

A more important disadvantage of the CMS collector is related to the fact that it cannot be started when the Old generation heap is full. Once the Old generation is full, it is too late for the CMS and it must then fall back to the usual stop-the-world strategy (announced by a “concurrent mode failure” in the GC log). To reach its low-latency goal the CMS is started whenever Old generation occupation reaches a threshold set by

-XX:CMSInitiatingOccupancyFraction=80

The CMS is started once 80% of the Old generation is occupied. For our application this reasonable value (which at the same time is also the default) worked well, but if the threshold is set too high a concurrent mode failure can any time bring back the long Old generation GC pauses. If on the other hand it is set too low (below the size of the live part of the heap) the CMS might run concurrently all the time and thus consume the processing power of one CPU entirely. If an application experiences brisk changes in its object creation and heap usage behavior, e.g. by the start of specialized tasks either interactively or by a timed trigger, it can be hard to set this threshold right to avoid both risks at all times.

CMS一个不容忽视的缺点是当老年代满了的时候,CMS无法启动。一旦老年代满了,对于CMS来说就太迟了,CMS必须降级为普通的stop-the-world策略(在日志中可以看到‘concurrent mode failure’)。为了达到CMS低延迟的目的,当老年代的产用率达到一定值的时候就启动CMS。而这个定值通过如下参数设置:

-XX:CMSInitiatingOccupancyFraction=80

一旦老年代的占用率达到了80%,CMS就启动。对于我们的应用,这个值是合适的,并且工作良好。但是如果这个值被设置的过高,就可能导致并发模式失败,将导致很长的老年代GC暂停。但是如果这个值设置的过低,CMS会很频繁的进行并发操作导致CPU资源的枯竭。如果应用在对象方面活动很积极(频繁的创建对象),比如交互式的任务和定时任务,设置一个适当的阈值以防止并发模式失败和频繁的并发清理是很困难的。


The Specter of Fragmentation

The biggest disadvantage of the CMS, however, is related to the fact that it does not compact the Old generation heap. It therefore carries the risk of heap fragmentation and severe operations degradation over time. Two factors increase this risk: a tight Old generation heap and frequent CMS runs. The first factor can be improved by making the Old generation heap larger than what would be needed with the ParallelGC collector (which I did from 1024 to 1200 MB as can be seen in the previous figures). The second factor can be improved by proper generation sizing as described above. We actually saw how infrequent Old generation GC can be made by it. To demonstrate how essential it is to fine tune the generation sizes before switching to the CMS let’s have a look at what might happen if we do not follow this rule and apply the CMS directly to the little tuned heap of figure 1:

然而CMS最大的弊端是:对于老年代的堆,它不进行压缩操作。因此会导致堆碎片过多,导致操作超时。两个方面加大了这个风险:过小的老年代和过于频繁的CMS操作。第一个因素可以通过加大老年代大小来改善(设置为比ParrallelGC需要的更多)。第二个可以通过调节各代之间的比例来解决。我们看到通过采取上面的措施,可以有效地降低CMS执行的频率。为了证明在切换为CMS之前微调各代大小是多么必要的事情,让我们来看看如果在CMS中直接使用默认值而不是遵照上面的规则进行微调会发生什么:



5. GC behavior and sudden degradation by fragmentation when the CMS is applied to the poorly tuned heap of figure 1 (GC indicators on the right from the first 14 hours only).

It is obvious that with these settings the JVM worked well for almost 14 hours under loadtest conditions (in production and with lower load this treacherously benign period may last much longer). Then suddenly there were very long GC pauses which actually stopped the JVM for about half of the remaining time. There were not only attempts to clean up the mess in the Old generation which lasted more than 10 seconds but even New generation GC pauses were in the seconds range because the collector spent a lot of time searching for space in the Old generation when it tried to promote objects from new to Old generation.

The fragmentation risk is the price to pay for the low-latency advantage of the CMS. This risk can be minimized but it is always there and it is hard to predict when it will strike. With proper GC tuning and monitoring, however, the risk can be managed.

很明显,在前14个小时使用默认值,JVM工作的非常好。然后出现了非常长的GC暂停,实际在后面时间的一半JVM都在暂停。不仅仅是老年代花费10多秒清理垃圾,新生代也花费了数秒钟清理垃圾,因为垃圾回收器花费了很长时间在老年代中搜索空间为了将幸存的对象从新生代移动到老年代。

产生大量碎片的危险是采用低延迟CMS的代价。这个危险可以减少,但是它时时存在着,不知什么时候会出现从而影响JVM的工作。但是,通过适当的GC微调和监控,这个危险是可控的。


The Promise of the Garbage First (G1) Collector

The G1 collector was designed to achieve low-latency behavior without the risk of heap fragmentation. As such, it is announced as a long-term replacement for the CMS collector by Oracle. G1 avoids the fragmentation risks because it is a compacting collector. As far as GC pauses are concerned, it does not aim at the shortest possible pauses but at controlling pauses by placing an upper limit on their duration which is maintained in a best-effort approach. Readers can find more details about the G1 collector in the great tutorial “Getting Started with the G1 Garbage Collector”, German readers also in Angelika Langer’s article “Der Garbage-First Garbage Collector (G1) – Übersicht über die Funktionalität”.

Before we examine the current state of the G1 collector by comparing its performance on our sample application with the performance of the classic collectors described above, let me summarize two important pieces of information about the G1 collector:

  • G1 is officially supported by Oracle since Java 7u4, but for G1 you should go for the most recent Java 7 update available. The Oracle GC team is working hard on G1 and improvements in recent Java updates (7u7 to 7u9) have been noticeable. On the other hand, G1 has been in no way production-ready in any Java 6 release and the by far superior Java 7 implementation will probably never be backported.
  • The generation sizing approach I described above is obsolete with G1. Setting generation sizes is in conflict with setting pause time targets and will prevent the G1 collector from doing what it was designed for. With G1 you set the overall memory size using “-Xms” and “-Xmx” and (optionally) a GC pause time target and usually leave all the rest to the G1 collector. It follows a similar approach as the ParallelGC collector’s AdapativeSizingPolicy and adjustment-controls the generation sizes in such a way as to fulfill the pause time target.

G1垃圾回收器的设计目的是在获得低延迟的同时,消除碎片的危险。所以,Oracle宣布将其作为CMS的替代品。G1能消除碎片的危险,是因为它是一个压缩算法。直到目前为止都在关注着GC暂停,G1的关注点不是尽可能短的完成GC,而是设置一个上限用以控制暂停时间。在我们通过样例应用测试G1的性能之前(和上面描述的传统的垃圾回收器相比较),让我们来总结下G1的几个重要的信息:

●  Oracle在java7u4中正式支持G1,但是如果你想使用G1最好使用java7最新的版本。Oracle的GC团队正在努力改进G1,并且在最新的几个版本中(7u7到7u9)G1的改进是显著的。另一方面,在java6的任何一个release版本中,G1都没有做好商业化准备。而且到目前为止,在java7中更高级的G1实现也不向下支持。

  上面提到的generation sizing approach(CMS中提到的调整新生代和老年代的大小,以及新生代中Survivor和Eden比例)在G1中被废弃了。设置各代大小和设置目标暂停时间相冲突,并且和G1的设计目标相冲突。在G1中你可以通过’-Xms’和’-Xmx’设置总的内存,可以选择性的设置目标暂停时间,通常将其它设置交给G1.和ParallelGC的AdaptiveSizingPolicy相类似,G1也有这样的一种机制,通过这种机制来调节各代大小以满足设置的目标暂停时间。


Once these guidelines were followed, the G1 collector delivered the following result out-of-the-box:


6. GC behavior of a JVM with G1 and minimal configuration (-Xms1024m -Xmx1024 -XX:+UseG1GC) over a period of 26 hours.

In this case, we used the default GC pause time target of 200 milliseconds. As can be seen from the indicators this target was almost met on average and the longest GC pauses were as good as with the CMS (figure 4). G1 apparently had very good control of GC pauses because outliers compared to the average duration were rather rare and limited.

On the other hand, average GC pause times were much longer than with the CMS collector (270 vs. 100ms) and because they were even more frequent this also means that accumulated GC pause time, i.e. the overhead for GC itself, was more than 4 times higher than with CMS (6,96 vs. 1.66% of elapsed time).

Just like the CMS the G1 works with GC pauses and with concurrent GC phases. In similar ways as the CMS, it starts concurrent phases based on an occupation threshold. It is visible in figure 6 that the available heap of 1GB is by far not fully used. This is because the G1’s default occupation threshold is much lower than the CMS’ threshold. It is also reported that the G1 in general tends to be satisfied with less heap than the other collectors.

在这种情况下,我们采用默认的目标暂停时间200ms。从数据上可以看到,G1很好的将暂停平均值控制在了目标暂停时间上下,并且最长的GC暂停和CMS一样好。从表面上看,G1很好的控制了GC暂停因为和平均值相比异常值很少,并且不是很离谱(所谓离谱就是指最大峰值和最小估值和平均值之间的差距很大)。

另一方面平均的GC暂停比CMS要稍微长一点(270ms vs 100ms)。GC的频率要比CMS更加频繁,加起来的GC暂停时间总和将近是CMS的4倍(6.96% vs 1.66%)

和CMS一样,G1也有GC暂停和并发GC阶段。和CMS方式相类似,它也是根据一个阈值开始并发阶段,这个阈值是指堆的占用率。在图表6中可以看到1GB的堆还有很多空间可以使用,远远没有达到满负荷。这是因为相比于CMS,G1的这个阈值低得多。也可以看到对于小的堆内存,G1对它们的处理要比其它垃圾回收器要好。

Quantitative Comparison of Garbage Collectors

The following table summarizes some key performance indicators achieved with the 4 most important garbage collectors of Oracle Java 7 running the same load test on the same application but with different levels of load (indicated by the garbage creation rate shown in column 2):

下面的表总结了一些关键的性能特征,这些数据是从Oracle java7的四个重要的垃圾回收器收集的。分别在这四个垃圾回收器上使用相同的应用进行测试,但是采用不同的负载。


Table with Comparison of several Garbage Collectors (Click to enlarge).

All the collectors were run with about 1GB of total heap size; the traditional collectors (ParallelGC, ParNewGC and CMS) in addition used the following heap settings:

-XX:NewSize=400m -XX:MaxNewSize=400m -XX:SurvivorRatio=6

while the G1 collector ran without additional heap size settings and used the default pause time target of 200 milliseconds which can also be set explicitly by

-XX:MaxGCPauseMillis=200

所有的垃圾回收器都使用1GB的堆。传统的垃圾回收器(ParallelGC,ParNewGC,CMS)采用如下的设置:

-XX:NewSize=400m-XX:MaxNewSize=400m -XX:SurvivorRatio=6

然而在G1中没有传统的堆大小设置参数,使用默认的目标暂停时间,当然也可以通过如下参数明确地设置目标暂停时间:

-XX:MaxGCPauseMillis=200

As can be seen from this table the traditional collectors execute New generation collections (column 3) in similar time. This is true for the ParallelGC and the ParNewGC collectors but also for the CMS which in fact uses the ParNewGC to execute New generation collections. Promotion from new to Old generation, however, requires some coordination between ParNewGC and CMS during New generation GC pauses. This coordination creates an extra cost which translates into slightly longer New generation pauses for the CMS.

从上面的表中可以看出传统的垃圾回收器在新生代的垃圾回收时间是相似的(第三列57,62,78,96ms)。对于ParallelGC和ParNewGC的确是这样的,当然对于CMS也是如此,因为CMS采用ParNewGC执行新生代的收集。将新生代的数据移动到老年代的时候,CMS和ParNewGC之间需要额外的合作。而这个合作需要额外的代价,这个代价表现出来就是CMS的新生代暂停会变长。

Column 7 summarizes the time lost in GC pauses as percentage of elapsed time. This number is a good measure of GC overhead because concurrent GC time (last column) and the CPU usage overhead it implies may be neglected. With heap sizes tuned as described above and thus with rare Old generation collections, column 7 is largely dominated by New generation pause time. New generation pause time is the product of New generation pause duration (column 3) and New generation pause frequency. New generation pause frequency is a function of the New generation size which was the same (400 MB) for all of the traditional collectors. Therefore and for these collectors column 7 more or less mirrors column 3 (for similar load).

第7列表示GC暂停时间在总的运行时间中的比例。这是一个衡量GC开销很好的数据,因为并发GC时间和CPU利用率可能会被忽略(什么意思??)。通过上面的方法调整堆的大小,从而减少了老年代的垃圾回收,因此第7列主要是新生代的暂停。

The benefit of the CMS collector in this picture is evident from column 6: it trades much (one order of magnitude) shorter Old generation GC pauses against a slightly higher overhead. For many real world applications this is a very good deal.

CMS的好处可以从第6列看出来:在高开销的同时,降低老年代的暂停,这对于真实世界中的应用来说是不错的。

How well does the G1 collector compete for our application? Column 6 (and 5) tells us that it successfully competes with the CMS in reducing Old generation GC pauses. But column 7 indicates that it pays a rather high price to achieve this: GC overhead was 7% compared to 1.6% for the CMS under the same load.

G1跟其它垃圾回收器相比较如何呢?第6列(和第5列)告诉我们它成功的减少了老年代的GC暂停。但是第7列告诉我们获得上面的提升是要付出代价的:在相同的负载下,G1的GC开销(所谓的开销是指GC时间在总的运行时间中的比例)从CMD的1.6%增长到了7%。

I will examine the conditions under which this higher overhead occurs as well as the strengths and weaknesses of the G1 compared to other collectors (in particular to the CMS collector) in a follow-up to this article as it is a vast and newsworthy subject in its own right.

在下篇文章中会对G1相对于CMS的加强和减弱做详细的分析。

Summary and Outlook

For all the classic Java GC algorithms (SerialGC, ParallelGC, ParNewGC and CMS) generation sizing is an essential tuning and fine tuning procedure which in many real-world applications is not practiced sufficiently. The consequences are suboptimal application performance and the risk of operations degradation (loss of performance and even application standstill for extended periods of time if it is not well monitored).

Generation sizing can improve application performance noticeably and reduce the occurrence of long GC pauses to a minimum. Elimination of long GC pauses, however, requires the usage of a low-latency collector. The preferred and most proven low-latency collector has been (and still is as of today) the CMS collector which in many cases does what is needed and, with proper tuning, also provides long-term stability in spite of its inherent heap fragmentation risk. The intended replacement, the G1 collector, is now (as of Java 7u9) a supported and usable alternative but there is still room for improvement. For many applications, it will deliver acceptable but not yet better results than the CMS collector. The details of its strengths and weaknesses deserve closer examination.


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值