java garbage collectors

最新推荐文章于 2024-09-15 09:47:04 发布

weixin_33770878

最新推荐文章于 2024-09-15 09:47:04 发布

阅读量138

点赞数

文章标签： java 开发工具 python

原文链接：https://my.oschina.net/u/1778261/blog/311278

版权

2019独角兽企业重金招聘Python工程师标准>>>

Though the details differ somewhat, all garbage collectors work by splitting the heap into different generations. These are called the old (or tenured) generation, and theyoung generation. The young generation is further divided into sections known as eden and the survivor spaces (though sometimes, eden is incorrectly used to refer to the entire young generation).

Garbage collector is designed to take advantage of the fact that many (and sometimes most) objects are only used temporarily. This is where the generational design comes in. Objects are first allocated in the young generation, which is some subset of the entire heap. When the young generation fills up, the garbage collector will stop all the application threads and empty out the young generation. Objects that are no longer in use are discarded, and objects that are still in use are moved elsewhere. This operation is called a minor GC.

There are two performance advantages to this design. First, because the young generation is only a portion of the entire heap, processing it is faster than processing the entire heap. This means that the application threads are stopped for a much shorter period of time than if the entire heap were processed at once. You probably see a trade-off there, since it also means that the application threads are stopped more frequently than they would be if the JVM waited to perform GC until the entire heap were full; that trade- off will be explored in more detail later in this chapter. For now, though, it is almost always a big advantage to have the shorter pauses even though they will be more frequent.

The second advantage arises from the way objects are allocated in the young generation. Objects are allocated in eden (which comprises the vast majority of the young generation). When the young generation is cleared during a collection, all objects in eden are either moved or discarded: all live objects are moved either to one of the survivor spaces or to the old generation. Since all objects are moved, the young generation is automat ically compacted when it is collected.

All GC algorithms have stop-the-world pauses during collection of the young generation.

As objects are moved to the old generation, eventually it too will fill up, and the JVM will need to find any objects within the old generation that are no longer in use and discard them. This is where GC algorithms have their biggest differences. The simpler algorithms stop all application threads, find the unused objects and free their memory, and then compact the heap. This process is called a full GC, and it generally causes a long pause for the application threads.

On the other hand, it is possible—though more computationally complex—to find unused objects while application threads are running; CMS and G1 both take that approach. Because the phase where they scan for unused objects can occur without stopping application threads, CMS and G1 are called concurrent collectors. They are also called low-pause (and sometimes—incorrectly—pauseless) collectors, since they minimize the need to stop all the application threads. Concurrent collectors also take different approaches to compacting the old generation.

When using the CMS or G1 collector, an application will typically experience fewer (and much shorter) pauses. The trade-off is that the application will use more CPU overall.

CMS and G1 may also perform a long, full GC pause (and avoiding those is one of the key factors to consider when tuning those algorithms).

As you consider which garbage collector is appropriate for your situation, think about the overall performance goals that must be met. There are trade-offs in every situa‐ tion. In an application (such as a Java EE server) measuring the response time of indi‐ vidual requests, consider these points:

The individual requests will be impacted by pause times—and more importantly by long pause times for full GCs. If minimizing the effect of pauses on response times is the goal, a concurrent collector will be more appropriate.
If the average response time is more important than the outliers (i.e., the 90th% response time), the throughput collector will usually yield better results.
The benefit of avoiding long pause times with a concurrent collector comes at the expense of extra CPU usage.
Similarly, the choice of garbage collector in a batch application is guided by the following trade-off:
- If enough CPU is available, using the concurrent collector to avoid full GC pauses will allow the job to finish faster.
- If CPU is limited, then the extra CPU consumption of the concurrent collector will cause the batch job to take more time.
  Quick Summary

All GC algorithms divide the heap into old and young genera‐ tions.
All GC algorithms employ a stop-the-world approach to clear‐ ing objects from the young generation, which is usually a very quick operation.
GC Algorithms

The JVM provides four different algorithms for performing GC.

The serial garbage collector

The serial collector uses a single thread to process the heap. It will stop all application threads as the heap is processed (for either a minor or full GC). During a full GC, it willfully compact the old generation.

The serial collector is enabled by using the -XX:+UseSerialGC flag (though usually it is the default in those cases where it might be used). Note that unlike with most JVM flags,the serial collector is not disabled by changing the plus sign to a minus sign (i.e., by specifying -XX:-UseSerialGC). On systems where the serial collector is the default, it is disabled by specifying a different GC algorithm.

The throughput collector

The throughput collector uses multiple threads to collect the young generation, which makes minor GCs much faster than when the serial collector is used. The throughput collector can use multiple threads to process the old generation as well. That is the default behavior in JDK 7u4 and later releases, and that behavior can be enabled in earlier JDK 7 JVMs by specifying the -XX:+UseParallelOldGC flag. Because it uses multiple threads, the throughput collector is often called the parallel collector.

The throughput collector stops all application threads during both minor and full GCs,and it fully compacts the old generation during a full GC. Since it is the default in most situations where it would be used, it needn’t be explicitly enabled. To enable it where necessary, use the flags -XX:+UseParallelGC -XX:+UseParallelOldGC.

The CMS collector

The CMS collector is designed to eliminate the long pauses associated with the full GC cycles of the throughput and serial collectors. CMS stops all application threads during a minor GC, which it also performs with multiple threads. Notably, though, CMS uses a different algorithm to collect the young generation (-XX:+UseParNewGC) than the throughput collector uses (-XX:+UseParallelGC).

Instead of stopping the application threads during a full GC, CMS uses one or more background threads to periodically scan through the old generation and discard unused objects. This makes CMS a low-pause collector: application threads are only paused during minor collections, and for some very short periods of time at certain points as the background threads scan the old generation. The overall amount of time that application threads are stopped is much less than with the throughput collector.

The trade-off here comes with increased CPU usage: there must be adequate CPU available for the background GC thread(s) to scan the heap at the same time the application threads are running. In addition, the background threads do not perform any compaction, which means that the heap can become fragmented. If the CMS background threads don’t get enough CPU to complete their tasks, or if the heap becomes too fragmented to allocate an object, CMS reverts to the behavior of the serial collector: it stops all application threads in order to clean and compact the old generation using a single thread. Then it begins its concurrent, background processing again (until, possibly, the next time the heap becomes too fragmented).

CMS is enabled by specifying the flags -XX:+UseConcMarkSweepGC -XX:+UseParNewGC(both of which are false by default).

The G1 collector

The G1 (or Garbage First) collector is designed to process large heaps (greater than about 4 GB) with minimal pauses. It divides the heap into a number of regions, but it is still a generational collector. Some number of those regions comprise the young generation, and the young generation is still collected by stopping all application threads and moving all objects that are alive into the old generation or the survivor spaces. As in the other algorithms, this occurs using multiple threads.

G1 is a concurrent collector: the old generation is processed by background threads thatdon’t need to stop the application threads to perform most of their work. Because the old generation is divided into regions, G1 can clean up objects from the old generation by copying from one region into another, which means that it (at least partially) compacts the heap during normal processing. Hence, a G1 heap is much less likely to be subject to fragmentation—though that is still possible.

Like CMS, the trade-off for avoiding the full GC cycles is CPU time: the multiple background threads must have CPU cycles available at the same time the application threads are running. G1 is enabled by specifying the flag -XX:+UseG1GC (which by default isfalse).

Causing and Disabling Explicit Garbage Collection

GC is typically caused when the JVM decides GC is necessary: a minor GC will be triggered when the new generation is full, a full GC will be triggered when the old generation is full, or a concurrent GC (if applicable) will be triggered when the heap starts to fill up.

Java provides a mechanism for applications to force a GC to occur: the System.gc()method. Calling that method is almost always a bad idea. This call always triggers a full GC (even if the JVM is running with CMS or G1), so application threads will be stopped for a relatively long period of time. And calling this method will not make the application any more efficient; it will cause a GC to occur sooner than might have happened other‐ wise, but that is really just shifting the performance impact.

There are exceptions to every rule, particularly when doing performance monitoring or benchmarking. For small benchmarks that run a bunch of code to properly warm up the JVM, forcing a GC before the measurement cycle may make sense. Similarly when doing heap analysis, it is usually a good idea to force a full GC before taking the heap dump. Most techniques to obtain a heap dump will perform a full GC anyway, but there are also other ways you can force a full GC: you can execute jcmd <process id> GC.run, or you can connect to the JVM using jconsole and click the Perform GC button in the Memory panel.

Another exception is RMI, which calls System.gc() every hour as part of its distributed garbage collector. That timing can be changed by setting a different value for these two system properties: -Dsun.rmi.dgc.server.gcInterval=N and -Dsun.rmi.dgc.cli ent.gcInterval=N. The values for N are in milliseconds, and the default value in Java 7 (which is changed from earlier releases) is 3600000 (one hour).

If you end up running third-party code that incorrectly calls the System.gc() method, those GCs can be prevented entirely by including -XX:+DisableExplicitGC in the JVM arguments; by default that flag is false.

Basic GC Tuning

Sizing the Heap

The size of the heap is controlled by two values: an initial value (specified with -XmsN) and a maximum value (-XmxN). The defaults for these vary depending on the operating system, the amount of system RAM, and the JVM in use. The defaults can be affected by other flags on the command line as well; heap sizing is one of the JVM’s core ergo‐ nomic tunings.

Sizing the Generations

The command-line flags to tune the generation sizes all adjust the size of the young generation; the old generation gets everything that is left over. There are a variety of flags that can be used to size the young generation:

-XX:NewRatio=N
Set the ratio of the young generation to the old generation.

-XX:NewSize=N
Set the initial size of the young generation.

-XX:MaxNewSize=N

Set the maximum size of the young generation.

-XmnN
Shorthand for setting both NewSize and MaxNewSize to the same value.

The young generation is first sized by the NewRatio, which has a default value of 2. Parameters that affect the sizing of heap spaces are generally specified as ratios; the value is used in an equation to determine the percentage of space affected. The NewRatio value is used in this formula:

    Initial Young Gen Size = Initial Heap Size / (1 + NewRatio)

Plugging in the initial size of the heap and the NewRatio yields the value that becomes the setting for the young generation. By default, then, the young generation starts out at 33% of the initial heap size.

Alternately, the size of the young generation can be set explicitly by specifying the NewSize flag. If that option is set, it will take precedence over the value calculated from the NewRatio. There is no default for this flag (though PrintFlagsFinal will report a value of 1 MB). If the flag isn’t set, the initial young generation size will be based on theNewRatio calculation.

As the heap expands, the young generation size will expand as well, up to the maximum size specified by the MaxNewSize flag. By default, that maximum is also set using theNewRatio value, though it is based on the maximum (rather than initial) heap size.

Tuning the young generation by specifying a range for its minimum and maximum sizes ends up being fairly difficult. When a heap size is fixed (by setting -Xms equal to -Xmx), it is usually preferable to use -Xmn to specify a fixed size for the young generation as well. If an application needs a dynamically sized heap and requires a larger (or smaller) young generation, then focus on setting the NewRatio value.

Sizing Permgen and Metaspace

When the JVM loads classes, it must keep track of certain metadata about those classes. From the perspective of an end user, this is all just bookkeeping information. This data is held in a separate heap space. In Java 7, this is called the permgen (or permanent generation), and in Java 8, this is called the metaspace.

Permgen and metaspace are not exactly the same thing. In Java 7, permgen contains some miscellaneous objects that are unrelated to class data; these are moved into the regular heap in Java 8. Java 8 also fundamentally changes the kind of metadata that is held in this special region—though since end users don’t know what that data is in the first place, that change doesn’t really affect us. As end users, all we need to know is that permgen/metaspace holds a bunch of class-related data, and that there are certain cir‐ cumstances where the size of that region needs to be tuned.

Note that permgen/metaspace does not hold the actual instance of the class (the Classobjects), nor reflection objects (e.g., Method objects); those are held in the regular heap. Information in permgen/metaspace is really only used by the compiler and JVM runtime, and the data it holds is referred to as class metadata.

There isn’t a good way to calculate in advance how much space a particular program needs for its permgen/metaspace. The size will be proportional to the number of classes it uses, so bigger applications will need bigger areas. One of the advantages to phasing out permgen is that the metaspace rarely needs to be sized—because (unlike permgen) metaspace will by default use as much space as it needs.

These memory regions behave just like a separate instance of the regular heap. They are sized dynamically based on an initial size and will increase as needed to a maximum size. For permgen, the sizes are specified via these flags: -XX:PermSize=N and-XX:MaxPermSize=N. Metaspace is sized with these flags: -XX:MetaspaceSize=N and-XX:MaxMetaspaceSize=N.

Resizing these regions requires a full GC, so it is an expensive operation. If there are a lot of full GCs during the startup of a program (as it is loading classes), it is often because permgen or metaspace is being resized, so increasing the initial size is a good idea to improve startup in that case. Java 7 applications that define a lot of classes should increase the maximum size as well. Application servers, for example, typically specify a maximum permgen size of 128 MB, 192 MB, or more.

Contrary to its name, data stored in permgen is not permanent (metaspace, then, is a much better name). In particular, classes can be eligible for GC just like anything else. This is a very common occurrence in an application server, which creates new class‐ loaders every time an application is deployed (or redeployed). The old classloaders are then unreferenced and eligible for GC, as are any classes that they defined. In a long development cycle in an application server, it is not unusual to see full GCs triggered during deployment: permgen or metaspace has filled up with the new class information, but the old class metadata can be freed.

Heap dumps (see Chapter 7) can be used to diagnose what classloaders exist, which in turn can help determine if a classloader leak is filling up permgen (or metaspace). Otherwise, jmap can be used with the argument -permstat (in Java 7) or -clstats (in Java 8) to print out information about the classloaders. That particular command isn’t the most stable, though, and it cannot be recommended.

‍‍‍‍‍Controlling Parallelism‍‍‍‍ ‍

All GC algorithms except the serial collector use multiple threads. The number of these threads is controlled by the -XX:ParallelGCThreads=N flag. The value of this flag affects the number of threads used for the following operations:

Collection of the young generation when using -XX:+UseParallelGC
Collection of the old generation when using -XX:+UseParallelOldGC
Collection of the young generation when using -XX:+UseParNewGC
Collection of the young generation when using -XX:+UseG1GC
Stop-the-world phases of CMS (though not full GCs)
Stop-the-world phases of G1 (though not full GCs)

Because these GC operations stop the application threads from executing, the JVM attempts to use as many CPU resources as it can in order to minimize the pause time. By default, that means the JVM will run one thread for each CPU on a machine, up to eight. Once that threshold has been reached, the JVM only adds a new thread for every five-eighths of a CPU. So the total number of threads (where N is the number of CPUs) on a machine with more than eight CPUs is:

    ParallelGCThreads = 8 + ((N - 8) * 5 / 8)

There are times when this number is too large. An application using a small heap (say, 1 GB) on a machine with eight CPUs will be slightly more efficient with four or six threads dividing up that heap. On a 128-CPU machine, 83 GC threads is too many for all but the largest heaps.

Additionally, if more than one JVM is running on the machine, it is a good idea to limit the total number of GC threads among all JVMs. When they run, the GC threads are quite efficient and each will consume 100% of a single CPU (this is why the average CPU usage for the throughput collector was higher than expected in previous examples). In machines with eight or fewer CPUs, GC will consume 100% of the CPU on the machine. On machines with more CPUs and multiple JVMs, there will still be too many GC threads running in parallel.

Take the example of a 16-CPU machine running four JVMs; each JVM will have by default 13 GC threads. If all four JVMs execute GC at the same time, the machine will have 52 CPU-hungry threads contending for CPU time. That results in a fair amount of contention; it will be more efficient if each JVM is limited to four GC threads. Even though it may be unlikely for all four JVMs to perform a GC operation at the same time, one JVM executing GC with 13 threads means that the application threads in the re‐ maining JVMs now have to compete for CPU resources on a machine where 13 of 16 CPUs are 100% busy executing GC tasks. Giving each JVM four GC threads provides a better balance in this case.

Note that this flag does not set the number of background threads used by CMS or G1 (though it does affect that). Details on that are given in the next chapter.

Adaptive Sizing

The sizes of the heap, the generations, and the survivor spaces can vary during executionas the JVM attempts to find the optimal performance according to its policies and tunings.

This is a best-effort solution, and it relies on past performance: the assumption is that future GC cycles will look similar to the GC cycles in the recent past. That turns out to be a reasonable assumption for many workloads, and even if the allocation rate suddenly changes, the JVM will readapt its sizes based on the new information.

Adaptive sizing provides benefits in two important ways. First, it means that small ap‐ plications don’t need to worry about overspecifying the size of their heap. Consider the administrative command-line programs used to adjust the operations of things like an application server—those programs are usually very short-lived and use minimal mem‐ ory resources. These applications will use 16 (or 64) MB of heap even though the default heap could potentially grow to 1 GB. Because of adaptive sizing, applications like that don’t need to be specifically tuned; the platform defaults ensure that they will not use a large amount of memory.

Second, it means that many applications don’t really need to worry about tuning their heap size at all—or if they need a larger heap than the platform default, they can just specify that larger heap and forget about the other details. The JVM can autotune the heap and generation sizes to use an optimal amount of memory given the GC algorithm’s performance goals. Adaptive sizing is what allows that autotuning to work.

Still, doing the adjustment of the sizes takes a small amount of time—which occurs for the most part during a GC pause. If you have taken the time to finely tune GC parameters and the size constraints of the application’s heap, adaptive sizing can be disabled. Disabling adaptive sizing is also useful for applications that go through markedly different phases, if you want to optimally tune GC for one of those phases.

At a global level, adaptive sizing is disabled by turning off the -XX:-UseAdaptiveSizePolicy flag (which is true by default). With the exception of the survivor spaces (which are examined in detail in the next chapter), adaptive sizing is also effectively turned off if the minimum and maximum heap sizes are set to the same value, and the initial and maximum sizes of the new generation are set to the same value.

To see how the JVM is resizing the spaces in an application, set the -XX:+PrintAdapti veSize Policy flag. When a GC is performed, the GC log will contain information detailing how the various generations were resized during a collection.

GC Tools

Since GC is central to the performance of Java, there are many tools that monitor its performance.

The best way to see what effect GC has on the performance of an application is to become familiar with the GC log, which is a record of every GC operation during the program’s execution.

The details in the GC log vary depending on the GC algorithm, but the basic management of the log is always the same. That management is covered here, and more details on the contents of the log are given in the algorithm-specific tuning sections in the next chapter.

There are multiple ways to enable the GC log: specifying either of the flags -verbose:gc or -XX:+PrintGC will create a simple GC log (the flags are aliases for each other, and by default the log is disabled). The -XX:+PrintGCDetails flag will create a log with much more information. This flag is recommended (it is also false by default); it is often too difficult to diagnose what is happening with GC using only the simple log. In conjunction with the detailed log, it is recommended to include -XX:+PrintGCTimeStamps or -XX:+PrintGCDateStamps, so that the time between GC operations can be determined. The difference in those two arguments is that the timestamps are relative to 0 (based on when the JVM starts), while the date stamps are an actual date string. That makes the date stamps ever-so-slightly less efficient as the dates are formatted, though it is an infrequent enough operation that its effect is unlikely to be noticed.

The GC log is written to standard output, though that location can be changed with the -Xloggc:filename flag. Using -Xloggc automatically enables the simple GC log unless PrintGCDetails has also been enabled. The amount of data that is kept in the GC log can be limited using log rotation; this is quite useful for a long-running server that might otherwise fill up its disk with logs over several months. Logfile rotation is controlled with these flags: -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=N -XX:GCLogFileSize=N. By default, UseGCLogFileRotation is disabled. When that flag is enabled, the default number of files is 0 (meaning unlimited), and the default logfile size is 0 (meaning unlimited). Hence, values must be specified for all these options in order for log rotation to work as expected. Note that a logfile size will be rounded up to 8 KB for values less than that.

You can parse and peruse the GC logfiles on your own, although there are several tools that do that. One of these is GC Histogram. GC Histogram reads in a GC log and provides several charts and tables about the data in that log.

For a scriptable solution, jstat is the tool of choice. jstat provides nine options to print different information about the heap; jstat -options will provide the full list. One useful option is -gcutil, which displays the time spent in GC as well as the percentage of each GC area that is currently filled. Other options to jstat will display the GC sizes in terms of KB.

Remember that jstat takes an optional argument—the number of milliseconds to repeat the command—so it can monitor over time the effect of GC in an application. Here is some sample output repeated every second:

% jstat -gcutil process_id 1000

S0 S1 E O P YGC YGCTFGCFGCT GCT

     51.71   0.00  99.12  60.00  99.93     98    1.985     8    2.397    4.382
      0.00  42.08   5.55  60.98  99.93     99    2.016     8    2.397    4.413
      0.00  42.08   6.32  60.98  99.93     99    2.016     8    2.397    4.413
      0.00  42.08  68.06  60.98  99.93     99    2.016     8    2.397    4.413
      0.00  42.08  82.27  60.98  99.93     99    2.016     8    2.397    4.413
      0.00  42.08  96.67  60.98  99.93     99    2.016     8    2.397    4.413
      0.00  42.08  99.30  60.98  99.93     99    2.016     8    2.397    4.413

     44.54   0.00   1.38  60.98  99.93    100    2.042     8    2.397    4.439
     44.54   0.00   1.91  60.98  99.93    100    2.042     8    2.397    4.439

When monitoring started, the program had already performed 98 collections of the young generation (YGC), which took a total of 1.985 seconds (YGCT). It had also per‐ formed eight full GCs (FGC) requiring 2.397 seconds (FGCT); hence the total time in GC (GCT) was 4.382 seconds.

All three sections of the young generation are displayed here: the two survivor spaces (S0 and S1) and eden (E). The monitoring started just as eden was filling up (99.12% full), so in the next second there was a young collection: eden reduced to 5.55% full, the survivor spaces switched places, and a small amount of memory was promoted to the old generation (O), which increased to using 60.98% of its space. As is typical, there is little or no change in the permanent generation (P) since all necessary classes have already been loaded by the application.

If you’ve forgotten to enable GC logging, this is a good substitute to watch how GC operates over time.

转载于:https://my.oschina.net/u/1778261/blog/311278