How a garbage collector works----From Thinking in Java the 4th edition

If you come from a programming language whereallocating objects on the heap is expensive, you may naturally assume thatJava’s scheme of allocating everything (except primitives) on the heap is alsoexpensive. However, it turns out that the garbage collector can have asignificant impact on increasing the speed of object creation. Thismight sound a bit odd at first—that storage release affects storage allocation—butit’s the way some JVMs work, and it meansthat allocating storage for heap objects in Java can be nearly as fast ascreating storage on the stack in other languages.

For example, you can think ofthe C++ heap as a yard where each object stakes out its own piece of turf. Thisreal estate can become abandoned sometime later and must be reused. In someJVMs, the Java heap is quite different; it’s more like a conveyor belt thatmoves forward every time you allocate a new object. This means that objectstorage allocation is remarkably rapid. The “heap pointer” is simply movedforward into virgin territory, so it’s effectively the same as C++’s stackallocation. (Of course, there’s a little extra overhead for bookkeeping, butit’s nothing like searching for storage.)

You might observe that the heapisn’t in fact a conveyor belt, and if you treat it that way, you’ll startpaging memory—moving it on and off disk, so that you can appear to have morememory than you actually do. Paging significantly impacts performance.Eventually, after you create enough objects, you’ll run out of memory. Thetrick is that the garbage collector steps in, and while it collects the garbageit compacts all the objects in the heap so that you’ve effectively moved the“heap pointer” closer to the beginning of the conveyor belt and farther awayfrom a page fault. The garbage collector rearranges things and makes itpossible for the high-speed, infinite-free-heap model to be used whileallocating storage.

To understand garbage collectionin Java, it’s helpful learn how garbage-collection schemes work in othersystems. A simple but slow garbage-collection technique is called referencecounting. This means that each object contains a reference counter, andevery time a reference is attached to that object, the reference count isincreased. Every time a reference goes out of scope or is set to null,the reference count is decreased. Thus, managing reference counts is a smallbut constant overhead that happens throughout the lifetime of your program. Thegarbage collector moves through the entire list of objects, and when it findsone with a reference count of zero it releases that storage (however, referencecounting schemes often release an object as soon as the count goes to zero).The one drawback is that if objects circularly refer to each other they canhave nonzero reference counts while still being garbage. Locating suchself-referential groups requires significant extra work for the garbage collector.Reference counting is commonly used to explain one kind of garbage collection,but it doesn’t seem to be used in any JVM implementations.

In faster schemes, garbagecollection is not based on reference counting. Instead, it is based on the ideathat any non-dead object must ultimately be traceable back to a reference thatlives either on the stack or in static storage. The chain might go throughseveral layers of objects. Thus, if you start in the stack and in the staticstorage area and walk through all the references, you’ll find all the liveobjects. For each reference that you find, you must trace into the object thatit points to and then follow all the references in that object, tracinginto the objects they point to, etc., until you’ve moved through the entire Webthat originated with the reference on the stack or in static storage. Eachobject that you move through must still be alive. Note that there is no problemwith detached self-referential groups—these are simply not found, and are thereforeautomatically garbage.

In the approach described here,the JVM uses an adaptive garbage-collection scheme, and what it doeswith the live objects that it locates depends on the variant currently beingused. One of these variants is stop-and-copy. This means that—forreasons that will become apparent—the program is first stopped (this is not abackground collection scheme). Then, each live object is copied from one heapto another, leaving behind all the garbage. In addition, as the objects arecopied into the new heap, they are packed end-to-end, thus compacting the newheap (and allowing new storage to simply be reeled off the end as previouslydescribed).

Of course, when an object ismoved from one place to another, all references that point at the object mustbe changed. The reference that goes from the heap or the static storage area tothe object can be changed right away, but there can be other referencespointing to this object that will be encountered later during the “walk.” Theseare fixed up as they are found (you could imagine a table that maps oldaddresses to new ones).

There are two issues that makethese so-called “copy collectors” inefficient. The first is the idea that youhave two heaps and you slosh all the memory back and forth between these twoseparate heaps, maintaining twice as much memory as you actually need. SomeJVMs deal with this by allocating the heap in chunks as needed and simplycopying from one chunk to another.

The second issue is the copyingprocess itself. Once your program becomes stable, it might be generating littleor no garbage. Despite that, a copy collector will still copy all the memoryfrom one place to another, which is wasteful. To prevent this, some JVMs detectthat no new garbage is being generated and switch to a different scheme (thisis the “adaptive” part). This other scheme is called mark-and-sweep, andit’s what earlier versions of Sun’s JVM used all the time. For general use,mark-and-sweep is fairly slow, but when you know you’re generating little or nogarbage, it’s fast.

Mark-and-sweep follows the samelogic of starting from the stack and static storage, and tracing through allthe references to find live objects. However, each time it finds a live object,that object is marked by setting a flag in it, but the object isn’t collectedyet. Only when the marking process is finished does the sweep occur. During thesweep, the dead objects are released. However, no copying happens, so if thecollector chooses to compact a fragmented heap, it does so by shuffling objectsaround.

“Stop-and-copy” refers to theidea that this type of garbage collection is not done in the background;instead, the program is stopped while the garbage collection occurs. In the Sunliterature you’ll find many references to garbage collection as a low-prioritybackground process, but it turns out that the garbage collection was notimplemented that way in earlier versions of the Sun JVM. Instead, the Sungarbage collector stopped the program when memory got low. Mark-and-sweep alsorequires that the program be stopped.

As previously mentioned, in theJVM described here memory is allocated in big blocks. If you allocate a largeobject, it gets its own block. Strict stop-and-copy requires copying every liveobject from the source heap to a new heap before you can free the old one,which translates to lots of memory. With blocks, the garbage collection cantypically copy objects to dead blocks as it collects. Each block has a generationcount to keep track of whether it’s alive. In the normal case, only theblocks created since the last garbage collection are compacted; all otherblocks get their generation count bumped if they have been referenced fromsomewhere. This handles the normal case of lots of short-lived temporaryobjects. Periodically, a full sweep is made—large objects are still not copied(they just get their generation count bumped), and blocks containing smallobjects are copied and compacted. The JVM monitors the efficiency of garbage collectionand if it becomes a waste of time because all objects are long-lived, then itswitches to mark-andsweep. Similarly, the JVM keeps track of how successfulmark-and-sweep is, and if the heap starts to become fragmented, it switchesback to stop-and-copy. This is where the “adaptive” part comes in, so you endup with a mouthful: “Adaptive generational stop-and-copy mark-andsweep.”

There are a number of additionalspeedups possible in a JVM. An especially important one involves the operationof the loader and what is called a just-in-time (JIT) compiler. A JITcompiler partially or fully converts a program into native machine code so thatit doesn’t need to be interpreted by the JVM and thus runs much faster. When aclass must be loaded (typically, the first time you want to create an object ofthat class), the .class file is located, and the bytecodes for thatclass are brought into memory. At this point, one approach is to simply JITcompile all the code, but this has two drawbacks: It takes a little more time,which, compounded throughout the life of the program, can add up; and itincreases the size of the executable (bytecodes are significantly more compactthan expanded JIT code), and this might cause paging, which definitely slowsdown a program. An alternative approach is lazy evaluation, which meansthat the code is not JIT compiled until necessary. Thus, code that never getsexecuted might never be JIT compiled. The Java HotSpot technologies in recentJDKs take a similar approach by increasingly optimizing a piece of code eachtime it is executed, so the more the code is executed, the faster it gets.


  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值