think in java笔记:How a garbage collector works

最新推荐文章于 2024-09-25 10:59:43 发布

sutonline

最新推荐文章于 2024-09-25 10:59:43 发布

阅读量762

点赞数

分类专栏： JAVA Think in Java笔记文章标签： java heap

JAVA 同时被 2 个专栏收录

198 篇文章 6 订阅

订阅专栏

Think in Java笔记

23 篇文章 0 订阅

订阅专栏

think in java笔记:How a garbage collector works

If you come from a programming language where allocating objects on the heap is expensive,
you may naturally assume that Java’s scheme of allocating everything (except primitives) on the heap is also expensive. However, it turns out that the garbage collector can have a significant impact on increasing the speed of object creation. This might sound a bit odd at first—that storage release affects storage allocation—but it’s the way some JVMs work, and it means that allocating storage for heap objects in Java can be nearly as fast as creating storage
on the stack in other languages.

译：如果你有以前用的语言是在堆内存分配是很重量级的，那么你可能自然也把java分配方式(除了基本类型)也是笨重的。然而要说的是，这个其实是垃圾回收期可以很大影响对象创建的速度。这可能听起来有点奇怪，内存释放为什么会和内存分配有什么关系，但这的确是JVM的工作方式，它意味着在java理分配内存几乎和其他语言创建内存一样快.

For example, you can think of the C++ heap as a yard where each object stakes out its own piece of turf. This real estate can become abandoned sometime later and must be reused. In some JVMs, the Java heap is quite different; it’s more like a conveyor belt that moves forward every time you allocate a new object. This means that object storage allocation is remarkably rapid. The “heap pointer” is simply moved forward into virgin territory, so it’s effectively the same as C++’s stack allocation. (Of course, there’s a little extra overhead for bookkeeping, but it’s nothing like searching for storage.)

译：举例来说，你可以把c++的堆想象成一个院子,每个对象有它自己的位置。其中的位置可能变得没有使用然后在以后被重用。在一些JVM里，Java的heap是很不同的，它更像一个传送带，当你分配对象内存时，每次向前滚动。这个意味着对象内存分配是很迅速的。堆指针只是简单的向前移动到一个新的未使用的地方，所以它甚至是可以和C++的栈一样高效。（当然，这对于那种借用归还的方式来说有点重，但是它绝对也不是搜索存储)

You might observe that the heap isn’t in fact a conveyor belt, and if you treat it that way, you’ll start paging memory—moving it on and off disk, so that you can appear to have more memory than you actually do. Paging significantly impacts performance. Eventually, after you create enough objects, you’ll run out of me mory. The trick is that the garbage collector steps in, and while it collects the garbage it compacts all the objects in the heap so that you’ve effectively moved the “heap pointer” closer to the beginning of the conveyor belt and farther
away from a page fault. The garbage collector rearranges things and makes it possible for the high-speed, infinite-free-heap model to be used while allocating storage.

译：你可以观察到heap其实不是一个传送带，并且如果你这么看待，你就可以考虑内存分页并且缓存磁盘了，这样你可以有更多的内存使用。分页很大影响了性能。最终，当你创建足够的对象时，你将内存溢出了。这里就是垃圾回收的功能了，它可以释放内存，从而使你靠近传送带的开始头部，远离分页的错误。垃圾回收可以使java在分配内存是，变得高速度、无限堆内存一样。

To understand garbage collection in Java, it’s helpful learn how garbage-collection schemes work in other systems. A simple but slow garbage-collection technique is called reference counting . This means that each object contains a reference counter, and every time a reference is attached to that object, the refere nce count is increased. Every time a reference goes out of scope or is set to null, the reference count is decreased. Thus, managing reference counts is a small but constant overhead that happens throughout the lifetime of your program. The garbage collector moves through the entire list of objects, and when it
finds one with a reference count of zero it releases that stor age (however, reference counting schemes often release an object as soon as the count goes to zero). The one drawback is that if objects circularly refer to each other they can have nonzero reference counts while still being garbage. Locating such self-referential groups requires significant extra work for the garbage collector. Reference counting is commonly used to explain one kind of garbage collection, but it doesn’t seem to be used in any JVM implementations.

译:为了懂得Java垃圾回收的原理，很有必要去看下其他系统的垃圾回收模式。一个简单但是比较慢的来及回收技术是引用计数。这意味着每一个对象包括一个引用计数，每次有一个新的引用就增加，每次出了范围域或者被设置成null时，引用计数就减少。即使管理这些引用计数很小，但是还是显得有点笨重，相对你程序生命周期来说。垃圾回收器扫描全部的对象，当它发现一个引用为0的对象时就释放内存（但是这个经常可以达到没有引用的同时就会释放掉内存),有一个缺点，如果对象们成为一个圈圈的互相引用，当被回收时他们也可以有非零对象引用，这样子就不会回收。定位这些自引用的组需要很多额度的工作。引用计数经常用来结识一种垃圾回收模式，但是不会有真正用到JVM的实现里。

In faster schemes, garbage collection is not based on reference counting. Instead, it is based on the idea that any non-dead object must ultimately be traceable back to a reference that lives either on the stack or in static storage. The chain might go through several layers of objects. Thus, if you start in the stack and in the static storage area and walk through all the references, you’ll find all the live objects. For each reference that you find, you must trace into the object that it points to and then follow all the references in that object, tracing into the objects they point to, etc., until you’ve move d through the entire Web that originated with the reference on the stack or in static storage. Each object th at you move through must still be alive. Note that there is no problem with detached self-referen tial groups—these are simply not found, and are therefore automatically garbage.

在更快的模式下，垃圾回收并不是基于引用计数的方式。相反的，它是根据每个存活的对象都应该在栈或静态块中存在引用。这个查找可能会需要跨越很多层级。这样的话，如果你在栈中或者静态存储域中扫一遍，你就会找到对应的存活对象。对每一个引用来说，你都必须追踪引用的对象，并且追踪这个对象引用的对象，直到你遍历完了来源于栈内存和静态存储域中的整个web。每一个对象在你扫描的时候都必须是存活的。附加说明的是那个自引用的群体不会成为问题，因为他们都不会被发现，会被自动的垃圾回收。

In the approach described here, the JVM uses an adaptive garbage-collection scheme, and what it does with the live objects that it locates depends on the variant currently being used. One of these variants is stop-and-copy . This means that—for reasons that will become apparent—the program is first stopped (this is not a background collection scheme). Then, each live object is copied from one heap to another, leaving behind all the garbage. In addition, as the objects are copied into the new heap, they are packed end-to-end, thus compacting the new heap (and allowing new storage to simply be reeled off the end as previously described).

译：在上述的模式下，JVM使用一种自适应的垃圾回收模式，根据定位存活的对象而变化使用的模式。其中的一个模式就是stop-and-copy。这意味着，在某种程度上变得透明，程序需要先停止下来(这不是一个后台回收模式)，然后每个存活的对象都没拷贝到一个新的堆里，留下的全部被当作垃圾一样看待。另外对象移到新的堆后，他们会重新配对，在新的堆里。

Of course, when an object is moved from one place to another, all references that point at the object must be changed. The reference that goes from the heap or the static storage area to the object can be changed right away, but there can be other references pointing to this object that will be encountered later during the “walk.” These are fixed up as they are found (you could imagine a table that maps old addresses to new ones).

当然，当一个对象从一个地方移到另外一个堆里的时候，所有引用都需要更新。引用可以立即更新，但是有一种情况是，在遍历过程中如果有对象还在引用原来堆里的对象。这些当他们被找到之后就会被修正（可以理解为一个table存放着一个旧的地址和新的地址存放列表).

There are two issues that make these so-called “copy collectors” inefficient. The first is the idea that you have two heaps and you slosh all the memory back and forth between these two separate heaps, maintaining twice as much me mory as you actually need. Some JVMs deal with this by allocating the heap in chunks as needed and simply copy ing from one chunk to another.

译：有两个问题使得“拷贝”这种模式不够效率。这个首先是你的管理两个堆，并且把所有内存在两个堆之间来回交换，维护两倍的不需要的内存。一些JVM处理这种情况是将分配的堆以块的形势存放，在切换的时候只是把一个大块儿拷贝到另外一个块。

The second issue is the copying process itself. Once your program becomes stable, it might be generating little or no garbage. Despite that, a copy collector will still copy all the memory from one place to another, which is wasteful. To prevent this, some JVMs detect that no new garbage is being generated and switch to a different scheme (this is the “adaptive” part). This other scheme is called mark-and-sweep, and it’s what earlier versions of Sun’s JVM used all the time. For general use, mark-and-sweep is fairly slow, but when you know you’re generating little or no garbage, it’s fast.

译：第二个就是copy进程的本身。一旦你的程序变得稳定，可能就会只有很少的垃圾或者没有垃圾。尽管这样，如果垃圾回收还是从这里拷贝到另外的内存里，就完全是浪费的。为了防止这样，一些JVM检查如果没有新的垃圾产生，就切换懂啊一个不同的模式(这就是自适应的作用).这个模式叫做mark-and-sweep，这是一个Sun的JVM以前一直在用的简单模式。在一般情况下，mark-and-sweep是很慢的，但是如果你知道你的程序只有很少或者没有垃圾的时候，它是很快的。

Mark-and-sweep follows the same logic of starting from the stack and static storage, and tracing through all the references to find live objects. However, each time it finds a live object, that object is marked by setting a flag in it, but the object isn’t collected yet. Only when the marking process is finished does the sweep occur. During the sweep, the dead objects are released. However, no copying happens, so if the collector chooses to compact a fragmented heap, it does so by shuffling objects around.

译：Mark-and-Sweep和栈静态域的逻辑是一样的，从这两方面中寻找活动的对象。然而，当它找到一个活动对象时，它打上一个标记，但是并开始回收。直到扫描结束之后，才可以进行清理。在清理过程中，只有旧的对象会被释放。然而，没有复制发生，当垃圾回收选择一个回收一个碎片堆时，它只是释放旧对象而不是将对象来回拷贝。

“Stop-and-copy” refers to the idea that this type of garbage collection is not done in the background; instead, the program is stopped while the garbage collection occurs. In the Sun literature you’ll find many references to garbage collection as a low-priority background process, but it turns out that the garbage collection was not implemented that way in earlier versions of the Sun JVM. Instead, the Sun garbage collector stopped the program when memory got low. Mark-and-sweep also requires that the program be stopped.

译:”stop-and-copy”这种模式并不是在后台完成的。它需要程序停下来，然后进行垃圾回收。在Sun的说明中你会发现很多关于垃圾回收是作为一个优先级很低的后台进行，可以证明之前sun的JVM并不是并不是这么做的。Sun的垃圾回收是当可用内存变低的时候停下来。Mark-and-sweep也是需要程序停止下来。

As previously mentioned, in the JVM described here memory is allocated in big blocks. If you allocate a large object, it gets its own block. Strict stop-and-copy requires copying every live object from the source heap to a new heap before you can free the old one, which translates to lots of memory. With blocks, the garbage collection can typically copy objects to dead blocks as it collects. Each block has a generation count to keep track of whether it’s alive. In the normal case, only the blocks created since the last garbage collection are compacted; all
other blocks get their generation count bumped if they have been referenced from
somewhere. This handles the normal case of lots of short-lived temporary objects.
Periodically, a full sweep is made—large objects are still not copied (they just get their generation count bumped), and blocks containi ng small objects are copied and compacted. The JVM monitors the efficiency of garbage collection and if it becomes a waste of time because all objects are long-lived, then it switches to mark-andsweep. Similarly, the JVM keeps track of how successful mark-and-sweep is, and if the heap starts to become fragmented, it switches back to stop-and-copy. This is where the “adaptive” part comes in, so you end up with a mouthful: “Adaptive generational stop-and-copy mark-andsweep.”

译：像之前提到的一样，在JVM里内存被当作大块儿一样分配。如果你分配了一个很大的对象，它获得它自己的大块儿。严格的stop-and-copy需要把每个存活的对象从原始堆里拷贝到新的堆里，这样会造成很多的内存占用。如果使用块儿的话，垃圾回收可以直接将对象拷贝到它之前回收的块儿里。每一个块都有一个产生次数来记录是否还存活的状态。在正常情况下，只有上次回收之后产生的块儿会被组配，其他的块如果有在引用就会把产生次数增加。这种处理了正常的大部分的情况，针对于生存期很短的对象。片段地来说，一个完整的sweep是并不拷贝大的块儿，而是将小的块儿复制并且组配。JVM监视垃圾回收的效率，如果大部分对象都是长期生存的，那么它就会切换到mark-and-sweep。相似的，JVM会监视多少是mark-and-sweep回收成功的，如果堆开始编程碎片，那么它就会重新切换到stop-and-copy。这就是自适应的部分，所以你可以用一句话来说明：自适应的stop-and-copy mark-and-sweep.

There are a number of additional speedups possible in a JVM. An especially important one involves the operation of the loader and what is called a just-in-time (JIT) compiler. A JIT compiler partially or fully converts a program into native machine code so that it doesn’t need to be interpreted by the JVM and thus runs much faster. When a class must be loaded (typically, the first time you want to create an object of that class), the .class file is located, and the bytecodes for that class are brought into memory. At this point, one approach is to simply JIT compile all the code, but this has two drawbacks: It takes a little more time, which, compounded throughout the life of the program, can add up; and it increases the size of the executable (bytecodes are significantly more compact than expanded JIT code), and this might cause paging, which definitely slows down a program. An alternative approach is lazy evaluation , which means that the code is not JIT compiled until necessary. Thus, code that never gets executed might never be JIT compiled. The Java HotSpot technologies in recent JDKs take a similar approach by increasingly optimizing a piece of code each time it is executed, so the more the code is executed, the faster it gets.

还有两外一种使得JVM变得快速的方法。一个特别重要的就是加载器和即时编译器。一个JIT可以将一个程序转换成本地机器的代码，所以它可以不用被JVM打断，而且运行比较快。当一个class被加载时，.class文件被定为到，和二进制代码加载到内存中。在这点上，一个方式是简单JIT编译所有代码，但是这个有两个缺点：它花费更多的时间，因为整个程序都要合成，可以向上累加。第二个是它增加了可执行代码的数量，这有可能导致分页，使得程序变得很慢。另外一个可选的方式是延迟判断，只有必要的时候采用JIT编译。即使这样，不执行的代码永远不会被JIT编译。Java热点技术在最近的JDKS使用一种相似的方式，每次执行的时候来增加优化代码的热度，所以这个代码执行越多次，它会变得更快。

最后一段没看明白….