This post is Part 8 in the series of posts on Garbage Collection (GC). Please see the index here.
One of the primary disadvantage discussed in the post on mark-sweep garbage collection is that it introduces very large system pauses when the entire heap is marked and swept. One of the primary optimization employed to solve this issue is employing generational garbage collection. This optimization is based on the following observations
- Most objects die young
- Over 90% garbage collected in a GC is newly created post the previous GC cycle
- If an object survives a GC cycle the chances of it becoming garbage in the short term is low and hence the GC wastes time marking it again and again in each cycle
The optimization based on the above observations is to segregate objects by age into multiple generations and collect each with different frequencies.
This scheme has proven to work rather well and is widely used in many modern systems (including .NET).
理解:标记-清除的垃圾回收方式会导致很强的系统停滞,当整个堆都需要标记-清除的时候。一种解决方案是应用划分年代的垃圾回收算法。这种优化是基于以下几点:1.大部分的对象很早就死亡了。2.3.如果一个对象在一次GC中存活了下来,那么短时间内这个对象被回收的可能性就很小。因此,每一次GC回收中就会花费很多无用的时间来标记这些对象。基于以上的观察,我们做出根据对象年龄进行年代划分并分割聚集的优化操作。
Detailed algorithm
The objects can be segregated into age based generations in different ways, e.g. by time of creation. However one common way is to consider a newly created object to be in Generation 0 (Gen0) and then if it is not collected by a cycle of garbage collection then it is promoted to the next higher generation, Gen1. Similarly if an object in Gen1 survives a GC then that gets promoted to Gen2.
Lower generations are collected more often. This ensures lower system pauses. The higher generation collection is triggered fewer times.
How many generations are employed, varies from system to system. In .NET 3 generations are used. Here for simplicity we will consider a 2 generation system but the concepts are easily extended to more than 2.
理解:对象根据创建时间可以划分为不同的年代。通常我们将新产生的对象划分为Gen0,如果在第一轮的GC中没有被回收,那么我们就将他划分为Gen1。相似的,我们可以讲Gen1中的对象提升为Gen2。对年轻代的对象收集操作更加频繁,老年代的收集操作相对少一些。具体要将对象划分为多少代,不同的系统会有不同的实现。.Net中将对象划分为3代,为了便于讨论,我们将对象划分为2代。
Let us consider that the memory is divided into two contiguous blocks, one for Gen1 and the other for Gen0. At start memory is allocated only from Gen0 area as follows
So we have 4 objects in Gen0. Now one of the references is released
Now if GC is fired it will use mark and sweep on Gen0 objects and cleanup the two objects that are not reachable. So the final state after cleaning up is
The two surviving objects are then promoted to Gen1. Promotion includes copying the two objects to Gen1 area and then updating the references to them
理解:假设我们将内存空间划分为连续的两块,一块存储Gen0对象,另一块为Gen1对象。图(1):两个引用,四个对象。图(2)释放了一个引用。图(3)标记-清除后两个对象不可达。图(4)存活的两个对象上升为Gen1并更新了引用。
Now assume a whole bunch of allocation/de-allocation has happened. Since new allocations are in Gen0 the memory layout looks like
The whole purpose of segregating into generations is to reduce the number of objects to inspect for marking. So the first root is used for marking as it points to a Gen0 object. While using the second root the moment the marker sees that the reference is into a Gen1 object it does not follow the reference, speeding up marking process.
Now if we only consider the Gen0 objects for marking then we only mark the objects indicated by ✓. The marking algorithm will fail to locate the Gen1 to Gen0 references (shown in red) and some object marking will be left out leading to dangling pointers.
One of the way to handle this is to somehow record all references from Gen1 to Gen0 (way to do that is in the next section) and then use these objects as new roots for the marking phase. If we use this method then we get a new set of marked objects as follows
理解:图(5)展示了一系列的构造和析构后的内存布局情况。划分对象的目的是为了减少需要标记的对象个数。所以第一个根引用所指向的对象可以直接标记,因为他指向的对象是Gen0的。第二个根引用指向Gen1中的对象,加速了标记的过程。如果只考虑标记Gen0对象,也就是只标记有对钩的对象,那么标记算法将无法标记Gen1对象所指的Gen0对象(红色箭头),并导致悬挂指针的产生(Gen1中紫色的对象)。一种解决方案是标记所有Gen1对象所指的Gen0对象,并将这些Gen1对象作为新的根。
This now gives the full set of marked objects. Post another GC and promotion of surviving objects to higher generation we get
At this point the next cycle as above resumes…
Tracking higher to lower generation references
In general applications there are very few (some studies show < 1% of all references) of these type of references. However, they all need to be recorded. There are two general approached of doing this
Write barrier + card-table
First a table called a card table is created. This is essentially an array of bits. Each bit indicates if a given range of memory is dirty (contains a write to a lower generation object). E.g. we can use a single bit to mark a 4KB block.
理解:首先创建一个卡片表,也就是bit数组。每一个bit位用来表示一个给定的内存区域是否是脏的。例如,我们可以用一个bit位来标记一个4kb的内存块是否为脏的。
Whenever an reference assignment is made in user code, instead of directly doing the assignment it is redirected to a small thunk (incase .NET the JITter does this). The thunk compares the assignees address to that of the Gen1 memory range. If the range falls within, then the thunk updates the corresponding bit in the card table to indicate that the range which the bit covers is now dirty (shown as red).
理解:当一个用户申请了一个内存分配引用,GC不会直接进行内存分配,而是转向一个thunk程序(.Net中就是如此)。thunk程序将分配者的地址和Gen1的内存地址进行比较。如果在Gen1的内存范围内,thunk会更新相应的bit位来表示相应的内存块已脏(红色部分)。
First marking uses only Gen0 objects. Once this is over it inspects the card table to locate dirty blocks. Then it considers every object in that dirty block to be new roots and marks objects using it.
As you can see that the 4KB block is just an optimization to reduce the size of the card table. If we increase the granularity to be per object then we can save marking time by having to consider only one object (in contrast to all in 4KB range) but our card table size will also significantly increase.
One of the flip sides is that the thunk makes reference assignment slower.
理解:首先标记的是Gen0对象。一旦标记结束就检查卡片表来定位脏块。所有脏块内的对象都会成为新的根,并且标记正在用他们的对象。4KB的大小可以最优化卡片的大小。如果我们将粒度增大,我们可以节省标记的时间但却只能考虑一个对象而不是4KB的大小。我们的表也会增大。thunk会使引用分配变慢。
HW support
Hardware support also uses card table but instead of using thunk it simply uses special features exposed by the HW+OS for notification of dirty writes. E.g. it can use the Win32 api GetWriteWatch to get the list of pages where write happened and use that information to get the card table entries.
However, these kind of support is not available on all platforms (or older version of platforms) and hence is less utilized.