golang的垃圾回收算法之九写屏障

最新推荐文章于 2024-08-17 12:25:15 发布

fpcc

最新推荐文章于 2024-08-17 12:25:15 发布

阅读量1.3k

点赞数

分类专栏： Golang 文章标签： GC golang

本文链接：https://blog.csdn.net/fpcc/article/details/126361944

版权

Golang 专栏收录该内容

10 篇文章 4 订阅

订阅专栏

本文深入探讨了Go语言中垃圾回收（GC）的写屏障机制，包括强三色不变和弱三色不变的概念，以及插入屏障和删除屏障的作用。在Go 1.8中引入的混合写屏障机制旨在确保并发GC的安全性和效率。文章通过源码分析展示了写屏障如何在内存管理中确保数据一致性，并防止误GC。同时，文中提到了写屏障在栈和全局变量中的应用，以及对内存顺序的考虑。

摘要由CSDN通过智能技术生成

说明

为了方便阅读把写屏障，先写出来，然后再接着写垃圾回收的清理等操作。

一、写屏障

在前面提到了，在标记的过程中是要插入写屏障的。在三色标记算法中，由于支持并行，就需要保证标记过程的安全性。标记有两大类强弱三色不变。应用在具体的场景有插入屏障和删除屏障。在前面的增量式GC中提到过，三色标记算法如果没有写屏障很容易导致误GC，安全性无法得到保障。那么如何才能保障GC的安全也就是正确的标记相关的内存对象呢？其实就是启动屏障机制。
写屏障其实就是一种机制，它能够在各种环境下保证数据的一致性。这里首先说明一下强弱三色不变：
1、强三色
强三色就是没有黑色对象引用白色对象的指针。结合前面增量垃圾回收中的说明可以知道，这样做的目的是为了解决内存对象的遗漏问题。防止被引用的白色对象被误清理。

2、弱三色
弱三色指被黑色对象引用的白色对象都处于灰色保护状。

根据上述的两种三色标记，Golang引入了插入屏障和删除屏障两种机制：
1、插入屏障
使用强三色标记，A引用的对象必须指定为灰色。

2、删除屏障
使用弱三色标记，内存对象被删除时，如果自身为灰色或者白色，则标记为灰色。这样可以保证被删除费用的对象如果还有其它引用，就不会被误清理。

在Golang1.8中的混合写屏障机制：
在实际应用中可以看到，插入写屏障结束时需要STW来重新扫描栈，标记栈上引用的白色对象的存活；删除写屏障回收精度低，GC开始时STW扫描堆栈来记录初始快照，这个过程会保护开始时刻的所有存活对象。换句话说，这个对STW是一种强需求，那么可不可以有一种更优的机制呢？在golang1.8中便引入了混合写屏障机制：
1、GC开始将栈上的对象全部扫描并标记为黑色(不再二次扫描，不用STW)，
2、GC期间，栈上创建的新对象设置为黑色。
3、被删除的对象标记为灰色。
4、被添加的对象标记为灰色。
这是一种符合弱三色不变标记的屏障

** 上述的机制在网上和书上有的是相关介绍，这里就不再拾人牙慧，重复了。建议大家在网上搜索相关即有大把的文章，或者找一些相关的GC算法书籍，也有很多介绍。

二、源码分析

下面就来分析一下相关的源码，在前面的分析代码中提到，在gcStart中，有一个启动写屏障：

//go:nosplit
func setGCPhase(x uint32) {
	atomic.Store(&gcphase, x)
	writeBarrier.needed = gcphase == _GCmark || gcphase == _GCmarktermination
	writeBarrier.enabled = writeBarrier.needed || writeBarrier.cgo
}

mstkbar.go和mbarrier.go这两个文件中是对写屏障的基本的汇集文件。
golang 内部是通过一个队列+掩码位图来实现三色状态的，并没有大家想象的有三个标志属性即：
白色对象：对象所在 span 的 gcmarkBits 中对应的 bit 为 0，不在队列；
灰色对象：对象所在 span 的 gcmarkBits 中对应的 bit 为 1，且对象在扫描队列中；
黑色对象：对象所在 span 的 gcmarkBits 中对应的 bit 为 1，且对象已经从扫描队列中处理并摘除掉。
在前面分析过堆上的标记过程，后面会分析栈上的扫描过程，这其中都用到了写屏障：

// cgoCheckWriteBarrier is called whenever a pointer is stored into memory.
// It throws if the program is storing a Go pointer into non-Go memory.
//go:nosplit
//go:nowritebarrier
func cgoCheckWriteBarrier(dst *uintptr, src uintptr) {
	if !cgoIsGoPointer(unsafe.Pointer(src)) {
		return
	}
	if cgoIsGoPointer(unsafe.Pointer(dst)) {
		return
	}

	// If we are running on the system stack then dst might be an
	// address on the stack, which is OK.
	g := getg()
	if g == g.m.g0 || g == g.m.gsignal {
		return
	}

	// Allocating memory can write to various mfixalloc structs
	// that look like they are non-Go memory.
	if g.m.mallocing != 0 {
		return
	}

	systemstack(func() {
		println("write of Go pointer", hex(src), "to non-Go memory", hex(uintptr(unsafe.Pointer(dst))))
		throw(cgoWriteBarrierFail)
	})
}

// cgoCheckMemmove is called when moving a block of memory.
// dst and src point off bytes into the value to copy.
// size is the number of bytes to copy.
// It throws if the program is copying a block that contains a Go pointer
// into non-Go memory.
//go:nosplit
//go:nowritebarrier
func cgoCheckMemmove(typ *_type, dst, src unsafe.Pointer, off, size uintptr) {
	if typ.kind&kindNoPointers != 0 {
		return
	}
	if !cgoIsGoPointer(src) {
		return
	}
	if cgoIsGoPointer(dst) {
		return
	}
	cgoCheckTypedBlock(typ, src, off, size)
}

// cgoCheckSliceCopy is called when copying n elements of a slice from
// src to dst.  typ is the element type of the slice.
// It throws if the program is copying slice elements that contain Go pointers
// into non-Go memory.
//go:nosplit
//go:nowritebarrier
func cgoCheckSliceCopy(typ *_type, dst, src slice, n int) {
	if typ.kind&kindNoPointers != 0 {
		return
	}
	if !cgoIsGoPointer(src.array) {
		return
	}
	if cgoIsGoPointer(dst.array) {
		return
	}
	p := src.array
	for i := 0; i < n; i++ {
		cgoCheckTypedBlock(typ, p, 0, typ.size)
		p = add(p, typ.size)
	}
}

// cgoCheckTypedBlock checks the block of memory at src, for up to size bytes,
// and throws if it finds a Go pointer. The type of the memory is typ,
// and src is off bytes into that type.
//go:nosplit
//go:nowritebarrier
func cgoCheckTypedBlock(typ *_type, src unsafe.Pointer, off, size uintptr) {
	// Anything past typ.ptrdata is not a pointer.
	if typ.ptrdata <= off {
		return
	}
	if ptrdataSize := typ.ptrdata - off; size > ptrdataSize {
		size = ptrdataSize
	}

	if typ.kind&kindGCProg == 0 {
		cgoCheckBits(src, typ.gcdata, off, size)
		return
	}

	// The type has a GC program. Try to find GC bits somewhere else.
	for _, datap := range activeModules() {
		if cgoInRange(src, datap.data, datap.edata) {
			doff := uintptr(src) - datap.data
			cgoCheckBits(add(src, -doff), datap.gcdatamask.bytedata, off+doff, size)
			return
		}
		if cgoInRange(src, datap.bss, datap.ebss) {
			boff := uintptr(src) - datap.bss
			cgoCheckBits(add(src, -boff), datap.gcbssmask.bytedata, off+boff, size)
			return
		}
	}

	aoff := uintptr(src) - mheap_.arena_start
	idx := aoff >> _PageShift
	s := mheap_.spans[idx]
	if s.state == _MSpanStack {
		// There are no heap bits for value stored on the stack.
		// For a channel receive src might be on the stack of some
		// other goroutine, so we can't unwind the stack even if
		// we wanted to.
		// We can't expand the GC program without extra storage
		// space we can't easily get.
		// Fortunately we have the type information.
		systemstack(func() {
			cgoCheckUsingType(typ, src, off, size)
		})
		return
	}

	// src must be in the regular heap.

	hbits := heapBitsForAddr(uintptr(src))
	for i := uintptr(0); i < off+size; i += sys.PtrSize {
		bits := hbits.bits()
		if i >= off && bits&bitPointer != 0 {
			v := *(*unsafe.Pointer)(add(src, i))
			if cgoIsGoPointer(v) {
				systemstack(func() {
					throw(cgoWriteBarrierFail)
				})
			}
		}
		hbits = hbits.next()
	}
}

// cgoCheckBits checks the block of memory at src, for up to size
// bytes, and throws if it finds a Go pointer. The gcbits mark each
// pointer value. The src pointer is off bytes into the gcbits.
//go:nosplit
//go:nowritebarrier
func cgoCheckBits(src unsafe.Pointer, gcbits *byte, off, size uintptr) {
	skipMask := off / sys.PtrSize / 8
	skipBytes := skipMask * sys.PtrSize * 8
	ptrmask := addb(gcbits, skipMask)
	src = add(src, skipBytes)
	off -= skipBytes
	size += off
	var bits uint32
	for i := uintptr(0); i < size; i += sys.PtrSize {
		if i&(sys.PtrSize*8-1) == 0 {
			bits = uint32(*ptrmask)
			ptrmask = addb(ptrmask, 1)
		} else {
			bits >>= 1
		}
		if off > 0 {
			off -= sys.PtrSize
		} else {
			if bits&1 != 0 {
				v := *(*unsafe.Pointer)(add(src, i))
				if cgoIsGoPointer(v) {
					systemstack(func() {
						throw(cgoWriteBarrierFail)
					})
				}
			}
		}
	}
}

// cgoCheckUsingType is like cgoCheckTypedBlock, but is a last ditch
// fall back to look for pointers in src using the type information.
// We only use this when looking at a value on the stack when the type
// uses a GC program, because otherwise it's more efficient to use the
// GC bits. This is called on the system stack.
//go:nowritebarrier
//go:systemstack
func cgoCheckUsingType(typ *_type, src unsafe.Pointer, off, size uintptr) {
	if typ.kind&kindNoPointers != 0 {
		return
	}

	// Anything past typ.ptrdata is not a pointer.
	if typ.ptrdata <= off {
		return
	}
	if ptrdataSize := typ.ptrdata - off; size > ptrdataSize {
		size = ptrdataSize
	}

	if typ.kind&kindGCProg == 0 {
		cgoCheckBits(src, typ.gcdata, off, size)
		return
	}
	switch typ.kind & kindMask {
	default:
		throw("can't happen")
	case kindArray:
		at := (*arraytype)(unsafe.Pointer(typ))
		for i := uintptr(0); i < at.len; i++ {
			if off < at.elem.size {
				cgoCheckUsingType(at.elem, src, off, size)
			}
			src = add(src, at.elem.size)
			skipped := off
			if skipped > at.elem.size {
				skipped = at.elem.size
			}
			checked := at.elem.size - skipped
			off -= skipped
			if size <= checked {
				return
			}
			size -= checked
		}
	case kindStruct:
		st := (*structtype)(unsafe.Pointer(typ))
		for _, f := range st.fields {
			if off < f.typ.size {
				cgoCheckUsingType(f.typ, src, off, size)
			}
			src = add(src, f.typ.size)
			skipped := off
			if skipped > f.typ.size {
				skipped = f.typ.size
			}
			checked := f.typ.size - skipped
			off -= skipped
			if size <= checked {
				return
			}
			size -= checked
		}
	}
}

需要注意的是，写屏障是在编译期确定的，之后就不再动了。这就涉及到一个问题即“内存逃逸”，这个在前面分析过，也就是说，只有确定的了内存分配到底在堆上还是栈上（堆才有写屏障），才能正确的应用写屏障。有兴趣可以看看前面的相关文章。
也就是说，上面写代码不是在函数调用时产生的，所以需要看一下相关的汇编代码：

// void gogo(Gobuf*)
// restore state from Gobuf; longjmp
TEXT runtime·gogo(SB), NOSPLIT, $16-8
	MOVQ	buf+0(FP), BX		// gobuf

	// If ctxt is not nil, invoke deletion barrier before overwriting.
	MOVQ	gobuf_ctxt(BX), AX
	TESTQ	AX, AX
	JZ	nilctxt
	LEAQ	gobuf_ctxt(BX), AX
	MOVQ	AX, 0(SP)
	MOVQ	$0, 8(SP)
	CALL	runtime·writebarrierptr_prewrite(SB)
	MOVQ	buf+0(FP), BX
......
// void gogo(Gobuf*)
// restore state from Gobuf; longjmp
TEXT runtime·gogo(SB), NOSPLIT, $16-8
	MOVQ	buf+0(FP), BX		// gobuf

	// If ctxt is not nil, invoke deletion barrier before overwriting.
	MOVQ	gobuf_ctxt(BX), AX
	TESTQ	AX, AX
	JZ	nilctxt
	LEAQ	gobuf_ctxt(BX), AX
	MOVQ	AX, 0(SP)
	MOVQ	$0, 8(SP)
	CALL	runtime·writebarrierptr_prewrite(SB)
	MOVQ	buf+0(FP), BX

可以看到，在这些汇编文件中调用了相关的写屏障的代码。这里最重要的其实是第一个函数gcmarkwb_m:

func gcmarkwb_m(slot *uintptr, ptr uintptr) {
	if writeBarrier.needed {
		// Note: This turns bad pointer writes into bad
		// pointer reads, which could be confusing. We avoid
		// reading from obviously bad pointers, which should
		// take care of the vast majority of these. We could
		// patch this up in the signal handler, or use XCHG to
		// combine the read and the write. Checking inheap is
		// insufficient since we need to track changes to
		// roots outside the heap.
		if slot1 := uintptr(unsafe.Pointer(slot)); slot1 >= minPhysPageSize {
			if optr := *slot; optr != 0 {
				shade(optr)
			}
		}
		// TODO: Make this conditional on the caller's stack color.
		if ptr != 0 && inheap(ptr) {
			shade(ptr)
		}
	}
}

在这个文件的开头的说明中，对此函数进行了简化：
gcmarkwb_m is the mark-phase write barrier, the only barrier we have.The rest of this file exists only to make calls to this function. This is a hybrid barrier that combines a Yuasa-style deletion barrier—which shades the object whose reference is being overwritten—with Dijkstra insertion barrier—which shades the object whose reference is being written. The insertion part of the barrier is necessary while the calling goroutine’s stack is grey. In pseudocode, the barrier is:

writePointer(slot, ptr):
shade(*slot)
if current stack is grey:
shade(ptr)
*slot = ptr
slot is the destination in Go code. ptr is the value that goes into the slot in Go code.

整个开头文件的说明还是非常清晰的，英文好的可以直接看，不好的去Google翻译上看。不要想当然，这是最可怕的。
slot is the destination in Go code.
ptr is the value that goes into the slot in Go code.

Shade indicates that it has seen a white pointer by adding the referent
to wbuf as well as marking it.

The two shades and the condition work together to prevent a mutator
from hiding an object from the garbage collector:

shade(*slot) prevents a mutator from hiding an object by moving
the sole pointer to it from the heap to its stack. If it attempts
to unlink an object from the heap, this will shade it.
shade(ptr) prevents a mutator from hiding an object by moving
the sole pointer to it from its stack into a black object in the
heap. If it attempts to install the pointer into a black object,
this will shade it.
Once a goroutine’s stack is black, the shade(ptr) becomes
unnecessary. shade(ptr) prevents hiding an object by moving it from
the stack to the heap, but this requires first having a pointer
hidden on the stack. Immediately after a stack is scanned, it only
points to shaded objects, so it’s not hiding anything, and the
shade(*slot) prevents it from hiding any other pointers on its
stack.

For a detailed description of this barrier and proof of
correctness, see https://github.com/golang/proposal/blob/master/design/17503-eliminate-rescan.md
Dealing with memory ordering:

Both the Yuasa and Dijkstra barriers can be made conditional on the
color of the object containing the slot. We chose not to make these
conditional because the cost of ensuring that the object holding
the slot doesn’t concurrently change color without the mutator
noticing seems prohibitive.

Consider the following example where the mutator writes into
a slot and then loads the slot’s mark bit while the GC thread
writes to the slot’s mark bit and then as part of scanning reads
the slot.

Initially both [slot] and [slotmark] are 0 (nil)
Mutator thread GC thread
st [slot], ptr st [slotmark], 1

ld r1, [slotmark] ld r2, [slot]

Without an expensive memory barrier between the st and the ld, the final
result on most HW (including 386/amd64) can be r1r20. This is a classic
example of what can happen when loads are allowed to be reordered with older
stores (avoiding such reorderings lies at the heart of the classic
Peterson/Dekker algorithms for mutual exclusion). Rather than require memory
barriers, which will slow down both the mutator and the GC, we always grey
the ptr object regardless of the slot’s color.
//
Another place where we intentionally omit memory barriers is when
accessing mheap_.arena_used to check if a pointer points into the
heap. On relaxed memory machines, it’s possible for a mutator to
extend the size of the heap by updating arena_used, allocate an
object from this new region, and publish a pointer to that object,
but for tracing running on another processor to observe the pointer
but use the old value of arena_used. In this case, tracing will not
mark the object, even though it’s reachable. However, the mutator
is guaranteed to execute a write barrier when it publishes the
pointer, so it will take care of marking the object. A general
consequence of this is that the garbage collector may cache the
value of mheap_.arena_used. (See issue #9984.)

Stack writes:
The compiler omits write barriers for writes to the current frame,
but if a stack pointer has been passed down the call stack, the
compiler will generate a write barrier for writes through that
pointer (because it doesn’t know it’s not a heap pointer).

One might be tempted to ignore the write barrier if slot points
into to the stack. Don’t do it! Mark termination only re-scans
frames that have potentially been active since the concurrent scan,
so it depends on write barriers to track changes to pointers in
stack frames that have not been active.

Global writes:

The Go garbage collector requires write barriers when heap pointers
are stored in globals. Many garbage collectors ignore writes to
globals and instead pick up global -> heap pointers during
termination. This increases pause time, so we instead rely on write
barriers for writes to globals so that we don’t have to rescan
global during mark termination.

Publication ordering:

三、总结

所谓写屏障其实就类似于多线程中的内存保护，保证每个线程访问相关的内存都是安全。而不能每个线程想删除就删除想更改就更改。并行和并发在大多数情况下确实是能提高效率的，不然也不会朝着这个方向发展。但一个事务的出现，不光有好的一面，必然也有负面的一面。并行和并发的引入，复杂性和不易维护、难于调试就是负面的东西。但人总不能因噎废食，强化好的一方面，尽量减少不好的一方面，这才是每个程序员应该考虑的。
看看Golang代码中很多的Debug部分，就明白，写一个程序真得需要花精气神的。