Go 1.19.3 sync.Pool原理简析

metabit

已于 2024-01-24 18:59:45 修改

阅读量1.5k

点赞数

分类专栏： # 具象的Go 文章标签： syncpool

于 2023-01-20 16:47:27 首次发布

本文链接：https://blog.csdn.net/dawnto/article/details/128741756

版权

具象的Go 专栏收录该内容

28 篇文章 0 订阅

订阅专栏

sync.Pool

sync.Pool是一个并发安全的对象池，一经创建不可复制。频繁的创建同类的临时对象，会给内存与Go GC带来压力，解决这类问题可以采用复用对象的策略。而复用对象则可以采用池化的思想。让被用完的对象去池子里泡一泡。sync.Pool 中的对象并不是一直存在，其中对象会在多轮GC间被清除。
sync.Pool的实现相对复杂，其与Go runtime紧密挂钩，在细节实现上，其使用了lock-free的环形队列结构，这是一个双端队列，从队列头部可以push和pop，在尾部只能pop。若按生产者消费者的模型来讲，这是一个只有一个生产者，但可以有多个消费者的模型，生产者可以在队列一端生产和消费，多个消费者只能在队列的另一端消费。

// A Pool is a set of temporary objects that may be individually saved and
// retrieved.
// 池是一组可以单独保存和检索的临时对象
// Any item stored in the Pool may be removed automatically at any time without
// notification. If the Pool holds the only reference when this happens, the
// item might be deallocated.
// 存储在池中的任何项目都可能随时自动删除，恕不另行通知。如果发生这种情况时池保存唯一的引用，则可能会解除分配该项目。
// A Pool is safe for use by multiple goroutines simultaneously.
// 池可以安全地同时由多个 goroutine 使用。
// Pool's purpose is to cache allocated but unused items for later reuse,
// relieving pressure on the garbage collector. That is, it makes it easy to
// build efficient, thread-safe free lists. However, it is not suitable for all
// free lists.
// 池的目的是缓存已分配但未使用的项目以供以后重用，从而减轻垃圾回收器的压力。也就是说，它可以轻松构建高效、线程安全的空闲列表。但是，它并不适合所有免费列表。
// An appropriate use of a Pool is to manage a group of temporary items
// silently shared among and potentially reused by concurrent independent
// clients of a package. Pool provides a way to amortize allocation overhead
// across many clients.
// 池的适当用法是管理一组临时项目，这些项目在包的并发独立客户端之间静默共享并可能由这些临时客户端重用。池提供了一种在多个客户端之间摊销分配开销的方法。
// An example of good use of a Pool is in the fmt package, which maintains a
// dynamically-sized store of temporary output buffers. The store scales under
// load (when many goroutines are actively printing) and shrinks when
// quiescent.
// 很好地使用池的一个示例是 fmt 包，它维护一个动态大小的临时输出缓冲区存储。存储在负载下缩放（当许多 goroutines 主动打印时）并在静止时收缩。
// On the other hand, a free list maintained as part of a short-lived object is
// not a suitable use for a Pool, since the overhead does not amortize well in
// that scenario. It is more efficient to have such objects implement their own
// free list.
// 另一方面，作为短期对象的一部分维护的空闲列表不适合用于池，因为在这种情况下开销不能很好地摊销。让此类对象实现自己的自由列表会更有效。
// A Pool must not be copied after first use.
// 首次使用后不得复制池。
// In the terminology of the Go memory model, a call to Put(x) “synchronizes before”
// a call to Get returning that same value x.
// Similarly, a call to New returning x “synchronizes before”
// a call to Get returning that same value x.
// 在 Go 内存模型的术语中，对 Put（x） 的调用在调用 Get 返回相同值 x 之前“同步”。同样，对 New 返回 x 的调用在对 Get 返回相同值 x 的调用之前“同步”。

Pool底层结构

type Pool struct {
	noCopy noCopy

	local     unsafe.Pointer // local fixed-size per-P pool, actual type is [P]poolLocal
	localSize uintptr        // size of the local array

	victim     unsafe.Pointer // local from previous cycle
	victimSize uintptr        // size of victims array

	// New optionally specifies a function to generate
	// a value when Get would otherwise return nil.
	// It may not be changed concurrently with calls to Get.
	New func() any
}

noCopy：用于检查Pool对象是否发生过拷贝
local：其底层存储结构的指针(数组或切片的指针)
localSize: 底层存储结构的长度(数组的长度)
victim：发生GC时接管local
victimSize：发生GC时接管localSize
New：Get操作，池中无对象时创建对象的API，在使用Pool时，需指定该函数对象。

底层存储结构相关

// Local per-P Pool appendix.
type poolLocalInternal struct {
	private any       // Can be used only by the respective P.  只能由相应的 P 使用时无需加锁
	shared  poolChain // Local P can pushHead/popHead; any P can popTail. 共享的存储链，poolChain为双端队列，头部可push，pop，尾部只能pop
}

type poolLocal struct {
	poolLocalInternal
	
    // 填充缓存行
	// Prevents false sharing on widespread platforms with
	// 128 mod (cache line size) = 0 .
	pad [128 - unsafe.Sizeof(poolLocalInternal{})%128]byte
}

race相关

竞态检测使用的插桩代码

// from runtime
func fastrandn(n uint32) uint32

var poolRaceHash [128]uint64

// poolRaceAddr returns an address to use as the synchronization point
// for race detector logic. We don't use the actual pointer stored in x
// directly, for fear of conflicting with other synchronization on that address.
// Instead, we hash the pointer to get an index into poolRaceHash.
// See discussion on golang.org/cl/31589.
func poolRaceAddr(x any) unsafe.Pointer {
	ptr := uintptr((*[2]unsafe.Pointer)(unsafe.Pointer(&x))[1])
	h := uint32((uint64(uint32(ptr)) * 0x85ebca6b) >> 16)
	return unsafe.Pointer(&poolRaceHash[h%uint32(len(poolRaceHash))])
}

Pool.Put 向池中加入对象x

// Put adds x to the pool.
func (p *Pool) Put(x any) {
	if x == nil { // 若对象为nil，直接返回
		return
	}
	if race.Enabled { // race 相关
		if fastrandn(4) == 0 {
			// Randomly drop x on floor.
			return
		}
		race.ReleaseMerge(poolRaceAddr(x))
		race.Disable()
	}
	l, _ := p.pin() // 让当前G与P进行绑定，禁止抢占。返回poolLocal指针及其pid
	if l.private == nil { // 如果*poolLocal的private字段为空，则赋值x
		l.private = x
	} else {
		l.shared.pushHead(x) //向底层环形队列头部插入
	}
	runtime_procUnpin() // 解除禁止抢占
	if race.Enabled {
		race.Enable()
	}
}

Pool.pin 返回当前G绑定的P中的*poolLocal, 和id

// pin pins the current goroutine to P, disables preemption and
// returns poolLocal pool for the P and the P's id.
// Caller must call runtime_procUnpin() when done with the pool.
// pin将当前goroutine固定为P，禁用抢占和
// 返回P和P的id的poolLocal池。
// 调用方在处理完池后必须调用runtime_procUnpin（）

func (p *Pool) pin() (*poolLocal, int) {
	pid := runtime_procPin() // 让当前G与P进行绑定，禁止抢占。
	// In pinSlow we store to local and then to localSize, here we load in opposite order.
	// 在pinSlow中，我们存储到local，然后存储到localSize，这里我们以相反的顺序加载。
	// Since we've disabled preemption, GC cannot happen in between.
	// 由于我们禁用了抢占，GC不能在两者之间发生
	// Thus here we must observe local at least as large localSize.
	// 因此，在这里我们必须观察到localSize至少和localSize一样大
	// We can observe a newer/larger local, it is fine (we must observe its zero-initialized-ness).
	// 我们可以观察到一个新的/更大的局部，它很好（我们必须观察它的零初始化性）
	
	s := runtime_LoadAcquintptr(&p.localSize) // load-acquire
	l := p.local                              // load-consume
	if uintptr(pid) < s {
		return indexLocal(l, pid), pid // 获取*poolLocal并返回其，与id
	}
	return p.pinSlow()  // 此时Pool未创建poolLocal,进入慢路径创建
}

func (p *Pool) pinSlow() (*poolLocal, int) {
	// Retry under the mutex.
	// Can not lock the mutex while pinned.
	// 在互斥锁下重试。
	// 固定时无法锁定互斥锁
	runtime_procUnpin()  // 结束非抢占
	allPoolsMu.Lock() // allPool上锁
	defer allPoolsMu.Unlock() //解锁时allPool解锁
	pid := runtime_procPin()  //重新设置非抢占
	// poolCleanup won't be called while we are pinned.
	// 固定过程不会执行poolCleanup
	s := p.localSize
	l := p.local
	if uintptr(pid) < s { //pid 在合理线标范围内则返回
		return indexLocal(l, pid), pid
	}
	if p.local == nil { // local数组为空，
		allPools = append(allPools, p)
	}
	// If GOMAXPROCS changes between GCs, we re-allocate the array and lose the old one.
	size := runtime.GOMAXPROCS(0) // 获取CPU逻辑核心数
	local := make([]poolLocal, size) // 创建该长度local数组
	atomic.StorePointer(&p.local, unsafe.Pointer(&local[0])) //绑定p.local      store-release
	runtime_StoreReluintptr(&p.localSize, uintptr(size))     // 更新p.localSize      store-release
	return &local[pid], pid
}

indexLocal 返回l对应数组的索引i对应的*poolLocal元素

func indexLocal(l unsafe.Pointer, i int) *poolLocal {
	lp := unsafe.Pointer(uintptr(l) + uintptr(i)*unsafe.Sizeof(poolLocal{}))
	return (*poolLocal)(lp)
}

Pool.Get 从池中随机获取一个对象，返回，并从池中清除该对象

// Get selects an arbitrary item from the Pool, removes it from the
// Pool, and returns it to the caller.
// Get 从池中选择任意项，将其从池中删除，然后将其返回给调用方。
// Get may choose to ignore the pool and treat it as empty.
// Get可以选择忽略池并将其视为空。
// Callers should not assume any relation between values passed to Put and
// the values returned by Get.
// 调用方不应假定传递给 Put 的值与 Get 返回的值之间存在任何关系。
// If Get would otherwise return nil and p.New is non-nil, Get returns
// the result of calling p.New.
// 如果 Get 将返回 nil 并且 p.New 是非 nil，则 Get 返回调用 p.New 的结果。
func (p *Pool) Get() any {
	if race.Enabled { // race相关
		race.Disable()
	}
	l, pid := p.pin() // 获取*poolLocal和id
	x := l.private // 拷贝 l.private
	l.private = nil // 置空 l.private
	if x == nil { // 若x为nil
		// Try to pop the head of the local shard. We prefer
		// the head over the tail for temporal locality of
		// reuse.
		// 尝试弹出本地碎片的头部。我们更喜欢
		// 时间位置的头重脚轻
		// 重复使用。
		x, _ = l.shared.popHead() // 尝试从共享链中获取一个
		if x == nil { //若x仍为空，则尝试走慢路径获取
			x = p.getSlow(pid)
		}
	}
	runtime_procUnpin() // 结束非抢占
	if race.Enabled { // race相关
		race.Enable()
		if x != nil {
			race.Acquire(poolRaceAddr(x))
		}
	}
	if x == nil && p.New != nil { // 若x仍未nil，且New函数对象非空
		x = p.New() //创建一个新的对象
	}
	return x // 返回
}

func (p *Pool) getSlow(pid int) any {
	// See the comment in pin regarding ordering of the loads.
	size := runtime_LoadAcquintptr(&p.localSize) // load-acquire
	locals := p.local                            // load-consume
	// Try to steal one element from other procs.
	for i := 0; i < int(size); i++ { // 查找全部的local数组,随机获取一个
		l := indexLocal(locals, (pid+i+1)%int(size))
		if x, _ := l.shared.popTail(); x != nil {
			return x //从消费端队尾获取一个对象
		}
	}

	// Try the victim cache. We do this after attempting to steal
	// from all primary caches because we want objects in the
	// victim cache to age out if at all possible.
	
	// 尝试从接管local的备用缓存中获取
	size = atomic.LoadUintptr(&p.victimSize)
	if uintptr(pid) >= size {
		return nil
	}
	locals = p.victim
	l := indexLocal(locals, pid)
	if x := l.private; x != nil {
		l.private = nil
		return x
	}
	
	for i := 0; i < int(size); i++ {
		l := indexLocal(locals, (pid+i)%int(size))
		if x, _ := l.shared.popTail(); x != nil {
			return x
		}
	}

	// Mark the victim cache as empty for future gets don't bother
	// with it.
	atomic.StoreUintptr(&p.victimSize, 0) // 置零，不希望再次使用

	return nil
}

runtime 相关外部函数

// Implemented in runtime.
func runtime_registerPoolCleanup(cleanup func()) //注册GC时回收的函数
func runtime_procPin() int
func runtime_procUnpin()

// The below are implemented in runtime/internal/atomic and the
// compiler also knows to intrinsify the symbol we linkname into this
// package.

//go:linkname runtime_LoadAcquintptr runtime/internal/atomic.LoadAcquintptr
func runtime_LoadAcquintptr(ptr *uintptr) uintptr

//go:linkname runtime_StoreReluintptr runtime/internal/atomic.StoreReluintptr
func runtime_StoreReluintptr(ptr *uintptr, val uintptr) uintptr

init时注册Pool的GC策略

func init() {
	runtime_registerPoolCleanup(poolCleanup)
}

包内变量

allPoolsMu 负责保护allPools
allPools 是Pool的集合
oldPools 是临时接管allPools的集合

var (
	allPoolsMu Mutex

	// allPools is the set of pools that have non-empty primary
	// caches. Protected by either 1) allPoolsMu and pinning or 2)
	// STW.
	allPools []*Pool

	// oldPools is the set of pools that may have non-empty victim
	// caches. Protected by STW.
	oldPools []*Pool
)

poolCleanup GC策略

func poolCleanup() {
	// This function is called with the world stopped, at the beginning of a garbage collection.
	// It must not allocate and probably should not call any runtime functions.

	// Because the world is stopped, no pool user can be in a
	// pinned section (in effect, this has all Ps pinned).

	// Drop victim caches from all pools.
	for _, p := range oldPools { //置空旧池子的所有 “接盘侠”
		p.victim = nil
		p.victimSize = 0
	}

	// Move primary cache to victim cache.
	for _, p := range allPools { // 让接盘侠接盘新的local和size
		p.victim = p.local
		p.victimSize = p.localSize
		p.local = nil // 置空当前的local和size 等待下一轮put
		p.localSize = 0
	}

	// The pools with non-empty primary caches now have non-empty
	// victim caches and no pools have primary caches.
	oldPools, allPools = allPools, nil
}

以上代码来自 src/sync/pool.go

poolqueue，sync.Pool底层数据结构的实现

该文件是sync.Pool底层双端队列的实现，在双端队列的上层用双向链表作为包装。sync.Pool中的local数组中保存的是该双向链表的指针。
所以sync.Pool的数据结构为
数组->双向链表->双端队列

array (local, 长度为CPU逻辑核心数，GMP中每个P都有一个id)
-------------------------
|     |     |     |     |
-------------------------
   |
   | double linked list
   |
------                         --------------------------
|    | ----------------------->|         queue          |
------                         --------------------------
   |
   |
   |
------                         --------------------------
|    | ----------------------->|         queue          |
------                         --------------------------

以下代码来自于src/sync/poolqueue.go

// Copyright 2019 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

package sync

import (
	"sync/atomic"
	"unsafe"
)

// poolDequeue is a lock-free fixed-size single-producer,
// multi-consumer queue. The single producer can both push and pop
// from the head, and consumers can pop from the tail.
//
// It has the added feature that it nils out unused slots to avoid
// unnecessary retention of objects. This is important for sync.Pool,
// but not typically a property considered in the literature.
type poolDequeue struct {
	// headTail packs together a 32-bit head index and a 32-bit
	// tail index. Both are indexes into vals modulo len(vals)-1.
	//
	// tail = index of oldest data in queue
	// head = index of next slot to fill
	//
	// Slots in the range [tail, head) are owned by consumers.
	// A consumer continues to own a slot outside this range until
	// it nils the slot, at which point ownership passes to the
	// producer.
	//
	// The head index is stored in the most-significant bits so
	// that we can atomically add to it and the overflow is
	// harmless.
	headTail uint64

	// vals is a ring buffer of interface{} values stored in this
	// dequeue. The size of this must be a power of 2.
	//
	// vals[i].typ is nil if the slot is empty and non-nil
	// otherwise. A slot is still in use until *both* the tail
	// index has moved beyond it and typ has been set to nil. This
	// is set to nil atomically by the consumer and read
	// atomically by the producer.
	vals []eface
}

type eface struct {
	typ, val unsafe.Pointer
}

const dequeueBits = 32

// dequeueLimit is the maximum size of a poolDequeue.
//
// This must be at most (1<<dequeueBits)/2 because detecting fullness
// depends on wrapping around the ring buffer without wrapping around
// the index. We divide by 4 so this fits in an int on 32-bit.
const dequeueLimit = (1 << dequeueBits) / 4

// dequeueNil is used in poolDequeue to represent interface{}(nil).
// Since we use nil to represent empty slots, we need a sentinel value
// to represent nil.
type dequeueNil *struct{}

func (d *poolDequeue) unpack(ptrs uint64) (head, tail uint32) {
	const mask = 1<<dequeueBits - 1
	head = uint32((ptrs >> dequeueBits) & mask)
	tail = uint32(ptrs & mask)
	return
}

func (d *poolDequeue) pack(head, tail uint32) uint64 {
	const mask = 1<<dequeueBits - 1
	return (uint64(head) << dequeueBits) |
		uint64(tail&mask)
}

// pushHead adds val at the head of the queue. It returns false if the
// queue is full. It must only be called by a single producer.
func (d *poolDequeue) pushHead(val any) bool {
	ptrs := atomic.LoadUint64(&d.headTail)
	head, tail := d.unpack(ptrs)
	if (tail+uint32(len(d.vals)))&(1<<dequeueBits-1) == head {
		// Queue is full.
		return false
	}
	slot := &d.vals[head&uint32(len(d.vals)-1)]

	// Check if the head slot has been released by popTail.
	typ := atomic.LoadPointer(&slot.typ)
	if typ != nil {
		// Another goroutine is still cleaning up the tail, so
		// the queue is actually still full.
		return false
	}

	// The head slot is free, so we own it.
	if val == nil {
		val = dequeueNil(nil)
	}
	*(*any)(unsafe.Pointer(slot)) = val

	// Increment head. This passes ownership of slot to popTail
	// and acts as a store barrier for writing the slot.
	atomic.AddUint64(&d.headTail, 1<<dequeueBits)
	return true
}

// popHead removes and returns the element at the head of the queue.
// It returns false if the queue is empty. It must only be called by a
// single producer.
func (d *poolDequeue) popHead() (any, bool) {
	var slot *eface
	for {
		ptrs := atomic.LoadUint64(&d.headTail)
		head, tail := d.unpack(ptrs)
		if tail == head {
			// Queue is empty.
			return nil, false
		}

		// Confirm tail and decrement head. We do this before
		// reading the value to take back ownership of this
		// slot.
		head--
		ptrs2 := d.pack(head, tail)
		if atomic.CompareAndSwapUint64(&d.headTail, ptrs, ptrs2) {
			// We successfully took back slot.
			slot = &d.vals[head&uint32(len(d.vals)-1)]
			break
		}
	}

	val := *(*any)(unsafe.Pointer(slot))
	if val == dequeueNil(nil) {
		val = nil
	}
	// Zero the slot. Unlike popTail, this isn't racing with
	// pushHead, so we don't need to be careful here.
	*slot = eface{}
	return val, true
}

// popTail removes and returns the element at the tail of the queue.
// It returns false if the queue is empty. It may be called by any
// number of consumers.
func (d *poolDequeue) popTail() (any, bool) {
	var slot *eface
	for {
		ptrs := atomic.LoadUint64(&d.headTail)
		head, tail := d.unpack(ptrs)
		if tail == head {
			// Queue is empty.
			return nil, false
		}

		// Confirm head and tail (for our speculative check
		// above) and increment tail. If this succeeds, then
		// we own the slot at tail.
		ptrs2 := d.pack(head, tail+1)
		if atomic.CompareAndSwapUint64(&d.headTail, ptrs, ptrs2) {
			// Success.
			slot = &d.vals[tail&uint32(len(d.vals)-1)]
			break
		}
	}

	// We now own slot.
	val := *(*any)(unsafe.Pointer(slot))
	if val == dequeueNil(nil) {
		val = nil
	}

	// Tell pushHead that we're done with this slot. Zeroing the
	// slot is also important so we don't leave behind references
	// that could keep this object live longer than necessary.
	//
	// We write to val first and then publish that we're done with
	// this slot by atomically writing to typ.
	slot.val = nil
	atomic.StorePointer(&slot.typ, nil)
	// At this point pushHead owns the slot.

	return val, true
}

// poolChain is a dynamically-sized version of poolDequeue.
//
// This is implemented as a doubly-linked list queue of poolDequeues
// where each dequeue is double the size of the previous one. Once a
// dequeue fills up, this allocates a new one and only ever pushes to
// the latest dequeue. Pops happen from the other end of the list and
// once a dequeue is exhausted, it gets removed from the list.
type poolChain struct {
	// head is the poolDequeue to push to. This is only accessed
	// by the producer, so doesn't need to be synchronized.
	head *poolChainElt

	// tail is the poolDequeue to popTail from. This is accessed
	// by consumers, so reads and writes must be atomic.
	tail *poolChainElt
}

type poolChainElt struct {
	poolDequeue

	// next and prev link to the adjacent poolChainElts in this
	// poolChain.
	//
	// next is written atomically by the producer and read
	// atomically by the consumer. It only transitions from nil to
	// non-nil.
	//
	// prev is written atomically by the consumer and read
	// atomically by the producer. It only transitions from
	// non-nil to nil.
	next, prev *poolChainElt
}

func storePoolChainElt(pp **poolChainElt, v *poolChainElt) {
	atomic.StorePointer((*unsafe.Pointer)(unsafe.Pointer(pp)), unsafe.Pointer(v))
}

func loadPoolChainElt(pp **poolChainElt) *poolChainElt {
	return (*poolChainElt)(atomic.LoadPointer((*unsafe.Pointer)(unsafe.Pointer(pp))))
}

func (c *poolChain) pushHead(val any) {
	d := c.head
	if d == nil {
		// Initialize the chain.
		const initSize = 8 // Must be a power of 2
		d = new(poolChainElt)
		d.vals = make([]eface, initSize)
		c.head = d
		storePoolChainElt(&c.tail, d)
	}

	if d.pushHead(val) {
		return
	}

	// The current dequeue is full. Allocate a new one of twice
	// the size.
	newSize := len(d.vals) * 2
	if newSize >= dequeueLimit {
		// Can't make it any bigger.
		newSize = dequeueLimit
	}

	d2 := &poolChainElt{prev: d}
	d2.vals = make([]eface, newSize)
	c.head = d2
	storePoolChainElt(&d.next, d2)
	d2.pushHead(val)
}

func (c *poolChain) popHead() (any, bool) {
	d := c.head
	for d != nil {
		if val, ok := d.popHead(); ok {
			return val, ok
		}
		// There may still be unconsumed elements in the
		// previous dequeue, so try backing up.
		d = loadPoolChainElt(&d.prev)
	}
	return nil, false
}

func (c *poolChain) popTail() (any, bool) {
	d := loadPoolChainElt(&c.tail)
	if d == nil {
		return nil, false
	}

	for {
		// It's important that we load the next pointer
		// *before* popping the tail. In general, d may be
		// transiently empty, but if next is non-nil before
		// the pop and the pop fails, then d is permanently
		// empty, which is the only condition under which it's
		// safe to drop d from the chain.
		d2 := loadPoolChainElt(&d.next)

		if val, ok := d.popTail(); ok {
			return val, ok
		}

		if d2 == nil {
			// This is the only dequeue. It's empty right
			// now, but could be pushed to in the future.
			return nil, false
		}

		// The tail of the chain has been drained, so move on
		// to the next dequeue. Try to drop it from the chain
		// so the next pop doesn't have to look at the empty
		// dequeue again.
		if atomic.CompareAndSwapPointer((*unsafe.Pointer)(unsafe.Pointer(&c.tail)), unsafe.Pointer(d), unsafe.Pointer(d2)) {
			// We won the race. Clear the prev pointer so
			// the garbage collector can collect the empty
			// dequeue and so popHead doesn't back up
			// further than necessary.
			storePoolChainElt(&d2.prev, nil)
		}
		d = d2
	}
}