go1.24 通过汇编深入学习map引入swiss table后的源码

本文链接：https://blog.csdn.net/weixin_38597669/article/details/147993577

文章目录

概要

在很久前分析过go map的源码，但在go1.24引入了swiss table来提高map性能，相较以前的map实现是完全不同的，特此记录其原理。

环境：Centos Linux 7 ，CPU AMD x86_64，Go version 1.24

一、前置知识

1:go 通过汇编分析栈布局和函数栈帧
 2:x86系列CPU寄存器和汇编指令总结
 3:cpu多级缓存

1.1、传统哈希表

哈希表是按关键词编址的技术，它提供了关键词key到对应value的映射。哈希表的核心是hash函数和冲突消解方法。hash函数本章不做讨论，我们一起看下冲突消解方法：拉链法和开地址法，二者各有优缺点。

冲突消解方法	优点	缺点
拉链法	1：实现，没有那么多边界条件需要考虑 2：冲突链表过长可以把单链表转化为搜索树（红黑树）避免性能退化严重负载可以突破1，相比之下扩容次数会变少	1：缓存不友好，需要多存储下一个冲突节点的指针 2：链表内存不连续，是松散的，无法充分的利用CPU 多级缓存
线性探测法	1：缓存友好，不需要额外存储数据 2：内存连续，是紧凑的，能充分的利用CPU 多级缓存	1：实现复杂，需要考虑slot的状态，有元素、空、被删除 2：冲突是连锁式的，会造成后续元素的操作变慢，这一点是最难以忍受的 3：负载只能<=1，相比之下更容易发生扩容

理想条件下拉链法和开地址法时间复杂度都是O(1),但发生极端冲突时，拉链法退化到O(lg N),开地址法退化到O(N)。

从过上述对比就能知道为什么多数hash冲突会通过拉链法解决了，如redis的hash类型(单链表)，go1.24之前的map(块状链表)都是通过拉链法来解决冲突的。

ps:扩容时会极大的降低哈希表性能。

1.2、swiss table

人们在追寻更高效哈希表的方向有两个：

改进hash函数，经过其计算的结果能减少甚至不出现冲突；
改进hash表数据结构，在发生冲突时更快的定位或对缓存更友好，在资源消耗和性能提升之间寻找更好的平衡点。
显而易见，swiss table属于后者。

Google工程师Matt Kulukundis在2017年CppCon大会上介绍了swiss table，一种在使用性能上可以远超 std::unordered_map 的哈希表，是google 在最佳工程实践中提炼出的一种优秀哈希表设计。在知名C++开源库中abseil-cpp已有实现。

1：abseil-cpp官网介绍swiss table设计方案
2：abseil-cpp swiss table源码

swiss table是从数据结构上对线性探测法的改进，原理比较简单，数据结构示意图如下：
swiss table结构示意图

enum class ctrl_t : int8_t { //swiss table 的ctrl bytes 的slot状态
  kEmpty = -128,   // 0b10000000 //表示对应slot为空
  kDeleted = -2,   // 0b11111110 //表示对应slot为已删除状态
  kSentinel = -1,  // 0b11111111  //表示ctrl bytes 结束标志位
  //另外ctrl byte等于  0b0??????? 时，表示对应slot有数据
};
struct HeapPtrs { //swiss table 结构体
  explicit HeapPtrs(uninitialized_tag_t) {}
  explicit HeapPtrs(ctrl_t* c) : control(c) {}
  // The control bytes (and, also, a pointer near to the base of the backing
  // array).
  //
  // This contains `capacity + 1 + NumClonedBytes()` entries, even
  // when the table is empty (hence EmptyGroup).
  //
  // Note that growth_info is stored immediately before this pointer.
  // May be uninitialized for SOO tables.
  ctrl_t* control;

  // The beginning of the slots, located at `SlotOffset()` bytes after
  // `control`. May be uninitialized for empty tables.
  // Note: we can't use `slots` because Qt defines "slots" as a macro.
  MaybeInitializedPtr slot_array;
};

为什么当ctrl byte等于 0b0??? 时，就能表示对应slot有数据呢？这就不得不说swiss table 对hash函数计算结果的分段使用了。
swiss table hash值分段使用

到这里，我们就对swiss table有了一个较为清晰的轮廓了。要说明的一点是，H1决定从ctrl bytes中的第几位开始搜索，然后以group大小进行并行搜索。也就是说group在swiss table只是一个逻辑上的概念，其并没有固定的起始位置，两个key（A和B）的group在某个时刻很可能会交叉，即group A 范围是[2,17]，group B范围是[5,20]。

那么swiss table相比传统哈希表快，快在哪里呢（快，其实就是比较冲突时的解决方法，因为没有冲突都是O(1)了，没必要比较）？

与传统线性探测法比较：

所有的操作都基于ctrl bytes控制内容，在探测时一次可以比较group大小个控制字节，而传统线性探测法只能按序比较键值对；

这是swiss table高效的精髓所在，假设此时group大小是16，先通过hash函数得到key的H1和H2，从ctrl bytes的第H1字节开始，一次探测16字节，得到等于H2的有哪几个字节，再取对应slot的key进行比较，没有就比较下一个group。通过双层过滤，先一次获得多个等于H2的slot，再比较slot的key，相比传统线性探测法一个个比较高效多了。

每个slot用ctrl bytes里的1byte代替，使得内存更紧凑，进一步提高CPU 缓存命中率；
另外比较ctrl bytes group时，在支持SIMD指令集的CPU上，使用SIMD指令集进行比较，进一步提高效率；
进行冲突探测时，以group为单位进行探测，大大降低了传统线性探测法冲突连锁反应的影响。

通过额外增加ctrl bytes控制内容这一点内存，换取性能的巨大提升，完全是值得的。

与传统拉链法比较：

传统拉链法相比传统线性探测法的最大优势就是没有冲突连锁反应，而swiss table 通过ctrl bytes实现以group为单位进行并行探测，大大降低了冲突连锁反应的影响，再加上内存非常友好，非极端情况下性能普遍优于拉链法。Google 在cpp中 swiss table的实现（如flat_hash_map）在多线程高负载场景下，冲突处理吞吐量达到std::unordered_map（拉链法）的 3~5 倍,也验证了swiss table冲突连锁控制的有效性。

二、go swiss map原理

go1.24之前map源码分析。
go1.24开始map源码位置有所调整：
go1.24之前map源码-map_noswiss.go；
go1.24及其以后map源码-map_swiss.go。

字节工程师zhangyunhao在2022年向go项目组提出使用google swiss table重新实现Go map，2025年2月go1.24正式发布，map进行了升级，引入了swiss table。字节工程师zhangyunhao的gomapbench repo提供了对map的性能基准测试代码，可以观察下其在go1.23和1.24的结果。
相比go1.24之前的map实现，其：

在大规模map或查询不存在元素时性能提升20%-50%;
插入和删除性能提升20%-50%;
内存上减少了0%-25%的消耗，固定大小的map不在产生额外的内存碎片。

2.1、map元数据

//https://github.com/golang/go/blob/go1.24.3/src/internal/abi/map_swiss.go
const (
	// Number of bits in the group.slot count.
	SwissMapGroupSlotsBits = 3
	// Number of slots in a group.
	SwissMapGroupSlots = 1 << SwissMapGroupSlotsBits // 一个group 存8个slot
)
type SwissMapType struct {
	Type
	Key   *Type //map的key类型
	Elem  *Type //map的value类型
	Group *Type // group 类型
	Hasher    func(unsafe.Pointer, uintptr) uintptr// function for hashing keys (ptr to key, seed) -> hash
	GroupSize uintptr // group大小，等于8*SlotSize + 8bytes(ctrl bytes)
	SlotSize  uintptr // slot大小，即一个键值对的大小
	ElemOff   uintptr // offset of elem in key/elem slot
	Flags     uint32
}

//https://github.com/golang/go/blob/go1.24.3/src/internal/runtime/maps/group.go
const (
	// Maximum load factor prior to growing.
	// 7/8 is the same load factor used by Abseil, but Abseil defaults to
	// 16 slots per group, so they get two empty slots vs our one empty
	// slot. We may want to reevaluate if this is best for us.
	maxAvgGroupLoad = 7 //负载因子7/8,表示平均每个group存储7个slot,就要进行扩容了

	ctrlEmpty   ctrl = 0b10000000 //表示slot为空
	ctrlDeleted ctrl = 0b11111110 //表示slot为已删除状态
  //下面是为了通过ctrl bytes快速计算出slot状态特殊设置的常量
	bitsetLSB     = 0x0101010101010101
	bitsetMSB     = 0x8080808080808080
	bitsetEmpty   = bitsetLSB * uint64(ctrlEmpty)
	bitsetDeleted = bitsetLSB * uint64(ctrlDeleted)
)
type ctrl uint8 //1 ctrl byte
type ctrlGroup uint64 //每个group 的 ctrl bytes，8字节，这一点与google cpp的swiss table不同，go是每个group维护自己的ctrl bytes
// A group holds abi.SwissMapGroupSlots slots (key/elem pairs) plus their
// control word.
type groupReference struct {//用于临时处理一个group数据的结构体，里面给出了group、slot的布局
	// data points to the group, which is described by typ.Group and has layout:
	// type group struct { //group的真实布局
	// 	ctrls ctrlGroup
	// 	slots [abi.SwissMapGroupSlots]slot
	// }
	// type slot struct {//键值对的真实布局 
	// 	key  typ.Key
	// 	elem typ.Elem
	// }
	data unsafe.Pointer // data *typ.Group
}
const (
	ctrlGroupsSize   = unsafe.Sizeof(ctrlGroup(0)) //group 的 ctrl bytes 为8字节
	groupSlotsOffset = ctrlGroupsSize //定位slot时用，向右移8字节
)
type groupsReference struct { //用于table承接group数组的一个结构体
	// data points to an array of groups. See groupReference above for the
	// definition of group.
	data unsafe.Pointer // data *[length]typ.Group
	// lengthMask is the number of groups in data minus one (note that
	// length must be a power of two). This allows computing i%length quickly using bitwise AND.
	lengthMask uint64 //掩码，等于len(data)-1,便于通过H1&lengthMask位运算（得到data数组索引下标）快速定位group在data中的位置，也就是找到key理论上所在的group，冲突的话要遍历其后面的group
}

//https://github.com/golang/go/blob/go1.24.3/src/internal/runtime/maps/table.go
const maxTableCapacity = 1024 //每个table包含1024个slot
type table struct {
	// The number of filled slots (i.e. the number of elements in the table).
	used uint16 //table中的slot(键值对)的个数
	// The total number of slots (always 2^N). Equal to `(groups.lengthMask+1)*abi.SwissMapGroupSlots`.
	capacity uint16 //table的容量，go table并不是初始时就申请1024个slot的容量，而是按需申请
	// The number of slots we can still fill without needing to rehash.
	// We rehash when used + tombstones > loadFactor*capacity, including
	// tombstones so the table doesn't overfill with tombstones. This field
	// counts down remaining empty slots before the next rehash.
	growthLeft uint16 //表示还有多少可被插入的slot(这里考虑到了负载因子，也就是说如果有8个空的slot,那么growthLeft等于8*7/8=7)
	// The number of bits used by directory lookups above this table. Note
	// that this may be less then globalDepth, if the directory has grown
	// but this table has not yet been split.
	localDepth uint8//table扩容分裂成两个table时该值会+1，与m.globalDepth比较，判断m.dirPtr是否也需要扩容
	// Index of this table in the Map directory. This is the index of the
	// _first_ location in the directory. The table may occur in multiple sequential indicies.
	// index is -1 if the table is stale (no longer installed in the directory).
	index int //当前table第一个在map中table数组中的索引下标，扩容会使其发生变化
	// groups is an array of slot groups. Each group holds abi.SwissMapGroupSlots
	// key/elem slots and their control bytes. A table has a fixed size
	// groups array. The table is replaced (in rehash) when more space is required.
	groups groupsReference 
}

//https://github.com/golang/go/blob/go1.24.3/src/internal/runtime/maps/map.go
type Map struct {
	// The number of filled slots (i.e. the number of elements in all tables). Excludes deleted slots.
	// Must be first (known by the compiler, for len() builtin).
	used uint64//map的slot个数，即len(map)的结果
	// seed is the hash seed, computed as a unique random number per map.
	seed uintptr //随机种子
	// The directory of tables.
	//
	// Normally dirPtr points to an array of table pointers
	//
	// dirPtr *[dirLen]*table
	//
	// The length (dirLen) of this array is `1 << globalDepth`. Multiple
	// entries may point to the same table. See top-level comment for more
	// details.
	//
	// Small map optimization: if the map always contained
	// abi.SwissMapGroupSlots or fewer entries, it fits entirely in a
	// single group. In that case dirPtr points directly to a single group.
	//
	// dirPtr *group
	//
	// In this case, dirLen is 0. used counts the number of used slots in
	// the group. Note that small maps never have deleted slots (as there
	// is no probe sequence to maintain).
	dirPtr unsafe.Pointer //当slot个数>8时，其本质是table 数组，当slot个数<=8时,其本质是*typ.Group（即直接指向了一个group），此时dirLen=0
	dirLen int //table个数
	// The number of bits to use in table directory lookups.
	globalDepth uint8 //表示map table数组扩容次数，会按2*dirLen进行扩容
	// The number of bits to shift out of the hash for directory lookups. 
	globalShift uint8 //64位下等于64-globalDepth，32位下等于32-globalDepth,辅助定位key在哪个table
	// writing is a flag that is toggled (XOR 1) while the map is being
	// written. Normally it is set to 1 when writing, but if there are
	// multiple concurrent writers, then toggling increases the probability
	// that both sides will detect the race.
	writing uint8 //1表示正在写
	// clearSeq is a sequence counter of calls to Clear. It is used to detect map clears during iteration.
	clearSeq uint64 //clear(map) 次数，迭代时用
}

通过对go swiss map 元数据的了解，我们可以得到其结构示意图：
go swiss map结构示意图
通过go map结构示意图与google cpp map结构示意图的对比，差异还是不小的：

google cpp map就是一个swiss table，而go map是swiss table数组，每个table仅存1024个slot。go这种处理方式可以降低扩容对map性能的影响，仅影响正在扩容的table。
google cpp map的group是一个概念，H1定位到的是某个slot，而go map的group是具体实例，H1定位到的是某个group。go这种方式通过具象化的group可以给等于H1的key预留8个slot，进一步降低线性探测法冲突连锁反应的影响，但可能多消耗些内存。

2.2、初始化

//https://github.com/golang/go/blob/master/src/internal/runtime/maps/map.go

func NewMap(mt *abi.SwissMapType, hint uintptr, m *Map, maxAlloc uintptr) *Map {
	if m == nil {
		m = new(Map)
	}
	m.seed = uintptr(rand())
	if hint <= abi.SwissMapGroupSlots {
		return m //初始化容量<=8时，直接返回new(Map)，等插入时再初始化m.dirPtr 字段值
	}
	// Full size map.
	// Set initial capacity to hold hint entries without growing in the average case.
	targetCapacity := (hint * abi.SwissMapGroupSlots) / maxAvgGroupLoad  //在负载因子是7/8的情况下，存hint个slot需要多少容量
	if targetCapacity < hint { // overflow
		return m // return an empty map. 理论上不会走到这里
	}
	dirSize := (uint64(targetCapacity) + maxTableCapacity - 1) / maxTableCapacity//每个table存1024个slot,计算申请targetCapacity个slot需要多少table
	dirSize, overflow := alignUpPow2(dirSize) //table的个数一定要是2的指数
	if overflow || dirSize > uint64(math.MaxUintptr) {
		return m // return an empty map.
	}

	//hint过大，则不预先分配内容直接返回m，后面慢慢扩容
	groups, overflow := math.MulUintptr(uintptr(dirSize), maxTableCapacity)
	if overflow {
		return m // return an empty map.
	} else {
		mem, overflow := math.MulUintptr(groups, mt.GroupSize)
		if overflow || mem > maxAlloc {
			return m // return an empty map.
		}
	}
	m.globalDepth = uint8(sys.TrailingZeros64(dirSize))//dirSize 二进制下末尾连续为0的位数
	m.globalShift = depthToShift(m.globalDepth)
	directory := make([]*table, dirSize)
	for i := range directory {
		directory[i] = newTable(mt, uint64(targetCapacity)/dirSize, i, m.globalDepth)//初始化每个table
	}
	m.dirPtr = unsafe.Pointer(&directory[0])
	m.dirLen = len(directory)
	return m
}
//https://github.com/golang/go/blob/go1.24.3/src/internal/runtime/maps/table.go
func newTable(typ *abi.SwissMapType, capacity uint64, index int, localDepth uint8) *table {
	if capacity < abi.SwissMapGroupSlots {
		capacity = abi.SwissMapGroupSlots
	}
	t := &table{
		index:      index,
		localDepth: localDepth,
	}
	if capacity > maxTableCapacity {
		panic("initial table capacity too large")
	}
	// N.B. group count must be a power of two for probeSeq to visit every group.
	capacity, overflow := alignUpPow2(capacity)
	if overflow {
		panic("rounded-up capacity overflows uint64")
	}
	t.reset(typ, uint16(capacity))
	return t
}
// reset resets the table with new, empty groups with the specified new total capacity.
func (t *table) reset(typ *abi.SwissMapType, capacity uint16) {
	groupCount := uint64(capacity) / abi.SwissMapGroupSlots//需要group的数量
	t.groups = newGroups(typ, groupCount)//申请group
	t.capacity = capacity
	t.resetGrowthLeft()
	for i := uint64(0); i <= t.groups.lengthMask; i++ {
		g := t.groups.group(typ, i)
		g.ctrls().setEmpty()//将每个group的ctrl bytes 全部置为 空状态
	}
}
//https://github.com/golang/go/blob/go1.24.3/src/internal/runtime/maps/group.go
// newGroups allocates a new array of length groups.
func newGroups(typ *abi.SwissMapType, length uint64) groupsReference {
	return groupsReference{
		data:       newarray(typ.Group, int(length)),//申请连续length个typ.Group的内存
		lengthMask: length - 1,
	}
}

初始化还是很简单的，就是根据预设的常量，每个table最多存1024个slot，每个group最多存8个slot，负载因为7/8等得到存hint个键值对需要多少个table，每个table多少group，并将每个group的ctrl bytes 全部置为空状态。

2.2、查询

//https://github.com/golang/go/blob/go1.24.3/src/internal/runtime/maps/runtime_swiss.go
func runtime_mapaccess1(typ *abi.SwissMapType, m *Map, key unsafe.Pointer) unsafe.Pointer {
  //...省略
	if m == nil || m.Used() == 0 {//未初始化的map或没有键值对的map直接返回nil
		if err := mapKeyError(typ, key); err != nil {
			panic(err) // see issue 23734
		}
		return unsafe.Pointer(&zeroVal[0])
	}

	if m.writing != 0 {//正在被写的话，此时读直接panic
		fatal("concurrent map read and map write")
	}
	hash := typ.Hasher(key, m.seed)//计算当前key的hash值
	if m.dirLen <= 0 {//map 初始化 slot数<=8 或过大， 会出现m.dirLen <= 0
		_, elem, ok := m.getWithKeySmall(typ, hash, key)//特殊处理，
		if !ok {
			return unsafe.Pointer(&zeroVal[0])
		}
		return elem
	}
	// Select table.
	idx := m.directoryIndex(hash)//通过hash值计算出理论上key所在table的位置
	t := m.directoryAt(idx) //通过数组下标拿到table
	// Probe table. 开始探测这个swiss table
	//h1(hash) 即取hash值高57位，得到H1
	seq := makeProbeSeq(h1(hash), t.groups.lengthMask)//通过H1定位该从哪个group开始探测
	for ; ; seq = seq.next() {
		g := t.groups.group(typ, seq.offset)//取当前group,即groupReference的实例
		//h2(hash) 即取hash值低7位，得到H2
		match := g.ctrls().matchH2(h2(hash))//【这是精髓所在】通过该group的ctrl bytes得到哪几个slot的ctrl byte值等于H2,然后遍历这几个slot即可
		for match != 0 {//这里遍历的只是该group中ctrl byte等于H2的slot,不等于的在获取match值时已经被过滤掉了
			i := match.first()//取第一个ctrl byte值等于H2的slot的数组下标
			slotKey := g.key(typ, i) //取该slot key
			slotKeyOrig := slotKey
			if typ.IndirectKey() {
				slotKey = *((*unsafe.Pointer)(slotKey))
			}
			if typ.Key.Equal(key, slotKey) { //对比该slot key与要查询的key是否一致
				slotElem := unsafe.Pointer(uintptr(slotKeyOrig) + typ.ElemOff)
				if typ.IndirectElem() {
					slotElem = *((*unsafe.Pointer)(slotElem))
				}
				return slotElem //一致直接返回该slot的value
			}
			match = match.removeFirst() //不一致则直接将该slot的数组下标从match中移除，这样下一个符合的slot的数组下标就成为match的头部了
		}
		match = g.ctrls().matchEmpty()//如果还没找到，查看该group crl bytes是否全是空状态
		if match != 0 {//是，则表示所查key在map中不存在
			// Finding an empty slot means we've reached the end of
			// the probe sequence.
			return unsafe.Pointer(&zeroVal[0])
		}
	}
}

计算所查key的hash值；
通过hash值定位到所在的swiss table；
通过hash值H1定位到所查key理论上在table的第几个group，开始遍历；
通过hash值H2和group ctrl bytes 快速得出该group 可能含有key的slot（这是精髓所在，直接定位哪几个slot的ctrl byte等于H2,不等于的就不用遍历了）；
遍历这些slot，对比key，有等于则找到，没有则第6步；
查看group ctrl bytes是否全为空状态，否则回到第4步，是则表示key不存在。

对于go1.24之前map 的查找，在冲突遍历时，先遍历单链表，再遍历链表节点的bmap中的8个slot，只能依次遍历，做不到像swiss map那样先通过tophash探测下这8个slot中是否可能包含所查key，可能的话直接给出具体哪几个slot，而不是8个都遍历，不可能就直接跳过了。
下面是go1.24之前map结构示意图：
go map1.24之前map结构示意图

2.3、插入和删除

插入和删除都是要先查找再插或删，所以流程与查找差不多。
本章只说一下插入的源码，删除的和查找源码高度一致，唯一不同的是找到key后，将slot中的key和value置为nil，并将slot对应ctrl byte设为已删除。

//https://github.com/golang/go/blob/master/src/internal/runtime/maps/runtime_swiss.go
func runtime_mapassign(typ *abi.SwissMapType, m *Map, key unsafe.Pointer) unsafe.Pointer {
	if m == nil { //未初始化map不允许插入，否则panic
		panic(errNilAssign)
	}
	if m.writing != 0 {//不允许并发写
		fatal("concurrent map writes")
	}
	hash := typ.Hasher(key, m.seed)计算当前key的hash值
	// Set writing after calling Hasher, since Hasher may panic, in which case we have not actually done a write.
	m.writing ^= 1 // toggle, see comment on writing
	if m.dirPtr == nil {//map 初始化 slot数<=8 或过大， 会出现m.dirPtr == nil
		m.growToSmall(typ) //此时通过growToSmall 初始化下，初始化后m.dirPtr只是一个group
	}
	if m.dirLen == 0 {
		if m.used < abi.SwissMapGroupSlots {//比较键值对数量<8
			elem := m.putSlotSmall(typ, hash, key)//成立的时候
			if m.writing == 0 {
				fatal("concurrent map writes")
			}
			m.writing ^= 1
			return elem
		}
		m.growToTable(typ)//否则扩容，将m.dirPtr有一个group升级为table数组
	}
	var slotElem unsafe.Pointer
outer:
	for {
		idx := m.directoryIndex(hash)//通过hash值计算出理论上key所在table的位置
		t := m.directoryAt(idx)//通过数组下标拿到table
		seq := makeProbeSeq(h1(hash), t.groups.lengthMask)//通过该group的ctrl bytes得到哪几个slot的ctrl byte值等于H2,然后遍历这几个slot即可
		// As we look for a match, keep track of the first deleted slot we find,
		// which we'll use to insert the new entry if necessary.
		var firstDeletedGroup groupReference //查找key过程中，遇到的第一个存在空slot或已删除slot的group
		var firstDeletedSlot uintptr//查找key过程中，遇到的第一个存在空slot或已删除slot在group中的数组下标,如果key不存在，插入的键值对就放在这个位置了
		for ; ; seq = seq.next() {//这部分与查找一致，就不注释了
			g := t.groups.group(typ, seq.offset)
			match := g.ctrls().matchH2(h2(hash))
			for match != 0 {
				i := match.first()
				slotKey := g.key(typ, i)
				slotKeyOrig := slotKey
				if typ.IndirectKey() {
					slotKey = *((*unsafe.Pointer)(slotKey))
				}
				if typ.Key.Equal(key, slotKey) {
					if typ.NeedKeyUpdate() {
						typedmemmove(typ.Key, slotKey, key)
					}
					slotElem = unsafe.Pointer(uintptr(slotKeyOrig) + typ.ElemOff)
					if typ.IndirectElem() {
						slotElem = *((*unsafe.Pointer)(slotElem))//找到了就更新value的值
					}
					t.checkInvariants(typ, m)
					break outer
				}
				match = match.removeFirst()
			}
			// No existing slot for this key in this group. Is this the end of the probe sequence?
			match = g.ctrls().matchEmpty()//如果还没找到，查看该group crl bytes是否全是空状态
			if match != 0 {//是，则表示所查key在map中不存在
				// Finding an empty slot means we've reached the end of the probe sequence.
				var i uintptr
				// If we found a deleted slot along the way, we can replace it without consuming growthLeft.
				if firstDeletedGroup.data != nil {//如果之前已经遇到了合适的slot，则将键值对放在该slot中
					g = firstDeletedGroup
					i = firstDeletedSlot
					t.growthLeft++ // will be decremented below to become a no-op.
				} else {//没有遇到，则从取当前group的第一个slot
					i = match.first()
				}
				// If there is room left to grow, just insert the new entry.
				if t.growthLeft > 0 {//还有可用的slot,即没达到负载因子上限，就将键值对放在这个位置了，否则要扩容
					slotKey := g.key(typ, i)
					slotKeyOrig := slotKey
					if typ.IndirectKey() {
						kmem := newobject(typ.Key)
						*(*unsafe.Pointer)(slotKey) = kmem
						slotKey = kmem
					}
					typedmemmove(typ.Key, slotKey, key)
					slotElem = unsafe.Pointer(uintptr(slotKeyOrig) + typ.ElemOff)
					if typ.IndirectElem() {
						emem := newobject(typ.Elem)
						*(*unsafe.Pointer)(slotElem) = emem
						slotElem = emem
					}
					g.ctrls().set(i, ctrl(h2(hash)))//设置slot对应ctrl byte为H2
					t.growthLeft--
					t.used++ //table 键值对数量+1
					m.used++ //map键值对数量+1
					t.checkInvariants(typ, m)
					break outer
				}
				t.rehash(typ, m) //扩容
				continue outer//扩容完了回到outer，继续找适合key的slot
			}
			// No empty slots in this group. Check for a deleted slot,
			// which we'll use if we don't find a match later in the probe sequence. 
			// We only need to remember a single deleted slot.
			if firstDeletedGroup.data == nil {//判断firstDeletedGroup.data == nil
				// Since we already checked for empty slots above, matches here must be deleted slots.
				match = g.ctrls().matchEmptyOrDeleted()//如果还没找到，查看该group crl bytes是否有空或已删除状态
				if match != 0 {//有的话则留下，以待key不存在的情况下使用
					firstDeletedGroup = g
					firstDeletedSlot = match.first()
				}
			}
		}
	}
	if m.writing == 0 {//最后再检查下并发写
		fatal("concurrent map writes")
	}
	m.writing ^= 1
	return slotElem //有了适合插入的slot,返回value的地址，由外部赋值
}

计算所查key的hash值；
通过hash值定位到所在的swiss table；
通过hash值H1定位到所查key理论上在table的第几个group，开始遍历；
通过hash值H2和group ctrl bytes 快速得出该group 可能含有key的slot；
遍历这些slot，对比key，相等则找到，没有则第6步；
查看group ctrl bytes是否全为空状态，否则第7步，是第8步；
查看group ctrl bytes是否有为空或已删除的slot，有则记录group和slot，然后到第4步；
有通过第7步记录到group和slot，则将键值对插入到此位置，没有则判断t.growthLeft > 0，成立则取当前group第一个slot插入键值对，不成立则需要扩容，扩容后再回到第2步。

插入过程可取的就是在查找遍历过程中先记录有为空或已删除slot的group,后面确定key不存在则直接插入该位置，无需再次去找合适的slot了。

2.4、扩容

go swiss map 每个table的扩容是独立判定的，扩容过程也是独立的。扩容分为table内扩容和table级别扩容：
1：table内扩容是指table内group数量翻倍；
2：table级别扩容是指table 分裂为两个table。

//https://github.com/golang/go/blob/master/src/internal/runtime/maps/table.go
func (t *table) rehash(typ *abi.SwissMapType, m *Map) {
  newCapacity := 2 * t.capacity
	if newCapacity <= maxTableCapacity {
		t.grow(typ, m, newCapacity) //table内group数量翻倍
		return
	}
	t.split(typ, m)//当前table 分裂为两个table
}

table内group数量翻倍

func (t *table) grow(typ *abi.SwissMapType, m *Map, newCapacity uint16) {
	newTable := newTable(typ, uint64(newCapacity), t.index, t.localDepth) //申请一个新table
	if t.capacity > 0 {
		for i := uint64(0); i <= t.groups.lengthMask; i++ {//遍历旧table的group
			g := t.groups.group(typ, i)
			for j := uintptr(0); j < abi.SwissMapGroupSlots; j++ {//遍历group的slot
				if (g.ctrls().get(j) & ctrlEmpty) == ctrlEmpty {//空或已删除的slot跳过
					// Empty or deleted
					continue
				}
				key := g.key(typ, j)
				if typ.IndirectKey() {
					key = *((*unsafe.Pointer)(key))
				}
				elem := g.elem(typ, j)
				if typ.IndirectElem() {
					elem = *((*unsafe.Pointer)(elem))
				}
				hash := typ.Hasher(key, m.seed)
				newTable.uncheckedPutSlot(typ, hash, key, elem)//将旧table中的键值对依次插入到新table中
			}
		}
	}
	newTable.checkInvariants(typ, m)
	m.replaceTable(newTable) //新table替换旧table
	t.index = -1
}

当前table 分裂为两个table

func (t *table) split(typ *abi.SwissMapType, m *Map) {
	localDepth := t.localDepth
	localDepth++ //扩容次数+1
	// //申请两个新table
	left := newTable(typ, maxTableCapacity, -1, localDepth)
	right := newTable(typ, maxTableCapacity, -1, localDepth)
	// Split in half at the localDepth bit from the top.
	mask := localDepthMask(localDepth)
	for i := uint64(0); i <= t.groups.lengthMask; i++ {
		g := t.groups.group(typ, i)
		for j := uintptr(0); j < abi.SwissMapGroupSlots; j++ {
			if (g.ctrls().get(j) & ctrlEmpty) == ctrlEmpty {//空或已删除的slot跳过
				// Empty or deleted
				continue
			}
			key := g.key(typ, j)
			if typ.IndirectKey() {
				key = *((*unsafe.Pointer)(key))
			}
			elem := g.elem(typ, j)
			if typ.IndirectElem() {
				elem = *((*unsafe.Pointer)(elem))
			}
			hash := typ.Hasher(key, m.seed)
			var newTable *table
			//将旧table的键值对hash&掩码计算后插入到两个新table中
			if hash&mask == 0 {
				newTable = left 
			} else {
				newTable = right
			}
			newTable.uncheckedPutSlot(typ, hash, key, elem)
		}
	}
	m.installTableSplit(t, left, right)
	t.index = -1
}

func (m *Map) installTableSplit(old, left, right *table) {
	if old.localDepth == m.globalDepth {//相等时map table数组会进行2被扩容，
		// No room for another level in the directory. Grow the directory.
		newDir := make([]*table, m.dirLen*2) //
		for i := range m.dirLen {
			t := m.directoryAt(uintptr(i))
			newDir[2*i] = t  //紧邻的两个数组下标指向同一个table
			newDir[2*i+1] = t
			if t.index == i {//修正t.index值
				t.index = 2 * i
			}
		}
		m.globalDepth++ //map 的table数组扩容次数+1
		m.globalShift--
		m.dirPtr = unsafe.Pointer(&newDir[0])//m.directory = newDir
		m.dirLen = len(newDir)
	}
  //两个新table替换掉旧table
	left.index = old.index
	m.replaceTable(left)
	entries := 1 << (m.globalDepth - left.localDepth)
	right.index = left.index + entries
	m.replaceTable(right)
}

2.5、迭代

//https://github.com/golang/go/blob/go1.24.3/src/runtime/map_swiss.go
func mapIterStart(t *abi.SwissMapType, m *maps.Map, it *maps.Iter) {
	it.Init(t, m)
	it.Next()
}
// mapIterNext performs the next step of iteration. Afterwards, the next key/elem are in it.Key()/it.Elem().
func mapIterNext(it *maps.Iter) {
	it.Next()
}
//https://github.com/golang/go/blob/master/src/internal/runtime/maps/table.go
type Iter struct {
	key  unsafe.Pointer //当前key，如果为nil说明迭代结束  (see cmd/compile/internal/walk/range.go).
	elem unsafe.Pointer //当前value (see cmd/compile/internal/walk/range.go).
	typ  *abi.SwissMapType//swiss map类型信息
	m    *Map //当前迭代的map

	// Randomize iteration order by starting iteration at a random slot offset. 
	entryOffset uint64 //随机偏移量，作用于group数组及其内部的slot
	dirOffset   uint64//遍历table数组时的随机偏移量
	// Snapshot of Map.clearSeq at iteration initialization time. Used to detect clear during iteration.
	clearSeq uint64//Iter初始化时Map.clearSeq的值，主要用于判断迭代过程map是否调用了Clear()，不等于说明调用了，直接结束迭代
	// Value of Map.globalDepth during the last call to Next. Used to detect directory grow during iteration.
	globalDepth uint8//Iter初始化时Map.globalDepth的值,识别map table数组是否发生了扩容，这时候需要重新调整当前迭代的位置
	// dirIdx is the current directory index, prior to adjustment by dirOffset.
	dirIdx int//当前正在的迭代的table的索引下标，结合dirOffset偏移确定table
	// tab is the table at dirIdx during the previous call to Next.
	tab *table//当前正在的迭代的table
	// group is the group at entryIdx during the previous call to Next.
	group groupReference//当前正在迭代的group
	// entryIdx is the current entry index, prior to adjustment by entryOffset.
	// The lower 3 bits of the index are the slot index, and the upper bits
	// are the group index.
	entryIdx uint64//当前正在迭代的slot位置
}

可以看到go swiss map通过对table、group、slot在迭代开始时设置随机偏移量，使得每次遍历结果的键值对顺序也是不同的。
所以在迭代间隙插入的键值对有的迭代到有的迭代不到。

三、巧妙的位运算

在第二章2.2小节我们强调如下代码是swiss map的高性能的精髓所在，一起看看吧。

match := g.ctrls().matchH2(h2(hash))
for match != 0 {
	i := match.first()
	//...省略
	match = match.removeFirst()
}

下面是将matchH2函数摘出来做的test:

package swisstable

import (
	"fmt"
	"testing"
)

const (
	bitsetLSB  = 0x0101010101010101
	bitsetMSB  = 0x8080808080808080
	deBruijn64 = 0x03f79d71b4ca8b09

	ctrlEmpty   ctrl = 0b10000000
	ctrlDeleted ctrl = 0b11111110

	bitsetEmpty   = bitsetLSB * uint64(ctrlEmpty)
	bitsetDeleted = bitsetLSB * uint64(ctrlDeleted)
)

var deBruijn64tab = [64]byte{
	0, 1, 56, 2, 57, 49, 28, 3, 61, 58, 42, 50, 38, 29, 17, 4,
	62, 47, 59, 36, 45, 43, 51, 22, 53, 39, 33, 30, 24, 18, 12, 5,
	63, 55, 48, 27, 60, 41, 37, 16, 46, 35, 44, 21, 52, 32, 23, 11,
	54, 26, 40, 15, 34, 20, 31, 10, 25, 14, 19, 9, 13, 8, 7, 6,
}

type ctrl uint8
type ctrlGroup uint64
type bitset uint64
//【g.ctrls().matchH2】
func ctrlGroupMatchH(g ctrlGroup, h uintptr) bitset {
	v := bitsetLSB * uint64(h)
	fmt.Printf("%x\n", v) //0x303030303030303
	v = uint64(g) ^ v
	oldV := v
	fmt.Printf("%x\n", v) //3030300030201
	v = v - bitsetLSB
	fmt.Printf("%x\n", v) //ff020201ff020100
	v = v &^ oldV
	fmt.Printf("%x\n", v) //ff000000ff000100
	v = v & bitsetMSB
	fmt.Printf("%x\n", v) //8000000080000000
	return bitset(v)
}
func trailingZeros64(x uint64) int { //计算x值二进制末尾（低位）多少个连续的0，比如0x8000000080000000 是31个
	if x == 0 {
		return 64
	}
	return int(deBruijn64tab[(x&-x)*deBruijn64>>(64-6)])
}
// 【match.first()】
func bitsetFirst(b bitset) uintptr {
	return uintptr(trailingZeros64(uint64(b))) >> 3
}
//【match.removeFirst()】
func removeFirst(b bitset) bitset {
	return b & (b - 1)
}
func TestSwissMap(t *testing.T) {
	//假设一个key的H2=0x03,此时通过H1定位到的group的ctrl bytes为0x0300000003000102;
	//0x0300000003000102的值表示第一个slot的ctrl byte是0x02,以此类推。
	//可知第4和第8个slot可能是要找的slot。
	v := ctrlGroupMatchH(ctrlGroup(0x0300000003000102), 0x03)
	t.Logf("%x", v)              //0x8000000080000000
	t.Logf("%v", bitsetFirst(v)) //3  定位到第4个slot
	v = removeFirst(v)   
	t.Logf("%x", v)              //0x8000000000000000
	t.Logf("%v", bitsetFirst(v)) //7  定位到第8个slot
}

从上述代码中可以看到swiss map通过巧妙的位运算直接定位出一个group中ctrl byte等于H2的slot，不等于的slot就无需处理了。

四、SIMD指令集优化

环境：Centos Linux 7 ，CPU AMD x86_64，Go version 1.24

package main
func main() {
	swissTable()
}
func swissTable() {
	m := make(map[int]int, 100)//100保证初始化时m.dirPtr是table数组
	m[1] = 1 //插入时，能走到 match := g.ctrls().matchH2(h2(hash))
	m[2] = 2
}

我们通过dlv可以追踪到g.ctrls().matchH2的汇编指令：

[root@kdzl gofunc]# dlv debug  map.go
Type 'help' for list of commands.
(dlv)  b runtime_fast64_swiss.go:252
Breakpoint 1 set at 0x4067d6 for runtime.mapassign_fast64() /usr/local/go/src/internal/runtime/maps/runtime_fast64_swiss.go:252
(dlv) c
> [Breakpoint 1] runtime.mapassign_fast64() /usr/local/go/src/internal/runtime/maps/runtime_fast64_swiss.go:252 (hits goroutine(1):1 total:1) (PC: 0x4067d6)
Warning: debugging optimized function
   247:                 var firstDeletedGroup groupReference
   248:                 var firstDeletedSlot uintptr
   249:
   250:                 for ; ; seq = seq.next() {
   251:                         g := t.groups.group(typ, seq.offset)
=> 252:                         match := g.ctrls().matchH2(h2(hash))
   253:
   254:                         // Look for an existing slot containing this key.
   255:                         for match != 0 {
   256:                                 i := match.first()
   257:
(dlv) si
> runtime.mapassign_fast64() /usr/local/go/src/internal/runtime/maps/runtime_fast64_swiss.go:252 (PC: 0x4067de)
Warning: debugging optimized function
        runtime_fast64_swiss.go:251     0x4067be        4889d9                  mov rcx, rbx
        runtime_fast64_swiss.go:251     0x4067c1        488b9c24b0000000        mov rbx, qword ptr [rsp+0xb0]
        runtime_fast64_swiss.go:251     0x4067c9        e892d2ffff              call $internal/runtime/maps.(*groupsReference).group
        runtime_fast64_swiss.go:251     0x4067ce        4889842480000000        mov qword ptr [rsp+0x80], rax
        runtime_fast64_swiss.go:252     0x4067d6*       488d842480000000        lea rax, ptr [rsp+0x80]
=>      runtime_fast64_swiss.go:252     0x4067de        6690                    data16 nop
        runtime_fast64_swiss.go:252     0x4067e0        e8bbd1ffff              call $internal/runtime/maps.(*groupReference).ctrls
        runtime_fast64_swiss.go:252     0x4067e5        4889842498000000        mov qword ptr [rsp+0x98], rax
        runtime_fast64_swiss.go:252     0x4067ed        488b442458              mov rax, qword ptr [rsp+0x58]
        runtime_fast64_swiss.go:252     0x4067f2        e8a9d2ffff              call $internal/runtime/maps.h2
        runtime_fast64_swiss.go:252     0x4067f7        488b942498000000        mov rdx, qword ptr [rsp+0x98]
(dlv) ni
 #... //省略
 # 一致ni，直到runtime_fast64_swiss.go:252 0x406808 e893d0ffff call $internal/runtime/maps.ctrlGroup.matchH2位置
(dlv) si
> internal/runtime/maps.ctrlGroup.matchH2() /usr/local/go/src/internal/runtime/maps/group.go:148 (PC: 0x4038a0)
Warning: debugging optimized function
TEXT internal/runtime/maps.ctrlGroup.matchH2(SB) /usr/local/go/src/internal/runtime/maps/group.go
=>      group.go:148    0x4038a0        66480f6ec3      movq xmm0, rbx
        group.go:148    0x4038a5        660f60c0        punpcklbw xmm0, xmm0
        group.go:148    0x4038a9        f20f70c000      pshuflw xmm0, xmm0, 0x0
        group.go:148    0x4038ae        66480f6ec8      movq xmm1, rax
        group.go:148    0x4038b3        660f74c1        pcmpeqb xmm0, xmm1
        group.go:148    0x4038b7        660fd7c8        pmovmskb ecx, xmm0

我们就可以看到在x86-64下， g.ctrls().matchH2 函数在编译时会被替换为：

#RBX寄存器的内容是当前key hash值的H2(假设是0x03)，RAX寄存器的内容是group 的ctrl bytes(假设是0x0300000003000102)
group.go:148    0x4038a0        66480f6ec3      movq xmm0, rbx #将H2(仅8为)加载到xmm0低64位，高64位置零【0x00...0000000000000003】
group.go:148    0x4038a5        660f60c0        punpcklbw xmm0, xmm0#对低64位交错解包,执行后xmm0寄存器内容为【0x00...0000000000000303】
group.go:148    0x4038a9        f20f70c000      pshuflw xmm0, xmm0, 0x0#对xmm0低64位重排，执行后xmm0寄存器内容为【0x00...0303030303030303】
group.go:148    0x4038ae        66480f6ec8      movq xmm1, rax #将ctrl bytes加载到XMM1寄存器低64位
group.go:148    0x4038b3        660f74c1        pcmpeqb xmm0, xmm1#将xmm0和xmm1内容按字节分别比较，相等则对应字节置为1，否则0，则此时xmm0寄存器内容为【0x00...8000000080000000】，可以看到一样得到符合的slot的位置
group.go:148    0x4038b7        660fd7c8        pmovmskb ecx, xmm0#将比较结果的掩码转换为16位整数存入ecx寄存器

可以看到用的都是x86 SSE指令集的指令，其是x86下SIMD的具体实现，运算结果与ctrlGroupMatchH函数结果一致。
go源码中注释也说明在x86-64下会被SIMD instructions替换。

func (g ctrlGroup) matchH2(h uintptr) bitset {
	return ctrlGroupMatchH2(g, h)
}
// Portable implementation of matchH2.
// Note: On AMD64, this is an intrinsic implemented with SIMD instructions. See
// note on bitset about the packed instrinsified return value.
func ctrlGroupMatchH2(g ctrlGroup, h uintptr) bitset {
	v := uint64(g) ^ (bitsetLSB * uint64(h))
	return bitset(((v - bitsetLSB) &^ v) & bitsetMSB)
}

如果不替换，对ctrlGroupMatchH2函数用dlv追踪，可以得到如下汇编内容：

(dlv) disass
TEXT main.ctrlGroupMatchH2(SB) /home/gofunc/map.go #可以看到相比SIMD指令优化后仅6行指令，不优化cpu就需要多执行这么多指令，效率就慢了
        map.go:22       0x471360        55                      push rbp
        map.go:22       0x471361        4889e5                  mov rbp, rsp
=>      map.go:22       0x471364        4883ec10                sub rsp, 0x10
        map.go:22       0x471368        4889442420              mov qword ptr [rsp+0x20], rax
        map.go:22       0x47136d        48895c2428              mov qword ptr [rsp+0x28], rbx
        map.go:22       0x471372        48c7042400000000        mov qword ptr [rsp], 0x0
        map.go:23       0x47137a        48b90101010101010101    mov rcx, 0x101010101010101
        map.go:23       0x471384        480fafcb                imul rcx, rbx
        map.go:23       0x471388        4831c8                  xor rax, rcx
        map.go:23       0x47138b        4889442408              mov qword ptr [rsp+0x8], rax
        map.go:24       0x471390        48b9fffefefefefefefe    mov rcx, 0xfefefefefefefeff
        map.go:24       0x47139a        4801c1                  add rcx, rax
        map.go:24       0x47139d        48f7d0                  not rax
        map.go:24       0x4713a0        4821c8                  and rax, rcx
        map.go:24       0x4713a3        48b98080808080808080    mov rcx, 0x8080808080808080
        map.go:24       0x4713ad        4821c8                  and rax, rcx
        map.go:24       0x4713b0        48890424                mov qword ptr [rsp], rax
        map.go:24       0x4713b4        4883c410                add rsp, 0x10
        map.go:24       0x4713b8        5d                      pop rbp
        map.go:24       0x4713b9        c3                      ret