前言
map是语言中常用的一个数据结构,其在不同语言中有着不同的实现的,现在我们看下在go中的底层实现。
更多内容分享,欢迎关注公众号:Go开发笔记
map源码
map的底层实现的源码位于runtime/map.go中,其相关方法的调用在编译时确认。
map说明
简单来说,map就是一个哈希表。数据被排列成一个bucket数组。每个bucket最多包含8个key/elem对。哈希的低位用于选择一个bucket。每个bucket包含每个hash的一些高位,以区分单个bucket中的条目。
map的数据存储结构
map实际上就是hash表,其包含2B个bucket,每个bucket最多包含8个键值对。bucket内数据的存储格式为:存储bucket内key对应hash的topHash数组,然后是键值对。这些键值对的存储方式是:8个key然后才是key对应的值。key对的hash的低位表示对应的bucket,高位表示bucket内具体的位置。当前map的bucket数位2B,一个key对应的hash为hash,则其对应bucket的位置为hash&(2^B-1)。
- 每个key对应的位置为:bucket的位置+bmp的大小+keySize*key的位置index。
- 每个key的value对应的位置为:bucket的位置+bmp的大小+keySize*8+elemSize*elem的位置index
bmp中存储了长度为8的数组,用以存储key的hash的高位。因bmp是数组,查找需要依次遍历确认与指定的hash的高位是否一致,如果一致,会取出key进行比较,避免hash冲突。
扩容中,hash的seed并没有发生变化,因此key的hash也不会改变。但扩容可能会导致B变化,由于hash的低位与B才能确定所在的bucket,如果B变化,则对应bucket的位置也会发生变化。因此扩容后,需要复制数据到新的位置中。
其具体结构为hmap。
hmap
// A header for a Go map.
type hmap struct {
// Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
// Make sure this stays in sync with the compiler's definition.
count int // # live cells == size of map. Must be first (used by len() builtin)
flags uint8
B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
hash0 uint32 // hash seed
buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated)
extra *mapextra // optional fields
}
// mapextra holds fields that are not present on all maps.
type mapextra struct {
// If both key and elem do not contain pointers and are inline, then we mark bucket
// type as containing no pointers. This avoids scanning such maps.
// However, bmap.overflow is a pointer. In order to keep overflow buckets
// alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
// overflow and oldoverflow are only used if key and elem do not contain pointers.
// overflow contains overflow buckets for hmap.buckets.
// oldoverflow contains overflow buckets for hmap.oldbuckets.
// The indirection allows to store a pointer to the slice in hiter.
overflow *[]*bmap
oldoverflow *[]*bmap
// nextOverflow holds a pointer to a free overflow bucket.
nextOverflow *bmap
}
// A bucket for a Go map.
type bmap struct {
// tophash generally contains the top byte of the hash value
// for each key in this bucket. If tophash[0] < minTopHash,
// tophash[0] is a bucket evacuation state instead.
tophash [bucketCnt]uint8
// Followed by bucketCnt keys and then bucketCnt elems.
// NOTE: packing all the keys together and then all the elems together makes the
// code a bit more complicated than alternating key/elem/key/elem/... but it allows
// us to eliminate padding which would be needed for, e.g., map[int64]int8.
// Followed by an overflow pointer.
}
// A hash iteration structure.
// If you modify hiter, also change cmd/compile/internal/gc/reflect.go to indicate
// the layout of this structure.
type hiter struct {
key unsafe.Pointer // Must be in first position. Write nil to indicate iteration end (see cmd/internal/gc/range.go).
elem unsafe.Pointer // Must be in second position (see cmd/internal/gc/range.go).
t *maptype
h *hmap
buckets unsafe.Pointer // bucket ptr at hash_iter initialization time
bptr *bmap // current bucket
overflow *[]*bmap // keeps overflow buckets of hmap.buckets alive
oldoverflow *[]*bmap // keeps overflow buckets of hmap.oldbuckets alive
startBucket uintptr // bucket iteration started at
offset uint8 // intra-bucket offset to start from during iteration (should be big enough to hold bucketCnt-1)
wrapped bool // already wrapped around from end of bucket array to beginning
B uint8
i uint8
bucket uintptr
checkBucket uintptr
}
hmap中的fields代表的意思依次是:
type hmap struct {
count int // map的size
flags uint8 // 状态标志位
B uint8 // log2(buckets) (可以容纳最多loadFactor * 2^B项)
noverflow uint16 // overflow buckets的大约数
hash0 uint32 // hash seed
buckets unsafe.Pointer // 2^B Buckets的数组.如果count为0可能为nil
oldbuckets unsafe.Pointer // 前一个存储桶数组大小的一半,仅在增长时为非零
nevacuate uintptr // 排空进度计数器(小于此值的buckets已排空)
extra *mapextra // 可选的 fields
}
make
根据初始化map的size及平台,具体的初始化又分为以下几种:
makemap_small
适用于size小于等于8时使用。代码逻辑很简单,就是通过new一个hamp,然后初始化hash0.
// makemap_small implements Go map creation for make(map[k]v) and
// make(map[k]v, hint) when hint is known to be at most bucketCnt
// at compile time and the map needs to be allocated on the heap.
func makemap_small() *hmap {
h := new(hmap)
h.hash0 = fastrand()
return h
}
makemap64
默认使用,当32位使用时,size超出int32位范围时,会重置size为0,然后调用makemap。
func makemap64(t *maptype, hint int64, h *hmap) *hmap {
if int64(int(hint)) != hint {
hint = 0
}
return makemap(t, int(hint), h)
}
makemap
32位平台时用以代替makemap64,更快。
func makemap(t *maptype, hint int, h *hmap) *hmap {
mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
if overflow || mem > maxAlloc {
hint = 0
}
// initialize Hmap
if h == nil {
h = new(hmap)
}
h.hash0 = fastrand()
// Find the size parameter B which will hold the requested # of elements.
// For hint < 0 overLoadFactor returns false since hint < bucketCnt.
B := uint8(0)
for overLoadFactor(hint, B) {
B++
}
h.B = B
// allocate initial hash table
// if B == 0, the buckets field is allocated lazily later (in mapassign)
// If hint is large zeroing this memory could take a while.
if h.B != 0 {
var nextOverflow *bmap
h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
if nextOverflow != nil {
h.extra = new(mapextra)
h.extra.nextOverflow = nextOverflow
}
}
return h
}
makemap的处理步骤如下:
- 计算大小为size的map需要占用的内存及是否溢出
- 若溢出则重置大小为0
- 若h为nil,则new一个hmap
- 初始化hashseed
- 根据size及LoadFactor计算B,并赋值到h.B
- 若h.B不为0,构造bucket array,赋值到h.B上,若存在提前分配的overflow buckets,则存入h.Extra
- 返回h
需要注意的是:
当B>=4时,会预先创建部分overflow buckets,用以存储当buckets已满的数据,同时也减少了扩容的频率。
具体源码见makeBucketArray
。
func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) {
base := bucketShift(b)
nbuckets := base
// For small b, overflow buckets are unlikely.
// Avoid the overhead of the calculation.
if b >= 4 {
// Add on the estimated number of overflow buckets
// required to insert the median number of elements
// used with this value of b.
nbuckets += bucketShift(b - 4)
sz := t.bucket.size * nbuckets
up := roundupsize(sz)
if up != sz {
nbuckets = up / t.bucket.size
}
}
if dirtyalloc == nil {
buckets = newarray(t.bucket, int(nbuckets))
} else {
// dirtyalloc was previously generated by
// the above newarray(t.bucket, int(nbuckets))
// but may not be empty.
buckets = dirtyalloc
size := t.bucket.size * nbuckets
if t.bucket.ptrdata != 0 {
memclrHasPointers(buckets, size)
} else {
memclrNoHeapPointers(buckets, size)
}
}
if base != nbuckets {
// We preallocated some overflow buckets.
// To keep the overhead of tracking these overflow buckets to a minimum,
// we use the convention that if a preallocated overflow bucket's overflow
// pointer is nil, then there are more available by bumping the pointer.
// We need a safe non-nil pointer for the last overflow bucket; just use buckets.
nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))
last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize)))
last.setoverflow(t, (*bmap)(buckets))
}
return buckets, nextOverflow
}
assign
assign对应的使用为:
m[key] = elem
具体对应的实际是m[key]
部分,返回的是key的地址。
// Like mapaccess, but allocates a slot for the key if it is not present in the map.
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
if h == nil {
panic(plainError("assignment to entry in nil map"))
}
if raceenabled {
callerpc := getcallerpc()
pc := funcPC(mapassign)
racewritepc(unsafe.Pointer(h), callerpc, pc)
raceReadObjectPC(t.key, key, callerpc, pc)
}
if msanenabled {
msanread(key, t.key.size)
}
if h.flags&hashWriting != 0 {
throw("concurrent map writes")
}
hash := t.hasher(key, uintptr(h.hash0))
// Set hashWriting after calling t.hasher, since t.hasher may panic,
// in which case we have not actually done a write.
h.flags ^= hashWriting
if h.buckets == nil {
h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
}
again:
bucket := hash & bucketMask(h.B)
if h.growing() {
growWork(t, h, bucket)
}
b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize)))
top := tophash(hash)
var inserti *uint8
var insertk unsafe.Pointer
var elem unsafe.Pointer
bucketloop:
for {
for i := uintptr(0); i < bucketCnt; i++ {
if b.tophash[i] != top {
if isEmpty(b.tophash[i]) && inserti == nil {
inserti = &b.tophash[i]
insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
}
if b.tophash[i] == emptyRest {
break bucketloop
}
continue
}
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
if t.indirectkey() {
k = *((*unsafe.Pointer)(k))
}
if !t.key.equal(key, k) {
continue
}
// already have a mapping for key. Update it.
if t.needkeyupdate() {
typedmemmove(t.key, k, key)
}
elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
goto done
}
ovf := b.overflow(t)
if ovf == nil {
break
}
b = ovf
}
// Did not find mapping for key. Allocate new cell & add entry.
// If we hit the max load factor or we have too many overflow buckets,
// and we're not already in the middle of growing, start growing.
if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
hashGrow(t, h)
goto again // Growing the table invalidates everything, so try again
}
if inserti == nil {
// all current buckets are full, allocate a new one.
newb := h.newoverflow(t, b)
inserti = &newb.tophash[0]
insertk = add(unsafe.Pointer(newb), dataOffset)
elem = add(insertk, bucketCnt*uintptr(t.keysize))
}
// store new key/elem at insert position
if t.indirectkey() {
kmem := newobject(t.key)
*(*unsafe.Pointer)(insertk) = kmem
insertk = kmem
}
if t.indirectelem() {
vmem := newobject(t.elem)
*(*unsafe.Pointer)(elem) = vmem
}
typedmemmove(t.key, insertk, key)
*inserti = top
h.count++
done:
if h.flags&hashWriting == 0 {
throw("concurrent map writes")
}
h.flags &^= hashWriting
if t.indirectelem() {
elem = *((*unsafe.Pointer)(elem))
}
return elem
}
mapassign的处理步骤如下:
- 若h为nil则panic
- 若有其他协程在写入map,则panic相关错误
- 对key进行hash
- 设置flags为Writing
- 若h.buckets为nil,则重新分配
- 进入again处理
- 根据hash获取对应的bucket
- 若h在扩容,则进行扩容工作
- 获取bucket对应的*bmap b及hash的高位top
- 进入bucketloop
- 遍历bucket
- 若b的当前topHash不为top
(1)若当前tophash的状态为empty且inserti为nil,则当前tophash的地址赋值给inserti,并获取key及element的地址
(2)若topHash的状态为emptyReset则跳出bucketloop
(3)继续遍历bucket - 若b的当前topHash不为top,说明已找到匹配的hash。获取key确认与存入的key是否一致(是否hash冲突),不一致则继续遍历bucket。一致,则确实是否更新,需要更新,则更新对应的key。
- 根据key的地址获取element的地址,跳转done
- 获取b的overflow buckets,若为nil,则跳出bucketloop;否则将overflow赋值给b,继续bucketloop
- 如果map没在扩容,新增数据后已超过负载因子或拥有太多的overflow buckets,则进行扩容处理;扩容后进入again
- 如果inseti为nil,说明所有的buckets已经满了,创建新的overflow,存入
- 如果key类型t并非指针类型,则获取其指针
- 如果存储的值类型非指针,获取其指针
- 将key移动至insertK
- 更新inserti指向的值为top
- 进入done
- 如果有其它goroutine在写入,则panic
- 如果存储的类型为值类型,则获取其指向的值地址
- 返回elem
简单总结
简单来说,存入数据需要经过以下几步:
- 计算hash
- 根据hash低位从buckets中确认存入bucket的位置
- 根据hash高位确认bucket内的位置
- 确认key是否存在,若存在则获取数据存入地址
- 否则获取overflow buckets,继续步骤1
- 若需要扩容,则进行扩容,扩容后,继续步骤1
- 如果buckets及overflow buckets均已满,则新建overflow bucket,获取key、elem的地址
- 存入数据
正常存入值的顺序:
- buckets
- overflow buckets
- 扩容后存入buckets/overflow buckets或者创建overflow buckets后存入
assign时的扩容
若map中的数据达到负载因子或者overflow的数量过多时,会发生扩容。
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
...
// If we hit the max load factor or we have too many overflow buckets,
// and we're not already in the middle of growing, start growing.
if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
hashGrow(t, h)
goto again // Growing the table invalidates everything, so try again
}
...
}
// overLoadFactor reports whether count items placed in 1<<B buckets is over loadFactor.
func overLoadFactor(count int, B uint8) bool {
return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
}
// tooManyOverflowBuckets reports whether noverflow buckets is too many for a map with 1<<B buckets.
// Note that most of these overflow buckets must be in sparse use;
// if use was dense, then we'd have already triggered regular map growth.
func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {
// If the threshold is too low, we do extraneous work.
// If the threshold is too high, maps that grow and shrink can hold on to lots of unused memory.
// "too many" means (approximately) as many overflow buckets as regular buckets.
// See incrnoverflow for more details.
if B > 15 {
B = 15
}
// The compiler doesn't see here that B < 16; mask B to generate shorter shift code.
return noverflow >= uint16(1)<<(B&15)
}
具体来说需要满足以下两个条件中的一个:
- map中的数据数量count>8 且 count>13*(2^b/2)
- overflow的数量noverflow需要不小于2B,即对于一个map来说,noverflow最大为2B -1,当B足够大时,noverflow最大为2^15 -1
若不满足,且当前的buckets已经满了(buckets、overflow均已经满了)时,则创建新的overflow,用以存储数据。
mapaccess
mapaccess的过程与mapassign的过程类似,区别在于存入/读出。
mapaccess有对应的使用有以下两种形式,
v := m[key] // 1
v,ok := m[key] // 2
第二种相对第一种多了一个bool的参数,表示是否包含此key,此处主要看方式2的源码。
func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool) {
if raceenabled && h != nil {
callerpc := getcallerpc()
pc := funcPC(mapaccess2)
racereadpc(unsafe.Pointer(h), callerpc, pc)
raceReadObjectPC(t.key, key, callerpc, pc)
}
if msanenabled && h != nil {
msanread(key, t.key.size)
}
if h == nil || h.count == 0 {
if t.hashMightPanic() {
t.hasher(key, 0) // see issue 23734
}
return unsafe.Pointer(&zeroVal[0]), false
}
if h.flags&hashWriting != 0 {
throw("concurrent map read and map write")
}
hash := t.hasher(key, uintptr(h.hash0))
m := bucketMask(h.B)
b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize)))
if c := h.oldbuckets; c != nil {
if !h.sameSizeGrow() {
// There used to be half as many buckets; mask down one more power of two.
m >>= 1
}
oldb := (*bmap)(unsafe.Pointer(uintptr(c) + (hash&m)*uintptr(t.bucketsize)))
if !evacuated(oldb) {
b = oldb
}
}
top := tophash(hash)
bucketloop:
for ; b != nil; b = b.overflow(t) {
for i := uintptr(0); i < bucketCnt; i++ {
if b.tophash[i] != top {
if b.tophash[i] == emptyRest {
break bucketloop
}
continue
}
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
if t.indirectkey() {
k = *((*unsafe.Pointer)(k))
}
if t.key.equal(key, k) {
e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
if t.indirectelem() {
e = *((*unsafe.Pointer)(e))
}
return e, true
}
}
}
return unsafe.Pointer(&zeroVal[0]), false
}
mapaccess2的处理步骤如下:
- 当h为nil或map的大小为0时,说明当前map为空,返回零值及不存在。
- 如果当前map被其他goroutine写入,则panic
- 计算当前key的hash,获取bucket的数量,计算hash对应的buckets的bmp b
- 如果有旧的bucket(扩容中)
(1)如果是同样大小的扩容,则需要将m缩小一半(扩容时增长了一倍)
(2)查找对应位置的bmap为oldb
(3)如果没有oldb没有被清空,则赋值给b - 获取hash的高位top
- 如果b!=nil,进入bucketloop
- 依次每个bucke对应hash的高位
- 如果与top不等,且后续没有bucket,说明key不存在,退出循环
- 如果与top相等,说明key可能在此bucket中,获取对应key,比较是否相等。相等则返回对应的值及存在标识。
简单来说:
- 计算key的hash,查找到对应的b
- 如果在扩容中,重新获取oldb为b
- 获取hash的top,在b(包含overflow)中循环查找k是否存在,存在则返回对应值,否则返回零值
mapdelete
func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
if raceenabled && h != nil {
callerpc := getcallerpc()
pc := funcPC(mapdelete)
racewritepc(unsafe.Pointer(h), callerpc, pc)
raceReadObjectPC(t.key, key, callerpc, pc)
}
if msanenabled && h != nil {
msanread(key, t.key.size)
}
if h == nil || h.count == 0 {
if t.hashMightPanic() {
t.hasher(key, 0) // see issue 23734
}
return
}
if h.flags&hashWriting != 0 {
throw("concurrent map writes")
}
hash := t.hasher(key, uintptr(h.hash0))
// Set hashWriting after calling t.hasher, since t.hasher may panic,
// in which case we have not actually done a write (delete).
h.flags ^= hashWriting
bucket := hash & bucketMask(h.B)
if h.growing() {
growWork(t, h, bucket)
}
b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
bOrig := b
top := tophash(hash)
search:
for ; b != nil; b = b.overflow(t) {
for i := uintptr(0); i < bucketCnt; i++ {
if b.tophash[i] != top {
if b.tophash[i] == emptyRest {
break search
}
continue
}
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
k2 := k
if t.indirectkey() {
k2 = *((*unsafe.Pointer)(k2))
}
if !t.key.equal(key, k2) {
continue
}
// Only clear key if there are pointers in it.
if t.indirectkey() {
*(*unsafe.Pointer)(k) = nil
} else if t.key.ptrdata != 0 {
memclrHasPointers(k, t.key.size)
}
e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
if t.indirectelem() {
*(*unsafe.Pointer)(e) = nil
} else if t.elem.ptrdata != 0 {
memclrHasPointers(e, t.elem.size)
} else {
memclrNoHeapPointers(e, t.elem.size)
}
b.tophash[i] = emptyOne
// If the bucket now ends in a bunch of emptyOne states,
// change those to emptyRest states.
// It would be nice to make this a separate function, but
// for loops are not currently inlineable.
if i == bucketCnt-1 {
if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {
goto notLast
}
} else {
if b.tophash[i+1] != emptyRest {
goto notLast
}
}
for {
b.tophash[i] = emptyRest
if i == 0 {
if b == bOrig {
break // beginning of initial bucket, we're done.
}
// Find previous bucket, continue at its last entry.
c := b
for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {
}
i = bucketCnt - 1
} else {
i--
}
if b.tophash[i] != emptyOne {
break
}
}
notLast:
h.count--
break search
}
}
if h.flags&hashWriting == 0 {
throw("concurrent map writes")
}
h.flags &^= hashWriting
}
- 对于nil或者大小为0的map的进行delete操作,进行可能会panic的处理,然后返回
- 如果当前map被其他goroutine写入,则panic
- 计算当前key的hash,获取bucket的数量,计算hash对应的bucket
- 如果正在扩容,则继续进行扩容处理
- 获取bucket对应的bmp
- 获取hash的高位top
- 进入search
- 如果b!=nil,依次查找bucke内对应hash的高位
(1)若没有查找到,若没有后续bucket时,退出search,否则继续查找bucket内的下一个数据
(2)若查找到,获取对应的key,比较两者是否一致,若不一致,继续查找
(3)若一致,将key及elem指向nil或清空数据
(3)将当前tophash对应的位置的状态设置为emptyOne
(4)继续判断后续位置的状态,如果bucket现在以一堆emptyOne的状态结束,那么将这些状态更改为emptyReset的状态.
(5) 数据count减一,退出search - 如果当前map被其他goroutine写入,则panic
简单来说,就是在查找后进行k/e的清除处理,然后进行bucket的状态标记,最后count减一。
总结
本文从map的数据结构、make、assign、access、delete的对应源码分析了处理过程。
- map实际上就是hash表,底层存储结构采用的是数组。
- map会在创建时(满足条件)预创建部分overflow buckets,方便数据的存入,减少扩容的频率。
- map数据的读写顺序: buckets->overflow buckets,若需要扩容,则扩容后再读写
- map在size超过负载因子或者overflow buckets数量过多时会扩容
- 并发读写map,会发生panic(当一goroutine正在写入时,若其他goroutine的读写操作会发生panic)
本文大略提到了map扩容的条件,具体的扩容处理在稍后的文章中继续深入。
公众号
鄙人刚刚开通了公众号,专注于分享Go开发相关内容,望大家感兴趣的支持一下,在此特别感谢。