深入理解GO语言：map结构原理和源码分析

最新推荐文章于 2023-03-10 00:47:56 发布

souy_c

最新推荐文章于 2023-03-10 00:47:56 发布

阅读量822

点赞数 1

分类专栏： go 数据结构文章标签： golang 数据结构

本文链接：https://blog.csdn.net/cyq6239075/article/details/106047992

版权

go 同时被 2 个专栏收录

21 篇文章 3 订阅

订阅专栏

数据结构

12 篇文章 0 订阅

订阅专栏

Map结构是go语言项目经常使用的数据结构，map使用简单对于数据量不大的场合使用非常合适。Map结构是如何实现的？我们先从测试程序入手,我们希望分析map的创建、插入、查询、删除等流程，因此我们的测试程序就要包括这几种操作，测试程序如下：

//Test.go
import (
	"fmt"
)

func main() {
	testmap()
}

func testmap() {
        fmt.Printf("testmap start \n")
        var test1 map[int64]int
        test1=make(map[int64]int,10)
        test1[3]=3
		v, ok := test1[3]
        fmt.Printf("test1 kv: \n",v,ok)
		for k, v := range test1 {
			fmt.Printf("test1 kv: \n",k,v)
		}
		test2:=make(map[int64]int,500)
        test2[400]=400
		v2, ok2 := test2[400]
        fmt.Printf("test2 400  \n",v2,ok2)
		delete(test2,400)
		fmt.Printf("testmap end \n")
}

主要数据结构

const (
    // Maximum number of key/value pairs a bucket can hold.
    bucketCntBits = 3
    bucketCnt     = 1 << bucketCntBits

    // Maximum average load of a bucket that triggers growth is 6.5.
    // Represent as loadFactorNum/loadFactDen, to allow integer math.
    loadFactorNum = 13
    loadFactorDen = 2

    // Maximum key or value size to keep inline (instead of mallocing per element).
    // Must fit in a uint8.
    // Fast versions cannot handle big values - the cutoff size for
    // fast versions in ../../cmd/internal/gc/walk.go must be at most this value.
    maxKeySize   = 128
    maxValueSize = 128

    // data offset should be the size of the bmap struct, but needs to be
    // aligned correctly. For amd64p32 this means 64-bit alignment
    // even though pointers are 32 bit.
    dataOffset = unsafe.Offsetof(struct {
        b bmap
        v int64
    }{}.v)

    // Possible tophash values. We reserve a few possibilities for special marks.
    // Each bucket (including its overflow buckets, if any) will have either all or none of its
    // entries in the evacuated* states (except during the evacuate() method, which only happens
    // during map writes and thus no one else can observe the map during that time).
    empty          = 0 // cell is empty
    evacuatedEmpty = 1 // cell is empty, bucket is evacuated.
    evacuatedX     = 2 // key/value is valid.  Entry has been evacuated to first half of larger table.
    evacuatedY     = 3 // same as above, but evacuated to second half of larger table.
    minTopHash     = 4 // minimum tophash for a normal filled cell.

    // flags
    iterator     = 1 // there may be an iterator using buckets
    oldIterator  = 2 // there may be an iterator using oldbuckets
    hashWriting  = 4 // a goroutine is writing to the map
    sameSizeGrow = 8 // the current map growth is to a new map of the same size

    // sentinel bucket ID for iterator checks
    noCheck = 1<<(8*sys.PtrSize) - 1
)

// A header for a Go map.
type hmap struct {
    // Note: the format of the Hmap is encoded in ../../cmd/internal/gc/reflect.go and
    // ../reflect/type.go. Don't change this structure without also changing that code!
    count     int // # live cells == size of map.  Must be first (used by len() builtin)
    flags     uint8
    B         uint8  //hash桶buckets的数量为2^B个
    noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
    hash0     uint32 // hash seed

    buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
    oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
    nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

    extra *mapextra // optional fields
}

// mapextra holds fields that are not present on all maps.
type mapextra struct {
    overflow    *[]*bmap
    oldoverflow *[]*bmap

    // nextOverflow holds a pointer to a free overflow bucket.
    nextOverflow *bmap
}

// A bucket for a Go map.
type bmap struct {
tophash [bucketCnt]uint8
}

结构中每个字段的含义很多文章都有详细说明，这里在补充说明下重点，创建map结构就是创建一个hmap，hmap中buckets中具体保存着kv值，bmap就是buckets的具体结构，上面结构中看bmap内是个tophash[8]数组，kv存放在何处了呢？其实这和c语言里面的隐藏指针一样，创建buckets时是创建一个144字节（根据机器不同会有不同值）的一段buff，将这个buff转成了bmap指针，buckets的大小定义如下：

size := bucketSize*(1+ktyp.size+etyp.size) + overflowPad + ptrSize

这里可以看出来是将key和value包含在内。

查询时先根据key计算出散列的hash值，然后根据B计算出具体哪一个bucket，找到bucket后根据hash值的高8位与tophash比较如果相等就进行key值比较，如果相等就找到了key，如果不等根据overflow查询下一个tophash继续比较。

看下整体结构图就明白了这些字段之间的联系：

根据hight bit 和low bit查找key：

回到之前的测试代码，将测试代码编译成汇编代码我们就可以方便的知道map的这些操作调用了哪些接口，因此我们要将测试代码生成汇编,生成汇编命令：go tool compile -N -l -S test.go，汇编代码（部分）如下：

[root@ceph-mon cyq]# go tool compile -N -l -S test.go
"".main STEXT size=48 args=0x0 locals=0x8
	0x0000 00000 (test.go:10)	TEXT	"".main(SB), $8-0
	0x0000 00000 (test.go:10)	MOVQ	(TLS), CX
	0x0009 00009 (test.go:10)	CMPQ	SP, 16(CX)
	0x000d 00013 (test.go:10)	JLS	41
	0x000f 00015 (test.go:10)	SUBQ	$8, SP
	0x0013 00019 (test.go:10)	MOVQ	BP, (SP)
	0x0017 00023 (test.go:10)	LEAQ	(SP), BP
	0x001b 00027 (test.go:10)	FUNCDATA	$0, gclocals•33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x001b 00027 (test.go:10)	FUNCDATA	$1, gclocals•33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x001b 00027 (test.go:11)	PCDATA	$0, $0
	0x001b 00027 (test.go:11)	CALL	"".testmap(SB)
	0x0020 00032 (test.go:12)	MOVQ	(SP), BP
	0x0024 00036 (test.go:12)	ADDQ	$8, SP
	0x0028 00040 (test.go:12)	RET
	0x0029 00041 (test.go:12)	NOP
	0x0029 00041 (test.go:10)	PCDATA	$0, $-1
	0x0029 00041 (test.go:10)	CALL	runtime.morestack_noctxt(SB)
	0x002e 00046 (test.go:10)	JMP	0
	0x0000 64 48 8b 0c 25 00 00 00 00 48 3b 61 10 76 1a 48  dH..%....H;a.v.H
	0x0010 83 ec 08 48 89 2c 24 48 8d 2c 24 e8 00 00 00 00  ...H.,$H.,$.....
	0x0020 48 8b 2c 24 48 83 c4 08 c3 e8 00 00 00 00 eb d0  H.,$H...........
	rel 5+4 t=16 TLS+0
	rel 28+4 t=8 "".testmap+0
	rel 42+4 t=8 runtime.morestack_noctxt+0

map初始化

先来看看make(map[int64]int,10) 这行对应的初始化函数吧：

	0x0089 00137 (test.go:17)	LEAQ	type.map[int64]int(SB), AX
	0x0090 00144 (test.go:17)	MOVQ	AX, (SP)
	0x0094 00148 (test.go:17)	MOVQ	$10, 8(SP)
	0x009d 00157 (test.go:17)	LEAQ	""..autotmp_23+520(SP), AX
	0x00a5 00165 (test.go:17)	MOVQ	AX, 16(SP)
	0x00aa 00170 (test.go:17)	PCDATA	$0, $1
	0x00aa 00170 (test.go:17)	CALL 	runtime.makemap(SB)

根据汇编代码可知是调用了：runtime.makemap：

func makemap(t *maptype, hint int, h *hmap) *hmap {
    // The size of hmap should be 48 bytes on 64 bit
    // and 28 bytes on 32 bit platforms.
    if debug.gctrace > 0 {
        print("makemap: hint=", hint, "\n")
    }
    if sz := unsafe.Sizeof(hmap{}); sz != 8+5*sys.PtrSize {
        println("runtime: sizeof(hmap) =", sz, ", t.hmap.size =", t.hmap.size)
        throw("bad hmap size")
    }

    if hint < 0 || hint > int(maxSliceCap(t.bucket.size)) {
        hint = 0
    }

    // initialize Hmap
    if h == nil {
        h = (*hmap)(newobject(t.hmap))
}
//生成哈希因子
    h.hash0 = fastrand()
    if debug.gctrace > 0 {
        print("makemap: hash0=", h.hash0, "\n")
    }
    // find size parameter which will hold the requested # of elements
B := uint8(0)
//根据用户初始设置的元素个数设置合适的B
    for overLoadFactor(hint, B) {
        B++
    }
    h.B = B
    if debug.gctrace > 0 {
        print("makemap: B=", B, "\n")
    }
    
    if h.B != 0 {
        var nextOverflow *bmap
                //如果有初始的值，分配buckets
        h.buckets, nextOverflow = makeBucketArray(t, h.B)
        if nextOverflow != nil {
            h.extra = new(mapextra)
            h.extra.nextOverflow = nextOverflow
        }
    }

    return h
}

需要注意B的计算公式：overLoadFactor(hint, B)只有一行代码：
return hint > bucketCnt && uintptr(hint) > loadFactorNum*(bucketShift(B)/loadFactorDen)
即B的大小应满足 hint <= (2^B) * 6.5，B取满足该式的最小值。
分配hash数组，是在makeBucketArray函数中。

func makeBucketArray(t *maptype, b uint8) (buckets unsafe.Pointer, nextOverflow *bmap) {
// base代表用户预期的桶的数量，即hash数组的真实大小，2^B
    base := bucketShift(b)
    nbuckets := base
    // For small b, overflow buckets are unlikely.
    // Avoid the overhead of the calculation.
    if b >= 4 {
        // Add on the estimated number of overflow buckets
        // required to insert the median number of elements
        // used with this value of b.
        nbuckets += bucketShift(b - 4)
        sz := t.bucket.size * nbuckets
        up := roundupsize(sz)
        if up != sz {
            nbuckets = up / t.bucket.size
        }
        
    }
    
    buckets = newarray(t.bucket, int(nbuckets))
    if base != nbuckets {
        nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))
        last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize)))
        last.setoverflow(t, (*bmap)(buckets))
    }
    return buckets, nextOverflow
}

如果B>4，就需要多申请一部分bucket，将多申请的bucket记录在nextOverflow，这样等需要扩容是直接就可以从nextOverflow中获取bucket。
如下图：

map插入和更新

        0x00bc 00188 (test.go:18)	LEAQ	type.map[int64]int(SB), AX
	0x00c3 00195 (test.go:18)	MOVQ	AX, (SP)
	0x00c7 00199 (test.go:18)	MOVQ	"".test1+168(SP), AX
	0x00cf 00207 (test.go:18)	MOVQ	AX, 8(SP)
	0x00d4 00212 (test.go:18)	MOVQ	$3, 16(SP)
	0x00dd 00221 (test.go:18)	PCDATA	$0, $2
	0x00dd 00221 (test.go:18)	CALL	runtime. mapassign (SB)
	0x00e2 00226 (test.go:18)	MOVQ	24(SP), AX
	0x00e7 00231 (test.go:18)	MOVQ	AX, ""..autotmp_24+224(SP)
	0x00ef 00239 (test.go:18)	TESTB	AL, (AX)
	0x00f1 00241 (test.go:18)	MOVQ	$3, (AX)

插入是调用了mapassign函数。当然后有人可能是mapassign_fast64，他们是一族函数就看编译器怎么选择了。

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    if h == nil {
        panic(plainError("assignment to entry in nil map"))
    }
    if raceenabled {
        callerpc := getcallerpc()
        pc := funcPC(mapassign)
        racewritepc(unsafe.Pointer(h), callerpc, pc)
        raceReadObjectPC(t.key, key, callerpc, pc)
    }
    if msanenabled {
        msanread(key, t.key.size)
}
// 在这里做并发判断，检测到并发写时，抛异常
	// 注意：go map的并发检测是伪检测，并不保证所有的并发都会被检测出来。而且这玩意是在运行期检测。
	// 所以对map有并发要求时，应使用sync.map来代替普通map，通过加锁来阻断并发冲突

    if h.flags&hashWriting != 0 {
        throw("concurrent map writes")
    }
alg := t.key.alg
// 这里得到hash值
    hash := alg.hash(key, uintptr(h.hash0))

    // 置Writing标志，key写入buckets后才会清除标志
    h.flags |= hashWriting
        // map不能为空，但hash数组可以初始是空的，这里会初始化

    if h.buckets == nil {
        h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
    }

again:
       // 这里用hash值的低阶位定位hash数组的下标偏移量
    bucket := hash & bucketMask(h.B)
if h.growing() {
        // // 这里是map的扩容缩容操作，后面单独讲
        growWork(t, h, bucket)
}
        // 通过下标bucket，偏移定位到具体的桶
    b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize)))
    top := tophash(hash) // 这里取高8位用于在桶内定位键值对
    var inserti *uint8 // tophash插入位置
    var insertk unsafe.Pointer // key插入位置
var val unsafe.Pointer // value插入位置

    for {
        for i := uintptr(0); i < bucketCnt; i++ {
            if b.tophash[i] != top {
                if b.tophash[i] == empty && inserti == nil {
// 找到个空位，先记录下tophash、key、value的插入位置
				    // 但要遍历完才能确定要不要插入到这个位置，因为后面有可能有重复的元素

                    inserti = &b.tophash[i]
                    insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
                    val = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
                }
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            if t.indirectkey {
                k = *((*unsafe.Pointer)(k))
            }
            if !alg.equal(key, k) {
                continue
            }
             // 走到这里说明map里找到一个重复的key，更新key-value
            if t.needkeyupdate {
                typedmemmove(t.key, k, key)
            }
            val = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
            goto done
        }
        ovf := b.overflow(t)
        if ovf == nil {
            break
        }
        b = ovf
}
//这里判断要不要扩容，后面单独讲
    if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
        hashGrow(t, h)
        goto again // Growing the table invalidates everything, so try again
    }
        // inserti == nil说明上1步没找到空位，整个链表是满的，这里添加一个新的溢出桶上去
    if inserti == nil {
        /// 分配新溢出桶，优先用预留的溢出桶，用完了则分配一个新桶内存
        newb := h.newoverflow(t, b)
        inserti = &newb.tophash[0]
        insertk = add(unsafe.Pointer(newb), dataOffset)
        val = add(insertk, bucketCnt*uintptr(t.keysize))
    }

    // // 当key或value的类型大小超过一定值（128字节）时，桶只存储key或value的指针。这里分配空间并取指针
    if t.indirectkey {
        kmem := newobject(t.key)
        *(*unsafe.Pointer)(insertk) = kmem
        insertk = kmem
    }
    if t.indirectvalue {
        vmem := newobject(t.elem)
        *(*unsafe.Pointer)(val) = vmem
    }
    typedmemmove(t.key, insertk, key) // 在桶中对应位置插入key
    *inserti = top // 插入tophash，hash值高8位
    h.count++ // 插入了新的键值对，h.count数量+1

done:
    if h.flags&hashWriting == 0 {
        throw("concurrent map writes")
    }
    h.flags &^= hashWriting // 释放hashWriting标志位
    if t.indirectvalue {
        val = *((*unsafe.Pointer)(val))
}
// 返回value可插入位置的指针，注意，value还没插入
    return val
}

函数的最后返回保存值的地址，我们在调到汇编代码中就可以看到在汇编中将值保存到该地址内。

map扩容和缩容

在上面说到了扩容，扩容/缩容的触发条件：

（1）当前不处在growing状态

（2）触发扩容：map的数据量count大于hash桶数量(2^B) * 6.5，2^B是hash数组大小，不包括溢出的桶

（3）触发缩容：溢出的桶数量noverflow>=32768(1<<15)或者>=hash数组大小。

扩容和缩容是同一个函数hashGrow()：

func hashGrow(t *maptype, h *hmap) {
    // If we've hit the load factor, get bigger.
    // Otherwise, there are too many overflow buckets,
    // so keep the same number of buckets and "grow" laterally.
    bigger := uint8(1)
    if !overLoadFactor(h.count+1, h.B) {
        bigger = 0
        h.flags |= sameSizeGrow
    }
    oldbuckets := h.buckets
    newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger)

    flags := h.flags &^ (iterator | oldIterator)
    if h.flags&iterator != 0 {
        flags |= oldIterator
    }
    // commit the grow (atomic wrt gc)
    h.B += bigger
    h.flags = flags
    h.oldbuckets = oldbuckets
    h.buckets = newbuckets
    h.nevacuate = 0
    h.noverflow = 0

    if h.extra != nil && h.extra.overflow != nil {
        // Promote current overflow buckets to the old generation.
        if h.extra.oldoverflow != nil {
            throw("oldoverflow is not nil")
        }
        h.extra.oldoverflow = h.extra.overflow
        h.extra.overflow = nil
    }
    if nextOverflow != nil {
        if h.extra == nil {
            h.extra = new(mapextra)
        }
        h.extra.nextOverflow = nextOverflow
    }

    // the actual copying of the hash table data is done incrementally
    // by growWork() and evacuate().
}

在hashGrow()开始，会先判断是否满足扩容条件，如果满足就表明这次是扩容，不满足就一定是缩容条件触发了。扩容和缩容剩下的逻辑，主要区别就在于容量变化，就是hmap.B参数，扩容时B+1则hash表容量扩大1倍，缩容时hash表容量不变。

h.oldbuckets：指向旧的hash数组，即当前的h.buckets
h.buckets：指向新创建的hash数组

到这里触发的主要工作已经完成，接下来就是怎么把元素搬迁到新hash表里了。如果现在就一次全量搬迁过去，显然接下来会有比较长的一段时间map被占用（不支持并发）。所以搬迁的工作是异步增量搬迁的。
在插入和删除的函数内都有下面一段代码用于在每次插入和删除操作时，执行一次搬迁工作。

每执行一次插入或删除，都会调用growWork搬迁0~2个hash桶（有可能这次需要搬迁的2个桶在此之前都被搬过了）。搬迁是以hash桶为单位的，包含对应的hash桶和这个桶的溢出链表。被delete掉的元素(emptyone标志)会被舍弃（这是缩容的关键），被搬迁过的bucket会被设置标记evacuatedX/evacuatedY

func growWork(t *maptype, h *hmap, bucket uintptr) {
    // make sure we evacuate the oldbucket corresponding
    // to the bucket we're about to use
    evacuate(t, h, bucket&h.oldbucketmask())

    // evacuate one more oldbucket to make progress on growing
    if h.growing() {
        evacuate(t, h, h.nevacuate)
    }
}

真正进行bucket搬移的evacuate函数。

func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
    b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
    newbit := h.noldbuckets()//如果是扩容，因为每次都是扩两倍，相当于之前一个bucket分到两个bucket里面，这里就是找到新的bucket（其实newbit就是老bucket个数）。
    if !evacuated(b) {
     
        var xy [2]evacDst
        x := &xy[0]
        x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
        x.k = add(unsafe.Pointer(x.b), dataOffset)
        x.v = add(x.k, bucketCnt*uintptr(t.keysize))

        if !h.sameSizeGrow() {
            // Only calculate y pointers if we're growing bigger.
            // Otherwise GC can see bad pointers.
            y := &xy[1]
            y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
            y.k = add(unsafe.Pointer(y.b), dataOffset)
            y.v = add(y.k, bucketCnt*uintptr(t.keysize))
        }

        for ; b != nil; b = b.overflow(t) {
            k := add(unsafe.Pointer(b), dataOffset)
            v := add(k, bucketCnt*uintptr(t.keysize))
            for i := 0; i < bucketCnt; i, k, v = i+1, add(k, uintptr(t.keysize)), add(v, uintptr(t.valuesize)) {
                top := b.tophash[i]
                if top == empty {
                    b.tophash[i] = evacuatedEmpty
                    continue
                }
                if top < minTopHash {
                    throw("bad map state")
                }
                k2 := k
                if t.indirectkey {
                    k2 = *((*unsafe.Pointer)(k2))
                }
                var useY uint8
                if !h.sameSizeGrow() {
          
                    hash := t.key.alg.hash(k2, uintptr(h.hash0))
                    if h.flags&iterator != 0 && !t.reflexivekey && !t.key.alg.equal(k2, k2) {
                      
                        useY = top & 1
                        top = tophash(hash)
                    } else {
                        if hash&newbit != 0 {
                            useY = 1
                        }
                    }
                }

                if evacuatedX+1 != evacuatedY {
                    throw("bad evacuatedN")
                }

                b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
                dst := &xy[useY]                 // evacuation destination

                if dst.i == bucketCnt {
                    dst.b = h.newoverflow(t, dst.b)
                    dst.i = 0
                    dst.k = add(unsafe.Pointer(dst.b), dataOffset)
                    dst.v = add(dst.k, bucketCnt*uintptr(t.keysize))
                }
                dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
                if t.indirectkey {
                    *(*unsafe.Pointer)(dst.k) = k2 // copy pointer
                } else {
                    typedmemmove(t.key, dst.k, k) // copy value
                }
                if t.indirectvalue {
                    *(*unsafe.Pointer)(dst.v) = *(*unsafe.Pointer)(v)
                } else {
                    typedmemmove(t.elem, dst.v, v)
                }
                dst.i++
                // These updates might push these pointers past the end of the
                // key or value arrays.  That's ok, as we have the overflow pointer
                // at the end of the bucket to protect against pointing past the
                // end of the bucket.
                dst.k = add(dst.k, uintptr(t.keysize))
                dst.v = add(dst.v, uintptr(t.valuesize))
            }
        }
        // Unlink the overflow buckets & clear key/value to help GC.
        if h.flags&oldIterator == 0 && t.bucket.kind&kindNoPointers == 0 {
            b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
            // Preserve b.tophash because the evacuation
            // state is maintained there.
            ptr := add(b, dataOffset)
            n := uintptr(t.bucketsize) - dataOffset
            memclrHasPointers(ptr, n)
        }
    }

    if oldbucket == h.nevacuate {
        advanceEvacuationMark(h, t, newbit)
    }
}

Evacuate函数实现两个目的：

1、如果是扩容将老的bucket分摊到新的两个（其中之一是和老bucket相同位置，另外一个就是间隔老bucket个数的下一个bucket）bucket中。如下图：

2、如果是缩容，就是将溢出桶内的key/value紧凑存放

缩容仅仅针对溢出桶太多的情况，触发缩容时hash数组的大小不变，即hash数组所占用的空间只增不减。也就是说，如果我们把一个已经增长到很大的map的元素挨个全部删除掉，hash表所占用的内存空间也不会被释放

map删除

	0x06f6 01782 (test.go:28)	LEAQ	type.map[int64]int(SB), AX
	0x06fd 01789 (test.go:28)	MOVQ	AX, (SP)
	0x0701 01793 (test.go:28)	MOVQ	"".test2+160(SP), AX
	0x0709 01801 (test.go:28)	MOVQ	AX, 8(SP)
	0x070e 01806 (test.go:28)	MOVQ	$400, 16(SP)
	0x0717 01815 (test.go:28)	PCDATA	$0, $8
	0x0717 01815 (test.go:28)	CAL L	runtime.mapdelete(SB)

删除与插入类似，前面的步骤都是参数和状态判断、定位key-value位置，然后clear对应的内存。不展开说。以下是几个关键点：

（1）删除过程中也会置hashWriting标志

（2）当key/value过大时，hash表里存储的是指针，这时候用软删除，置指针为nil，数据交给gc去删。当然，这是map的内部处理，外层是无感知的，拿到的都是值拷贝

（3）无论Key/value是值类型还是指针类型，删除操作都只影响hash表，外层已经拿到的数据不受影响。尤其是指针类型，外层的指针还能继续使用

func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
    if raceenabled && h != nil {
        callerpc := getcallerpc()
        pc := funcPC(mapdelete)
        racewritepc(unsafe.Pointer(h), callerpc, pc)
        raceReadObjectPC(t.key, key, callerpc, pc)
    }
    if msanenabled && h != nil {
        msanread(key, t.key.size)
    }
    if h == nil || h.count == 0 {
        return
    }
    if h.flags&hashWriting != 0 {
        throw("concurrent map writes")
    }

    alg := t.key.alg
    hash := alg.hash(key, uintptr(h.hash0))

    // Set hashWriting after calling alg.hash, since alg.hash may panic,
    // in which case we have not actually done a write (delete).
    h.flags |= hashWriting

    bucket := hash & bucketMask(h.B)
    if h.growing() {
        growWork(t, h, bucket)
    }
    b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
    top := tophash(hash)
search:
    for ; b != nil; b = b.overflow(t) {
        for i := uintptr(0); i < bucketCnt; i++ {
            if b.tophash[i] != top {
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            k2 := k
            if t.indirectkey {
                k2 = *((*unsafe.Pointer)(k2))
            }
            if !alg.equal(key, k2) {
                continue
            }
            // Only clear key if there are pointers in it.
            if t.indirectkey {
                *(*unsafe.Pointer)(k) = nil
            } else if t.key.kind&kindNoPointers == 0 {
                memclrHasPointers(k, t.key.size)
            }
            // Only clear value if there are pointers in it.
            if t.indirectvalue || t.elem.kind&kindNoPointers == 0 {
                v := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
                if t.indirectvalue {
                    *(*unsafe.Pointer)(v) = nil
                } else {
                    memclrHasPointers(v, t.elem.size)
                }
            }
            b.tophash[i] = empty
            h.count--
            break search
        }
    }

    if h.flags&hashWriting == 0 {
        throw("concurrent map writes")
    }
    h.flags &^= hashWriting
}

有了前面的查询和插入基础删除就比较简单了，删除就是将key对应的tophash设置成empty，内存并没有释放。

map遍历

先看几个标记：

// 可能有迭代器使用 
bucketsiterator =1

// 可能有迭代器使用
 oldbucketsoldIterator =2

// 有协程正在向 map 中写入 
keyhashWriting =4

// 等量扩容（对应条件 2）
sameSizeGrow =8

本来 map 的遍历过程比较简单：遍历所有的 bucket 以及它后面挂的 overflow bucket，然后挨个遍历 bucket 中的所有 key。每个 bucket 中包含 8 个 key，取出 key 和 value，这个过程就完成了。

但是，现实并没有这么简单。还记得前面讲过的扩容过程吗？扩容过程不是一个原子的操作，它每次最多只搬运 2 个 bucket，所以如果触发了扩容操作，那么在很长时间里，map 的状态都是处于一个中间态：有些 bucket 已经搬迁到新家，而有些 bucket 还待在老地方。

因此，遍历如果发生在扩容的过程中，就会涉及到遍历新老 bucket 的过程，这是难点所在。

先看下汇编代码中迭代器的使用：

        0x0297 00663 (test.go:21)	MOVQ	"".test1+168(SP), AX
	0x029f 00671 (test.go:21)	MOVQ	AX, ""..autotmp_13+232(SP)
	0x02a7 00679 (test.go:21)	LEAQ	""..autotmp_14+568(SP), DI
	0x02af 00687 (test.go:21)	XORPS	X0, X0
	0x02b2 00690 (test.go:21)	LEAQ	-32(DI), DI
	0x02b6 00694 (test.go:21)	DUFFZERO	$273
	0x02c9 00713 (test.go:21)	LEAQ	type.map[int64]int(SB), AX
	0x02d0 00720 (test.go:21)	MOVQ	AX, (SP)
	0x02d4 00724 (test.go:21)	MOVQ	""..autotmp_13+232(SP), AX
	0x02dc 00732 (test.go:21)	MOVQ	AX, 8(SP)
	0x02e1 00737 (test.go:21)	LEAQ	""..autotmp_14+568(SP), AX
	0x02e9 00745 (test.go:21)	MOVQ	AX, 16(SP)
	0x02ee 00750 (test.go:21)	PCDATA	$0, $5
	0x02ee 00750 (test.go:21)	CALL	runtime.mapiterinit(SB)
	0x02f3 00755 (test.go:21)	JMP	757
	0x02f5 00757 (test.go:21)	MOVQ	""..autotmp_14+568(SP), AX
	0x02fd 00765 (test.go:21)	TESTQ	AX, AX
	0x0300 00768 (test.go:21)	JNE	775
	0x0302 00770 (test.go:21)	JMP	1226
	0x0307 00775 (test.go:21)	MOVQ	""..autotmp_14+576(SP), AX
	0x030f 00783 (test.go:21)	TESTB	AL, (AX)
	0x0311 00785 (test.go:21)	MOVQ	(AX), AX
	0x0314 00788 (test.go:21)	MOVQ	AX, ""..autotmp_29+112(SP)
	0x0319 00793 (test.go:21)	MOVQ	""..autotmp_14+568(SP), AX
	0x0321 00801 (test.go:21)	TESTB	AL, (AX)
	0x0323 00803 (test.go:21)	MOVQ	(AX), AX
	0x0326 00806 (test.go:21)	MOVQ	AX, "".k+96(SP)
	0x032b 00811 (test.go:21)	MOVQ	""..autotmp_29+112(SP), AX
	0x0330 00816 (test.go:21)	MOVQ	AX, "".v+88(SP)

迭代器使用流程图：

type hiter struct{
// key 指针 
key unsafe.Pointer
// value 指针 
value unsafe.Pointer
// map 类型，包含如 key size 大小等 
t *maptype
// map header 
h *hmap
// 初始化时指向的 bucket 
buckets unsafe.Pointer
// 当前遍历到的 bmap 
bptr *bmap 
overflow [2]*[]*bmap
oldoverflow *[]*bmap
// 起始遍历的 bucet 编号 
startBucket uintptr
// 遍历开始时 key 的编号（每个 bucket 中有 8 个 cell） 
offset uint8
// 是否从头遍历了 
wrapped bool
// B 的大小 B uint8
// 指示当前 key 序号 
i uint8
// 指向当前的 bucket 
bucket uintptr
// 因为扩容，需要检查的 bucket 
checkBucket uintptr
}

mapiterinit 就是对 hiter 结构体里的字段进行初始化赋值操作，即使是对一个写死的 map 进行遍历，每次出来的结果也是无序的。

func mapiterinit(t *maptype, h *hmap, it *hiter) {
    if raceenabled && h != nil {
        callerpc := getcallerpc()
        racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiterinit))
    }

    if h == nil || h.count == 0 {
        return
    }

    if unsafe.Sizeof(hiter{})/sys.PtrSize != 12 {
        throw("hash_iter size incorrect") // see ../../cmd/internal/gc/reflect.go
    }
    it.t = t
    it.h = h

    // grab snapshot of bucket state
    it.B = h.B
    it.buckets = h.buckets
    if t.bucket.kind&kindNoPointers != 0 {
       
        h.createOverflow()
        it.overflow = h.extra.overflow
        it.oldoverflow = h.extra.oldoverflow
    }

// 生成随机数 r
r := uintptr(fastrand())
    if h.B > 31-bucketCntBits {
        r += uintptr(fastrand()) << 31
}
//从哪个 bucket 开始遍历
it.startBucket = r & bucketMask(h.B)
//从 bucket 的哪个 key 开始遍历
    it.offset = uint8(r >> h.B & (bucketCnt - 1))

    // iterator state
    it.bucket = it.startBucket

    // Remember we have an iterator.
    // Can run concurrently with another mapiterinit().
    if old := h.flags; old&(iterator|oldIterator) != iterator|oldIterator {
        atomic.Or8(&h.flags, iterator|oldIterator)
    }

    mapiternext(it)
}

在 mapiternext 函数中就会从 it.startBucket 的 it.offset 号的 key 开始遍历，取出其中的 key 和 value，直到又回到起点 bucket，完成遍历过程。

源码部分比较好看懂，尤其是理解了前面注释的几段代码后，再看这部分代码就没什么压力了。下面通过图形化的方式讲解整个遍历过程，希望能够清晰易懂。

func mapiternext(it *hiter) {
    h := it.h
    if raceenabled {
        callerpc := getcallerpc()
        racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiternext))
    }
    if h.flags&hashWriting != 0 {
        throw("concurrent map iteration and map write")
    }
    t := it.t
    bucket := it.bucket
    b := it.bptr
    i := it.i
    checkBucket := it.checkBucket
    alg := t.key.alg

next:
    if b == nil {
        if bucket == it.startBucket && it.wrapped {
            // end of iteration
            it.key = nil
            it.value = nil
            return
        }
        if h.growing() && it.B == h.B {
           
            oldbucket := bucket & it.h.oldbucketmask()
            b = (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
            if !evacuated(b) {
                checkBucket = bucket
            } else {
                b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))
                checkBucket = noCheck
            }
        } else {
            b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))
            checkBucket = noCheck
        }
        bucket++
        if bucket == bucketShift(it.B) {
            bucket = 0
            it.wrapped = true
        }
        i = 0
    }
    for ; i < bucketCnt; i++ {
        offi := (i + it.offset) & (bucketCnt - 1)
        if b.tophash[offi] == empty || b.tophash[offi] == evacuatedEmpty {
            continue
        }
        k := add(unsafe.Pointer(b), dataOffset+uintptr(offi)*uintptr(t.keysize))
        if t.indirectkey {
            k = *((*unsafe.Pointer)(k))
        }
        v := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+uintptr(offi)*uintptr(t.valuesize))
        if checkBucket != noCheck && !h.sameSizeGrow() {
            
            if t.reflexivekey || alg.equal(k, k) {
              
                hash := alg.hash(k, uintptr(h.hash0))
                if hash&bucketMask(it.B) != checkBucket {
                    continue
                }
            } else {
               
                if checkBucket>>(it.B-1) != uintptr(b.tophash[offi]&1) {
                    continue
                }
            }
        }
        if (b.tophash[offi] != evacuatedX && b.tophash[offi] != evacuatedY) ||
            !(t.reflexivekey || alg.equal(k, k)) {
            
            it.key = k
            if t.indirectvalue {
                v = *((*unsafe.Pointer)(v))
            }
            it.value = v
        } else {
            
            rk, rv := mapaccessK(t, h, k)
            if rk == nil {
                continue // key has been deleted
            }
            it.key = rk
            it.value = rv
        }
        it.bucket = bucket
        if it.bptr != b { // avoid unnecessary write barrier; see issue 14921
            it.bptr = b
        }
        it.i = i + 1
        it.checkBucket = checkBucket
        return
    }
    b = b.overflow(t)
    i = 0
    goto next
}

假设我们有下图所示的一个 map，起始时 B = 1，有两个 bucket，后来触发了扩容（这里不要深究扩容条件，只是一个设定），B 变成 2。并且， 1 号 bucket 中的内容搬迁到了新的 bucket， 1号裂变成 1号和 3号；0号 bucket 暂未搬迁。老的 bucket 挂在在 *oldbuckets 指针上面，新的 bucket 则挂在 *buckets 指针上面

这时，我们对此 map 进行遍历。假设经过初始化后，startBucket = 3，offset = 0。于是，遍历的起点将是 3 号 bucket 的 0 号 key， bucket 遍历顺序为：3 -> 0 -> 1 -> 2。

因为 3 号 bucket 对应老的 1 号 bucket，因此先检查老 1 号 bucket 是否已经被搬迁过。判断方法就是：

func evacuated(b *bmap) bool {
	h := b.tophash[0]
	return h > empty && h < minTopHash
}

如果 b.tophash[0] 的值在标志值范围内，即在 (0,4) 区间里，说明已经被搬迁过了。

empty = 0
evacuatedEmpty = 1
evacuatedX = 2
evacuatedY = 3
minTopHash = 4

在本例中，老 1 号 bucket 已经被搬迁过了。所以它的 tophash[0] 值在 (0,4) 范围内，因此只用遍历新的 3 号 bucket。

依次遍历 3 号 bucket 的 key，这时候会找到第一个非空的 key4：

新 3 号 bucket 遍历完之后，回到了新 0 号 bucket。0 号 bucket 对应老的 0 号 bucket，经检查，老 0 号 bucket 并未搬迁，因此对新 0 号 bucket 的遍历就改为遍历老 0 号 bucket。那是不是把老 0 号 bucket 中的所有 key 都取出来呢？

并没有这么简单，回忆一下，老 0 号 bucket 在搬迁后将裂变成 2 个 bucket：新 0 号、新 2 号。而我们此时正在遍历的只是新 0 号 bucket（注意，遍历都是遍历的 *bucket 指针，也就是所谓的新 buckets）。所以，我们只会取出老 0 号 bucket 中那些在裂变之后，分配到新 0 号 bucket 中的那些 key。

因此，lowbits == 00 的key1将进入遍历结果集：

key4—>key1

和之前的流程一样，继续遍历新 1 号 bucket，发现老 1 号 bucket 已经搬迁，只用遍历新 1 号 bucket 中现有的元素就可以了。结果集变成：

key4—>key1—>key3

继续遍历新 2 号 bucket，它来自老 0 号 bucket，因此需要在老 0 号 bucket 中那些会裂变到新 2 号 bucket 中的 key，也就是 lowbit == 01 的那些 key2。

这样，遍历结果集变成：

key4—>key1—>key3—>key2

最后，继续遍历到新 3 号 bucket 时，发现所有的 bucket 都已经遍历完毕，整个迭代过程执行完毕。

顺便说一下，如果碰到 key 是 NaNs这种的，处理方式类似。核心还是要看它被分裂后具体落入哪个 bucket。只不过只用看它 top hash 的最低位。如果 top hash 的最低位是 0 ，分配到 X part；如果是 1 ，则分配到 Y part。据此决定是否取出 key，放到遍历结果集里。

map 遍历的核心在于理解 2 倍扩容时，老 bucket 会分裂到 2 个新 bucket 中去。而遍历操作，会按照新 bucket 的序号顺序进行，碰到老 bucket 未搬迁的情况时，要在老 bucket 中找到将来要搬迁到新 bucket 来的 key

souy_c

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
2
评论
深入理解GO语言：map结构原理和源码分析

Map结构是go语言项目经常使用的数据结构，map使用简单对于数据量不大的场合使用非常合适。Map结构是如何实现的？我们先从测试程序入手,我们希望分析map的创建、插入、查询、删除等流程，因此我们的测试程序就要包括这几种操作，测试程序如下：//Test.goimport ( "fmt")func main() { testmap()}func testmap() { fmt.Printf("testmap start \n") var test1 ma
复制链接

扫一扫