深入理解GO语言:goroutine协程及调度

对操作系统有过一些了解就知道linux下的线程其实是task_struct结构, 线程其实并不是真正运行的实体, 线程只是代表一个执行流和其状态. 真正运行驱动流程往前的其实是CPU. CPU在时钟的驱动下, 根据PC寄存器从程序中取指令和操作数, 从RAM中取数据, 进行计算, 处理, 跳转, 驱动执行流往前. CPU并不关注处理的是线程还是协程, 只需要设置PC寄存器, 设置栈指针等(这些称为上下文), 那么CPU就可以欢快的运行这个线程或者这个协程了.

线程的运行, 其实是被运行. 其阻塞, 其实是切换出调度队列, 不再去调度执行这个执行流. 其他执行流满足其条件, 便会把被移出调度队列的执行流重新放回调度队列.

协程同理, 协程其实也是一个数据结构, 记录了要运行什么函数, 运行到哪里了. go在用户态实现调度, 所以go要有代表协程这种执行流的结构体, 也要有保存和恢复上下文的函数, 运行队列.

Goroutine协程结构体

type g struct {
    
  // 当前g使用的栈空间,stack结构包括 [lo, hi]两个成员
    stack       stack   // offset known to runtime/cgo
  // 用于检测是否需要进行栈扩张,go代码使用
    stackguard0 uintptr // offset known to liblink
  // 用于检测是否需要进行栈扩展,原生代码使用的
    stackguard1 uintptr // offset known to liblink
  // 当前g所绑定的m
    m              *m      // current m; offset known to arm liblink
  // 当前g的调度数据,当goroutine切换时,保存当前g的上下文,用于恢复
    sched          gobuf
    // g当前的状态
    atomicstatus   uint32
  // 当前g的id
    goid           int64
  // 下一个g的地址,通过guintptr结构体的ptr set函数可以设置和获取下一个g,通过这个字段和sched.gfreeStack sched.gfreeNoStack 可以把 free g串成一个链表
    schedlink      guintptr
  // 判断g是否允许被抢占
    preempt        bool       // preemption signal, duplicates stackguard0 = stackpreempt
    // g是否要求要回到这个M执行, 有的时候g中断了恢复会要求使用原来的M执行
    lockedm        muintptr
}
type gobuf struct {
    sp   uintptr  //栈指针
    pc   uintptr//程序运行到的位置
    g    guintptr
    ctxt unsafe.Pointer
    ret  sys.Uintreg
    lr   uintptr
    bp   uintptr // for GOEXPERIMENT=framepointer
}

协程切换函数

1、发生切换时保存上下文

TEXT runtime•gogo(SB), NOSPLIT, $24-8
	MOVD	buf+0(FP), R5
	MOVD	gobuf_g(R5), g
	BL	runtime•save_g(SB)

	MOVD	0(g), R4	// make sure g is not nil
	MOVD	gobuf_sp(R5), R0
	MOVD	R0, RSP
	MOVD	gobuf_lr(R5), LR
	MOVD	gobuf_ret(R5), R0
	MOVD	gobuf_ctxt(R5), R26
	MOVD	$0, gobuf_sp(R5)
	MOVD	$0, gobuf_ret(R5)
	MOVD	$0, gobuf_lr(R5)
	MOVD	$0, gobuf_ctxt(R5)
	CMP	ZR, ZR // set condition codes for == test, needed by stack split
	MOVD	gobuf_pc(R5), R6
	B	(R6)

2、重新调度时恢复上下文

TEXT runtime•mcall(SB), NOSPLIT, $-8-8
	// Save caller state in g->sched
	MOVD	RSP, R0
	MOVD	R0, (g_sched+gobuf_sp)(g)
	MOVD	LR, (g_sched+gobuf_pc)(g)
	MOVD	$0, (g_sched+gobuf_lr)(g)
	MOVD	g, (g_sched+gobuf_g)(g)

	// Switch to m->g0 & its stack, call fn.
	MOVD	g, R3
	MOVD	g_m(g), R8
	MOVD	m_g0(R8), g
	BL	runtime•save_g(SB)
	CMP	g, R3
	BNE	2(PC)
	B	runtime•badmcall(SB)
	MOVD	fn+0(FP), R26			// context
	MOVD	0(R26), R4			// code pointer
	MOVD	(g_sched+gobuf_sp)(g), R0
	MOVD	R0, RSP	// sp = m->g0->sched.sp
	MOVD	R3, -8(RSP)
	MOVD	$0, -16(RSP)
	SUB	$16, RSP
	BL	(R4)
	B	runtime•badmcall2(SB)

启动一个goroutine一般写作:

go func1(arg1 type1,arg2 type2){....}(a1,a2)

一个协程代表了一个执行流, 执行流有需要执行的函数(对应上面的 func1), 有函数的入参(a1, a2), 有当前执行流的状态和进度(对应 CPU 的 PC 寄存器和 SP 寄存器), 当然也需要有保存状态的地方, 用于执行流恢复.

真正代表协程的是 runtime.g 结构体. 每个 go func 都会编译成 runtime.newproc 函数, 最终有一个 runtime.g 对象放入调度队列. 上面的 func1 函数的指针设置在 runtime.g 的 startpc字段, 参数会在 newproc 函数里拷贝到 stack 中, sched 用于保存协程切换时的 pc 位置和栈位置。

GPM模型

type g struct {
    stack       stack   // offset known to runtime/cgo
    stackguard0 uintptr // offset known to liblink
    stackguard1 uintptr // offset known to liblink

    _panic         *_panic // innermost panic - offset known to liblink
    _defer         *_defer // innermost defer
    m              *m      // current m; offset known to arm liblink
    sched          gobuf
    syscallsp      uintptr        // if status==Gsyscall, syscallsp = sched.sp to use during gc
    syscallpc      uintptr        // if status==Gsyscall, syscallpc = sched.pc to use during gc
    stktopsp       uintptr        // expected sp at top of stack, to check in traceback
    param          unsafe.Pointer // passed parameter on wakeup
    atomicstatus   uint32
    stackLock      uint32 // sigprof/scang lock; TODO: fold in to atomicstatus
    goid           int64
    waitsince      int64  // approx time when the g become blocked
    waitreason     string // if status==Gwaiting
    schedlink      guintptr
    preempt        bool     // preemption signal, duplicates stackguard0 = stackpreempt
    paniconfault   bool     // panic (instead of crash) on unexpected fault address
    preemptscan    bool     // preempted g does scan for gc
    gcscandone     bool     // g has scanned stack; protected by _Gscan bit in status
    gcscanvalid    bool     // false at start of gc cycle, true if G has not run since last scan; TODO: remove?
    throwsplit     bool     // must not split stack
    raceignore     int8     // ignore race detection events
    sysblocktraced bool     // StartTrace has emitted EvGoInSyscall about this goroutine
    sysexitticks   int64    // cputicks when syscall has returned (for tracing)
    traceseq       uint64   // trace event sequencer
    tracelastp     puintptr // last P emitted an event for this goroutine
    lockedm        muintptr
    sig            uint32
    writebuf       []byte
    sigcode0       uintptr
    sigcode1       uintptr
    sigpc          uintptr
    gopc           uintptr // pc of go statement that created this goroutine
    startpc        uintptr // pc of goroutine function
    racectx        uintptr
    waiting        *sudog         // sudog structures this g is waiting on (that have a valid elem ptr); in lock order
    cgoCtxt        []uintptr      // cgo traceback context
    labels         unsafe.Pointer // profiler labels
    timer          *timer         // cached timer for time.Sleep
    selectDone     uint32         // are we participating in a select and did someone win the race?
    gcAssistBytes int64
}

G:每个G都代表用户启动的一个goroutine,比较重要的成员如下

  • stack: 当前g使用的栈空间, 有lo和hi两个成员
  • stackguard0: 检查栈空间是否足够的值, 低于这个值会扩张栈, 0是go代码使用的
  • stackguard1: 检查栈空间是否足够的值, 低于这个值会扩张栈, 1是原生代码使用的
  • m: 当前g对应的m
  • sched: g的调度数据, 当g中断时会保存当前的pc和rsp等值到这里, 恢复运行时会使用这里的值
  • atomicstatus: g的当前状态
  • schedlink: 下一个g, 当g在链表结构中会使用
  • preempt: g是否被抢占中
  • lockedm: g是否要求要回到这个M执行, 有的时候g中断了恢复会要求使用原来的M执行
type m struct {
    g0      *g     // goroutine with scheduling stack
    morebuf gobuf  // gobuf arg to morestack
    divmod  uint32 // div/mod denominator for arm - known to liblink

    // Fields not known to debuggers.
    procid        uint64       // for debuggers, but offset not hard-coded
    gsignal       *g           // signal-handling g
    goSigStack    gsignalStack // Go-allocated signal handling stack
    sigmask       sigset       // storage for saved signal mask
    tls           [6]uintptr   // thread-local storage (for x86 extern register)
    mstartfn      func()
    curg          *g       // current running goroutine
    caughtsig     guintptr // goroutine running during fatal signal
    p             puintptr // attached p for executing go code (nil if not executing go code)
    nextp         puintptr
    id            int64
    mallocing     int32
    throwing      int32
    preemptoff    string // if != "", keep curg running on this m
    locks         int32
    softfloat     int32
    dying         int32
    profilehz     int32
    helpgc        int32
    spinning      bool // m is out of work and is actively looking for work
    blocked       bool // m is blocked on a note
    inwb          bool // m is executing a write barrier
    newSigstack   bool // minit on C thread called sigaltstack
    printlock     int8
    incgo         bool   // m is executing a cgo call
    freeWait      uint32 // if == 0, safe to free g0 and delete m (atomic)
    fastrand      [2]uint32
    needextram    bool
    traceback     uint8
    ncgocall      uint64      // number of cgo calls in total
    ncgo          int32       // number of cgo calls currently in progress
    cgoCallersUse uint32      // if non-zero, cgoCallers in use temporarily
    cgoCallers    *cgoCallers // cgo traceback if crashing in cgo call
    park          note
    alllink       *m // on allm
    schedlink     muintptr
    mcache        *mcache
    lockedg       guintptr
    createstack   [32]uintptr    // stack that created this thread.
    freglo        [16]uint32     // d[i] lsb and f[i]
    freghi        [16]uint32     // d[i] msb and f[i+16]
    fflag         uint32         // floating point compare flags
    lockedExt     uint32         // tracking for external LockOSThread
    lockedInt     uint32         // tracking for internal lockOSThread
    nextwaitm     muintptr       // next m waiting for lock
    waitunlockf   unsafe.Pointer // todo go func(*g, unsafe.pointer) bool
    waitlock      unsafe.Pointer
    waittraceev   byte
    waittraceskip int
    startingtrace bool
    syscalltick   uint32
    thread        uintptr // thread handle
    freelink      *m      // on sched.freem

    // these are here because they are too large to be on the stack
    // of low-level NOSPLIT functions.
    libcall   libcall
    libcallpc uintptr // for cpu profiler
    libcallsp uintptr
    libcallg  guintptr
    syscall   libcall // stores syscall parameters on windows

    mOS
} 

M:底层真正执行G的线程,比较重要的成员如下

  • g0: 用于调度的特殊g, 调度和执行系统调用时会切换到这个g
  • curg: 当前运行的g
  • p: 当前拥有的P
  • nextp: 唤醒M时, M会拥有这个P
  • park: M休眠时使用的信号量, 唤醒M时会通过它唤醒
  • schedlink: 下一个m, 当m在链表结构中会使用
  • mcache: 分配内存时使用的本地分配器, 和p.mcache一样(拥有P时会复制过来)
  • lockedg: lockedm的对应值
type p struct {
    lock mutex

    id          int32
    status      uint32 // one of pidle/prunning/...
    link        puintptr
    schedtick   uint32     // incremented on every scheduler call
    syscalltick uint32     // incremented on every system call
    sysmontick  sysmontick // last tick observed by sysmon
    m           muintptr   // back-link to associated m (nil if idle)
    mcache      *mcache
    racectx     uintptr

    deferpool    [5][]*_defer // pool of available defer structs of different sizes (see panic.go)
    deferpoolbuf [5][32]*_defer

    // Cache of goroutine ids, amortizes accesses to runtime•sched.goidgen.
    goidcache    uint64
    goidcacheend uint64

    // Queue of runnable goroutines. Accessed without lock.
    runqhead uint32
    runqtail uint32
    runq     [256]guintptr
    runnext guintptr

    // Available G's (status == Gdead)
    gfree    *g
    gfreecnt int32

    sudogcache []*sudog
    sudogbuf   [128]*sudog

    tracebuf traceBufPtr
    traceSweep bool
    traceSwept, traceReclaimed uintptr

    palloc persistentAlloc // per-P to avoid mutex

    // Per-P GC state
    gcAssistTime         int64 // Nanoseconds in assistAlloc
    gcFractionalMarkTime int64 // Nanoseconds in fractional mark worker
    gcBgMarkWorker       guintptr
    gcMarkWorkerMode     gcMarkWorkerMode

    // gcMarkWorkerStartTime is the nanotime() at which this mark
    // worker started.
    gcMarkWorkerStartTime int64

    // gcw is this P's GC work buffer cache. The work buffer is
    // filled by write barriers, drained by mutator assists, and
    // disposed on certain GC state transitions.
    gcw gcWork

    // wbBuf is this P's GC write barrier buffer.
    //
    // TODO: Consider caching this in the running G.
    wbBuf wbBuf

    runSafePointFn uint32 // if 1, run sched.safePointFn at next safe point

    pad [sys.CacheLineSize]byte
} 

P:默认机器核数个P,代表执行G所需要的资源,比较重要的成员如下

  • status: p的当前状态
  • link: 下一个p, 当p在链表结构中会使用
  • m: 拥有这个P的M
  • mcache: 分配内存时使用的本地分配器
  • runqhead: 本地运行队列的出队序号
  • runqtail: 本地运行队列的入队序号
  • runq: 本地运行队列的数组, 可以保存256个G
  • gfree: G的自由列表, 保存变为_Gdead后可以复用的G实例
  • gcBgMarkWorker: 后台GC的worker函数, 如果它存在M会优先执行它
  • gcw: GC的本地工作队列, 详细将在下一篇(GC篇)分析

G、P、M之间的关系如下图:

                                   

之前介绍内存分配时说到了mcache与一个p绑定,p上的这个mcache是从m结构赋值过去的。

每个p都有本地g队列runq,newproc产生的新g优先放入本地runq,当本地runq满了后再放入全局的runq。当取g的时候也是优先从本地的runq,本地获取不到时才去全局或者其他p上的runq获取g。

当M需要系统调用或者p执行了一段时间后Sysmon负责将g和M分离,然后调用schedule进行调度。

创建协程

 上面介绍go function 其实就是调用runtime.newproc函数:

func newproc(siz int32, fn *funcval) {
    //计算额外参数的地址argp
    argp := add(unsafe.Pointer(&fn), sys.PtrSize)
    //获取调用端的地址(返回地址)pc
    pc := getcallerpc()
    //使用systemstack调用newproc1
    systemstack(func() {
        newproc1(fn, (*uint8)(argp), siz, pc)
    })
}

newproc 函数获取了参数和当前g的pc信息,并通过systemstack切换当前的g到g0, 并且使用g0的栈空间, 然后调用传入的函数, 再切换回原来的g和原来的栈空间.主要处理函数是在newproc1中:

func newproc1(fn *funcval, argp *uint8, narg int32, callergp *g, callerpc uintptr) {
//获取当前的g, 会编译为读取FS寄存器(TLS), 这里会获取到g0
    _g_ := getg()

    if fn == nil {
        _g_.m.throwing = -1 // do not dump full stacks
        throw("go of nil func value")
    }
    // 加锁禁止被抢占
    _g_.m.locks++ // disable preemption because it can be holding p in a local var
    siz := narg
    siz = (siz + 7) &^ 7

    // 如果参数过多,则直接抛出异常,栈大小是2k
    if siz >= _StackMin-4*sys.RegSize-sys.RegSize {
        throw("newproc: function arguments too large for new goroutine")
    }
     //获取m拥有的p
    _p_ := _g_.m.p.ptr()
    // 尝试获取一个空闲的g,如果获取不到,则新建一个,并添加到allg里面
    // gfget首先会尝试从p本地获取空闲的g,如果本地没有的话,则从全局获取一堆平衡到本地p
    newg := gfget(_p_)
    if newg == nil {
        newg = malg(_StackMin)
        casgstatus(newg, _Gidle, _Gdead)
        // 新建的g,添加到全局的 allg里面,allg是一个slice, append进去即可
        allgadd(newg) // publishes with a g->status of Gdead so GC scanner doesn't look at uninitialized stack.
    }
    // 判断获取的g的栈是否正常
    if newg.stack.hi == 0 {
        throw("newproc1: newg missing stack")
    }
    // 判断g的状态是否正常
    if readgstatus(newg) != _Gdead {
        throw("newproc1: new g is not Gdead")
    }
    // 预留一点空间,防止读取超出一点点
    totalSize := 4*sys.RegSize + uintptr(siz) + sys.MinFrameSize // extra space in case of reads slightly beyond frame
    // 空间大小进行对齐
    totalSize += -totalSize & (sys.SpAlign - 1) // align to spAlign
    sp := newg.stack.hi - totalSize
    spArg := sp
    // usesLr 为0,这里不执行
    if usesLR {
        // caller's LR
        *(*uintptr)(unsafe.Pointer(sp)) = 0
        prepGoExitFrame(sp)
        spArg += sys.MinFrameSize
    }
    if narg > 0 {
        // 将参数拷贝入栈
        memmove(unsafe.Pointer(spArg), unsafe.Pointer(argp), uintptr(narg))
    }
    // 初始化用于保存现场的区域及初始化基本状态
    memclrNoHeapPointers(unsafe.Pointer(&newg.sched), unsafe.Sizeof(newg.sched))
    newg.sched.sp = sp
    newg.stktopsp = sp
    // 这里保存了goexit的地址,在用户函数执行完成后,会根据pc来执行goexit
    newg.sched.pc = funcPC(goexit) + sys.PCQuantum // +PCQuantum so that previous instruction is in same function
    newg.sched.g = guintptr(unsafe.Pointer(newg))
    // 这里调整 sched 信息,pc = goexit的地址
    gostartcallfn(&newg.sched, fn)
    newg.gopc = callerpc
    newg.ancestors = saveAncestors(callergp)
    newg.startpc = fn.fn
    if _g_.m.curg != nil {
        newg.labels = _g_.m.curg.labels
    }
    if isSystemGoroutine(newg) {
        atomic.Xadd(&sched.ngsys, +1)
    }
    newg.gcscanvalid = false
    casgstatus(newg, _Gdead, _Grunnable)
    // 如果p缓存的goid已经用完,本地再从sched批量获取一点
    if _p_.goidcache == _p_.goidcacheend {
        // Sched.goidgen is the last allocated id,
        // this batch must be [sched.goidgen+1, sched.goidgen+GoidCacheBatch].
        // At startup sched.goidgen=0, so main goroutine receives goid=1.
        _p_.goidcache = atomic.Xadd64(&sched.goidgen, _GoidCacheBatch)
        _p_.goidcache -= _GoidCacheBatch - 1
        _p_.goidcacheend = _p_.goidcache + _GoidCacheBatch
    }
    // 分配goid
    newg.goid = int64(_p_.goidcache)
    _p_.goidcache++
    // 把新的g放到 p 的可运行g队列中
    runqput(_p_, newg, true)
    // 判断是否有空闲p,且是否需要唤醒一个m来执行g
    if atomic.Load(&sched.npidle) != 0 && atomic.Load(&sched.nmspinning) == 0 && mainStarted {
        wakep()
    }
    _g_.m.locks--
    if _g_.m.locks == 0 && _g_.preempt { // restore the preemption request in case we've cleared it in newstack
        _g_.stackguard0 = stackPreempt
    }
}

runtime.newproc1的处理:

(1)、调用getg获取当前的g, 会编译为读取FS寄存器(TLS), 这里会获取到g0

(2)、从g中获取对应的m的然后locks++, 禁止抢占,再从m上获取当前的p,因为g要优先放入本地p的runq队列

(3)、新建一个g 首先调用gfget从p.gfree获取g, 如果之前有g被回收在这里就可以复用,获取不到时调用malg分配一个g, 初始的栈空间大小是2K。需要先设置g的状态为已中止(_Gdead), 这样gc不会去扫描这个g的未初始化的栈

(4)、把参数复制到g的栈上,把返回地址复制到g的栈上, 这里的返回地址是goexit, 表示调用完目标函数后会调用goexit

(5)、设置g的调度数据(sched) ,设置sched.sp等于参数+返回地址后的rsp地址,设置sched.pc等于目标函数的地址, 查看gostartcallfn和gostartcall,设置sched.g等于g,设置g的状态为待运行(_Grunnable)

(6)、调用runqput把g放到运行队列,首先随机把g放到p.runnext, 如果放到runnext则入队原来在runnext的g,然后尝试把g放到P的"本地运行队列",如果本地运行队列满了则调用runqputslow把g放到"全局运行队列" ,runqputslow会把本地运行队列中一半的g放到全局运行队列, 这样下次就可以继续用快速的本地运行队列了

(7)、如果当前有空闲的P, 但是无自旋的M(nmspinning等于0), 并且主函数已执行则唤醒或新建一个M (因为有空闲p就要充分利用),唤醒或新建一个M是通过wakep函数 。wakep函数主要调用startm函数:

func startm(_p_ *p, spinning bool) {
    lock(&sched.lock)
    if _p_ == nil {
        // 如果没有指定p, 则从sched.pidle获取空闲的p
        _p_ = pidleget()
        if _p_ == nil {
            unlock(&sched.lock)
            // 如果没有获取到p,重置nmspinning
            if spinning {
                // The caller incremented nmspinning, but there are no idle Ps,
                // so it's okay to just undo the increment and give up.
                if int32(atomic.Xadd(&sched.nmspinning, -1)) < 0 {
                    throw("startm: negative nmspinning")
                }
            }
            return
        }
    }
    // 首先尝试从 sched.midle获取一个空闲的m
    mp := mget()
    unlock(&sched.lock)
    if mp == nil {
        // 如果获取不到空闲的m,则创建一个 mspining = true的m,并将p绑定到m上,直接返回
        var fn func()
        if spinning {
            // The caller incremented nmspinning, so set m.spinning in the new M.
            fn = mspinning
        }
        newm(fn, _p_)
        return
    }
    // 判断获取到的空闲m是否是spining状态
    if mp.spinning {
        throw("startm: m is spinning")
    }
    // 判断获取到的m是否有p
    if mp.nextp != 0 {
        throw("startm: m has p")
    }
    if spinning && !runqempty(_p_) {
        throw("startm: p has runnable gs")
    }
    // The caller incremented nmspinning, so set m.spinning in the new M.
    // 调用函数的父函数已经增加了nmspinning, 这里只需要设置m.spining就ok了,同时把p绑上来
    mp.spinning = spinning
    mp.nextp.set(_p_)
    // 唤醒m
    notewakeup(&mp.park)
}

startm函数 :

(1)、调用pidleget从"空闲P链表"获取一个空闲的P

(2)、调用mget从"空闲M链表"获取一个空闲的M

(3)、如果没有空闲的M, 则调用newm新建一个M ,newm会新建一个m的实例, m的实例包含一个g0, 然后调用newosproc创建一个系统线程,newosproc会调用syscall clone创建一个新的线程,线程创建后会设置TLS, 设置TLS中当前的g为g0, 然后执行mstart

(4)、调用notewakeup(&mp.park)唤醒线程

创建goroutine的流程至此结束。

协程调度

在创建m的函数newm1中我们看到m的执行函数是mstart,所以M被唤醒时会调用mstart函数。

 

func mstart() {
    _g_ := getg()

    osStack := _g_.stack.lo == 0
    if osStack {
        // Initialize stack bounds from system stack.
        // Cgo may have left stack size in stack.hi.
        // minit may update the stack bounds.
        // 从系统堆栈上直接划出所需的范围
        size := _g_.stack.hi
        if size == 0 {
            size = 8192 * sys.StackGuardMultiplier
        }
        _g_.stack.hi = uintptr(noescape(unsafe.Pointer(&size)))
        _g_.stack.lo = _g_.stack.hi - size + 1024
    }
    // Initialize stack guards so that we can start calling
    // both Go and C functions with stack growth prologues.
    _g_.stackguard0 = _g_.stack.lo + _StackGuard
    _g_.stackguard1 = _g_.stackguard0
    // 调用mstart1来处理
    mstart1()

    // Exit this thread.
    if GOOS == "windows" || GOOS == "solaris" || GOOS == "plan9" || GOOS == "darwin" {
        // Window, Solaris, Darwin and Plan 9 always system-allocate
        // the stack, but put it in _g_.stack before mstart,
        // so the logic above hasn't set osStack yet.
        osStack = true
    }
    // 退出m,正常情况下mstart1调用schedule() 时,是不再返回的,所以,不用担心系统线程的频繁创建退出
    mexit(osStack)
}
func mstart1() {
    _g_ := getg()

    if _g_ != _g_.m.g0 {
        throw("bad runtime•mstart")
    }

    // Record the caller for use as the top of stack in mcall and
    // for terminating the thread.
    // We're never coming back to mstart1 after we call schedule,
    // so other calls can reuse the current frame.
    // 保存调用者的pc sp等信息
    save(getcallerpc(), getcallersp())
    asminit()
    // 初始化m的sigal的栈和mask
    minit()

    // Install signal handlers; after minit so that minit can
    // prepare the thread to be able to handle the signals.
    // 安装sigal处理器
    if _g_.m == &m0 {
        mstartm0()
    }
    // 如果设置了mstartfn,就先执行这个
    if fn := _g_.m.mstartfn; fn != nil {
        fn()
    }

    if _g_.m.helpgc != 0 {
        _g_.m.helpgc = 0
        stopm()
    } else if _g_.m != &m0 {
        // 获取nextp
        acquirep(_g_.m.nextp.ptr())
        _g_.m.nextp = 0
    }
    schedule()
}

mstart函数:

(1)、调用getg获取当前的g, 这里会获取到g0,获取g0主要是为了检查g0是否需要分配栈,如果g未分配栈则从当前的栈空间(系统栈空间)上分配, 也就是说g0会使用系统栈空间

(2)、调用mstart1函数 ,该函数首先调用gosave函数保存当前的状态到g0的调度数据中, 因为有可能这个m是新创建的,注意这里是g0。以后每次调度都会从这个栈地址开始,然后调用schedule函数

func schedule() {
    _g_ := getg()

    if _g_.m.locks != 0 {
        throw("schedule: holding locks")
    }
    // 如果有lockg,停止执行当前的m
    if _g_.m.lockedg != 0 {
        // 解除lockedm的锁定,并执行当前g
        stoplockedm()
        execute(_g_.m.lockedg.ptr(), false) // Never returns.
    }

    // We should not schedule away from a g that is executing a cgo call,
    // since the cgo call is using the m's g0 stack.
    if _g_.m.incgo {
        throw("schedule: in cgo")
    }

top:
    // gc 等待
    if sched.gcwaiting != 0 {
        gcstopm()
        goto top
    }

    var gp *g
    var inheritTime bool

    if gp == nil {
        // Check the global runnable queue once in a while to ensure fairness.
        // Otherwise two goroutines can completely occupy the local runqueue
        // by constantly respawning each other.
        // 为了保证公平,每隔61次,从全局队列上获取g
        if _g_.m.p.ptr().schedtick%61 == 0 && sched.runqsize > 0 {
            lock(&sched.lock)
            gp = globrunqget(_g_.m.p.ptr(), 1)
            unlock(&sched.lock)
        }
    }
    if gp == nil {
        // 全局队列上获取不到待运行的g,则从p local队列中获取
        gp, inheritTime = runqget(_g_.m.p.ptr())
        if gp != nil && _g_.m.spinning {
            throw("schedule: spinning with local work")
        }
    }
    if gp == nil {
        // 如果p local获取不到待运行g,则开始查找,这个函数会从 全局 io poll, p locl和其他p local获取待运行的g,后面详细分析
        gp, inheritTime = findrunnable() // blocks until work is available
    }

    // This thread is going to run a goroutine and is not spinning anymore,
    // so if it was marked as spinning we need to reset it now and potentially
    // start a new spinning M.
    if _g_.m.spinning {
        // 如果m是自旋状态,取消自旋
        resetspinning()
    }

    if gp.lockedm != 0 {
        // Hands off own p to the locked m,
        // then blocks waiting for a new p.
        // 如果g有lockedm,则休眠上交p,休眠m,等待新的m,唤醒后从这里开始执行,跳转到top
        startlockedm(gp)
        goto top
    }
    // 开始执行这个g
    execute(gp, inheritTime)
}

schedule函数:

schedule函数获取g => [必要时休眠] => [唤醒后继续获取] => execute函数执行g => 执行后返回到goexit => 重新执行schedule函数

func execute(gp *g, inheritTime bool) {
    _g_ := getg()
    // 更改gp的状态,并不允许抢占
    casgstatus(gp, _Grunnable, _Grunning)
    gp.waitsince = 0
    gp.preempt = false
    gp.stackguard0 = gp.stack.lo + _StackGuard
    if !inheritTime {
        // 调度计数
        _g_.m.p.ptr().schedtick++
    }
    _g_.m.curg = gp
    gp.m = _g_.m
    // 开始执行g的代码了
    gogo(&gp.sched)
}

execute函数:

(1)、调用getg获取当前的g0

(2)、把G的状态由待运行(_Grunnable)改为运行中(_Grunning)

(3)、设置G的stackguard, 栈空间不足时可以扩张

(4)、增加P中记录的调度次数(对应上面的每61次优先获取一次全局运行队列)

(5)、设置g.m.curg = g,g.m = m

(6)、调用gogo函数,这个函数会根据g.sched中保存的状态恢复各个寄存器的值并继续运行g。

首先针对g.sched.ctxt调用写屏障(GC标记指针存活), ctxt中一般会保存指向[函数+参数]的指针。

设置TLS中的g为g.sched.g, 也就是g自身,设置rsp寄存器为g.sched.rsp,设置rax寄存器为g.sched.ret,设置rdx寄存器为g.sched.ctxt (上下文),设置rbp寄存器为g.sched.rbp,清空sched中保存的信息,跳转到g.sched.pc。因为前面创建goroutine的newproc1函数把返回地址设为了goexit, 函数运行完毕返回时将会调用goexit函数。

目标函数执行完毕后会调用goexit函数, goexitàmcallà goexit0.
mcall函数保运退出时的上下文, 处理如下:

  • 设置g.sched.pc等于当前的返回地址
  • 设置g.sched.sp等于寄存器rsp的值
  • 设置g.sched.g等于当前的g
  • 设置g.sched.bp等于寄存器rbp的值
  • 切换TLS中当前的g等于m.g0
  • 设置寄存器rsp等于g0.sched.sp, 使用g0的栈空间
  • 设置第一个参数为原来的g
  • 设置rdx寄存器为指向函数地址的指针(上下文)
  • 调用指定的函数, 不会返回

保存了当前的上下文然后切换到g0和g0的栈空间, 再调用指定的函数.
回到g0的栈空间这个步骤非常重要, 因为这个时候g已经中断, 继续使用g的栈空间且其他M唤醒了这个g将会产生灾难性的后果.
G在中断或者结束后都会通过mcall回到g0的栈空间继续调度, 从goexit调用的mcall的保存状态其实是多余的, 因为G已经结束了.

func goexit0(gp *g) {
    _g_ := getg()
    // 转换g的状态为dead,以放回空闲列表
    casgstatus(gp, _Grunning, _Gdead)
    if isSystemGoroutine(gp) {
        atomic.Xadd(&sched.ngsys, -1)
    }
    // 清空g的状态
    gp.m = nil
    locked := gp.lockedm != 0
    gp.lockedm = 0
    _g_.m.lockedg = 0
    gp.paniconfault = false
    gp._defer = nil // should be true already but just in case.
    gp._panic = nil // non-nil for Goexit during panic. points at stack-allocated data.
    gp.writebuf = nil
    gp.waitreason = 0
    gp.param = nil
    gp.labels = nil
    gp.timer = nil

    // Note that gp's stack scan is now "valid" because it has no
    // stack.
    gp.gcscanvalid = true
    dropg()

    // 把g放回空闲列表,以备复用
    gfput(_g_.m.p.ptr(), gp)
    // 再次进入调度循环
    schedule()
}

goexit0函数调用时已经回到了g0的栈空间, 处理如下:

  • 把G的状态由运行中(_Grunning)改为已中止(_Gdead)
  • 清空G的成员
  • 调用dropg函数解除M和G之间的关联
  • 调用gfput函数把G放到P的自由列表中, 下次创建G时可以复用
  • 调用schedule函数继续调度

抢占 

抢占时机:

  1. channel、mutex等sync操作发生了协程阻塞
  2. time.sleep
  3. 网络操作暂时未ready
  4. Gc
  5. 运行过久或者系统调用过久
  6. 其他

抢占实现 :

上面提到的sysmon函数负责协程抢占. sysmon中有netpool(获取fd事件), retake(抢占), forcegc(按时间强制执行gc), scavenge heap(释放自由列表中多余的项减少内存占用)等处理。

func retake(now int64) uint32 {
    n := 0
    // Prevent allp slice changes. This lock will be completely
    // uncontended unless we're already stopping the world.
    lock(&allpLock)
    // We can't use a range loop over allp because we may
    // temporarily drop the allpLock. Hence, we need to re-fetch
    // allp each time around the loop.
    for i := 0; i < len(allp); i++ {
        _p_ := allp[i]
        if _p_ == nil {
            // This can happen if procresize has grown
            // allp but not yet created new Ps.
            continue
        }
        pd := &_p_.sysmontick
        s := _p_.status
        if s == _Psyscall {
            // Retake P from syscall if it's there for more than 1 sysmon tick (at least 20us).
            // pd.syscalltick 即 _p_.sysmontick.syscalltick 只有在sysmon的时候会更新,而 _p_.syscalltick 则会每次都更新,所以,当syscall之后,第一个sysmon检测到的时候并不会抢占,而是第二次开始才会抢占,中间间隔至少有20us,最多会有10ms
            t := int64(_p_.syscalltick)
            if int64(pd.syscalltick) != t {
                pd.syscalltick = uint32(t)
                pd.syscallwhen = now
                continue
            }
            // On the one hand we don't want to retake Ps if there is no other work to do,
            // but on the other hand we want to retake them eventually
            // because they can prevent the sysmon thread from deep sleep.
            // 是否有空p,有寻找p的m,以及当前的p在syscall之后,有没有超过10ms
            if runqempty(_p_) && atomic.Load(&sched.nmspinning)+atomic.Load(&sched.npidle) > 0 && pd.syscallwhen+10*1000*1000 > now {
                continue
            }
            // Drop allpLock so we can take sched.lock.
            unlock(&allpLock)
            // Need to decrement number of idle locked M's
            // (pretending that one more is running) before the CAS.
            // Otherwise the M from which we retake can exit the syscall,
            // increment nmidle and report deadlock.
            incidlelocked(-1)
            // 抢占p,把p的状态转为idle状态
            if atomic.Cas(&_p_.status, s, _Pidle) {
                if trace.enabled {
                    traceGoSysBlock(_p_)
                    traceProcStop(_p_)
                }
                n++
                _p_.syscalltick++
                // 把当前p移交出去,上面已经分析过了
                handoffp(_p_)
            }
            incidlelocked(1)
            lock(&allpLock)
        } else if s == _Prunning {
            // Preempt G if it's running for too long.
            // 如果p是running状态,如果p下面的g执行太久了,则抢占
            t := int64(_p_.schedtick)
            if int64(pd.schedtick) != t {
                pd.schedtick = uint32(t)
                pd.schedwhen = now
                continue
            }
            // 判断是否超出10ms, 不超过不抢占
            if pd.schedwhen+forcePreemptNS > now {
                continue
            }
            // 开始抢占
            preemptone(_p_)
        }
    }
    unlock(&allpLock)
    return uint32(n)
}
func preemptone(_p_ *p) bool {
    mp := _p_.m.ptr()
    if mp == nil || mp == getg().m {
        return false
    }
    gp := mp.curg
    if gp == nil || gp == mp.g0 {
        return false
    }
    // 标识抢占字段
    gp.preempt = true

    // Every call in a go routine checks for stack overflow by
    // comparing the current stack pointer to gp->stackguard0.
    // Setting gp->stackguard0 to StackPreempt folds
    // preemption into the normal stack overflow check.
    // 更新stackguard0,保证能检测到栈溢
    gp.stackguard0 = stackPreempt
    return true
}
retake函数枚举所有的P,如果P在系统调用中(_Psyscall), 且经过了一次sysmon循环(20us~10ms), 则抢占这个P,调用handoffp解除M和P之间的关联。如果P在运行中(_Prunning), 且经过了一次sysmon循环并且G运行时间超过forcePreemptNS(10ms), 则抢占这个P,调用preemptone函数,设置g.preempt = true,设置g.stackguard0 = stackPreempt。

                                 

go函数的开头会比对stackguard0值判断是否需要扩张栈,stackPreempt是一个特殊的常量, 它的值会比任何的栈地址都要大, 检查时一定会触发栈扩张。

栈扩张调用的是morestack_noctxt函数, morestack_noctxt函数清空rdx寄存器并调用morestack函数.

TEXT runtime•morestack(SB),NOSPLIT,$-8-0
	// Cannot grow scheduler stack (m->g0).
	MOVD	g_m(g), R8
	MOVD	m_g0(R8), R4
	CMP	g, R4
	BNE	3(PC)
	BL	runtime•badmorestackg0(SB)
	B	runtime•abort(SB)

	// Cannot grow signal stack (m->gsignal).
	MOVD	m_gsignal(R8), R4
	CMP	g, R4
	BNE	3(PC)
	BL	runtime•badmorestackgsignal(SB)
	B	runtime•abort(SB)

	// Called from f.
	// Set g->sched to context in f
	MOVD	RSP, R0
	MOVD	R0, (g_sched+gobuf_sp)(g)
	MOVD	LR, (g_sched+gobuf_pc)(g)
	MOVD	R3, (g_sched+gobuf_lr)(g)
	MOVD	R26, (g_sched+gobuf_ctxt)(g)

	// Called from f.
	// Set m->morebuf to f's callers.
	MOVD	R3, (m_morebuf+gobuf_pc)(R8)	// f's caller's PC
	MOVD	RSP, R0
	MOVD	R0, (m_morebuf+gobuf_sp)(R8)	// f's caller's RSP
	MOVD	g, (m_morebuf+gobuf_g)(R8)

	// Call newstack on m->g0's stack.
	MOVD	m_g0(R8), g
	BL	runtime•save_g(SB)
	MOVD	(g_sched+gobuf_sp)(g), R0
	MOVD	R0, RSP
	MOVD.W	$0, -16(RSP)	// create a call frame on g0 (saved LR; keep 16-aligned)
	BL	runtime•newstack(SB)
	// Not reached, but make sure the return PC from the call to newstack
	// is still in this function, and not the beginning of the next.
	UNDEF

TEXT runtime•morestack_noctxt(SB),NOSPLIT,$-4-0
	MOVW	$0, R26
	B runtime•morestack(SB)

morestack函数会保存G的状态到g.sched, 切换到g0的栈, 然后调用newstack函数.
newstack是个go函数,判断本次是否是由抢占引起,如果是则调用gopreempt_m-> goschedImpl完成抢占.

func goschedImpl(gp *g) {
    status := readgstatus(gp)
    if status&^_Gscan != _Grunning {
        dumpgstatus(gp)
        throw("bad g status")
    }
    casgstatus(gp, _Grunning, _Grunnable)
    dropg()
    lock(&sched.lock)
    globrunqput(gp)
    unlock(&sched.lock)

    schedule()
}

goschedImpl函数完成吉祥工作:

(1)、设置G的状态由运行中(_Grunnable)改为待运行(_Grunnable)

(2)、调用dropg函数解除M和G之间的关联

(3)、调用globrunqput把G放到全局运行队列

(4)、调用schedule函数继续调度

最后又回到了schedule函数,不停的循环往复。

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值