一、背景
本文基于client-go@v0.16.4
之前业务中有这么一段伪代码:
func batchCreateVmi(ds []UserData) {
wg := &sync.WaitGroup{}
// vmiNum会动态改变,开发测试一般只有十几
// 生产能到200(有逻辑限制了上限200)
const vmiNum = 200
for i := 0; i < vmiNum; i++ {
wg.Add(1)
createOneVmi(wg, ds[i])
}
wg.Wait()
}
func createOneVmi(wg *sync.WaitGroup, data UserData) {
defer wg.Done()
// 15秒超时的context
ctx, _ := context.WithTimeout(context.Backgroud, 15*time.Second)
// 从数据库中查询vmi模板数据
vmiTemplate, _ := getVmiTemplateFromDB(ctx, data.TemplateID)
// 用户数据和模板数据结合
target := combindTemplateAndData(vmiTemplate, data)
// client-go调api server接口创建vmi
// 和其它协程用的相同的client
createVmi(target)
// 更新数据库中的状态
// 注意这里用的还是最开始的context
if err := updateDBStatus(ctx, data); err != nil {
log.Errorf("updateDBStatus error: %s", err.Error())
return
}
}
开发测试功能验证并没有发现什么问题,然而到了生产之后,出现大量的updateDBStatus error: context deadline
错误,很明显是context超时了,正常来说整个函数走完都是在1秒之内的,到底是哪个环节导致的15秒context的超时呢?
在这段代码中,有数据库操作和调apiServer接口创建vmi的操作。首先怀疑是否是数据库性能问题——查看数据库性能监控数据,未发现异常;接着怀疑apiServer压力——同样没有发现异常信息;再查看服务到数据库和服务到apiServer之间的网络——仍未发现异常。
最后,怀疑客户端有问题。通过查阅一些资料,发现可能是client-go默认的QPS和Burst参数导致的:默认QPS为5,Burst为10
:
// k8s.io/client-go/rest/config.go
// Config holds the common attributes that can be passed to a Kubernetes client on
// initialization.
type Config struct {
/*...*/
// QPS indicates the maximum QPS to the master from this client.
// If it's zero, the created RESTClient will use DefaultQPS: 5
QPS float32
// Maximum burst for throttle.
// If it's zero, the created RESTClient will use DefaultBurst: 10.
Burst int
/*...*/
}
QPS和Burst是client-go令牌桶限流的两个参数,其中QPS=5表示每秒产生5个令牌放到令牌桶中,Burst=10表示令牌桶的容量是10
,client-go只有拿到了令牌才能对apiServer发请求,这意味着默认最多一秒钟发送15个请求(令牌桶中10个令牌+这一秒新产生的5个令牌),但是第二秒开始只有新产生的5个令牌,因此第二秒只能发送5个请求。开发测试流量小,最多也就是十几并发的样子,生产高峰时段能到上限200,如果每秒发送5个,200个就需要200/5=40秒,远远超过context的15秒超时时间。
在开发环境上量压测,复现问题;增大QPS和Burst,不再出现超时现象:
import (
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/tools/clientcmd"
"myproject/myconfig" // 项目中的配置项
)
func initClientSe() error {
/*...*/
restConfig, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
if err != nil {
return err
}
// 如果配置文件中配置了QPS和Burst,以配置文件中的为准
// 很多开源项目把QPS和Burst都配置为100
if myconfig.QPS > 0 {
restConfig.QPS = myconfig.QPS
}
if myconfig.Burst > 0 {
restConfig.Burst = myconfig.Burst
}
// 修改QPS和Burst后初始化clientSet
clientSet, err := kubernetes.NewForConfig(restConfig)
/*..*/
}
二、client-go令牌桶限流源码分析
接着前文client-go初始化部分,在kubernetes.NewForConfig中有对应的限流初始化逻辑:
// k8s.io/client-go/kubernetes/lientset.go
// NewForConfig creates a new Clientset for the given config.
// If config's RateLimiter is not set and QPS and Burst are acceptable,
// NewForConfig will generate a rate-limiter in configShallowCopy.
func NewForConfig(c *rest.Config) (*Clientset, error) {
configShallowCopy := *c
if configShallowCopy.RateLimiter == nil && configShallowCopy.QPS > 0 {
if configShallowCopy.Burst <= 0 {
return nil, fmt.Errorf("Burst is required to be greater than 0 when RateLimiter is not set and QPS is set to greater than 0")
}
configShallowCopy.RateLimiter = flowcontrol.NewTokenBucketRateLimiter(configShallowCopy.QPS, configShallowCopy.Burst)
}
/*...*/
}
可以看到有调用flowcontrol(流控)包下的NewTokenBucketRateLimiter方法初始化令牌桶限速器,返回一个RateLimiter,RateLimiter定义如下:
type RateLimiter interface {
// TryAccept returns true if a token is taken immediately. Otherwise,
// it returns false.
TryAccept() bool
// Accept returns once a token becomes available.
Accept()
// Stop stops the rate limiter, subsequent calls to CanAccept will return false
Stop()
// QPS returns QPS of this rate limiter
QPS() float32
// Wait returns nil if a token is taken before the Context is done.
Wait(ctx context.Context) error
}
client-go请求apiServer的request函数有如下逻辑,每次请求apiServer都会先执行tryThrottle:
// k8s.io/client-go/rest/request.go
// Do formats and executes the request. Returns a Result object for easy response
// processing.
//
// Error type:
// * If the request can't be constructed, or an error happened earlier while building its
// arguments: *RequestConstructionError
// * If the server responds with a status: *errors.StatusError or *errors.UnexpectedObjectError
// * http.Client.Do errors are returned directly.
func (r *Request) Do() Result {
if err := r.tryThrottle(); err != nil {
return Result{err: err}
}
var result Result
err := r.request(func(req *http.Request, resp *http.Response) {
result = r.transformResponse(resp, req)
})
if err != nil {
return Result{err: err}
}
return result
}
tryThrottle
就是与令牌桶限速关系密切的函数,其实现如下:
// k8s.io/client-go/rest/request.go
func (r *Request) tryThrottle() error {
if r.throttle == nil {
return nil
}
now := time.Now()
var err error
if r.ctx != nil {
err = r.throttle.Wait(r.ctx)
} else {
r.throttle.Accept()
}
if latency := time.Since(now); latency > longThrottleLatency {
klog.V(4).Infof("Throttling request took %v, request: %s:%s", latency, r.verb, r.URL().String())
}
return err
}
tryThrottle函数主要涉及r.throttle.Wait
和r.throttle.Accept
两个函数,而r.throttle正是前面NewForConfig中初始化的RateLimiter。因此我们来看看RateLimiter的Wait和Accept实现:
// k8s.io/client-go/util/flowcontrol/throttle.go
func (t *tokenBucketRateLimiter) Wait(ctx context.Context) error {
return t.limiter.Wait(ctx)
}
func (t *tokenBucketRateLimiter) Accept() {
now := t.clock.Now()
t.clock.Sleep(t.limiter.ReserveN(now, 1).DelayFrom(now))
}
可以看到Wait和Accept基本没什么逻辑,主要还是调用limter的Wait方法和ReserveN方法。limiter初始化如下:
// k8s.io/client-go/util/flowcontrol/throttle.go
import (
"golang.org/x/time/rate"
)
// NewTokenBucketRateLimiter creates a rate limiter which implements a token bucket approach.
// The rate limiter allows bursts of up to 'burst' to exceed the QPS, while still maintaining a
// smoothed qps rate of 'qps'.
// The bucket is initially filled with 'burst' tokens, and refills at a rate of 'qps'.
// The maximum number of tokens in the bucket is capped at 'burst'.
func NewTokenBucketRateLimiter(qps float32, burst int) RateLimiter {
limiter := rate.NewLimiter(rate.Limit(qps), burst)
return newTokenBucketRateLimiter(limiter, realClock{}, qps)
}
func newTokenBucketRateLimiter(limiter *rate.Limiter, c Clock, qps float32) RateLimiter {
return &tokenBucketRateLimiter{
limiter: limiter,
clock: c,
qps: qps,
}
}
limiter是golang.org/x/time/rate
包中初始化的对象,我们来看看这个包下Wait和ReserveN两个方法的实现。
golang.org/x/time/rate包对应github上github.com/golang/time/rate
// Wait is shorthand for WaitN(ctx, 1).
func (lim *Limiter) Wait(ctx context.Context) (err error) {
return lim.WaitN(ctx, 1)
}
// WaitN blocks until lim permits n events to happen.
// It returns an error if n exceeds the Limiter's burst size, the Context is
// canceled, or the expected wait time exceeds the Context's Deadline.
// The burst limit is ignored if the rate limit is Inf.
func (lim *Limiter) WaitN(ctx context.Context, n int) (err error) {
// The test code calls lim.wait with a fake timer generator.
// This is the real timer generator.
newTimer := func(d time.Duration) (<-chan time.Time, func() bool, func()) {
timer := time.NewTimer(d)
return timer.C, timer.Stop, func() {}
}
return lim.wait(ctx, n, time.Now(), newTimer)
}
// wait is the internal implementation of WaitN.
func (lim *Limiter) wait(ctx context.Context, n int, t time.Time, newTimer func(d time.Duration) (<-chan time.Time, func() bool, func())) error {
lim.mu.Lock()
burst := lim.burst
limit := lim.limit
lim.mu.Unlock()
if n > burst && limit != Inf {
return fmt.Errorf("rate: Wait(n=%d) exceeds limiter's burst %d", n, burst)
}
// Check if ctx is already cancelled
select {
case <-ctx.Done():
return ctx.Err()
default:
}
// Determine wait limit
waitLimit := InfDuration
if deadline, ok := ctx.Deadline(); ok {
waitLimit = deadline.Sub(t)
}
// Reserve
r := lim.reserveN(t, n, waitLimit)
if !r.ok {
return fmt.Errorf("rate: Wait(n=%d) would exceed context deadline", n)
}
// Wait if necessary
delay := r.DelayFrom(t)
if delay == 0 {
return nil
}
ch, stop, advance := newTimer(delay)
defer stop()
advance() // only has an effect when testing
select {
case <-ch:
// We can proceed.
return nil
case <-ctx.Done():
// Context was canceled before we could proceed. Cancel the
// reservation, which may permit other events to proceed sooner.
r.Cancel()
return ctx.Err()
}
}
// reserveN is a helper method for AllowN, ReserveN, and WaitN.
// maxFutureReserve specifies the maximum reservation wait duration allowed.
// reserveN returns Reservation, not *Reservation, to avoid allocation in AllowN and WaitN.
func (lim *Limiter) reserveN(t time.Time, n int, maxFutureReserve time.Duration) Reservation {
lim.mu.Lock()
defer lim.mu.Unlock()
if lim.limit == Inf {
return Reservation{
ok: true,
lim: lim,
tokens: n,
timeToAct: t,
}
} else if lim.limit == 0 {
var ok bool
if lim.burst >= n {
ok = true
lim.burst -= n
}
return Reservation{
ok: ok,
lim: lim,
tokens: lim.burst,
timeToAct: t,
}
}
t, tokens := lim.advance(t)
// Calculate the remaining number of tokens resulting from the request.
tokens -= float64(n)
// Calculate the wait duration
var waitDuration time.Duration
if tokens < 0 {
waitDuration = lim.limit.durationFromTokens(-tokens)
}
// Decide result
ok := n <= lim.burst && waitDuration <= maxFutureReserve
// Prepare reservation
r := Reservation{
ok: ok,
lim: lim,
limit: lim.limit,
}
if ok {
r.tokens = n
r.timeToAct = t.Add(waitDuration)
// Update state
lim.last = t
lim.tokens = tokens
lim.lastEvent = r.timeToAct
}
return r
}
// advance calculates and returns an updated state for lim resulting from the passage of time.
// lim is not changed.
// advance requires that lim.mu is held.
func (lim *Limiter) advance(t time.Time) (newT time.Time, newTokens float64) {
last := lim.last
if t.Before(last) {
last = t
}
// Calculate the new number of tokens, due to time that passed.
elapsed := t.Sub(last)
delta := lim.limit.tokensFromDuration(elapsed)
tokens := lim.tokens + delta
if burst := float64(lim.burst); tokens > burst {
tokens = burst
}
return t, tokens
}
// tokensFromDuration is a unit conversion function from a time duration to the number of tokens
// which could be accumulated during that duration at a rate of limit tokens per second.
func (limit Limit) tokensFromDuration(d time.Duration) float64 {
if limit <= 0 {
return 0
}
return d.Seconds() * float64(limit)
}
三、总结
golang.org/x/time/rate
下的rate.go文件只有400多行,上个章节有省略部分代码,如果对省略部分有兴趣,可以自行研读官网上的代码。
从前面的代码来看,client-go的令牌桶限流实现有以下几个注意点:
- Burst表示令牌桶的大小,QPS表示1s可以产生多少令牌。在golang.org/x/time/rate包中,如果QPS设置为InfDuration(1<<63-1,表示无限大),则client每次都能
立刻
拿到令牌,不会受限于Burst; - 如果client-go Burst=10,QPS=5,则最多一秒内可以有15个请求可以拿到令牌(令牌桶10个+这1秒内新产生的5个),但不表示每秒都能有15个请求能拿到令牌;
- golang.org/x/time/rate产生令牌并
没有单独起个协程定时往令牌桶里放令牌,而是当有拿令牌的请求过来时,计算当前时间与上次生成令牌的时间差,再结合QPS参数,往令牌桶里生成对应数量的令牌
(如果时间间隔太长,计算出来的令牌数量大于Burst,则生成Burst个令牌),再从令牌桶中拿对应数量的令牌(client-go中1个请求1个令牌)。
因此,有如下示意图:
微信公众号卡巴斯同步发布,欢迎大家关注。