目录
在讨论熔断-限流-降级,我们再回顾下服务雪崩
一、服务雪崩
- 服务雪崩原因
- 服务雪崩应对策略
二、熔断-限流-降级概述
限流和熔断最终都会导致用户的体验降级
- 限流:流量2k,但是我的服务能力只有1k,所以这个时候多出来的流量怎么办?
- a.拒绝;
- b.排队等待;
- 用户体验
- 用户体验不好:当前访问用户过多,请稍后重试
- 用户体验降级:原本是访问流畅,下单流畅 -> 当前访问用户过多,请稍后重试
- 熔断:
- 比如A服务访问B服务,这时候B服务很慢(B服务压力过大,导致了出现不少请求错误),调用方很容易出现一个问题:每次调用都超时
- 结果这个时候数据库出现了问题,超时重试,导致网络2k的流量突然变成了3k,这让原本满负荷的B服务雪上加霜,B服务宕机
- 如果这时候有一种熔断机制(比较恰当的比喻如保险丝)
- a.发现了大部分请求很慢,50%请求都很慢
- b.发现请求有50%都错误了
- c.错误数量很多,比如1s出现了20个错误
三、熔断限流技术选型
- Hystrix官网:https://github.com/Netflix/Hystrix;Netflix开源的,已经不维护了
- Sentinel:https://github.com/alibaba/Sentinel;阿里开源的,一直在维护
- sentinel-golang:https://github.com/alibaba/sentinel-golang
Sentinel | Hystrix | |
---|---|---|
隔离策略 | 信号量隔离 | 线程池隔离/信号量隔离 |
熔断降级策略 | 基于响应时间或失败比率 | 基于失败比率 |
实时指标实现 | 滑动窗口 | 滑动窗口(基于RxJava) |
规则配置 | 支持多种数据源 | 支持多种数据源 |
扩展性 | 多个扩展点 | 插件的形式 |
基于注解的支持 | 支持 | 支持 |
限流 | 基于QPS,支持基于调用关系的限流 | 有限的支持 |
流量整形 | 支持慢启动、匀速器模式 | 不支持 |
系统负载保护 | 支持 | 不支持 |
控制台 | 开箱即用,可配置规则,查看秒级监控、机器发现等 | 不完善 |
常见框架的适配 | Servlet、Spring Cloud、Dubbo、gRpc等 | Servlet、Spring Cloud Netflix |
四、sentinel限流
1 - 基于QPS限流
-
基于QPS限流:
- Entry表示入口,base.Inbound表示入流量
- StatIntervalInMs: 规则对应的流量控制器的独立统计结构的统计周期。如果StatIntervalInMs是1000,也就是统计QPS
- 测试结果是之前10次都是检查通过,第11次和第12次限流
-
ControlBehavior:表示表示流量控制器的控制行为,目前 Sentinel 支持两种控制行为
- Reject:表示如果当前统计周期内,统计结构统计的请求数超过了阈值,就直接拒绝
- Throttling:表示匀速排队的统计策略。它的中心思想是,以固定的间隔时间让请求通过
-
Throttling匀速排队
-
Reject策略
package main
import (
"fmt"
sentinel "github.com/alibaba/sentinel-golang/api"
"github.com/alibaba/sentinel-golang/core/base"
"github.com/alibaba/sentinel-golang/core/flow"
"log"
)
func main() {
//先初始化sentinel
err := sentinel.InitDefault()
if err != nil {
log.Fatalf("初始化sentinel 异常: %v", err)
}
//配置限流规则
_, err = flow.LoadRules([]*flow.Rule{
{
Resource: "some-test",
TokenCalculateStrategy: flow.Direct,
ControlBehavior: flow.Reject, //匀速通过
Threshold: 10,
StatIntervalInMs: 1000,
},
{
Resource: "some-test2",
TokenCalculateStrategy: flow.Direct,
ControlBehavior: flow.Reject, //直接拒绝
Threshold: 10,
StatIntervalInMs: 1000,
},
})
if err != nil {
log.Fatalf("加载规则失败: %v", err)
}
for i := 0; i < 12; i++ {
e, b := sentinel.Entry("some-test", sentinel.WithTrafficType(base.Inbound))
if b != nil {
fmt.Println("限流了")
} else {
fmt.Println("检查通过")
e.Exit()
}
}
}
2 - Throttling策略
- Throttling策略:修改
ControlBehavior: flow.Throttling, //匀速通过
package main
import (
"fmt"
sentinel "github.com/alibaba/sentinel-golang/api"
"github.com/alibaba/sentinel-golang/core/base"
"github.com/alibaba/sentinel-golang/core/flow"
"log"
)
func main() {
//先初始化sentinel
err := sentinel.InitDefault()
if err != nil {
log.Fatalf("初始化sentinel 异常: %v", err)
}
//配置限流规则
_, err = flow.LoadRules([]*flow.Rule{
{
Resource: "some-test",
TokenCalculateStrategy: flow.Direct,
ControlBehavior: flow.Throttling, //匀速通过
Threshold: 10,
StatIntervalInMs: 1000,
},
})
if err != nil {
log.Fatalf("加载规则失败: %v", err)
}
for i := 0; i < 12; i++ {
e, b := sentinel.Entry("some-test", sentinel.WithTrafficType(base.Inbound))
if b != nil {
fmt.Println("限流了")
} else {
fmt.Println("检查通过")
e.Exit()
}
}
}
- 验证Throttling策略:我们在每个Entry后添加sleep 101ms,结果就是全部通过
package main
import (
"fmt"
"log"
"time"
sentinel "github.com/alibaba/sentinel-golang/api"
"github.com/alibaba/sentinel-golang/core/base"
"github.com/alibaba/sentinel-golang/core/flow"
)
func main() {
//先初始化sentinel
err := sentinel.InitDefault()
if err != nil {
log.Fatalf("初始化sentinel 异常: %v", err)
}
//配置限流规则
_, err = flow.LoadRules([]*flow.Rule{
{
Resource: "some-test",
TokenCalculateStrategy: flow.Direct,
ControlBehavior: flow.Throttling, //匀速通过
Threshold: 10,
StatIntervalInMs: 1000,
},
})
if err != nil {
log.Fatalf("加载规则失败: %v", err)
}
for i := 0; i < 12; i++ {
e, b := sentinel.Entry("some-test", sentinel.WithTrafficType(base.Inbound))
if b != nil {
fmt.Println("限流了")
} else {
fmt.Println("检查通过")
e.Exit()
}
time.Sleep(101 * time.Millisecond)
}
}
3 - sentinel预热/冷启动
- WarmUp:即预热/冷启动方式。当系统长期处于低水位的情况下,当流量突然增加时,直接把系统拉升到高水位可能瞬间把系统压垮。通过"冷启动",让通过的流量缓慢增加,在一定时间内逐渐增加到阈值上限,给冷系统一个预热的时间,避免冷系统被压垮
- 测试预热
package main
import (
"fmt"
"log"
"math/rand"
"time"
sentinel "github.com/alibaba/sentinel-golang/api"
"github.com/alibaba/sentinel-golang/core/base"
"github.com/alibaba/sentinel-golang/core/flow"
)
func main() {
//先初始化sentinel
err := sentinel.InitDefault()
if err != nil {
log.Fatalf("初始化sentinel 异常: %v", err)
}
var globalTotal int
var passTotal int
var blockTotal int
ch := make(chan struct{})
//配置限流规则
_, err = flow.LoadRules([]*flow.Rule{
{
Resource: "some-test",
TokenCalculateStrategy: flow.WarmUp, //冷启动策略
ControlBehavior: flow.Reject, //直接拒绝
Threshold: 1000,
WarmUpPeriodSec: 30,
},
})
if err != nil {
log.Fatalf("加载规则失败: %v", err)
}
//我会在每一秒统计一次,这一秒只能 你通过了多少,总共有多少, block了多少, 每一秒会产生很多的block
for i := 0; i < 100; i++ {
go func() {
for {
globalTotal++
e, b := sentinel.Entry("some-test", sentinel.WithTrafficType(base.Inbound))
if b != nil {
//fmt.Println("限流了")
blockTotal++
time.Sleep(time.Duration(rand.Uint64()%10) * time.Millisecond)
} else {
passTotal++
time.Sleep(time.Duration(rand.Uint64()%10) * time.Millisecond)
e.Exit()
}
}
}()
}
go func() {
var oldTotal int //过去1s总共有多少个
var oldPass int //过去1s总共pass多少个
var oldBlock int //过去1s总共block多少个
for {
oneSecondTotal := globalTotal - oldTotal
oldTotal = globalTotal
oneSecondPass := passTotal - oldPass
oldPass = passTotal
oneSecondBlock := blockTotal - oldBlock
oldBlock = blockTotal
time.Sleep(time.Second)
fmt.Printf("total:%d, pass:%d, block:%d\n", oneSecondTotal, oneSecondPass, oneSecondBlock)
}
}()
<-ch
}
五、sentinel熔断
1 - 熔断器模型
- sentinel熔断器模型:Sentinel 熔断降级基于熔断器模式 (circuit breaker pattern) 实现。熔断器内部维护了一个熔断器的状态机
- 熔断器有三种状态
- Closed 状态:也是初始状态,该状态下,熔断器会保持闭合,对资源的访问直接通过熔断器的检查
- Open 状态:断开状态,熔断器处于开启状态,对资源的访问会被切断
- Half-Open 状态:半开状态,该状态下除了探测流量,其余对资源的访问也会被切断。探测流量指熔断器处于半开状态时,会周期性的允许一定数目的探测请求通过,如果探测请求能够正常的返回,代表探测成功,此时熔断器会重置状态到 Closed 状态,结束熔断;如果探测失败,则回滚到 Open 状态
- 静默期:
- Sentinel 熔断器的三种熔断策略都支持静默期 (规则中通过MinRequestAmount字段表示)
- 静默期是指一个最小的静默请求数,在一个统计周期内,如果对资源的请求数小于设置的静默数,那么熔断器将不会基于其统计值去更改熔断器的状态
- 静默期的设计理由也很简单,举个例子,假设在一个统计周期刚刚开始时候,第 1 个请求碰巧是个慢请求,这个时候这个时候的慢调用比例就会是 100%,很明显是不合理,所以存在一定的巧合性
- 所以静默期提高了熔断器的精准性以及降低误判可能性
2 - 基于错误数熔断
package main
import (
"errors"
"fmt"
"log"
"math/rand"
"time"
sentinel "github.com/alibaba/sentinel-golang/api"
"github.com/alibaba/sentinel-golang/core/circuitbreaker"
"github.com/alibaba/sentinel-golang/core/config"
"github.com/alibaba/sentinel-golang/logging"
"github.com/alibaba/sentinel-golang/util"
)
type stateChangeTestListener struct {
}
func (s *stateChangeTestListener) OnTransformToClosed(prev circuitbreaker.State, rule circuitbreaker.Rule) {
fmt.Printf("rule.steategy: %+v, From %s to Closed, time: %d\n", rule.Strategy, prev.String(), util.CurrentTimeMillis())
}
func (s *stateChangeTestListener) OnTransformToOpen(prev circuitbreaker.State, rule circuitbreaker.Rule, snapshot interface{}) {
fmt.Printf("rule.steategy: %+v, From %s to Open, snapshot: %d, time: %d\n", rule.Strategy, prev.String(), snapshot, util.CurrentTimeMillis())
}
func (s *stateChangeTestListener) OnTransformToHalfOpen(prev circuitbreaker.State, rule circuitbreaker.Rule) {
fmt.Printf("rule.steategy: %+v, From %s to Half-Open, time: %d\n", rule.Strategy, prev.String(), util.CurrentTimeMillis())
}
func main() {
total := 0
totalPass := 0
totalBlock := 0
totalErr := 0
conf := config.NewDefaultConfig()
// for testing, logging output to console
conf.Sentinel.Log.Logger = logging.NewConsoleLogger()
err := sentinel.InitWithConfig(conf)
if err != nil {
log.Fatal(err)
}
ch := make(chan struct{})
// Register a state change listener so that we could observer the state change of the internal circuit breaker.
circuitbreaker.RegisterStateChangeListeners(&stateChangeTestListener{})
_, err = circuitbreaker.LoadRules([]*circuitbreaker.Rule{
// Statistic time span=10s, recoveryTimeout=3s, maxErrorCount=50
{
Resource: "abc",
Strategy: circuitbreaker.ErrorCount,
RetryTimeoutMs: 3000, //3s之后尝试恢复
MinRequestAmount: 10, //静默数
StatIntervalMs: 5000,
Threshold: 50,
},
})
if err != nil {
log.Fatal(err)
}
logging.Info("[CircuitBreaker ErrorCount] Sentinel Go circuit breaking demo is running. You may see the pass/block metric in the metric log.")
go func() {
for {
total++
e, b := sentinel.Entry("abc")
if b != nil {
// g1 blocked
totalBlock++
fmt.Println("协程熔断了")
time.Sleep(time.Duration(rand.Uint64()%20) * time.Millisecond)
} else {
totalPass++
if rand.Uint64()%20 > 9 {
totalErr++
// Record current invocation as error.
sentinel.TraceError(e, errors.New("biz error"))
}
// g1 passed
time.Sleep(time.Duration(rand.Uint64()%20+10) * time.Millisecond)
e.Exit()
}
}
}()
go func() {
for {
total++
e, b := sentinel.Entry("abc")
if b != nil {
// g2 blocked
totalBlock++
time.Sleep(time.Duration(rand.Uint64()%20) * time.Millisecond)
} else {
// g2 passed
totalPass++
time.Sleep(time.Duration(rand.Uint64()%80) * time.Millisecond)
e.Exit()
}
}
}()
go func() {
for {
time.Sleep(time.Second)
fmt.Println(totalErr)
}
}()
<-ch
}
3 - 基于错误率熔断
package main
import (
"errors"
"fmt"
"log"
"math/rand"
"time"
sentinel "github.com/alibaba/sentinel-golang/api"
"github.com/alibaba/sentinel-golang/core/circuitbreaker"
"github.com/alibaba/sentinel-golang/core/config"
"github.com/alibaba/sentinel-golang/logging"
"github.com/alibaba/sentinel-golang/util"
)
type stateChangeTestListener struct {
}
func (s *stateChangeTestListener) OnTransformToClosed(prev circuitbreaker.State, rule circuitbreaker.Rule) {
fmt.Printf("rule.steategy: %+v, From %s to Closed, time: %d\n", rule.Strategy, prev.String(), util.CurrentTimeMillis())
}
func (s *stateChangeTestListener) OnTransformToOpen(prev circuitbreaker.State, rule circuitbreaker.Rule, snapshot interface{}) {
fmt.Printf("rule.steategy: %+v, From %s to Open, snapshot: %.2f, time: %d\n", rule.Strategy, prev.String(), snapshot, util.CurrentTimeMillis())
}
func (s *stateChangeTestListener) OnTransformToHalfOpen(prev circuitbreaker.State, rule circuitbreaker.Rule) {
fmt.Printf("rule.steategy: %+v, From %s to Half-Open, time: %d\n", rule.Strategy, prev.String(), util.CurrentTimeMillis())
}
func main() {
total := 0
totalPass := 0
totalBlock := 0
totalErr := 0
conf := config.NewDefaultConfig()
// for testing, logging output to console
conf.Sentinel.Log.Logger = logging.NewConsoleLogger()
err := sentinel.InitWithConfig(conf)
if err != nil {
log.Fatal(err)
}
ch := make(chan struct{})
// Register a state change listener so that we could observer the state change of the internal circuit breaker.
circuitbreaker.RegisterStateChangeListeners(&stateChangeTestListener{})
_, err = circuitbreaker.LoadRules([]*circuitbreaker.Rule{
// Statistic time span=10s, recoveryTimeout=3s, maxErrorCount=50
{
Resource: "abc",
Strategy: circuitbreaker.ErrorRatio,
RetryTimeoutMs: 3000, //3s之后尝试恢复
MinRequestAmount: 10, //静默数
StatIntervalMs: 5000,
Threshold: 0.4,
},
})
if err != nil {
log.Fatal(err)
}
logging.Info("[CircuitBreaker ErrorCount] Sentinel Go circuit breaking demo is running. You may see the pass/block metric in the metric log.")
go func() {
for {
total++
e, b := sentinel.Entry("abc")
if b != nil {
// g1 blocked
totalBlock++
fmt.Println("协程熔断了")
time.Sleep(time.Duration(rand.Uint64()%20) * time.Millisecond)
} else {
totalPass++
if rand.Uint64()%20 > 9 {
totalErr++
// Record current invocation as error.
sentinel.TraceError(e, errors.New("biz error"))
}
// g1 passed
time.Sleep(time.Duration(rand.Uint64()%40+10) * time.Millisecond)
e.Exit()
}
}
}()
go func() {
for {
total++
e, b := sentinel.Entry("abc")
if b != nil {
// g2 blocked
totalBlock++
time.Sleep(time.Duration(rand.Uint64()%20) * time.Millisecond)
} else {
// g2 passed
totalPass++
time.Sleep(time.Duration(rand.Uint64()%80) * time.Millisecond)
e.Exit()
}
}
}()
go func() {
for {
time.Sleep(time.Second)
fmt.Println(float64(totalErr) / float64(total))
}
}()
<-ch
}
4 - 基于慢请求熔断
package main
import (
"errors"
"fmt"
"log"
"math/rand"
"time"
sentinel "github.com/alibaba/sentinel-golang/api"
"github.com/alibaba/sentinel-golang/core/circuitbreaker"
"github.com/alibaba/sentinel-golang/core/config"
"github.com/alibaba/sentinel-golang/logging"
"github.com/alibaba/sentinel-golang/util"
)
type stateChangeTestListener struct {
}
func (s *stateChangeTestListener) OnTransformToClosed(prev circuitbreaker.State, rule circuitbreaker.Rule) {
fmt.Printf("rule.steategy: %+v, From %s to Closed, time: %d\n", rule.Strategy, prev.String(), util.CurrentTimeMillis())
}
func (s *stateChangeTestListener) OnTransformToOpen(prev circuitbreaker.State, rule circuitbreaker.Rule, snapshot interface{}) {
fmt.Printf("rule.steategy: %+v, From %s to Open, snapshot: %.2f, time: %d\n", rule.Strategy, prev.String(), snapshot, util.CurrentTimeMillis())
}
func (s *stateChangeTestListener) OnTransformToHalfOpen(prev circuitbreaker.State, rule circuitbreaker.Rule) {
fmt.Printf("rule.steategy: %+v, From %s to Half-Open, time: %d\n", rule.Strategy, prev.String(), util.CurrentTimeMillis())
}
func main() {
conf := config.NewDefaultConfig()
// for testing, logging output to console
conf.Sentinel.Log.Logger = logging.NewConsoleLogger()
err := sentinel.InitWithConfig(conf)
if err != nil {
log.Fatal(err)
}
ch := make(chan struct{})
// Register a state change listener so that we could observer the state change of the internal circuit breaker.
circuitbreaker.RegisterStateChangeListeners(&stateChangeTestListener{})
_, err = circuitbreaker.LoadRules([]*circuitbreaker.Rule{
// Statistic time span=10s, recoveryTimeout=3s, slowRtUpperBound=50ms, maxSlowRequestRatio=50%
{
Resource: "abc",
Strategy: circuitbreaker.SlowRequestRatio,
RetryTimeoutMs: 3000,
MinRequestAmount: 10,
StatIntervalMs: 5000,
MaxAllowedRtMs: 50,
Threshold: 0.5,
},
})
if err != nil {
log.Fatal(err)
}
logging.Info("[CircuitBreaker SlowRtRatio] Sentinel Go circuit breaking demo is running. You may see the pass/block metric in the metric log.")
go func() {
for {
e, b := sentinel.Entry("abc")
if b != nil {
// g1 blocked
time.Sleep(time.Duration(rand.Uint64()%20) * time.Millisecond)
} else {
if rand.Uint64()%20 > 9 {
// Record current invocation as error.
sentinel.TraceError(e, errors.New("biz error"))
}
// g1 passed
time.Sleep(time.Duration(rand.Uint64()%80+10) * time.Millisecond)
e.Exit()
}
}
}()
go func() {
for {
e, b := sentinel.Entry("abc")
if b != nil {
// g2 blocked
time.Sleep(time.Duration(rand.Uint64()%20) * time.Millisecond)
} else {
// g2 passed
time.Sleep(time.Duration(rand.Uint64()%80) * time.Millisecond)
e.Exit()
}
}
}()
<-ch
}
六、gin集成sentinel
- 我们在goods_web中对grpc的GoodsList做限流:熔断跟限流差不多,只是配置不同而已,根据需求进行修改即可
- goods_web/initialize/init_sentinel.go:初始化sentinel
package initialize
import (
sentinel "github.com/alibaba/sentinel-golang/api"
"github.com/alibaba/sentinel-golang/core/flow"
"go.uber.org/zap"
)
func InitSentinel() {
err := sentinel.InitDefault()
if err != nil {
zap.S().Fatalf("初始化sentinel 异常: %v", err)
}
//配置限流规则
//这种配置应该从nacos中读取
_, err = flow.LoadRules([]*flow.Rule{
{
Resource: "goods-list",
TokenCalculateStrategy: flow.Direct,
ControlBehavior: flow.Reject,
//Threshold: 20,
Threshold: 3, //为了测试,6秒钟只允许3个请求
StatIntervalInMs: 6000,
},
})
if err != nil {
zap.S().Fatalf("加载规则失败: %v", err)
}
}
- goods_web/main.go:main中添加初始化逻辑
func main() {
//1. 初始化logger
initialize.InitLogger()
//2. 初始化配置文件
initialize.InitConfig()
//3. 初始化routers
Router := initialize.Routers()
//4. 初始化翻译
if err := initialize.InitTrans("zh"); err != nil {
panic(err)
}
//5. 初始化srv的连接
initialize.InitSrvConn()
//6.初始化sentinel
initialize.InitSentinel()
//省略。。。
- goods_web/api/goods/api_goods.go
func List(ctx *gin.Context) {
//省略。。。
e, b := sentinel.Entry("goods-list", sentinel.WithTrafficType(base.Inbound))
if b != nil {
ctx.JSON(http.StatusTooManyRequests, gin.H{
"msg": "请求过于频繁,请稍后重试",
})
return
}
请求商品的service服务
r, err := global.GoodsSrvClient.GoodsList(context.WithValue(context.Background(), "ginContext", ctx), request)
if err != nil {
zap.S().Errorw("[List] 查询 【商品列表】失败")
api.HandleGrpcErrorToHttp(err, ctx)
return
}
e.Exit()
reMap := map[string]interface{}{
"total": r.Total,
}