前言
服务雪崩:
服务雪崩是系统中的蝴蝶效应导致的, 其发生的原因多种多样, 有不合理的容量设计, 或者是高并发下某一个方法响应变慢, 亦或是某台机器的资源耗尽; 从源头上我们无法完全杜绝雪崩源头的发生, 但是雪崩的根本原因来源于服务之间的强依赖, 所以我们可以提前评估。当整个微服务系统中, 有一个节点出现异常情况, 就有可能在高并发的情况下出现雪崩, 导致调用它的上游系统出现响应延迟, 响应延迟就会导致 tomcat 连接本耗尽, 导致该服务节点不能正常的接收到正常的情况, 这就是服务雪崩行为;
服务隔离:
如果整个系统雪崩是由于一个接口导致的, 由于这一个接口响应不及时导致问题, 那么我们就有必要对这个接口进行隔离, 就是只允许这个接口最多能接受多少的并发, 做了这样的限制后,该接口的主机就会空余线程出来接收其他的情况,不会被哪个坏了的接口占用满;
Hystrix服务隔离
依赖导入:
<!-- Hystrix服务隔离依赖导入 -->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
</dependency>
启动类开启Hystrix功能:
@SpringBootApplication
//启用断路器功能
@EnableCircuitBreaker
//开启Eureka客户端功能
@EnableEurekaClient
public class EurekaClientConsumerApplication {
@Bean
//负载均衡注解
@LoadBalanced
RestTemplate restTemplate(){
return new RestTemplate();
}
public static void main(String[] args) {
SpringApplication.run(EurekaClientConsumerApplication.class);
}
}
代码使用:
@HystrixCommand(fallbackMethod = "testFallBack")
@Override
public String testHystrix() {
/** 调用远程服务 */
String result = restTemplate.getForObject("http://"+ SERVIER_NAME + "/queryUser",String.class);
return result;
}
@Override
public String testFallBack() {
log.info(Thread.currentThread().getName() + "==> ------ FallBack ------");
return "[FallBack]";
}
Hystrix服务隔离策略
代码配置 (隔离策略 -- 线程池)
/**
* 服务隔离隔离策略: 线程池模式
*/
@HystrixCommand(fallbackMethod = "testFallBack",
commandKey = "testHystrixThread",
groupKey = "LIC_GroupKey_XY",
commandProperties = {
@HystrixProperty(name = "execution.isolation.strategy", value = "THREAD"),
@HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "1000000000")
},
threadPoolKey = "LIC_ThreadPollKey_XY",
threadPoolProperties = {
@HystrixProperty(name = "coreSize", value = "10")
})
@Override
public String testHystrixThread() {
log.info(Thread.currentThread().getName() + "==> ****** HystrixThread ******");
/** 调用远程服务 */
String result = restTemplate.getForObject("http://"+ SERVIER_NAME + "/queryUser",String.class);
return result;
}
代码配置 (隔离策略 -- 信号量)
/**
* 服务隔离隔离策略: 信号量模式
*/
@HystrixCommand(fallbackMethod = "testFallBack",
commandKey = "testHystrixSemaphore",
groupKey = "LIC_GroupKey_XY1",
commandProperties = {
@HystrixProperty(name = "execution.isolation.semaphore.maxConcurrentRequests",value = "10"),
@HystrixProperty(name = "execution.isolation.strategy", value = "SEMAPHORE"),
@HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "1000000000")
})
@Override
public String testHystrixSemaphore() {
log.info(Thread.currentThread().getName() + "==> ****** HystrixSemaphore ******");
/** 调用远程服务 */
String result = restTemplate.getForObject("http://"+ SERVIER_NAME + "/queryUser",String.class);
return result;
}
降级方法:
/**
* 服务降级方法
*/
@Override
public String testFallBack() {
log.info(Thread.currentThread().getName() + "==> ------ FallBack ------");
return "[FallBack]";
}
Hystrix服务隔离策略 - 测试
1. 隔离策略 -- 线程池
THREAD线程池隔离策略, 独立线程接收请求, 默认采用的就是线程池隔离
测试方法 (配置线程池核心线程数为10, 即最大仅支持10个的并发, 那么在测试中使用11个线程进行同时调用):
private Integer count = 11;
private CountDownLatch cdl = new CountDownLatch(count);
/**
* 服务隔离隔离策略测试: 线程池模式
*/
@Test
public void testHystrixThread() {
for (Integer i = 0; i < count; i++) {
new Thread(new Runnable() {
@Override
public void run() {
try {
cdl.await();
} catch (InterruptedException e) {
e.printStackTrace();
}
logger.info(Thread.currentThread().getName() + "==>" + userService.testHystrixThread());
}
}).start();
cdl.countDown();
}
try {
Thread.currentThread().join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
调用成功线程:
测试类内运行日志:
业务类内运行日志:
服务降级日志:
总结:
① 当服务没有被降级时, Hystrix会单独创建线程, 并执行目标业务方法, 所以测试方法运行线程与业务方法内运行线程是不一样的
② 当服务被降级时, Hystrix不会单独创建线程执行目标业务方法, 而是执行降级方法, 所以降级方法会被调用, 且返回对应内容; 但是测试方法运行线程与业务方法内运行线程是一样的
2. 隔离策略 -- 信号量
信号量隔离是采用一个全局变量来控制并发量, 一个请求过来全局变量加1, 当增加到与配置信号量的大小相等时就不再接受用户请求
测试方法 (配置信号量为10, 即最大仅支持10个的并发, 那么在测试中使用11个线程进行同时调用):
private Integer count = 11;
private CountDownLatch cdl = new CountDownLatch(count);
/**
* 服务隔离隔离策略测试: 线程池模式
*/
@Test
public void testHystrixSemaphore() {
for (Integer i = 0; i < count; i++) {
new Thread(new Runnable() {
@Override
public void run() {
try {
cdl.await();
} catch (InterruptedException e) {
e.printStackTrace();
}
logger.info(Thread.currentThread().getName() + "==>" + userService.testHystrixSemaphore());
}
}).start();
cdl.countDown();
}
try {
Thread.currentThread().join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
测试类内运行日志:
业务类内运行日志:
服务降级日志:
总结:
使用信号量实现服务隔离, 当接收到请求进行处理时, 维护信号量, 未到达阈值, 则执行目标方法; 到达阈值时, 执行降级方法; 由于Hystrix未创建新的线程, 所以测试类内与业务类内使用的是相同的线程
Hystrix数据监控
Hystrix 进行服务熔断时会对调用结果进行统计, 比如超时数、bad 请求数、降级数、异常数等等都会有统计, 那么统计的数据就需要有一个界面来展示, hystrix-dashboard 就是这么一个展示 hystrix 统计结果的服务
1. Dashboard 工程搭建
pom文件配置:
<!-- 1. 引入springboot父工程 -->
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.2.2.RELEASE</version>
<relativePath/>
</parent>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
<java.version>1.8</java.version>
<!--2. springcloud版本 -->
<spring-cloud.version>Hoxton.SR1</spring-cloud.version>
</properties>
<dependencies>
<!-- 3. 引入hystrix-dashboard依赖 -->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-netflix-hystrix-dashboard</artifactId>
</dependency>
<!-- 4. 引入监控器 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
</dependencies>
<!-- 5. springcloud的依赖仓库导入 -->
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>${spring-cloud.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
application.properties配置:
server.port=9990
#暴露监控端点
management.endpoints.web.exposure.include=*
启动类配置:
/**
* 监控界面:http://localhost:9990/hystrix
* 需要监控的端点(使用了hystrix组件的端点):http://localhost:8085/actuator/hystrix.stream
*/
@SpringBootApplication
@EnableHystrixDashboard
public class HystrixDashboardApplication {
public static void main(String[] args) {
SpringApplication.run(HystrixDashboardApplication.class);
}
}
2. 被监控工程需要配置的内容
application.properties:
management.endpoint.health.show-details=always
management.endpoint.shutdown.enabled=true
#hystrix.stream 开放所有的监控接口
management.endpoints.web.exposure.include=*
pom配置:
<!-- Hystrix服务隔离依赖导入 -->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
</dependency>
<!-- 健康监测的jar包 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
启动类中启动断路器功能:
代码配置:
// Controller层
@RestController
@RequestMapping("/user")
public class UserController {
@Autowired
private UserService userService;
@RequestMapping("testHystrix")
public String testHystrix(){
return userService.testHystrix();
}
}
// Service层
@HystrixCommand(fallbackMethod = "testFallBack")
@Override
public String testHystrix() {
/** 调用远程服务 */
String result = restTemplate.getForObject("http://"+ SERVIER_NAME + "/queryUser",String.class);
return result;
}
@HystrixCommand(fallbackMethod = "testFallBack")
@Override
public String testError() {
log.info("====== testError ======");
int res = 10/0;
return "[SUCCESS]";
}
3. 测试
监控界面:http://localhost:9990/hystrix
需要监控的端点(使用了hystrix组件的端点):http://localhost:8085/actuator/hystrix.stream
Hystrix熔断
服务熔断: 如果某个接口在一段时间内, 达到了一定的并发量, 且请求失败率达到一定阈值, 那么将触发服务熔断, 在服务熔断过程中, 该接口的所有的请求都将被降级, 执行降级方法
熔断发生的三个必要条件:
1、统计时间周期, 滚动窗口
相应的配置属性: metrics.rollingStats.timeInMilliseconds (默认10000毫秒)
2、请求次数必须达到一定数量
相应的配置属性: circuitBreaker.requestVolumeThreshold (默认20次)
3、失败率达到阈值
相应的配置属性: circuitBreaker.errorThresholdPercentage (默认50%)
模拟Hystrix熔断:
在Service层方法中加入异常语句: int res = 10/0;
========== Controller层 =============
@RequestMapping("testError")
public String testError(){
return userService.testError();
}
============ Service层 ==============
@HystrixCommand(fallbackMethod = "testFallBack")
@Override
public String testError() {
log.info("====== testError ======");
int res = 10/0;
return "[SUCCESS]";
}
@Override
public String testFallBack() {
log.info(Thread.currentThread().getName() + "==> ------ FallBack ------");
return "[FallBack]";
}
1. 测试熔断器状态 (关闭 -> 开启 -> 半开 -> 开启 -> ···)
服务监控:
日志分析:
第一阶段: (熔断器状态: 关闭) testError接口接收到请求进行处理, 但是由于出现异常, 则进行服务降级
... ...
2020-12-31 16:40:13.116 INFO 2484 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 16:40:13.118 INFO 2484 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : hystrix-UserServiceImpl-10==> ------ FallBack ------
2020-12-31 16:40:13.321 INFO 2484 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 16:40:13.322 INFO 2484 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : hystrix-UserServiceImpl-10==> ------ FallBack ------
... ...
2020-12-31 16:40:13.509 INFO 2484 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 16:40:13.898 INFO 2484 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : hystrix-UserServiceImpl-10==> ------ FallBack ------
2020-12-31 16:40:14.101 INFO 2484 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 16:40:14.102 INFO 2484 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : hystrix-UserServiceImpl-10==> ------ FallBack ------
第二阶段: (熔断器状态: 打开) Hystrix服务熔断
2020-12-31 16:40:14.532 INFO 2484 --- [nio-8085-exec-1] com.lic.service.UserServiceImpl : http-nio-8085-exec-1==> ------ FallBack ------
2020-12-31 16:40:14.743 INFO 2484 --- [nio-8085-exec-3] com.lic.service.UserServiceImpl : http-nio-8085-exec-3==> ------ FallBack ------
2020-12-31 16:40:14.931 INFO 2484 --- [nio-8085-exec-6] com.lic.service.UserServiceImpl : http-nio-8085-exec-6==> ------ FallBack ------
... ...
2020-12-31 16:40:15.178 INFO 2484 --- [nio-8085-exec-7] com.lic.service.UserServiceImpl : http-nio-8085-exec-7==> ------ FallBack ------
2020-12-31 16:40:15.452 INFO 2484 --- [nio-8085-exec-8] com.lic.service.UserServiceImpl : http-nio-8085-exec-8==> ------ FallBack ------
2020-12-31 16:40:17.708 INFO 2484 --- [nio-8085-exec-9] com.lic.service.UserServiceImpl : http-nio-8085-exec-9==> ------ FallBack ------
第三阶段: (熔断器状态: 半开) 当Hystrix熔断器开启时, 过一段时间后, 熔断器就会由开启状态变成半开状态; 半开状态的熔断器是可以接受用户请求并把请求传递给服
务提供方的,这时候如果远程调用返回成功, 那么熔断器就会有半开状态变成关闭状态, 反之, 如果调用失败,熔断器就会有半开状态变成开启状态
2020-12-31 16:40:24.962 INFO 2484 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 16:40:24.964 INFO 2484 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : hystrix-UserServiceImpl-10==> ------ FallBack ------
第四阶段: (熔断器状态: 开启) 调用失败, 熔断器由半开状态变成开启状态
2020-12-31 16:40:25.200 INFO 2484 --- [nio-8085-exec-6] com.lic.service.UserServiceImpl : http-nio-8085-exec-6==> ------ FallBack ------
2020-12-31 16:40:25.591 INFO 2484 --- [nio-8085-exec-7] com.lic.service.UserServiceImpl : http-nio-8085-exec-7==> ------ FallBack ------
2020-12-31 16:40:25.903 INFO 2484 --- [nio-8085-exec-8] com.lic.service.UserServiceImpl : http-nio-8085-exec-8==> ------ FallBack ------
2020-12-31 16:40:26.089 INFO 2484 --- [nio-8085-exec-9] com.lic.service.UserServiceImpl : http-nio-8085-exec-9==> ------ FallBack ------
2020-12-31 16:40:26.254 INFO 2484 --- [io-8085-exec-10] com.lic.service.UserServiceImpl : http-nio-8085-exec-10==> ------ FallBack ------
2020-12-31 16:40:26.420 INFO 2484 --- [nio-8085-exec-2] com.lic.service.UserServiceImpl : http-nio-8085-exec-2==> ------ FallBack ------
2020-12-31 16:40:26.611 INFO 2484 --- [nio-8085-exec-1] com.lic.service.UserServiceImpl : http-nio-8085-exec-1==> ------ FallBack ------
2020-12-31 16:40:26.832 INFO 2484 --- [nio-8085-exec-3] com.lic.service.UserServiceImpl : http-nio-8085-exec-3==> ------ FallBack ------
2020-12-31 16:40:27.230 INFO 2484 --- [nio-8085-exec-6] com.lic.service.UserServiceImpl : http-nio-8085-exec-6==> ------ FallBack ------
... ...
2. 测试熔断器状态(关闭 -> 开启 -> 半开 -> 关闭)
服务监控:
日志分析:
第一阶段: (熔断器状态: 关闭) testError接口接收到请求进行处理, 但是由于出现异常, 则进行服务降级
... ...
2020-12-31 17:12:23.012 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 17:12:23.013 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : hystrix-UserServiceImpl-10==> ------ FallBack ------
2020-12-31 17:12:23.199 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 17:12:23.199 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : hystrix-UserServiceImpl-10==> ------ FallBack ------
2020-12-31 17:12:23.394 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 17:12:23.394 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : hystrix-UserServiceImpl-10==> ------ FallBack ------
... ...
2020-12-31 17:12:23.584 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 17:12:23.584 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : hystrix-UserServiceImpl-10==> ------ FallBack ------
2020-12-31 17:12:23.778 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 17:12:23.779 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : hystrix-UserServiceImpl-10==> ------ FallBack ------
2020-12-31 17:12:23.971 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 17:12:23.971 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : hystrix-UserServiceImpl-10==> ------ FallBack ------
第二阶段: (熔断器状态: 打开) Hystrix服务熔断
2020-12-31 17:12:24.167 INFO 6132 --- [nio-8085-exec-8] com.lic.service.UserServiceImpl : http-nio-8085-exec-8==> ------ FallBack ------
2020-12-31 17:12:24.367 INFO 6132 --- [nio-8085-exec-9] com.lic.service.UserServiceImpl : http-nio-8085-exec-9==> ------ FallBack ------
2020-12-31 17:12:24.546 INFO 6132 --- [nio-8085-exec-2] com.lic.service.UserServiceImpl : http-nio-8085-exec-2==> ------ FallBack ------
2020-12-31 17:12:24.928 INFO 6132 --- [io-8085-exec-10] com.lic.service.UserServiceImpl : http-nio-8085-exec-10==> ------ FallBack ------
... ...
2020-12-31 17:12:26.523 INFO 6132 --- [nio-8085-exec-9] com.lic.service.UserServiceImpl : http-nio-8085-exec-9==> ------ FallBack ------
2020-12-31 17:12:26.742 INFO 6132 --- [nio-8085-exec-2] com.lic.service.UserServiceImpl : http-nio-8085-exec-2==> ------ FallBack ------
2020-12-31 17:12:27.010 INFO 6132 --- [io-8085-exec-10] com.lic.service.UserServiceImpl : http-nio-8085-exec-10==> ------ FallBack ------
2020-12-31 17:12:27.448 INFO 6132 --- [nio-8085-exec-1] com.lic.service.UserServiceImpl : http-nio-8085-exec-1==> ------ FallBack ------
第三阶段: (熔断器状态: 半开) 当Hystrix熔断器开启时, 过一段时间后, 熔断器就会由开启状态变成半开状态;半开状态的熔断器是可以接受用户请求并把请求传递给服
务提供方的,这时候如果远程调用返回成功, 那么熔断器就会有半开状态变成关闭状态, 反之, 如果调用失败,熔断器就会有半开状态变成开启状态
2020-12-31 17:12:42.175 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
第四阶段: (熔断器状态: 关闭) 调用成功, 熔断器由半开状态变成关闭状态
2020-12-31 17:12:43.494 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 17:12:44.211 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 17:12:44.450 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 17:12:44.717 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 17:12:44.908 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 17:12:45.098 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 17:12:45.283 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 17:12:45.478 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
2020-12-31 17:12:45.674 INFO 6132 --- [rServiceImpl-10] com.lic.service.UserServiceImpl : ====== testError ======
... ...