概述
一个完整的微服务系统包含多个微服务单元,,各个微服务子系统存在互相调用的情况,形成一个 调用链。一个客户端请求从发出到被响应 经历了哪些组件、哪些微服务、请求总时长、每个组件所花时长 等信息我们有必要了解和收集,以帮助我们定位性能瓶颈、进行性能调优,因此监控整个微服务架构的调用链十分有必要,本文将阐述如何使用 Zipkin
搭建微服务调用链追踪中心。
Zipkin初摸
正如 Ziplin官网 所描述,Zipkin是一款分布式的追踪系统,其可以帮助我们收集微服务架构中用于解决延时问题的时序数据,更直白地讲就是可以帮我们追踪调用的轨迹。
Zipkin的设计架构如下图所示:
要理解这张图,需要了解一下Zipkin的几个核心概念:
- Reporter
在某个应用中安插的用于发送数据给Zipkin的组件称为Reporter
,目的就是用于追踪数据收集。
- Span
微服务中调用一个组件时,从发出请求开始到被响应的过程会持续一段时间,将这段跨度称为Span
。
- Trace
从client发出请求到完成请求处理,中间会经历一个调用链,将这一个整个过程称为一个追踪(Trace)。一个Trace可能包含多个Span,反之每个Span都有一个上级的Trace。
- Annotation
用于定位一个request的开始和结束,cs/sr/ss/cr含有额外的信息,比如说时间点,当这个annotation被记录了,这个RPC也被认为完成了
cs:Client Start,表示客户端发起请求 ;一个span的开始;
cf:Client Finish,表示客户端获取到服务端返回信息;一个span的结束
ss:Server Start,表示服务端收到请求
sf:Server Finish,表示服务端完成处理,并将结果发送给客户端
ss-cs:网络延迟
sf-ss:逻辑处理时间
cf-cs:整个流程时间
- Transport
一种数据传输的方式,比如最简单的HTTP方式,当然在高并发时可以换成Kafka等消息队列
看了一下基本概念后,再结合上面的架构图,可以试着理解一下,只有装配有Report组件的Client才能通过Transport来向Zipkin发送追踪数据。追踪数据由Collector收集器进行手机然后持久化到Storage之中。最后需要数据的一方,可以通过UI界面调用API接口,从而最终取到Storage中的数据。可见整体流程不复杂。
Zipkin官网给出了各种常见语言支持的OpenZipkin libraries:
本文接下来将 构造微服务追踪的实验场景 并使用 Brave
来辅助完成微服务调用链追踪中心搭建!
部署Zipkin服务
利用Docker来部署Zipkin服务再简单不过了:
持久化方式
创建zipkin持久化数据库,当然,也可以不持久化,放内存中,不过我相信您如果不想被老板祭天也不会这么干!
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
-- ----------------------------
-- Table structure for zipkin_annotations
-- ----------------------------
DROP TABLE IF EXISTS `zipkin_annotations`;
CREATE TABLE `zipkin_annotations` (
`trace_id_high` bigint(20) NOT NULL DEFAULT 0 COMMENT 'If non zero, this means the\r\ntrace uses 128 bit traceIds instead of 64 bit',
`trace_id` bigint(20) NOT NULL COMMENT 'coincides with zipkin_spans.trace_id',
`span_id` bigint(20) NOT NULL COMMENT 'coincides with zipkin_spans.id',
`a_key` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL COMMENT 'BinaryAnnotation.key or\r\nAnnotation.value if type == -1',
`a_value` blob NULL COMMENT 'BinaryAnnotation.value(), which must be smaller than\r\n64KB',
`a_type` int(11) NOT NULL COMMENT 'BinaryAnnotation.type() or -1 if Annotation',
`a_timestamp` bigint(20) NULL DEFAULT NULL COMMENT 'Used to implement TTL; Annotation.timestamp or\r\nzipkin_spans.timestamp',
`endpoint_ipv4` int(11) NULL DEFAULT NULL COMMENT 'Null when Binary/Annotation.endpoint is null',
`endpoint_ipv6` binary(16) NULL DEFAULT NULL COMMENT 'Null when Binary/Annotation.endpoint is\r\nnull, or no IPv6 address',
`endpoint_port` smallint(6) NULL DEFAULT NULL COMMENT 'Null when Binary/Annotation.endpoint is\r\nnull',
`endpoint_service_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT 'Null when\r\nBinary/Annotation.endpoint is null',
UNIQUE INDEX `trace_id_high`(`trace_id_high`, `trace_id`, `span_id`, `a_key`, `a_timestamp`) USING BTREE COMMENT 'Ignore insert on duplicate',
INDEX `trace_id_high_2`(`trace_id_high`, `trace_id`, `span_id`) USING BTREE COMMENT 'for joining with zipkin_spans',
INDEX `trace_id_high_3`(`trace_id_high`, `trace_id`) USING BTREE COMMENT 'for getTraces/ByIds',
INDEX `endpoint_service_name`(`endpoint_service_name`) USING BTREE COMMENT 'for\r\ngetTraces and getServiceNames',
INDEX `a_type`(`a_type`) USING BTREE COMMENT 'for getTraces and\r\nautocomplete values',
INDEX `a_key`(`a_key`) USING BTREE COMMENT 'for getTraces and\r\nautocomplete values',
INDEX `trace_id`(`trace_id`, `span_id`, `a_key`) USING BTREE COMMENT 'for dependencies job'
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = COMPRESSED;
-- ----------------------------
-- Table structure for zipkin_spans
-- ----------------------------
DROP TABLE IF EXISTS `zipkin_spans`;
CREATE TABLE `zipkin_spans` (
`trace_id_high` bigint(20) NOT NULL DEFAULT 0 COMMENT 'If non zero, this means the\r\ntrace uses 128 bit traceIds instead of 64 bit',
`trace_id` bigint(20) NOT NULL,
`id` bigint(20) NOT NULL,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`remote_service_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
`parent_id` bigint(20) NULL DEFAULT NULL,
`debug` bit(1) NULL DEFAULT NULL,
`start_ts` bigint(20) NULL DEFAULT NULL COMMENT 'Span.timestamp(): epoch micros used for endTs query\r\nand to implement TTL',
`duration` bigint(20) NULL DEFAULT NULL COMMENT 'Span.duration(): micros used for minDuration and\r\nmaxDuration query',
PRIMARY KEY (`trace_id_high`, `trace_id`, `id`) USING BTREE,
INDEX `trace_id_high`(`trace_id_high`, `trace_id`) USING BTREE COMMENT 'for\r\ngetTracesByIds',
INDEX `name`(`name`) USING BTREE COMMENT 'for getTraces and\r\ngetSpanNames',
INDEX `remote_service_name`(`remote_service_name`) USING BTREE COMMENT 'for getTraces\r\nand getRemoteServiceNames',
INDEX `start_ts`(`start_ts`) USING BTREE COMMENT 'for getTraces ordering\r\nand range'
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = COMPRESSED;
-- ----------------------------
-- Records of zipkin_annotations
-- ----------------------------
set global innodb_large_prefix=1;
set global innodb_file_format=BARRACUDA;
-- ----------------------------
-- Table structure for zipkin_dependencies
-- ----------------------------
DROP TABLE IF EXISTS `zipkin_dependencies`;
CREATE TABLE `zipkin_dependencies` (
`day` date NOT NULL,
`parent` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
`child` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
`call_count` bigint(20) NULL DEFAULT NULL,
`error_count` bigint(20) NULL DEFAULT NULL,
PRIMARY KEY (`day`, `parent`, `child`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = COMPRESSED;
SET FOREIGN_KEY_CHECKS = 1;
部署zipkin server
zipkin:
image: openzipkin/zipkin
container_name: zipkin
environment:
- STORAGE_TYPE=mysql
# Point the zipkin at the storage backend
- MYSQL_DB=zipkin
- MYSQL_USER=root
- MYSQL_PASS=root
- MYSQL_HOST=192.168.137.129
- MYSQL_TCP_PORT=3306
network_mode: host
ports:
# Port used for the Zipkin UI and HTTP Api
- 9411:9411
非持久化方式
docker run -d -p 9411:9411 \
--name zipkin \
openzipkin/zipkin
完成之后浏览器打开: localhost:9411
可以看到Zipkin的可视化界面:
模拟微服务调用链
我们来构造一个如下图所示的调用链:
图中包含 一个客户端 + 三个微服务:
- Client:使用/servicea接口消费ServiceA提供的服务
- ServiceA:使用/serviceb接口消费ServiceB提供的服务,端口8881
- ServiceB:使用/servicec接口消费ServiceC提供的服务,端口8882
- ServiceC:提供终极服务,端口8883
引入依赖
每个zipkin客户端服务都引入如下依赖:
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-sleuth-zipkin</artifactId>
<version>2.1.0.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
<version>2.1.0.RELEASE</version>
</dependency>
</dependencies>
服务调用代码
为了模拟明显的延时效果,准备在每个接口的响应中用代码加入3s的延时。
简单起见,我们用SpringBoot来实现三个微服务。
ServiceA的控制器代码如下:
@RestController
public class ServiceAContorller {
@Autowired
private RestTemplate restTemplate;
@GetMapping("/servicea")
public String servicea() {
try {
Thread.sleep(new Random().nextInt(5)*1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
return restTemplate.getForObject("http://localhost:8882/serviceb", String.class);
}
}
ServiceB的代码如下:
@RestController
public class ServiceBContorller {
@Autowired
private RestTemplate restTemplate;
@GetMapping("/serviceb")
public String serviceb() {
try {
Thread.sleep(new Random().nextInt(5)*1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
return restTemplate.getForObject("http://localhost:8883/servicec", String.class);
}
}
ServiceC的代码如下:
@RestController
public class ServiceCContorller {
@GetMapping("/servicec")
public String servicec() {
try {
Thread.sleep(new Random().nextInt(5)*1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
return "Now, we reach the terminal call:servicec !";
}
}
application.yml配置
spring:
application:
name: service1
zipkin:
base-url: http://192.168.137.129:9411 #zipkin server 的地址
sender:
type: web #如果ClassPath里没有kafka, active MQ, 默认是web的方式
sleuth:
sampler:
probability: 1.0 #100%取样,生产环境应该低一点,用不着全部取出来
server:
port: 8081
四个服务工程除了端口不一样,其他都一样,zipkin监控serviceName不做配置,默认会使用Spring Application Name
附RestTemplate实例化代码
@Configuration
public class RestTemplateConfig {
@Value("${remote.maxTotalConnect:0}")
private int maxTotalConnect;
@Value("${remote.maxConnectPerRoute:200}")
private int maxConnectPerRoute;
@Value("${remote.connectTimeout:2000}")
private int connectTimeout; //连接超时默认2s
@Value("${remote.readTimeout:30000}")
private int readTimeout; //读取超时默认30s
private ClientHttpRequestFactory createFactory2() {
HttpClient httpClient = HttpClientBuilder.create().setMaxConnTotal(this.maxTotalConnect)
.setMaxConnPerRoute(this.maxConnectPerRoute).build();
HttpComponentsClientHttpRequestFactory factory = new HttpComponentsClientHttpRequestFactory(
httpClient);
factory.setConnectTimeout(this.connectTimeout);
factory.setReadTimeout(this.readTimeout);
return factory;
}
@Bean
@ConditionalOnMissingBean(RestTemplate.class)
public RestTemplate getRestTemplate() {
RestTemplate restTemplate = new RestTemplate(this.createFactory2());
List<HttpMessageConverter<?>> converterList = restTemplate.getMessageConverters();
converterList.add(new BufferedImageHttpMessageConverter());
return restTemplate;
}
}
我们将三个微服务都启动起来,然后浏览器中输入 localhost:8881/servicea来发出请求,过了9s之后,将取到ServiceC中提供的微服务接口所返回的内容,如下图所示:
测试结果
当我们访问http://localhost:8881/servicea
时,会生成一个完整的调用链路json数据,通过之前搭建的zipkin server UI可以查看详情:
点击show查看各server请求详情:
进入SERVICE8882详情:
可以自己通过CS,SS,CF,SF计算是网络延迟引起的问题还是逻辑处理引起的性能问题!