最近要对一些业务流程进行端到端的监控,这些业务是由几个微服务构成,微服务都是Java Spring编写的,我们需要了解整个业务涉及的各个模块的流量统计,性能状况,例如总共有多少次业务请求调用,多少次成功或失败的回复,每个步骤的耗时是多少等等。因此我也研究了一下如何在Java Spring应用中输出统计指标,通过Prometheus来统一收集指标,并在Grafana中通过不同的报表来呈现这些信息。
首先我们先定义一个简单的业务流程,假设我们有两个Spring的应用,一个是提供业务请求接口的HTTP调用,在收到业务请求后,把里面携带的信息发送到Kafka。另一个应用是订阅Kafka的消息,获取应用一发出的业务数据,并进行处理。
应用一
在start.spring.io网站里面新建一个应用,artifact的名字为kafka-sender-example,Dependancies里面选择Apache kafka for spring, Actuator, Spring Web。打开生成的项目文件,添加一个名为RemoteCommandController的类,实现一个http接口,代码如下:
package cn.roygao.kafkasenderexample;
import java.util.Collections;
import java.util.Map;
import java.util.UUID;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;
import java.util.logging.Logger;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.ResponseEntity;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;
import com.alibaba.fastjson.JSONObject;
@RestController
public class RemoteCommandController {
@Autowired
private KafkaTemplate<Integer, String> template;
private final static Logger LOGGER = Logger.getLogger(RemoteCommandController.class.getName());
@PostMapping("/sendcommand")
public ResponseEntity<Map<String, Object>> sendCommand(@RequestBody JSONObject commandMsg) {
String requestId = UUID.randomUUID().toString();
String vin = commandMsg.getString("vin");
String command = commandMsg.getString("command");
LOGGER.info("Send command to vehicle:" + vin + ", command:" + command);
Map<String, Object> requestIdObj = Collections.singletonMap("requestId", requestId);
ProducerRecord<Integer, String> record = new ProducerRecord<>("remotecommand", 1, command);
try {
System.out.println(System.currentTimeMillis());
template.send(record).get(10, TimeUnit.SECONDS);
}
catch (ExecutionException e) {
LOGGER.info("Error");
LOGGER.info(e.getMessage());
}
catch (TimeoutException | InterruptedException e) {
LOGGER.info("Timeout");
LOGGER.info(e.getMessage());
}
return ResponseEntity.accepted().body(requestIdObj);
}
}
这个代码很简单,提供了一个POST的/sendcommand的接口,用户调用这个接口,提供车辆的VIN号和要发送的指令信息,收到请求之后,将把这些业务请求信息转发到Kafka的消息主题。这里用到了KafkaTemplate来进行消息的发送。为此,定义一个名为KafkaSender的配置类,代码如下:
package cn.roygao.kafkasenderexample;
import java.util.HashMap;
import java.util.Map;
import org.apache.kafka.clients.admin.NewTopic;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.common.serialization.IntegerSerializer;
import org.apache.kafka.common.serialization.StringSerializer;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.config.TopicBuilder;
import org.springframework.kafka.core.DefaultKafkaProducerFactory;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.kafka.core.ProducerFactory;
@Configuration
public class KafkaSender {
@Bean
public NewTopic topic() {
return TopicBuilder.name("remotecommand")
.build();
}
@Bean
public ProducerFactory<Integer, String> producerFactory() {
return new DefaultKafkaProducerFactory<>(producerConfigs());
}
@Bean
public Map<String, Object> producerConfigs() {
Map<String, Object> props = new HashMap<>();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, IntegerSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
// See https://kafka.apache.org/documentation/#producerconfigs for more properties
return props;
}
@Bean
public KafkaTemplate<Integer, String> kafkaTemplate() {
return new KafkaTemplate<Integer, String>(producerFactory());
}
}
代码里面定义了Kafka服务器的地址,消息主题等配置。
运行./mvnw clean package进行编译打包。
应用二
在start.spring.io网站里面新建一个应用,artifact的名字为kafka-sender-example,Dependancies里面选择Apache kafka for spring, Actuator。打开生成的项目文件,新建一个名为RemoteCommandHandler的类,实现接收Kafka信息的功能,代码如下:
package cn.roygao.kafkareceiverexample;
import java.util.concurrent.TimeUnit;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.kafka.listener.adapter.ConsumerRecordMetadata;
import org.springframework.stereotype.Component;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
@Component
public class RemoteCommandHandler {
private Timer timer;
public RemoteCommandHandler(MeterRegistry registry) {
this.timer = Timer
.builder("kafka.process.latency")
.publishPercentiles(0.15, 0.5, 0.95)
.publishPercentileHistogram()
.register(registry);
}
@KafkaListener(id = "myId", topics = "remotecommand")
public void listen(String in, ConsumerRecordMetadata meta) {
long latency = System.currentTimeMillis()-meta.timestamp();
timer.record(latency, TimeUnit.MILLISECONDS);
}
}
这里类的构造函数需要传入一个MeterRetistry的对象,然后新建一个Timer对象,这是Micrometer提供的四种Metric之一,可以用来记录时长的信息。把这个Timer注册到MeterRegistry。
在listen方法中,定义了从Kafka的消息主题订阅消息,获取消息的metadata中的生成时间的时间戳,并与当前的时间进行比较,计算出从消息生成到消息消费的耗时,然后用timer来进行计算。Timer会按照之前的定义进行不同百分位区间的分布统计。
同样我们也需要定义一个Kafka的配置类,代码如下:
package cn.roygao.kafkareceiverexample;
import java.util.HashMap;
import java.util.Map;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.annotation.EnableKafka;
import org.springframework.kafka.config.ConcurrentKafkaListenerContainerFactory;
import org.springframework.kafka.config.KafkaListenerContainerFactory;
import org.springframework.kafka.core.ConsumerFactory;
import org.springframework.kafka.core.DefaultKafkaConsumerFactory;
import org.springframework.kafka.listener.ConcurrentMessageListenerContainer;
@Configuration
@EnableKafka
public class KafkaConfig {
@Bean
KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<Integer, String>>
kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<Integer, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(3);
factory.getContainerProperties().setPollTimeout(3000);
return factory;
}
@Bean
public ConsumerFactory<Integer, String> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(consumerConfigs());
}
@Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> props = new HashMap<>();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put("key.deserializer", "org.apache.kafka.common.serialization.IntegerDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
return props;
}
}
在application.properties文件中添加以下配置:
spring.kafka.consumer.auto-offset-reset=earliest
server.port=7777
management.endpoints.web.exposure.include=health,info,prometheus
management.endpoints.enabled-by-default=true
management.endpoint.health.show-details: always
然后运行./mvnw clean package进行编译打包。
启动Kafka
这里我采用Docker的方式来启动Kafka,compose文件的内容如下:
---
version: '2'
services:
zookeeper:
image: confluentinc/cp-zookeeper:6.1.0
hostname: zookeeper
container_name: zookeeper
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
broker:
image: confluentinc/cp-server:6.1.0
hostname: broker
container_name: broker
depends_on:
- zookeeper
ports:
- "9092:9092"
- "9101:9101"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_METRIC_REPORTERS: io.confluent.metrics.reporter.ConfluentMetricsReporter
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
KAFKA_CONFLUENT_LICENSE_TOPIC_REPLICATION_FACTOR: 1
KAFKA_CONFLUENT_BALANCER_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_JMX_PORT: 9101
KAFKA_JMX_HOSTNAME: localhost
KAFKA_CONFLUENT_SCHEMA_REGISTRY_URL: http://schema-registry:8081
CONFLUENT_METRICS_REPORTER_BOOTSTRAP_SERVERS: broker:29092
CONFLUENT_METRICS_REPORTER_TOPIC_REPLICAS: 1
CONFLUENT_METRICS_ENABLE: 'true'
CONFLUENT_SUPPORT_CUSTOMER_ID: 'anonymous'
schema-registry:
image: confluentinc/cp-schema-registry:6.1.0
hostname: schema-registry
container_name: schema-registry
depends_on:
- broker
ports:
- "8081:8081"
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: 'broker:29092'
SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081
connect:
image: cnfldemos/cp-server-connect-datagen:0.4.0-6.1.0
hostname: connect
container_name: connect
depends_on:
- broker
- schema-registry
ports:
- "8083:8083"
environment:
CONNECT_BOOTSTRAP_SERVERS: 'broker:29092'
CONNECT_REST_ADVERTISED_HOST_NAME: connect
CONNECT_REST_PORT: 8083
CONNECT_GROUP_ID: compose-connect-group
CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_FLUSH_INTERVAL_MS: 10000
CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_KEY_CONVERTER: org.apache.kafka.connect.storage.StringConverter
CONNECT_VALUE_CONVERTER: io.confluent.connect.avro.AvroConverter
CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: http://schema-registry:8081
# CLASSPATH required due to CC-2422
CLASSPATH: /usr/share/java/monitoring-interceptors/monitoring-interceptors-6.1.0.jar
CONNECT_PRODUCER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor"
CONNECT_CONSUMER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor"
CONNECT_PLUGIN_PATH: "/usr/share/java,/usr/share/confluent-hub-components"
CONNECT_LOG4J_LOGGERS: org.apache.zookeeper=ERROR,org.I0Itec.zkclient=ERROR,org.reflections=ERROR
control-center:
image: confluentinc/cp-enterprise-control-center:6.1.0
hostname: control-center
container_name: control-center
depends_on:
- broker
- schema-registry
- connect
- ksqldb-server
ports:
- "9021:9021"
environment:
CONTROL_CENTER_BOOTSTRAP_SERVERS: 'broker:29092'
CONTROL_CENTER_CONNECT_CLUSTER: 'connect:8083'
CONTROL_CENTER_KSQL_KSQLDB1_URL: "http://ksqldb-server:8088"
CONTROL_CENTER_KSQL_KSQLDB1_ADVERTISED_URL: "http://localhost:8088"
CONTROL_CENTER_SCHEMA_REGISTRY_URL: "http://schema-registry:8081"
CONTROL_CENTER_REPLICATION_FACTOR: 1
CONTROL_CENTER_INTERNAL_TOPICS_PARTITIONS: 1
CONTROL_CENTER_MONITORING_INTERCEPTOR_TOPIC_PARTITIONS: 1
CONFLUENT_METRICS_TOPIC_REPLICATION: 1
PORT: 9021
ksqldb-server:
image: confluentinc/cp-ksqldb-server:6.1.0
hostname: ksqldb-server
container_name: ksqldb-server
depends_on:
- broker
- connect
ports:
- "8088:8088"
environment:
KSQL_CONFIG_DIR: "/etc/ksql"
KSQL_BOOTSTRAP_SERVERS: "broker:29092"
KSQL_HOST_NAME: ksqldb-server
KSQL_LISTENERS: "http://0.0.0.0:8088"
KSQL_CACHE_MAX_BYTES_BUFFERING: 0
KSQL_KSQL_SCHEMA_REGISTRY_URL: "http://schema-registry:8081"
KSQL_PRODUCER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor"
KSQL_CONSUMER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor"
KSQL_KSQL_CONNECT_URL: "http://connect:8083"
KSQL_KSQL_LOGGING_PROCESSING_TOPIC_REPLICATION_FACTOR: 1
KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: 'true'
KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: 'true'
ksqldb-cli:
image: confluentinc/cp-ksqldb-cli:6.1.0
container_name: ksqldb-cli
depends_on:
- broker
- connect
- ksqldb-server
entrypoint: /bin/sh
tty: true
ksql-datagen:
image: confluentinc/ksqldb-examples:6.1.0
hostname: ksql-datagen
container_name: ksql-datagen
depends_on:
- ksqldb-server
- broker
- schema-registry
- connect
command: "bash -c 'echo Waiting for Kafka to be ready... && \
cub kafka-ready -b broker:29092 1 40 && \
echo Waiting for Confluent Schema Registry to be ready... && \
cub sr-ready schema-registry 8081 40 && \
echo Waiting a few seconds for topic creation to finish... && \
sleep 11 && \
tail -f /dev/null'"
environment:
KSQL_CONFIG_DIR: "/etc/ksql"
STREAMS_BOOTSTRAP_SERVERS: broker:29092
STREAMS_SCHEMA_REGISTRY_HOST: schema-registry
STREAMS_SCHEMA_REGISTRY_PORT: 8081
rest-proxy:
image: confluentinc/cp-kafka-rest:6.1.0
depends_on:
- broker
- schema-registry
ports:
- 8082:8082
hostname: rest-proxy
container_name: rest-proxy
environment:
KAFKA_REST_HOST_NAME: rest-proxy
KAFKA_REST_BOOTSTRAP_SERVERS: 'broker:29092'
KAFKA_REST_LISTENERS: "http://0.0.0.0:8082"
KAFKA_REST_SCHEMA_REGISTRY_URL: 'http://schema-registry:8081'
运行nohup docker compose up > ./kafka.log 2>&1 &即可启动。在浏览器输入localhost:9021,可以在控制台界面观看Kafka的相关信息。
分别运行应用一和应用二,然后调用POST http://localhost:8080/remotecommand接口发送业务请求,例如以下的命令:
curl --location --request POST 'http://localhost:8080/sendcommand' \
--header 'Content-Type: application/json' \
--data-raw '{
"vin": "ABC123",
"command": "engine-start"
}'
在Kafka的控制台可以看到有一个remotecommand的消息主题,并且有一条信息发送和被消费。
启动Prometheus和Grafana
同样采用docker compose的方式来启动,compose文件内容如下:
services:
prometheus:
image: prom/prometheus-linux-amd64
#network_mode: host
container_name: prometheus
restart: unless-stopped
volumes:
- ./config:/etc/prometheus/
command:
- '--config.file=/etc/prometheus/prometheus.yaml'
ports:
- 9090:9090
grafana:
image: grafana/grafana
user: '472'
#network_mode: host
container_name: grafana
restart: unless-stopped
links:
- prometheus:prometheus
volumes:
- ./data/grafana:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
ports:
- 3000:3000
depends_on:
- prometheus
在这个compose文件的目录下新建一个config目录,里面存放prometheus的配置文件,内容如下:
scrape_configs:
- job_name: 'Spring Boot Application input'
metrics_path: '/actuator/prometheus'
scrape_interval: 2s
static_configs:
- targets: ['172.17.0.1:7777']
labels:
application: 'My Spring Boot Application'
这里面的targets配置的是应用二暴露的地址,metrics_path是采集指标的路径。
在compose文件的目录下新建一个data/grafana目录,挂载给Grafana的文件目录,注意这里需要用chmod 777来修改目录权限,不然Grafana会报权限错误。
运行nohup docker compose up > ./prometheus.log 2>&1 &运行即可。
打开localhost:9090可以访问prometheus的页面,然后我们可以输入kafka进行搜索,可以看到应用二上报的kafka_process_latency的指标数据,按照我们的定义进行了0.15,0.5, 0.95这三个百分位区间的统计。
打开localhost:3000可以访问Grafana的页面,配置datasource,选择Prometheus这个容器的地址,然后save&test。之后可以新建一个dashboard,然后可以在报表里面显示kafka_process_latency的指标图形。
【未完待续】,还要增加对Http接口调用的Counter metric,以及在Grafana定义更多的报表,包括其他服务指标等等。