Eureka Server prometheus监控服务健康状态

背景

  服务进程监控一般都有相关组件处理了,早期业务出现特定服务使用的DB资源超过额配量,导致健康检测失败,服务陆续从Eureka下线了,业务监控在没路由到特定节点时候,或者路由到特定节点但没有碰到阈值场景不会触发告警,意味着业务短暂性正常,服务陆续下线;Eureka server 作为注册中心可以较早感知到服务注册状态,实例节点挂了(注册上的实例少了)、节点状态非UP 场景

监控方案

  • Eureka定时采集注册信息,实例节点数、实例节点状态信息
  • prometheus 定时采集Eureka server 采集到的数据
  • grafana 查询及对数据告警

Eureka注册信息数据采集

metric 数据结构定义
  • 统计节点状态

    type:Gauge

eureka_instance_status{client="{client}",status="{status}"}

client: eureka client application name

status 枚举

状态枚举值
UP1
DOWN5
STARTING2
OUT_OF_SERVICE3
UNKNOW4

最近n时间内平均值大于1,表示异常,执行告警

  • 统计节点数量

    type:Gauge

eureka_instance_count{client="{client}",count="{count}"}

client: eureka client application name

count: client count

java pom 依赖
<!-- boot2.x 兼容-->
<!-- The client -->
<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>simpleclient</artifactId>
    <version>0.6.0</version>
</dependency>
<!-- Hotspot JVM metrics-->
<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>simpleclient_hotspot</artifactId>
    <version>0.6.0</version>
</dependency>
<!-- Exposition HTTPServer-->
<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>simpleclient_httpserver</artifactId>
    <version>0.6.0</version>
</dependency>
<!-- Pushgateway exposition-->
<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>simpleclient_pushgateway</artifactId>
    <version>0.6.0</version>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
    <version>1.1.4</version>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-core</artifactId>
    <version>1.1.4</version>
</dependency>
java代码
@Component
public class InstanceStateCollector {

    @Autowired
    PeerAwareInstanceRegistry registry;

    private static final Logger log = LoggerFactory.getLogger(InstanceStateCollector.class);

    @Scheduled(cron = "*/5 * * * * ?")
    public void collect() {

        Applications applications = registry.getApplications();

        applications.getRegisteredApplications().forEach((registeredApplication) -> {
            Integer count = registeredApplication.size();
            String client = registeredApplication.getName();

            log.debug("client :{}, count :{}", client, count);
            PrometheusMetricsUtils.metricInstanceCount(client, count);

            registeredApplication.getInstances().forEach((instance) -> {
                String instanceId = instance.getInstanceId();
                log.debug("client :{}, instance :{}, status :{}", client, instanceId, instance.getStatus());
                PrometheusMetricsUtils.metricInstanceStatus(client, instanceId, instance.getStatus());

            });
        });
    }
}
@Service
public class PrometheusMetricsService {

    /**
     * 实例状态统计
     * eureka_instance_status{client="{client}",status="{status}"}
     */
    private static final String EUREKA_INSTANCE_STATUS = "mall_eureka_instance_status";

    /**
     * 实例数量统计
     * eureka_instance_count{client="{client}",count="{count}"}
     */
    private static final String EUREKA_INSTANCE_COUNT = "mall_eureka_instance_count";

    private static final String LABEL_CLIENT = "client";

    private final Gauge instanceStatusGauge;
    private final Gauge instanceCountGauge;


    public PrometheusMetricsService(CollectorRegistry registry) {
        instanceStatusGauge = Gauge
                .build(EUREKA_INSTANCE_STATUS, "instance status")
                .labelNames(LABEL_CLIENT)
                .register(registry);

        instanceCountGauge = Gauge
                .build(EUREKA_INSTANCE_COUNT, "instance count")
                .labelNames(LABEL_CLIENT)
                .register(registry);
    }

    /**
     * 实例状态埋点
     *
     * @param client   client name || application name
     * @param statusValue   status
     */
    void metricInstanceStatus(String client, Integer statusValue) {
        instanceStatusGauge.labels(client).set(statusValue);
    }

    /**
     * 实例数量埋点
     *
     * @param client client name || application name
     * @param count  count
     */
    void metricInstanceCount(String client, Integer count) {
        instanceCountGauge.labels(client).set(count);
    }



}

Prometheus采集Eureka server数据

prometheus.yml
  - job_name: 'mgmall-eureka'
    scrape_interval: 10s 
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['10.124.129.42:19110']

Grafana报表维护

报表
mall_eureka_instance_count{client="MGMALL-CONFIG"}
.....

![image-20190531140258528](/Users/yugj/Library/Application Support/typora-user-images/image-20190531140258528.png)

监控

avg() query(A,10s,now) is below 1

![image-20190531140319350](/Users/yugj/Library/Application Support/typora-user-images/image-20190531140319350.png)

转载于:https://my.oschina.net/yugj/blog/3056695

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值