本博客是深入研究Envoy Proxy和http://Istio.io 以及它如何实现更优雅的方式来连接和管理微服务系列文章的一部分。
这是接下来几个部分的想法(将在发布时更新链接):
- 断路器(第一部分)
- 重试/超时(第二部分)
- 分布式跟踪(第三部分)
- Prometheus的指标收集(第四部分)
- rate limiter(第五部分)
第四部分 - 使用Prometheus搜集envoy 指标
envoy 配置和提供metrics方式
在envoy1.9版本中,已经直接暴露了prometheus 格式的metrics,也就是prometheus可以直接去采集,无需再通过配置Statsd来收集metrics了,使用statsd的流程大致上是:首先Envoy推送指标到statsd,然后我们用prometheus(一个时序数据库)从statsd拉取指标。
设置envoy 的配置文件中,有如下的设置:
admin:
access_log_path: "/dev/null"
address:
socket_address:
address: 0.0.0.0
port_value: 9000
envoy 启动成功以后,访问localhost:9000/stats/prometheus,可以返回类似下面的参数:
# TYPE envoy_listener_admin_http_downstream_rq_completed counter
envoy_listener_admin_http_downstream_rq_completed{envoy_http_conn_manager_prefix="admin"} 3154
# TYPE envoy_listener_admin_http_downstream_rq_xx counter
envoy_listener_admin_http_downstream_rq_xx{envoy_response_code_class="1",envoy_http_conn_manager_prefix="admin"} 0
# TYPE envoy_listener_admin_downstream_cx_total counter
envoy_listener_admin_downstream_cx_total{} 146346
envoy_listener_admin_http_downstream_rq_xx{envoy_response_code_class="5",envoy_http_conn_manager_prefix="admin"} 0
envoy_listener_admin_http_downstream_rq_xx{envoy_response_code_class="3",envoy_http_conn_manager_prefix="admin"} 0
# TYPE envoy_listener_admin_downstream_pre_cx_timeout counter
envoy_listener_admin_downstream_pre_cx_timeout{} 0
# TYPE envoy_listener_admin_no_filter_chain_match counter
envoy_listener_admin_no_filter_chain_match{} 0
# TYPE envoy_listener_admin_downstream_cx_destroy counter
envoy_listener_admin_downstream_cx_destroy{} 146344
envoy_listener_admin_http_downstream_rq_xx{envoy_response_code_class="2",envoy_http_conn_manager_prefix="admin"} 3154
envoy_listener_admin_http_downstream_rq_xx{envoy_response_code_class="4",envoy_http_conn_manager_prefix="admin"} 0
# TYPE envoy_cluster_upstream_flow_control_paused_reading_total counter
envoy_cluster_upstream_flow_control_paused_reading_total{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_membership_change counter
envoy_cluster_membership_change{envoy_cluster_name="hawkeye"} 1
# TYPE envoy_cluster_ext_authz_denied counter
envoy_cluster_ext_authz_denied{envoy_cluster_name="hawkeye"} 15
# TYPE envoy_cluster_upstream_rq_completed counter
envoy_cluster_upstream_rq_completed{envoy_cluster_name="hawkeye"} 255
# TYPE envoy_cluster_upstream_rq_pending_failure_eject counter
envoy_cluster_upstream_rq_pending_failure_eject{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_upstream_cx_connect_attempts_exceeded counter
envoy_cluster_upstream_cx_connect_attempts_exceeded{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_upstream_rq_timeout counter
envoy_cluster_upstream_rq_timeout{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_upstream_rq counter
envoy_cluster_upstream_rq{envoy_response_code="204",envoy_cluster_name="hawkeye"} 2
# TYPE envoy_cluster_upstream_cx_connect_timeout counter
envoy_cluster_upstream_cx_connect_timeout{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_upstream_cx_none_healthy counter
envoy_cluster_upstream_cx_none_healthy{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_upstream_rq_pending_total counter
envoy_cluster_upstream_rq_pending_total{envoy_cluster_name="hawkeye"} 6
# TYPE envoy_cluster_internal_upstream_rq_completed counter
envoy_cluster_internal_upstream_rq_completed{envoy_cluster_name="hawkeye"} 15
envoy_cluster_upstream_rq{envoy_response_code="200",envoy_cluster_name="hawkeye"} 209
envoy_cluster_upstream_rq{envoy_response_code="503",envoy_cluster_name="hawkeye"} 29
# TYPE envoy_cluster_lb_local_cluster_not_ok counter
envoy_cluster_lb_local_cluster_not_ok{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_lb_zone_routing_sampled counter
envoy_cluster_lb_zone_routing_sampled{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_upstream_cx_connect_fail counter
envoy_cluster_upstream_cx_connect_fail{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_upstream_rq_retry_success counter
envoy_cluster_upstream_rq_retry_success{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_bind_errors counter
envoy_cluster_bind_errors{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_upstream_cx_total counter
envoy_cluster_upstream_cx_total{envoy_cluster_name="hawkeye"} 6
# TYPE envoy_cluster_lb_zone_number_differs counter
envoy_cluster_lb_zone_number_differs{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_original_dst_host_invalid counter
envoy_cluster_original_dst_host_invalid{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_lb_zone_no_capacity_left counter
envoy_cluster_lb_zone_no_capacity_left{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_upstream_cx_max_requests counter
envoy_cluster_upstream_cx_max_requests{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_upstream_rq_per_try_timeout counter
envoy_cluster_upstream_rq_per_try_timeout{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_upstream_rq_retry_overflow counter
envoy_cluster_upstream_rq_retry_overflow{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_external_upstream_rq counter
envoy_cluster_external_upstream_rq{envoy_response_code="204",envoy_cluster_name="hawkeye"} 2
envoy_cluster_external_upstream_rq{envoy_response_code="503",envoy_cluster_name="hawkeye"} 29
# TYPE envoy_cluster_upstream_cx_rx_bytes_total counter
envoy_cluster_upstream_cx_rx_bytes_total{envoy_cluster_name="hawkeye"} 122326
# TYPE envoy_cluster_upstream_cx_http1_total counter
envoy_cluster_upstream_cx_http1_total{envoy_cluster_name="hawkeye"} 6
# TYPE envoy_cluster_upstream_rq_pending_overflow counter
envoy_cluster_upstream_rq_pending_overflow{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_lb_zone_routing_cross_zone counter
envoy_cluster_lb_zone_routing_cross_zone{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_lb_subsets_created counter
"} 0
# TYPE envoy_cluster_lb_subsets_active gauge
envoy_cluster_lb_subsets_active{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_circuit_breakers_default_cx_open gauge
envoy_cluster_circuit_breakers_default_cx_open{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_max_host_weight gauge
envoy_cluster_max_host_weight{envoy_cluster_name="hawkeye"} 1
# TYPE envoy_cluster_circuit_breakers_default_rq_retry_open gauge
envoy_cluster_circuit_breakers_default_rq_retry_open{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_version gauge
envoy_cluster_version{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_membership_total gauge
envoy_cluster_membership_total{envoy_cluster_name="hawkeye"} 1
# TYPE envoy_cluster_circuit_breakers_high_rq_pending_open gauge
envoy_cluster_circuit_breakers_high_rq_pending_open{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_circuit_breakers_default_rq_open gauge
envoy_cluster_circuit_breakers_default_rq_open{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_circuit_breakers_high_rq_open gauge
envoy_cluster_circuit_breakers_high_rq_open{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_upstream_cx_active gauge
envoy_cluster_upstream_cx_active{envoy_cluster_name="hawkeye"} 3
# TYPE envoy_cluster_upstream_rq_pending_active gauge
envoy_cluster_upstream_rq_pending_active{envoy_cluster_name="hawkeye"} 0
# TYPE envoy_cluster_upstream_rq_active gauge
envoy_cluster_upstream_rq_active{envoy_cluster_name="hawkeye"} 0
envoy_cluster_membership_healthy{envoy_cluster_name="ext-authz"} 1
envoy_cluster_circuit_breakers_default_rq_pending_open{envoy_cluster_name="ext-authz"} 0
envoy_cluster_circuit_breakers_default_rq_open{envoy_cluster_name="ext-authz"} 0
envoy_cluster_membership_total{envoy_cluster_name="ext-authz"} 1
envoy_cluster_circuit_breakers_default_rq_retry_open{envoy_cluster_name="ext-authz"} 0
envoy_cluster_upstream_cx_active{envoy_cluster_name="ext-authz"} 4
envoy_cluster_version{envoy_cluster_name="ext-authz"} 0
envoy_cluster_circuit_breakers_high_rq_pending_open{envoy_cluster_name="ext-authz"} 0
envoy_cluster_upstream_cx_rx_bytes_buffered{envoy_cluster_name="ext-authz"} 714
envoy_cluster_upstream_rq_pending_active{envoy_cluster_name="ext-authz"} 0
envoy_cluster_circuit_breakers_high_cx_open{envoy_cluster_name="ext-authz"} 0
envoy_cluster_circuit_breakers_high_rq_retry_open{envoy_cluster_name="ext-authz"} 0
envoy_cluster_max_host_weight{envoy_cluster_name="ext-authz"} 1
envoy_cluster_upstream_rq_active{envoy_cluster_name="ext-authz"} 0
envoy_cluster_circuit_breakers_high_rq_open{envoy_cluster_name="ext-authz"} 0
envoy_cluster_upstream_cx_tx_bytes_buffered{envoy_cluster_name="ext-authz"} 0
envoy_cluster_circuit_breakers_default_cx_open{envoy_cluster_name="ext-authz"} 0
envoy_cluster_lb_subsets_active{envoy_cluster_name="ext-authz"} 0
由于我实际测试过程中,envoy是部署在k8s当中,所以,利用prometheus对pod 自动发现的能力,没有在prometheus做静态配置。自动发现的好处,就是当pod hpa的时候,不需要去更改配置文件。
要采集envoy metrics,在 envoy deployment当中要加入如下注解:
kind: Deployment
apiVersion: apps/v1
metadata:
name: gateway
labels:
app: gateway
spec:
replicas: 2
selector:
matchLabels:
app: gateway
template:
metadata:
labels:
app: gateway
annotations:
prometheus.io/scrape: 'true'
prometheus.io/path: '/stats/prometheus'
prometheus.io/port: '9000'
spec:
volumes:
- name: config
configMap:
name: gateway-cm
containers:
- name: gateway
image: 'envoyproxy/envoy:v1.9.0'
env:
- name: 'CPUS'
value: '1'
- name: 'SERVICE_NAME'
value: '-gateway'
resources:
limits:
cpu: '1'
memory: 1Gi
requests:
cpu: '1'
memory: 1Gi
volumeMounts:
- name: config
mountPath: /etc/envoy/
imagePullPolicy: Always
注意如下的注解:
annotations:
prometheus.io/scrape: 'true'
prometheus.io/path: '/stats/prometheus'
prometheus.io/port: '9000'
然后查看prometheus的管理界面,查询一下metrics:
![344a6a1e17d38298f73b6b5c554b83b3.png](https://i-blog.csdnimg.cn/blog_migrate/27fd295e61ef1a8b46eb733453b79ab5.png)
prometheus已经成功采集metrics。
grafana展示
grafana dashbord中,已经有两个做好的模板。大家引入就可以了。
具体是envoy global看板(7253)和 Envoy Service Dashboard(7250)。
最后效果如下:
![52c58852294c30e3f5250fb01a27a015.png](https://i-blog.csdnimg.cn/blog_migrate/8f3e05eb9e18b715f935923a2d6f714e.jpeg)
![916a794d848928ffdefa0982a7333291.png](https://i-blog.csdnimg.cn/blog_migrate/c0bf1ec73de839e6926c0fb128e5d9ad.jpeg)