Atlas+Spectator+Grafana搭建实时监控平台

最新推荐文章于 2024-05-01 02:45:05 发布

chengque7086

最新推荐文章于 2024-05-01 02:45:05 发布

阅读量1.6k

点赞数

文章标签： c# 测试 json

原文链接：https://my.oschina.net/u/2408085/blog/733900

版权

转载请注明原创地址：http://my.oschina.net/u/2408085/blog/733900

平台组成

Atlas

Netflix开源的管理多维时序数据的metrics后端服务系统。可以用于汇总存储基于Spectator库收集到的metrics数据，并提供强大的查询语法，支持图表，json, 图片等格式返回。

Spectator

Netflix开源的用于收集metrics的lib库，主要为了支持JDK8, 而用于替换同类旧产品Servo的项目。

具体用法参考http://netflix.github.io/spectator/en/latest/intro/counter/

Grafana

Grafana是灵活的Dashboard开源项目，可通过简单的配置自动画出对应数据源的图形。常用于实时监控系统的展示功能。

组合原理

应用通过Spectator收集metrics，并通过Atlas-client发送给Atlas, Grafana配置Atlas数据源和监控数据，实时从Atlas中获取时序数据。

搭建环境

Atlas

直接部署Netflix开源的Atlas，可独立运行的java包。由于需要grafana3-atlas-datasource的支持，同时为了更好的支持atlas json数据，需要Atlas 1.5.0+以上版本，官方release页面https://github.com/Netflix/atlas/releases 1.5.0+版本还未release。需要大家自行根据最新master代码编译打包。


$ curl -Lo memory.conf https://raw.githubusercontent.com/Netflix/atlas/master/conf/memory.conf

java -jar standalone-master.jar memory.conf

Grafana

打开http://grafana.org/download/页面，自行下载安装最新Grafana v3.X版本。

安装atals插件

感谢briangann提供的3.X插件(之前我测试使用的2.6版的插件)，在3.X插件基础上我增加了对Grafana模板功能的支持，暂时还没和原版合并，大家可以自行选择。


git clone  https://github.com/jewelknife/grafana3-atlas-datasource.git

mv grafana3-atlas-datasource /var/lib/grafana/plugins

# 重启grafana

程序新增依赖

默认应用程序已经集成springboot, springcloud。


// GC和JVM等扩展功能需要增加额外jar包，详见后面

<dependency>
   <groupId>org.springframework.cloud</groupId>
   <artifactId>spring-cloud-starter-spectator</artifactId>
</dependency>

<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

<dependency>
   <groupId>org.springframework.cloud</groupId>
   <artifactId>spring-cloud-starter-atlas</artifactId>
</dependency>

增加全局tag配置

// 默认增加spring.application.name

@Configuration
public class AtlasTagProviderConfigration {

   @Bean
   AtlasTagProvider atlasCommonTags(@Value("${spring.application.name}") String appName) {
       return () -> Collections.singletonMap("app", appName);
   }

}

开启Atlas-client push功能

增加注解
@EnableAtlas

具体功能实现

TPS

spring-cloud-neflix-core 自带mvc接收请求，restful请求，httpclient请求等metrics记录功能。

其中记录mvc接受请求的拦截器为：


MetricsHandlerInterceptor.java

public class MetricsHandlerInterceptor extends HandlerInterceptorAdapter {
  @Value("${netflix.metrics.rest.metricName:rest}")
  String metricName;

  @Value("${netflix.metrics.rest.callerHeader:#{null}}")
  String callerHeader;

  @Autowired
  MonitorRegistry registry;

  @Autowired
  ServoMonitorCache servoMonitorCache;

  @Autowired
  Collection<MetricsTagProvider> tagProviders;

  @Override
  public boolean preHandle(HttpServletRequest request, HttpServletResponse response,
        Object handler) throws Exception {
     RequestContextHolder.getRequestAttributes().setAttribute("requestStartTime",
           System.nanoTime(), SCOPE_REQUEST);
     return super.preHandle(request, response, handler);
  }

  @Override
  public void afterCompletion(HttpServletRequest request, HttpServletResponse response,
        Object handler, Exception ex) throws Exception {
     RequestContextHolder.getRequestAttributes().setAttribute("exception", ex,
           SCOPE_REQUEST);
     Long startTime = (Long) RequestContextHolder.getRequestAttributes().getAttribute(
           "requestStartTime", SCOPE_REQUEST);
     if (startTime != null)
        recordMetric(request, response, handler, startTime);
     super.afterCompletion(request, response, handler, ex);
  }

  protected void recordMetric(HttpServletRequest request, HttpServletResponse response,
        Object handler, Long startTime) {
     String caller = null;
     if (callerHeader != null) {
        caller = request.getHeader(callerHeader);
     }

     SmallTagMap.Builder builder = SmallTagMap.builder();
     for (MetricsTagProvider tagProvider : tagProviders) {
        Map<String, String> tags = tagProvider.httpRequestTags(request, response,
              handler, caller);
        for (Map.Entry<String, String> tag : tags.entrySet()) {
           builder.add(Tags.newTag(tag.getKey(), tag.getValue()));
        }
     }

     MonitorConfig.Builder monitorConfigBuilder = MonitorConfig.builder(metricName);
     monitorConfigBuilder.withTags(builder);

     servoMonitorCache.getTimer(monitorConfigBuilder.build()).record(
           System.nanoTime() - startTime, TimeUnit.NANOSECONDS);
  }
}

DefaultMetricsTagProvider.java

@Override
public Map<String, String> httpRequestTags(HttpServletRequest request,
     HttpServletResponse response, Object handler, String caller) {
  Map<String, String> tags = new HashMap<>();

  tags.put("method", request.getMethod());
  tags.put("status", ((Integer) response.getStatus()).toString());

  String uri = (String) request
        .getAttribute(HandlerMapping.BEST_MATCHING_PATTERN_ATTRIBUTE);
  if (uri == null) {
     uri = request.getPathInfo();
  }
  if (!StringUtils.hasText(uri)) {
     uri = "/";
  }
  uri = sanitizeUrlTemplate(uri.substring(1));
  tags.put("uri", uri.isEmpty() ? "root" : uri);

  Object exception = request.getAttribute("exception");
  if (exception != null) {
     tags.put("exception", exception.getClass().getSimpleName());
  }

  if (caller != null) {
     tags.put("caller", caller);
  }

  return tags;
}

上述代码可以看出，SpringCloud主要用Timer来记录Http请求，默认metricName为rest，可以通过配置"netflix.metrics.rest.metricName"进行修改。同时tags里增添了method，uri, status。所以配置数据条件如下即可

输入图片说明

Atlas里画出的图如下：

输入图片说明

JVM内存相关

集成步骤

增加依赖

com.netflix.spectator:spectator-ext-jvm:0.40.0

初始化

import com.netflix.spectator.jvm.Jmx;

@Configuration
public class JvmMonitorConfigration {

   @Autowired
   public void setRegistry(Registry registry) {
       Jmx.registerStandardMXBeans(registry);
   }

}

Metrics

jvm.memory.used

当前使用的内存大小, 单位为bytes, 根据tags可以区分出6种：(这里有个注意点，tag的value存在空格，atlas和Spectator默认不支持)

atlas.dstype=rate,id=Code Cache,memtype=NON_HEAP,policy=DefaultPublishingPolicy
atlas.dstype=rate,id=Compressed Class Space,memtype=NON_HEAP,policy=DefaultPublishingPolicy
atlas.dstype=rate,id=Metaspace,memtype=NON_HEAP,policy=DefaultPublishingPolicy
atlas.dstype=rate,id=PS Eden Space,memtype=HEAP,policy=DefaultPublishingPolicy
atlas.dstype=rate,id=PS Old Gen,memtype=HEAP,policy=DefaultPublishingPolicy
atlas.dstype=rate,id=PS Survivor Space,memtype=HEAP,policy=DefaultPublishingPolicy

如上，可以根据id和memtype过滤指定"jvm.memory.used"值，memtype表示内存类型，它只有两个值HEAP和NON_HEAP。

jvm.memory.committed

当前可使用的内存大小(包括已使用的)，单位为bytes，分类同上。（>=used） committed不足时jvm向系统申请，若超过max则发生OutOfMemoryError错误。

jvm.memory.max

最大可使用内存，单位为bytes，分类同上。（>=committed）

GC相关

集成步骤

增加依赖

com.netflix.spectator:spectator-ext-gc:0.40.0

初始化GCLogger

@Configuration
public class JvmMonitorConfigration {
  // Keep a single instance of the logger
   private GcLogger gc;

   @Autowired
   public void setRegistry(Registry registry) {
      Spectator.globalRegistry().add(registry);
       gc = new GcLogger();
       gc.start(null);
   }

}

Metrics

jvm.gc.allocationRate

年轻代GC回收内存速率，单位为bytes/second。回收的内存大小为youngGen.sizeBeforeGC - youngGen.sizeAfterGC。

jvm.gc.promotionRate

年轻代转移老年代速率，单位为bytes/second。转移的内存大小为abs(oldGen.sizeAfterGC - oldGen.sizeBeforeGC)

jvm.gc.liveDataSize

Full GC后老年代存活对象的大小，单位为bytes。

jvm.gc.maxDataSize

老年代最大大小，单位为bytes。

jvm.gc.pause

GC事件暂停时间，单位为：

statistic=max: seconds
statistic=count: events/second
statistic=totalTime: seconds/second

监控参考

输入图片说明

遇到的问题

metrics长度过长被拒绝的问题

比如hystrix相关的metrics都很长。https://github.com/spring-cloud/spring-cloud-netflix/issues/798 按照Issue中回答修改配置可解决。

metrics tags的value效验不通过问题：

比如jvm的metrics中的value会有空格，而默认校验表达式为[\.\-\w]+，它不支持空格。client和server端都会效验， client端校验功能需要反射修改效验正则表达式，服务端可以通过配置去掉校验功能。

"timerCache is above the warning threshold of 1000 with size XXX"日志告警

这个告警主要是说创建的timer已经超过默认阈值1000了，可以通过增大配置netflix.metrics.servo.cacheWarningThreshold来解决。

参考文档

转载于:https://my.oschina.net/u/2408085/blog/733900

chengque7086

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Atlas+Spectator+Grafana搭建实时监控平台

转载请注明原创地址：http://my.oschina.net/u/2408085/blog/733900平台组成AtlasNetflix开源的管理多维时序数据的metrics后端服务系统。可以用于汇总存储基于Spectator库收集到的metrics数据，并提供强大的查询语法，支持图表...
复制链接

扫一扫