通过一个springcloud stream kafka metrics的BUG的解决来了解kafka binder的初始化流程。

21 篇文章 1 订阅
10 篇文章 0 订阅
首先,从官方文档上摘录了两段文档说明:


34. Metrics Emitter
Spring Cloud Stream provides a module called spring-cloud-stream-metrics that can be used to emit any available metric from Spring Boot metrics endpoint to a named channel. This module allow operators to collect metrics from stream applications without relying on polling their endpoints.


The module is activated when you set the destination name for metrics binding, e.g. spring.cloud.stream.bindings.applicationMetrics.destination=<DESTINATION_NAME>. applicationMetrics can be configured in a similar fashion to any other producer binding. The default contentType setting of applicationMetrics is application/json.




37.6 Kafka Metrics
Kafka binder module exposes the following metrics:


spring.cloud.stream.binder.kafka.someGroup.someTopic.lag - this metric indicates how many messages have not been yet consumed from given binder’s topic by given consumer group. 
For example if the value of the metric spring.cloud.stream.binder.kafka.myGroup.myTopic.lag is 1000, then consumer group myGroup has 1000 messages to waiting to be consumed from topic myTopic. 
This metric is particularly useful to provide auto-scaling feedback to PaaS platform of your choice.


大意是我们可以引入依赖:
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-stream-metrics</artifactId>
</dependency>
然后在配置文件中配置一下spring.cloud.stream.bindings.applicationMetrics.destination=<DESTINATION_NAME>, 就能happy的在/metrics上看到一些Stream程序运行时的metrics了,
而且可以将这些metrics发往指定的<DESTINATION_NAME> topic中去,是Json格式的字符串,便于其他监控软件消费它做相应的处理。


然后后面又说kafka binder 模块暴露了一个关键的监控指标:spring.cloud.stream.binder.kafka.<group>.<topic>.lag ,即consumer的消费延迟指标,这个指标确实满关键的,可以反映stream程序的处理能力
如果lag在持续的变大的话,说明消费处理速度赶不上生产速度,会导致严重的消息处理滞后问题,因此需要监控这个指标的变化情况。


于是很happy的启动程序,然后打开/metrics端点,一看,没有这个lag指标,其他的applicationMetrics倒是都在。are you kidding me ?
接下来开始了漫长的排查问题的阶段。。。

首先我打开了spring-cloud-stream-metrics模块的源码目录看了看(还好没有几个类),发现一个KafkaBinderMetrics(MD这名字一看就感觉跟这个问题相关),打开来看了看源码(主要片段):

@Override
	public Collection<Metric<?>> metrics() {
		List<Metric<?>> metrics = new LinkedList<>();
		for (Map.Entry<String, KafkaMessageChannelBinder.TopicInformation> topicInfo : this.binder.getTopicsInUse()
				.entrySet()) {
			if (!topicInfo.getValue().isConsumerTopic()) {
				continue;
			}

			String topic = topicInfo.getKey();
			String group = topicInfo.getValue().getConsumerGroup();

			try (Consumer<?, ?> metadataConsumer = createConsumerFactory(group).createConsumer()) {
				List<PartitionInfo> partitionInfos = metadataConsumer.partitionsFor(topic);
				List<TopicPartition> topicPartitions = new LinkedList<>();
				for (PartitionInfo partitionInfo : partitionInfos) {
					topicPartitions.add(new TopicPartition(partitionInfo.topic(), partitionInfo.partition()));
				}
				Map<TopicPartition, Long> endOffsets = metadataConsumer.endOffsets(topicPartitions);
				long lag = 0;
				for (Map.Entry<TopicPartition, Long> endOffset : endOffsets.entrySet()) {
					OffsetAndMetadata current = metadataConsumer.committed(endOffset.getKey());
					if (current != null) {
						lag += endOffset.getValue() - current.offset();
					}
					else {
						lag += endOffset.getValue();
					}
				}
				metrics.add(new Metric<>(String.format("%s.%s.%s.lag", METRIC_PREFIX, group, topic), lag));
			}
			catch (Exception e) {
				LOG.debug("Cannot generate metric for topic: " + topic, e);
			}
		}
		return metrics;
	}

果然是这个类产生的这个metrics,于是通过find usage,找到了它是在KafkaBinderConfiguration中定义了一个bean:

	@Bean
	public PublicMetrics kafkaBinderMetrics(KafkaMessageChannelBinder kafkaMessageChannelBinder) {
		return new KafkaBinderMetrics(kafkaMessageChannelBinder, configurationProperties);
	}

接着顺藤摸瓜 find useage KafkaBinderConfiguration,发现是由spring的binder spi载入的,在spring-cloud-stream-binder-kafka模块的META-INF/spring.binders中配置的:

kafka:\
org.springframework.cloud.stream.binder.kafka.config.KafkaBinderConfiguration

那么这个spring.binders是怎么被加载的呢?继续搜索spring.binders关键词,发现在BinderFactoryConfiguration中装载的:

@Bean
	@ConditionalOnMissingBean(BinderTypeRegistry.class)
	public BinderTypeRegistry binderTypeRegistry(ConfigurableApplicationContext configurableApplicationContext) {
		Map<String, BinderType> binderTypes = new HashMap<>();
		ClassLoader classLoader = configurableApplicationContext.getClassLoader();
		if (classLoader == null) {
			classLoader = BinderFactoryConfiguration.class.getClassLoader();
		}
		try {
			Enumeration<URL> resources = classLoader.getResources("META-INF/spring.binders");
			if (!Boolean.valueOf(this.selfContained) && (resources == null || !resources.hasMoreElements())) {
				throw new BeanCreationException("Cannot create binder factory, no `META-INF/spring.binders` " +
						"resources found on the classpath");
			}
			while (resources.hasMoreElements()) {
				URL url = resources.nextElement();
				UrlResource resource = new UrlResource(url);
				for (BinderType binderType : parseBinderConfigurations(classLoader, resource)) {
					binderTypes.put(binderType.getDefaultName(), binderType);
				}
			}
		}
		catch (IOException | ClassNotFoundException e) {
			throw new BeanCreationException("Cannot create binder factory:", e);
		}
		return new DefaultBinderTypeRegistry(binderTypes);
	}

再find usage BinderFactoryConfiguration ,发现是在@EnableBinding注解中@Import导入的配置:

@Configuration
@Import({ BindingServiceConfiguration.class, BindingBeansRegistrar.class, BinderFactoryConfiguration.class,
		SpelExpressionConverterConfiguration.class })
@EnableIntegration
public @interface EnableBinding {

	/**
	 * A list of interfaces having methods annotated with {@link Input} and/or
	 * {@link Output} to indicate binding targets.
	 */
	Class<?>[] value() default {};

}

哈。。@EnableBinding注解不就是我们在使用spring cloud stream的时候打的注解用来注入binding channel的嘛:

@EnableBinding(Processor.class)
public class AProcessor {

    @Value("${spring.cloud.stream.bindings.input.consumer.instanceIndex}")
    private int instanceIndex;
    @Value("${spring.cloud.stream.bindings.input.consumer.instanceCount}")
    private int instanceCount;

    @StreamListener(Processor.INPUT)
//    @SendTo(Processor.OUTPUT)
    public void process(MyUser myUser) {
//        externalService();
//        return myUser.toString();

        System.out.println("instance" + instanceIndex + " received message:" + myUser.toString());
    }

    private void externalService() {
        throw new RuntimeException("An external call failed");
    }

}

于是仔细查看了BinderFactoryConfiguration的每一行代码,声明的每一个bean。。。它里面有声明了一个DefaultBinderFactory bean:

@Bean
	@ConditionalOnMissingBean(BinderFactory.class)
	public DefaultBinderFactory binderFactory(BinderTypeRegistry binderTypeRegistry,
			BindingServiceProperties bindingServiceProperties) {
		DefaultBinderFactory binderFactory = new DefaultBinderFactory(
				getBinderConfigurations(binderTypeRegistry, bindingServiceProperties), binderTypeRegistry);
		binderFactory.setDefaultBinder(bindingServiceProperties.getDefaultBinder());
		binderFactory.setListeners(binderFactoryListeners);
		return binderFactory;
	}

点进去看了看DefaultBinderFactory的源码(核心片段):

private <T> Binder<T, ?, ?> getBinderInstance(String configurationName) {
		if (!this.binderInstanceCache.containsKey(configurationName)) {
			BinderConfiguration binderConfiguration = this.binderConfigurations.get(configurationName);
			if (binderConfiguration == null) {
				throw new IllegalStateException("Unknown binder configuration: " + configurationName);
			}
			BinderType binderType = this.binderTypeRegistry.get(binderConfiguration.getBinderType());
			Assert.notNull(binderType, "Binder type " + binderConfiguration.getBinderType() + " is not defined");
			Properties binderProperties = binderConfiguration.getProperties();
			// Convert all properties to arguments, so that they receive maximum
			// precedence
			ArrayList<String> args = new ArrayList<>();
			for (Map.Entry<Object, Object> property : binderProperties.entrySet()) {
				args.add(String.format("--%s=%s", property.getKey(), property.getValue()));
			}
			// Initialize the domain with a unique name based on the bootstrapping context
			// setting
			ConfigurableEnvironment environment = this.context != null ? this.context.getEnvironment() : null;
			String defaultDomain = environment != null ? environment.getProperty("spring.jmx.default-domain") : null;
			if (defaultDomain == null) {
				defaultDomain = "";
			}
			else {
				defaultDomain += ".";
			}
			args.add("--spring.jmx.default-domain=" + defaultDomain + "binder." + configurationName);
			args.add("--spring.main.applicationContextClass=" + AnnotationConfigApplicationContext.class.getName());
			List<Class<?>> configurationClasses = new ArrayList<Class<?>>(
					Arrays.asList(binderType.getConfigurationClasses()));
			SpringApplicationBuilder springApplicationBuilder = new SpringApplicationBuilder()
					.sources(configurationClasses.toArray(new Class<?>[] {})).bannerMode(Mode.OFF).web(false);
			// If the environment is not customized and a main context is available, we
			// will set the latter as parent.
			// This ensures that the defaults and user-defined customizations (e.g. custom
			// connection factory beans)
			// are propagated to the binder context. If the environment is customized,
			// then the binder context should
			// not inherit any beans from the parent
			boolean useApplicationContextAsParent = binderProperties.isEmpty() && this.context != null;
			if (useApplicationContextAsParent) {
				springApplicationBuilder.parent(this.context);
			}
			if (useApplicationContextAsParent || (environment != null && binderConfiguration.isInheritEnvironment())) {
				if (environment != null) {
					StandardEnvironment binderEnvironment = new StandardEnvironment();
					binderEnvironment.merge(environment);
					springApplicationBuilder.environment(binderEnvironment);
				}
			}
			ConfigurableApplicationContext binderProducingContext = springApplicationBuilder
					.run(args.toArray(new String[args.size()]));
			@SuppressWarnings("unchecked")
			Binder<T, ?, ?> binder = binderProducingContext.getBean(Binder.class);
			if (this.listeners != null) {
				for (Listener binderFactoryListener : listeners) {
					binderFactoryListener.afterBinderContextInitialized(configurationName, binderProducingContext);
				}
			}
			this.binderInstanceCache.put(configurationName, new BinderInstanceHolder(binder, binderProducingContext));
		}
		return (Binder<T, ?, ?>) this.binderInstanceCache.get(configurationName).getBinderInstance();
	}

在这个getBinderInstance方法中,我竟然惊喜的发现尼玛他启动了另外一个ApplicationContext,作为当前root context(AnnotationConfigEmbeddedWebApplicationContext)的child context。。。然后KafkaBinderConfiguration里面定义的那个kafkaBinderMetrics bean是在这个child context中,而/metrics端点的MetricsEndpoint bean是通过EndpointAutoConfiguration 自动装配类在root context中定义的,所以导致了/metrics(MetricsEndpoint的invoke方法):

@Override
public Map<String, Object> invoke() {
   Map<String, Object> result = new LinkedHashMap<String, Object>();
   List<PublicMetrics> metrics = new ArrayList<PublicMetrics>(this.publicMetrics);
   for (PublicMetrics publicMetric : metrics) {
      try {
         for (Metric<?> metric : publicMetric.metrics()) {
            result.put(metric.getName(), metric.getValue());
         }
      }
      catch (Exception ex) {
         // Could not evaluate metrics
      }
   }
   return result;
}

无法获取到kafkaBinderMetrics中暴露的lag metrics。

到此,我们算是找出问题的原因了。那么如何解决呢?

我们再回到org.springframework.cloud.stream.binder.DefaultBinderFactory#getBinderInstance方法里面看看,发现在binder的child context启动完之后,会调用org.springframework.cloud.stream.binder.DefaultBinderFactory.Listener的afterBinderContextInitialized作为一个后置处理方法,可以让我们实现一些自己的逻辑:

if (this.listeners != null) {
				for (Listener binderFactoryListener : listeners) {
					binderFactoryListener.afterBinderContextInitialized(configurationName, binderProducingContext);
				}
			}

这都不是最关键的,最关键的是这个afterBinderContextInitialized的第二个参数,它是这个binder的context引用!!!!!

通过这个binderProducingContext,我们就可以获取到定义的那个kafkaBinderMetrics bean啊!!!!!!

现在主要的问题就是这个this.linsteners是怎么被传进来的问题了。。。

通过find usage org.springframework.cloud.stream.binder.DefaultBinderFactory#setListeners方法:

	public void setListeners(Collection<Listener> listeners) {
		this.listeners = listeners;
	}

发现这个方法是在BinderFactoryConfiguration中被调用的(实例化了DefaultBinderFactory并且setListeners):

@Bean
	@ConditionalOnMissingBean(BinderFactory.class)
	public DefaultBinderFactory binderFactory(BinderTypeRegistry binderTypeRegistry,
			BindingServiceProperties bindingServiceProperties) {
		DefaultBinderFactory binderFactory = new DefaultBinderFactory(
				getBinderConfigurations(binderTypeRegistry, bindingServiceProperties), binderTypeRegistry);
		binderFactory.setDefaultBinder(bindingServiceProperties.getDefaultBinder());
		binderFactory.setListeners(binderFactoryListeners);
		return binderFactory;
	}
	@Autowired(required = false)
	private Collection<DefaultBinderFactory.Listener> binderFactoryListeners;

看到这里,解决方案一目了然了。。。那就是定义一个DefaultBinderFactory.Listener 的bean就行了:

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.actuate.endpoint.MetricsEndpoint;
import org.springframework.cloud.stream.binder.DefaultBinderFactory;
import org.springframework.cloud.stream.binder.kafka.KafkaBinderMetrics;
import org.springframework.context.ConfigurableApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class KafkaBinderMetricsConfig {

    @Autowired
    private MetricsEndpoint metricsEndpoint;


    /**
     * 在 binder application context ready之后,会调用这个Listener的afterBinderContextInitialized。
     *
     * @return
     */
    @Bean
    public DefaultBinderFactory.Listener listenerForMeterBinders() {

        return new DefaultBinderFactory.Listener() {
            @Override
            public void afterBinderContextInitialized(String configurationName, ConfigurableApplicationContext binderContext) {
                //这个binderContext是当前context(AnnotationConfigEmbeddedWebApplicationContext)的child context
                // metricsEndpoint是在AnnotationConfigEmbeddedWebApplicationContext中定义的
                // 而kafkaBinderMetrics是在child context中定义的,所以当访问/metrics端点的时候,并不会暴露binder的metrics,
                // 即spring.cloud.stream.binder.kafka.<group>.<destination>.lag这个关键监控指标没法看到
                // 这是一个BUG,还好提供了一个回调Listener可以让我们有机会获取到这个binderContext,
                // 然后将这里面的KafkaBinderMetrics bean弄出来,手工注册到metricsEndpoint中去。
                metricsEndpoint.registerPublicMetrics(binderContext.getBean(KafkaBinderMetrics.class));
            }
        };

    }


}

重新启动程序,访问/metrics端点:


惊不惊喜,意不意外。

备注:以上使用的springcloud版本为:Edgware.SR3, springboot版本为:1.5.12.RELEASE

2.1版本好像已经修复了这个问题,kafka binder 配置不再使用单独的context来装载。

最后总结一下:首先当然是springboot基本原理要清楚,然后要熟练运用idea的功能(本文就大量使用find usage和ctrl+shift+f全局搜索功能来梳理调用的来龙去脉)












评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值