spring容器启动失败rocketmq依然消费消息问题

问题描述

久违的线上故障又来了,线下大量骑手反馈app上餐品图片不显示了

问题定位

注:正确的问题定位方式应该是先去发布平台获取最近发布过的项目,缩小范围-_-!!!
首先根据订单定位业务线,然后查看业务流程如下(简化之后的,实际链路又臭又长完全可以优化一波):
1

业务方数据推送有问题

首先查看业务爸爸推送的数据发现图片地址大部分没有协议头(http或者https),只有部分存在。当场跪下小声的问业务爸爸你们推送的餐品图片信息怎么没有协议头啊?
业务爸爸:没有协议头是正常

继续排查发现很久前的历史数据也存在没有协议头问题。推断错误
业务爸爸这种模糊的设定很容易给排查问题带来没有必要的交互。要么全部没有协议头。要么全都有。而不是部分由部分没有。以后对接业务应该规避这类问题

替换图片为内部图片有问题

查看app获取餐品详情接口查看获取图片链接流程。定位发现查询图片只查询内部图片链接。如果替换图片链接失败,那么库里面只有外部图片的地址,返回的图片结果就会为空。查看日志返回的图片链接确实为空

反馈给对接平台协助排查是否消费消息有问题。经排查发现最近项目合并在灰发,灰发的服务spring上下文启动失败了。但服务进程还在,并且依然在消费消息,但是由于spring应用上下文启动失败,所以依赖注入的所有bean均是失败的,消息也必然会一直消费失败直至retry 16次之后彻底放弃重试

问题总结

由于上下文启动失败,但是rocketmq消费者bean没有销毁,依然在消费消息,导致消息一直消费失败,丢失了部分图片替换为内部图片的逻辑,造成了线上故障。恢复方案很简单,关闭灰发流量即可。或者直接粗暴的kill掉jvm进程都可以

为什么spring容器启动失败rocketmq依然在消费消息

首先查看消费mq消息的bean定义

@Bean(initMethod = "start", destroyMethod = "shutdown")
@ConditionalOnExpression("'${spring.profiles.active}'!='pre'")
public DefaultMQPushConsumer orderServiceMessageConsumer(@Value("${rocketmq.ns.unit}") String nameserver) throws MQClientException {
    DefaultMQPushConsumer defaultMQPushConsumer = new DefaultMQPushConsumer();
    defaultMQPushConsumer.setNamesrvAddr(nameserver);
    defaultMQPushConsumer.setConsumerGroup("consume_hema_unit_orderService_group");
    defaultMQPushConsumer.subscribe("myorder", "order-place");
    defaultMQPushConsumer.setMessageListener(orderServiceMessageListener);
    return defaultMQPushConsumer;
}

如果spring应用上下文启动失败会回调销毁所有单例bean。那么为什么没有shutdown掉呢?查看spring应用上下文启动失败日志

content:  [] [main] org.springframework.boot.SpringApplication:reportFailure:771 Application startup failed
java.lang.NullPointerException: null
	at com.....hema.gw.tmc.TmcMessagePuller.onApplicationEvent(TmcMessagePuller.java:53) ~[docking-hema-gw-1.0.1-SNAPSHOT.jar!/:?]
	at org.springframework.context.event.SimpleApplicationEventMulticaster.doInvokeListener(SimpleApplicationEventMulticaster.java:172) ~[spring-context-4.3.19.RELEASE.jar!/:4.3.19.RELEASE]
	at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:165) ~[spring-context-4.3.19.RELEASE.jar!/:4.3.19.RELEASE]
	at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:139) ~[spring-context-4.3.19.RELEASE.jar!/:4.3.19.RELEASE]
	at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:393) ~[spring-context-4.3.19.RELEASE.jar!/:4.3.19.RELEASE]
	at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:399) ~[spring-context-4.3.19.RELEASE.jar!/:4.3.19.RELEASE]
	at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:347) ~[spring-context-4.3.19.RELEASE.jar!/:4.3.19.RELEASE]
	at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:883) ~[spring-context-4.3.19.RELEASE.jar!/:4.3.19.RELEASE]
	at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:546) ~[spring-context-4.3.19.RELEASE.jar!/:4.3.19.RELEASE]
	at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:693) ~[spring-boot-1.5.16.RELEASE.jar!/:1.5.16.RELEASE]
	at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:360) ~[spring-boot-1.5.16.RELEASE.jar!/:1.5.16.RELEASE]
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:303) ~[spring-boot-1.5.16.RELEASE.jar!/:1.5.16.RELEASE]
	at org.springframework.boot.builder.SpringApplicationBuilder.run(SpringApplicationBuilder.java:134) ~[spring-boot-1.5.16.RELEASE.jar!/:1.5.16.RELEASE]
	at org.springframework.cloud.stream.binder.DefaultBinderFactory.getBinderInstance(DefaultBinderFactory.java:214) ~[spring-cloud-stream-1.3.4.RELEASE.jar!/:1.3.4.RELEASE]
	at org.springframework.cloud.stream.binder.DefaultBinderFactory.getBinder(DefaultBinderFactory.java:155) ~[spring-cloud-stream-1.3.4.RELEASE.jar!/:1.3.4.RELEASE]
	at org.springframework.cloud.stream.binding.BindingService.getBinder(BindingService.java:155) ~[spring-cloud-stream-1.3.4.RELEASE.jar!/:1.3.4.RELEASE]
	at org.springframework.cloud.stream.binding.BindingService.bindProducer(BindingService.java:111) ~[spring-cloud-stream-1.3.4.RELEASE.jar!/:1.3.4.RELEASE]
	at org.springframework.cloud.stream.binding.BindableProxyFactory.bindOutputs(BindableProxyFactory.java:238) ~[spring-cloud-stream-1.3.4.RELEASE.jar!/:1.3.4.RELEASE]
	at org.springframework.cloud.stream.binding.OutputBindingLifecycle.start(OutputBindingLifecycle.java:57) ~[spring-cloud-stream-1.3.4.RELEASE.jar!/:1.3.4.RELEASE]
	at org.springframework.context.support.DefaultLifecycleProcessor.doStart(DefaultLifecycleProcessor.java:173) ~[spring-context-4.3.19.RELEASE.jar!/:4.3.19.RELEASE]
	at org.springframework.context.support.DefaultLifecycleProcessor.access$200(DefaultLifecycleProcessor.java:50) ~[spring-context-4.3.19.RELEASE.jar!/:4.3.19.RELEASE]
	at org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.start(DefaultLifecycleProcessor.java:350) ~[spring-context-4.3.19.RELEASE.jar!/:4.3.19.RELEASE]
	at org.springframework.context.support.DefaultLifecycleProcessor.startBeans(DefaultLifecycleProcessor.java:149) ~[spring-context-4.3.19.RELEASE.jar!/:4.3.19.RELEASE]
	at org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:112) ~[spring-context-4.3.19.RELEASE.jar!/:4.3.19.RELEASE]
	at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:880) ~[spring-context-4.3.19.RELEASE.jar!/:4.3.19.RELEASE]
	at org.springframework.boot.context.embedded.EmbeddedWebApplicationContext.finishRefresh(EmbeddedWebApplicationContext.java:144) ~[spring-boot-1.5.16.RELEASE.jar!/:1.5.16.RELEASE]
	at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:546) ~[spring-context-4.3.19.RELEASE.jar!/:4.3.19.RELEASE]
	at org.springframework.boot.context.embedded.EmbeddedWebApplicationContext.refresh(EmbeddedWebApplicationContext.java:122) ~[spring-boot-1.5.16.RELEASE.jar!/:1.5.16.RELEASE]
	at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:693) ~[spring-boot-1.5.16.RELEASE.jar!/:1.5.16.RELEASE]
	at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:360) ~[spring-boot-1.5.16.RELEASE.jar!/:1.5.16.RELEASE]
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:303) ~[spring-boot-1.5.16.RELEASE.jar!/:1.5.16.RELEASE]
	at org.springframework.boot.builder.SpringApplicationBuilder.run(SpringApplicationBuilder.java:134) ~[spring-boot-1.5.16.RELEASE.jar!/:1.5.16.RELEASE]
	at com.....hema.Application.main(Application.java:28) ~[classes!/:?]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_201]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_201]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_201]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_201]
	at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48) ~[docking-mix-hema.jar:?]
	at org.springframework.boot.loader.Launcher.launch(Launcher.java:87) ~[docking-mix-hema.jar:?]
	at org.springframework.boot.loader.Launcher.launch(Launcher.java:50) ~[docking-mix-hema.jar:?]
	at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51) ~[docking-mix-hema.jar:?]

由于TmcMessagePuller bean消费spring事件时抛出NPE异常导致上下文失败,那么为什么上下文失败没有销毁bean呢?

spring何时销毁bean

上下文完成刷新之后,注册shutdown钩子方法,在收到shutdown信号时处理关闭spring应用上下文的一些动作

// org.springframework.boot.SpringApplication#refreshContext
private void refreshContext(ConfigurableApplicationContext context) {
	refresh(context);
	if (this.registerShutdownHook) {
		try {
			context.registerShutdownHook();
		}
		catch (AccessControlException ex) {
			// Not allowed in some environments.
		}
	}
}

spring应用上下文初始化失败时会销毁当前上下文中的bean,前提是BeansException异常

// org.springframework.context.support.AbstractApplicationContext#refresh
public void refresh() throws BeansException, IllegalStateException {
	synchronized (this.startupShutdownMonitor) {
		...
		try {
			...
			// Last step: publish corresponding event.
			finishRefresh();
		}
		catch (BeansException ex) {
			if (logger.isWarnEnabled()) {
				logger.warn("Exception encountered during context initialization - " +
						"cancelling refresh attempt: " + ex);
			}
			// Destroy already created singletons to avoid dangling resources.
			destroyBeans();
			// Reset 'active' flag.
			cancelRefresh(ex);
			// Propagate exception to caller.
			throw ex;
		}
		...
	}
}

问题小结

  1. 由于我们上下文没有完成刷新便抛出了未被catch住的异常,因此不会注册shutdown钩子方法,并且jvm没有shutdown信号,没有人执行kill命令,所以如果抛出异常前初始化了rocketmq bean,则会继续消费消息,不会因此而关闭销毁bean
  2. 由于我们上下文在刷新过程中抛出的是NPE异常,而spring框架catch的是BeansException异常,NPE不会被catch住,所以导致上下文刷新失败时没有销毁掉当前上下文中的bean

2
3

总结

  1. 服务监控告警盲区问题。如果消费消息异常及时发现告警也不会造成该故障
  2. 业务方对接时,尽可能的规避模糊的定义,不能对异常场景进行兜底,异常就应该是异常,给排查问题造成不必要的麻烦
  3. 通过spring bean的init,destroy回调rocketmq的start,shutdown方法存在很大的风险。正确的姿势想必大家一定会有自己的方案与思路吧,留言讨论一波撒?_
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值