环境信息
服务端:腾讯云RocketMQ服务
客户端:SpringCloudAlibaba2021.1版本
发现错误
生产环境有人反馈系统报错,上服务器查看发现大量报错日志
The producer group[] has been created before, specify another name please
排查问题
从错误信息看来是group被创建了两次,网上查了一下解决方案基本都是
new DefaultMQProducer时,提供instance name,而且instance name唯一(意思是每次创建实例使用不同的名称)。
producer.setInstanceName(RunTimeUtil.getRocketMqUniqeInstanceName());
但是该方案是治标不治本,并且由于我们是使用框架发送消息,该对象是自动创建的。
本着先使用临时方案解决,再查找问题的心态先查看DefaultMQProducer创建的地方:RocketMQComponent4BinderAutoConfiguration、RocketMQAutoConfiguration
// DefaultMQProducer bean创建
@Bean
@ConditionalOnMissingBean(DefaultMQProducer.class)
public DefaultMQProducer defaultMQProducer() {
DefaultMQProducer producer;
String configNameServer = environment.resolveRequiredPlaceholders(
"${spring.cloud.stream.rocketmq.binder.name-server:${rocketmq.producer.name-server:}}");
String ak = environment.resolveRequiredPlaceholders(
"${spring.cloud.stream.rocketmq.binder.access-key:${rocketmq.producer.access-key:}}");
String sk = environment.resolveRequiredPlaceholders(
"${spring.cloud.stream.rocketmq.binder.secret-key:${rocketmq.producer.secret-key:}}");
if (!StringUtils.isEmpty(ak) && !StringUtils.isEmpty(sk)) {
producer = new DefaultMQProducer(RocketMQBinderConstants.DEFAULT_GROUP,
new AclClientRPCHook(new SessionCredentials(ak, sk)));
producer.setVipChannelEnabled(false);
}
else {
producer = new DefaultMQProducer(RocketMQBinderConstants.DEFAULT_GROUP);
}
if (StringUtils.isEmpty(configNameServer)) {
configNameServer = RocketMQBinderConstants.DEFAULT_NAME_SERVER;
}
producer.setNamesrvAddr(configNameServer);
return producer;
}
// RocketMQTemplate 的bean创建
@Bean(destroyMethod = "destroy")
@ConditionalOnMissingBean
public RocketMQTemplate rocketMQTemplate(DefaultMQProducer mqProducer,
ObjectMapper objectMapper) {
RocketMQTemplate rocketMQTemplate = new RocketMQTemplate();
rocketMQTemplate.setProducer(mqProducer);
rocketMQTemplate.setObjectMapper(objectMapper);
return rocketMQTemplate;
}
经查看,RocketMQTemplate实现了InitializingBean接口,于是查看RocketMQTemplate中的afterPropertiesSet()方法,该方法中调用了 producer.start();
,于是又查看DefaultMQProducer.start()
/**
* 启动这个生产者实例. </p>
*/
@Override
public void start() throws MQClientException {
this.setProducerGroup(withNamespace(this.producerGroup));
this.defaultMQProducerImpl.start();
if (null != traceDispatcher) {
try {
traceDispatcher.start(this.getNamesrvAddr(), this.getAccessChannel());
} catch (MQClientException e) {
log.warn("trace dispatcher start failed ", e);
}
}
}
可以看到实际的实例启动代码在defaultMQProducerImpl.start();
,于是继续查看
// defaultMQProducerImpl.start();
public void start() throws MQClientException {
this.start(true);
}
public void start(final boolean startFactory) throws MQClientException {
switch (this.serviceState) {
case CREATE_JUST:
this.serviceState = ServiceState.START_FAILED;
//校验配置
this.checkConfig();
//设置实例名称
if (!this.defaultMQProducer.getProducerGroup().equals(MixAll.CLIENT_INNER_PRODUCER_GROUP)) {
this.defaultMQProducer.changeInstanceNameToPID();
}
//获取client工厂
this.mQClientFactory = MQClientManager.getInstance().getOrCreateMQClientInstance(this.defaultMQProducer, rpcHook);
//注册实例
boolean registerOK = mQClientFactory.registerProducer(this.defaultMQProducer.getProducerGroup(), this);
if (!registerOK) {
this.serviceState = ServiceState.CREATE_JUST;
throw new MQClientException("The producer group[" + this.defaultMQProducer.getProducerGroup()
+ "] has been created before, specify another name please." + FAQUrl.suggestTodo(FAQUrl.GROUP_NAME_DUPLICATE_URL),
null);
}
this.topicPublishInfoTable.put(this.defaultMQProducer.getCreateTopicKey(), new TopicPublishInfo());
if (startFactory) {
mQClientFactory.start();
}
log.info("the producer [{}] start OK. sendMessageWithVIPChannel={}", this.defaultMQProducer.getProducerGroup(),
this.defaultMQProducer.isSendMessageWithVIPChannel());
this.serviceState = ServiceState.RUNNING;
break;
case RUNNING:
case START_FAILED:
case SHUTDOWN_ALREADY:
throw new MQClientException("The producer service state not OK, maybe started once, "
+ this.serviceState
+ FAQUrl.suggestTodo(FAQUrl.CLIENT_SERVICE_NOT_OK),
null);
default:
break;
}
此时发现该处的instance name基本是不可修改的,一个应用中实例名默认使用的是进程pid作为实例名。
发现问题
按照上面的流程,系统运行时应该是不会出现该错误的。由于是发送消息失败,于是从源头查找问题,发送消息使用的是StreamBridge.send()方法,于是查看该方法实现原理,该方法有多个重载方法,最终调用的方法源码如下
public boolean send(String bindingName, @Nullable String binderName, Object data, MimeType outputContentType) {
if (!(data instanceof Message)) {
data = MessageBuilder.withPayload(data).build();
}
//获取生产者对象配置
ProducerProperties producerProperties = this.bindingServiceProperties.getProducerProperties(bindingName);
//绑定生产者的通道
SubscribableChannel messageChannel = this.resolveDestination(bindingName, producerProperties, binderName);
...
//发送消息
return messageChannel.send(resultMessage);
}
synchronized SubscribableChannel resolveDestination(String destinationName, ProducerProperties producerProperties, String binderName) {
//从缓存中获取
SubscribableChannel messageChannel = this.channelCache.get(destinationName);
//没获取到,从spring上下文获取
if (messageChannel == null && this.applicationContext.containsBean(destinationName)) {
messageChannel = this.applicationContext.getBean(destinationName, SubscribableChannel.class);
this.addInterceptors((AbstractMessageChannel) messageChannel);
}
//没获取到,则创建一个
if (messageChannel == null) {
messageChannel = new DirectWithAttributesChannel();
if (this.destinationBindingCallback != null) {
Object extendedProducerProperties = this.bindingService
.getExtendedProducerProperties(messageChannel, destinationName);
this.destinationBindingCallback.configure(destinationName, messageChannel,
producerProperties, extendedProducerProperties);
}
Binder binder = null;
if (StringUtils.hasText(binderName)) {
BinderFactory binderFactory = this.applicationContext.getBean(BinderFactory.class);
binder = binderFactory.getBinder(binderName, messageChannel.getClass());
}
//通道绑定到实际的生产者
this.bindingService.bindProducer(messageChannel, destinationName, false, binder);
//加入到缓存
this.channelCache.put(destinationName, messageChannel);
this.addInterceptors((AbstractMessageChannel) messageChannel);
}
return messageChannel;
}
然后查看bindingService.bindProducer()是如何 绑定到实际的生产者
public <T> Binding<T> bindProducer(T output, String outputName, boolean cache, @Nullable Binder<T, ?, ProducerProperties> binder) {
...
//执行绑定
Binding<T> binding = doBindProducer(output, bindingTarget, binder,
producerProperties);
...
return binding;
}
public <T> Binding<T> doBindProducer(T output, String bindingTarget,
Binder<T, ?, ProducerProperties> binder,
ProducerProperties producerProperties) {
//如果没有定时器,或者没有配置重试,则直接绑定(报异常会抛出)
if (this.taskScheduler == null
|| this.bindingServiceProperties.getBindingRetryInterval() <= 0) {
return binder.bindProducer(bindingTarget, output, producerProperties);
}
//否则先尝试绑定。出现异常则先响应延迟绑定,并使用定时器自动重试
else {
try {
return binder.bindProducer(bindingTarget, output, producerProperties);
}
catch (RuntimeException e) {
LateBinding<T> late = new LateBinding<T>(bindingTarget,
e.getCause() == null ? e.toString() : e.getCause().getMessage(), producerProperties, false);
rescheduleProducerBinding(output, bindingTarget, binder,
producerProperties, late, e);
return late;
}
}
}
而以上代码中 binder.bindProducer() 最终会调用到 AbstractMessageChannelBinder.createProducerMessageHandler(),而该方法已有 RocketMQMessageChannelBinder 重写。在该方法中会重新创建生产者,即RocketMQTemplate对象和DefaultMQProducer对象,此处创建DefaultMQProducer对象时会使用TOPIC名称作为instance name【producer.setInstanceName(RocketMQUtil.getInstanceName(rpcHook, destination.getName() + "|" + UtilAll.getPid()));
】,即同一个TOPIC会使用相同的instance name。
看到此处发现只有多次调用 bindingService.bindProducer() 方法,则会使用同一个instance name创建多个相同实例,此时才会报以上错误。如果要触发多次调用,只有channelCache中不存在,并且spring上下文中也不存在才会执行。
由于我们使用StreamBridge.send()方法时传入的是TOPIC名称,spring上下文中肯定不存在,于是查看channelCache的实现代码
StreamBridge(FunctionCatalog functionCatalog, FunctionRegistry functionRegistry,
BindingServiceProperties bindingServiceProperties, ConfigurableApplicationContext applicationContext,
@Nullable NewDestinationBindingCallback destinationBindingCallback) {
this.bindingService = applicationContext.getBean(BindingService.class);
this.functionCatalog = functionCatalog;
this.functionRegistry = functionRegistry;
this.applicationContext = applicationContext;
this.bindingServiceProperties = bindingServiceProperties;
this.destinationBindingCallback = destinationBindingCallback;
//此处使用的是LinkedHashMap,并且重写了removeEldestEntry()。
this.channelCache = new LinkedHashMap<String, SubscribableChannel>() {
@Override
protected boolean removeEldestEntry(Map.Entry<String, SubscribableChannel> eldest) {
//bindingServiceProperties.getDynamicDestinationCacheSize()默认为10
boolean remove = size() > bindingServiceProperties.getDynamicDestinationCacheSize();
if (remove && logger.isDebugEnabled()) {
logger.debug("Removing message channel from cache " + eldest.getKey());
}
return remove;
}
};
}
发现channelCache是一个LinkedHashMap,并且重写了removeEldestEntry()方法【感兴趣的可以去查看相关方法】。当
channelCache中元素个数大于bindingServiceProperties.getDynamicDestinationCacheSize()【该配置默认为10个】时,则移除第一个元素。至此大概知道原因了,于是登录服务器查看rocketmq的日志,默认在{user.home}/logs/rocketmqlogs 目录下,搜索instance name
符合规则RocketMQUtil.getInstanceName(rpcHook, destination.getName() + "|" + UtilAll.getPid())
的client,果然数量大于10个,至此真相大白。
分析原因
系统每次使用发送消息StreamBridge.send()发送消息时,才会创建RocketMQ的客户端实例,创建完成后则将TOPIC和SubscribableChannel存放在channelCache中。当系统中消息发送的TOPIC数量大于10个时,则会移除掉channelCache中最早维护的TOPIC关系。而下次再往该TOPIC发送消息时,则会重新创建SubscribableChannel,在创建SubscribableChannel的过程中会重新创建RocketMQ的客户端实例,导致应用中mQClientFactory存在相同的生产组报错。
解决方案
- 扩大channelCache的容量,即修改bindingServiceProperties.getDynamicDestinationCacheSize()的配置
#根据实际需求配置
spring.cloud.stream.dynamicDestinationCacheSize=20
- 使用StreamBridge.send()发送消息时,第一个参数传入springbean 名称而不是使用TOPIC名称。该方案需要在配置文件中配置生产者的SubscribableChannel
总结
由于偷懒不想在每个项目中配置生产者的SubscribableChannel,导致项目中产生隐藏BUG。好在最后问题排查出来并解决,但是由于系统部分消息丢失而导致的数据问题还需修复