在往微软的Event hub持续性的发送数据时,最后出现OOM异常
-
1.发现问题:
该服务代码运行两三天,发现服务抛出OOM异常,并且保留了当时的jvm的dump文件。
服务运行参数:java -Djava.net.preferIPv4Stack=true -Xms512m -Xmx512m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/platform/dataextractor/hbasesearch01/hbasesearch01.hprof -jar /home/platform/dataextractor/hbasesearch01/hbasesearch-V-Alpha-1.0.1-jar-with-dependencies.jar
-
2.dump文件分析
使用MAT文件打开dump文件如下图所示,在Histogram下选择group by package可以将信息按照package显示,发现com.microsoft.azure.eventhubs.impl.MessageSender的object数量有45958个,很明显数量过大。所以问题的根源应该是创建了过多的MessageSender对象并且未释放,应该是发生了内存泄漏。
-
3.代码分析
以下为发送部分的代码,该功能是将数据发送到指定的partition中,由dump文件分析出是由于MessageSender对象为释放导致内存溢出,所以有可能是每次通过ehClient.createPartitionSenderSync(partitionId)创建出partitionSender后会在ehClient建立引用,虽然partitionSender.close掉,但内存回收时并不会将该对象回收掉
private boolean writePartitionQueue(String message, String taskId, String partitionId) {
try {
String payload = message;
byte[] payloadBytes = payload.getBytes(Charset.forName("UTF8"));
EventData sendEvent = EventData.create(payloadBytes);
Map<String, Object> properties = sendEvent.getProperties();
properties.put("taskId", taskId);
PartitionSender partitionSender = ehClient.createPartitionSenderSync(partitionId);
partitionSender.sendSync(sendEvent);
partitionSender.closeSync();
logger.debug("message: {}", message);
logger.debug("time: {} Send Complete...", Instant.now());
} catch (EventHubException e) {
logger.error("exception: [{}]", e);
return false;
}
return true;
}
- 4.问题解决
将代码稍作修改,共用一个partitionSender发送数据,从而保证不会不断的创建对象,以免对象未能回收导致OOM
private boolean writePartitionQueue(String message, String taskId, String partitionId) {
try {
String payload = message;
byte[] payloadBytes = payload.getBytes(Charset.forName("UTF8"));
EventData sendEvent = EventData.create(payloadBytes);
Map<String, Object> properties = sendEvent.getProperties();
properties.put("taskId", taskId);
PartitionSender partitionSender = this.partitionSenderMap.get(partitionId);
if (partitionSender != null) {
partitionSender.sendSync(sendEvent);
logger.debug("message: {}", message);
logger.debug("time: {} Send Complete...", Instant.now());
} else {
logger.error("partitionId: {} sender is null", partitionId);
}
} catch (EventHubException e) {
logger.error("exception: [{}]", e);
return false;
}
return true;
}
将代码修改并重新上线,通过jcmd命令查看内存中的MessageSender个数,发现MessageSender数量稳定在9个,并未大量的增长,所以问题解决
[platform@prodplatformlayerweixinvm1 ~]$ sudo jcmd 76063 GC.class_histogram|grep Send
143: 38 6384 org.apache.qpid.proton.engine.impl.SenderImpl
319: 38 2128 org.apache.qpid.proton.engine.impl.TransportSender
492: 38 1216 com.microsoft.azure.eventhubs.impl.SendLinkHandler
510: 9 1152 com.microsoft.azure.eventhubs.impl.MessageSender
600: 38 912 com.microsoft.azure.eventhubs.impl.MessageSender$SendTimeout
815: 8 448 com.microsoft.azure.eventhubs.impl.PartitionSenderImpl
834: 1 408 org.apache.zookeeper.ClientCnxn$SendThread
1023: 9 216 com.microsoft.azure.eventhubs.impl.MessageSender$1
1142: 9 144 com.microsoft.azure.eventhubs.impl.MessageSender$2
1143: 9 144 com.microsoft.azure.eventhubs.impl.MessageSender$7
1144: 9 144 com.microsoft.azure.eventhubs.impl.MessageSender$8
1145: 9 144 com.microsoft.azure.eventhubs.impl.MessageSender$DeliveryTagComparator
1264: 4 96 com.microsoft.azure.eventhubs.impl.MessageSender$9
1426: 3 72 org.apache.qpid.proton.amqp.transport.SenderSettleMode
1613: 3 48 com.microsoft.azure.eventhubs.impl.MessageSender$2$1
1846: 1 32 [Lorg.apache.qpid.proton.amqp.transport.SenderSettleMode;