最近发现一个项目在部署到机器上的时候,机器内存一直占用很高。
接着开始排查机器内存高的原因
1、查找机器上占用内存高的进程
ps aux |sort -k4nr|head -20查找占用内存高的前20个进程
发现前20个进程平均每个进程占用内存3g,总共就占用了60g。每个进程占用内存都很高,是造成机器总体内存高的原因。
2、查看进程的内存占用情况
以进程号13588为例,使用jmap -histo:live 13588|head -30查看该java进程中占用内存最高的前30个对象
发现netty内存池中的ByteBuffer一直没有被gc。
3、使用mat进行内存分析
通过jmap -dump:format=b,file=heapdump.hprof <pid> 把进程的内存占用情况导出来,导入mat做内存分析用。
查看对象的引用情况发现:es的连接没有释放。es底层使用netty通信,netty又使用ByteBuffer池缓存了ByteBuffer
4、查看es的源码
NettyTransport启动通信线程
private ClientBootstrap createClientBootstrap() {
if (blockingClient) {
clientBootstrap = new ClientBootstrap(new OioClientSocketChannelFactory(Executors.newCachedThreadPool(daemonThreadFactory(settings, TRANSPORT_CLIENT_WORKER_THREAD_NAME_PREFIX))));
} else {
int bossCount = settings.getAsInt("transport.netty.boss_count", 1);
clientBootstrap = new ClientBootstrap(new NioClientSocketChannelFactory(
Executors.newCachedThreadPool(daemonThreadFactory(settings, TRANSPORT_CLIENT_BOSS_THREAD_NAME_PREFIX)),
bossCount,
new NioWorkerPool(Executors.newCachedThreadPool(daemonThreadFactory(settings, TRANSPORT_CLIENT_WORKER_THREAD_NAME_PREFIX)), workerCount),
new HashedWheelTimer(daemonThreadFactory(settings, "transport_client_timer"))));
}
clientBootstrap.setPipelineFactory(configureClientChannelPipelineFactory());
clientBootstrap.setOption("connectTimeoutMillis", connectTimeout.millis());
String tcpNoDelay = settings.get("transport.netty.tcp_no_delay", settings.get(TCP_NO_DELAY, "true"));
if (!"default".equals(tcpNoDelay)) {
clientBootstrap.setOption("tcpNoDelay", Booleans.parseBoolean(tcpNoDelay, null));
}
String tcpKeepAlive = settings.get("transport.netty.tcp_keep_alive", settings.get(TCP_KEEP_ALIVE, "true"));
if (!"default".equals(tcpKeepAlive)) {
clientBootstrap.setOption("keepAlive", Booleans.parseBoolean(tcpKeepAlive, null));
}
ByteSizeValue tcpSendBufferSize = settings.getAsBytesSize("transport.netty.tcp_send_buffer_size", settings.getAsBytesSize(TCP_SEND_BUFFER_SIZE, TCP_DEFAULT_SEND_BUFFER_SIZE));
if (tcpSendBufferSize != null && tcpSendBufferSize.bytes() > 0) {
clientBootstrap.setOption("sendBufferSize", tcpSendBufferSize.bytes());
}
ByteSizeValue tcpReceiveBufferSize = settings.getAsBytesSize("transport.netty.tcp_receive_buffer_size", settings.getAsBytesSize(TCP_RECEIVE_BUFFER_SIZE, TCP_DEFAULT_RECEIVE_BUFFER_SIZE));
if (tcpReceiveBufferSize != null && tcpReceiveBufferSize.bytes() > 0) {
clientBootstrap.setOption("receiveBufferSize", tcpReceiveBufferSize.bytes());
}
clientBootstrap.setOption("receiveBufferSizePredictorFactory", receiveBufferSizePredictorFactory);
boolean reuseAddress = settings.getAsBoolean("transport.netty.reuse_address", settings.getAsBoolean(TCP_REUSE_ADDRESS, NetworkUtils.defaultReuseAddress()));
clientBootstrap.setOption("reuseAddress", reuseAddress);
return clientBootstrap;
}
向上回溯调用createClientBootrap()方法的地方
最后发现是在es的TransportClient中调用。在创建TransportClient时,回启动底层的通信线程。
由此发现肯定有TransportClient对象常驻在内存中,没有关闭连接。
5、在mat中模糊搜索TransportClient
6、在mat中查看TransportClient的对象引用情况
定位到问题代码在EsUtil.java中
7、查找EsUtil的引用
要确定EsUtil是什么时候被引用。只有EsUtil被引用到时,jvm编译器才会寻找这个类,并将这个类加载到AppClassLoader中。只要确保EsUtil在java进程的整个执行过程中不被引用到,jvm编译器就不会初始化EsUtil的静态变量。