背景
行情服务每次在开盘与收盘期间的堆外内存都会上涨,并且需要周期性手动重启,影响到服务的稳定性
排查过程
1.堆外内存的计算标准
此matrix(used_direct_memory)计算标准由netty统一进行计算,因此可以初步判断是由于netty分配的堆外内存导致内存上涨,并非直接有也直接调用Unsafe分配堆外内存
import io.netty.util.internal.PlatformDependent;
public DirectMemoryMonitor() {
// 使用堆外内存
Field usedMemory = ReflectionUtils.findField(PlatformDependent.class, "DIRECT_MEMORY_COUNTER");
usedMemory.setAccessible(true);
Field limitMemory = ReflectionUtils.findField(PlatformDependent.class, "DIRECT_MEMORY_LIMIT");
limitMemory.setAccessible(true);
try {
DIRECT_MEMORY_COUNTER = (AtomicLong) usedMemory.get(PlatformDependent.class);
DIRECT_MEMORY_LIMIT = (Long) limitMemory.get(PlatformDependent.class);
} catch (IllegalAccessException e) {
}
XueqiuMetrics.getInstance().register("used_direct_memory", (Gauge<Long>) () -> DIRECT_MEMORY_COUNTER.get());
}
2.查看此堆外内存的分配逻辑及引用方
netty的分配堆外内存,需要增加相应的数值数值,查看此方法调用
public final class PlatformDependent {
private static final AtomicLong DIRECT_MEMORY_COUNTER;
// 增加对应容量的堆外内存的数值
private static void incrementMemoryCounter(int capacity) {
if (DIRECT_MEMORY_COUNTER != null) {
for (;;) {
long usedMemory = DIRECT_MEMORY_COUNTER.get();
long newUsedMemory = usedMemory + capacity;
if (newUsedMemory > DIRECT_MEMORY_LIMIT) {
throw new OutOfDirectMemoryError("failed to allocate " + capacity
+ " byte(s) of direct memory (used: " + usedMemory + ", max: " + DIRECT_MEMORY_LIMIT + ')');
}
if (DIRECT_MEMORY_COUNTER.compareAndSet(usedMemory, newUsedMemory)) {
break;
}
}
}
}
}
查看申请ByteBuffer的具体构造方法
public static ByteBuffer allocateDirectNoCleaner(int capacity) {
assert USE_DIRECT_BUFFER_NO_CLEANER;
incrementMemoryCounter(capacity);
try {
return PlatformDependent0.allocateDirectNoCleaner(capacity);
} catch (Throwable e) {
decrementMemoryCounter(capacity);
throwException(e);
return null;
}
}
通过反射查看调用的是DirectByteBuffer(long addr, int cap)的构造函数,并且无cleaner,需要手动释放
static ByteBuffer allocateDirectNoCleaner(int capacity) {
return newDirectBuffer(UNSAFE.allocateMemory(capacity), capacity);
}
// Invoked only by JNI: NewDirectByteBuffer(void*, long)
//
private DirectByteBuffer(long addr, int cap) {
super(-1, 0, cap, cap);
address = addr;
cleaner = null;
att = null;
}
查看DirectByteBuffer的引用方,引用方是DirectArena.PoolChunk,熟悉netty的内存模型的同学,都有所了解netty的底层byte底层存储依赖chunk的管理,最后添加到DriectArena
static final class DirectArena extends PoolArena<ByteBuffer> {
private static ByteBuffer allocateDirect(int capacity) {
return PlatformDependent.useDirectBufferNoCleaner() ?
PlatformDependent.allocateDirectNoCleaner(capacity) : ByteBuffer.allocateDirect(capacity);
}
protected PoolChunk<ByteBuffer> newChunk(int pageSize, int maxOrder,
int pageShifts, int chunkSize) {
if (directMemoryCacheAlignment == 0) {
return new PoolChunk<ByteBuffer>(this,
allocateDirect(chunkSize), pageSize, maxOrder,
pageShifts, chunkSize, 0);
}
final ByteBuffer memory = allocateDirect(chunkSize
+ directMemoryCacheAlignment);
return new PoolChunk<ByteBuffer>(this, memory, pageSize,
maxOrder, pageShifts, chunkSize,
offsetCacheLine(memory));
}
private void allocateNormal(PooledByteBuf<T> buf, int reqCapacity, int normCapacity) {
if (q050.allocate(buf, reqCapacity, normCapacity) || q025.allocate(buf, reqCapacity, normCapacity) ||
q000.allocate(buf, reqCapacity, normCapacity) || qInit.allocate(buf, reqCapacity, normCapacity) ||
q075.allocate(buf, reqCapacity, normCapacity)) {
return;
}
// Add a new chunk.
PoolChunk<T> c = newChunk(pageSize, maxOrder, pageShifts, chunkSize);
long handle = c.allocate(normCapacity);
assert handle > 0;
c.initBuf(buf, handle, reqCapacity);
qInit.add(c);
}
}
3.根据服务dump文件进行有关DirectByteBuffer的分析
由于问题的复杂性,需要利用oql分析堆内文件 :相关语法介绍可以参考JVM 对象查询语言(OQL)_潘建南的博客-CSDN博客
3.1.验证监控中堆外内存数值,由于时间较长,数值有所失真
3.2.分析netty chunk 是否与堆外内存分配向匹配
oql说明:查询持有java.nio.DirectByteBuffer(cleaner 为null)引用的netty chunk 对象的明细
select map(filter(referrers(s), "/io.netty.buffer.PoolC/.test(classof(it).name)"),
"toHtml(it) + ' mem:' + toHtml(it.memory) + ' chunksize:' + toHtml(it.chunkSize) + ' unusable:' + toHtml(it.unusable) + ' free:' + toHtml(it.freeBytes)")
from java.nio.DirectByteBuffer s
where s.cleaner == null & count(referrers(s)) > 0
2021-07-29:共分配12个chunk,并且每个chunk的大小是16MB,共 12 * 16 = 192M
io.netty.buffer.PoolChunk#2 mem:java.nio.DirectByteBuffer#10 chunksize:16777216 unusable:12 free:2424832
io.netty.buffer.PoolChunk#3 mem:java.nio.DirectByteBuffer#11 chunksize:16777216 unusable:12 free:16769024
io.netty.buffer.PoolChunk#1 mem:java.nio.DirectByteBuffer#12 chunksize:16777216 unusable:12 free:16769024
io.netty.buffer.PoolChunk#4 mem:java.nio.DirectByteBuffer#17 chunksize:16777216 unusable:12 free:16515072
io.netty.buffer.PoolChunk#5 mem:java.nio.DirectByteBuffer#18 chunksize:16777216 unusable:12 free:16515072
io.netty.buffer.PoolChunk#6 mem:java.nio.DirectByteBuffer#19 chunksize:16777216 unusable:12 free:16515072
io.netty.buffer.PoolChunk#7 mem:java.nio.DirectByteBuffer#20 chunksize:16777216 unusable:12 free:16515072
io.netty.buffer.PoolChunk#8 mem:java.nio.DirectByteBuffer#21 chunksize:16777216 unusable:12 free:16515072
io.netty.buffer.PoolChunk#9 mem:java.nio.DirectByteBuffer#22 chunksize:16777216 unusable:12 free:16515072
io.netty.buffer.PoolChunk#10 mem:java.nio.DirectByteBuffer#23 chunksize:16777216 unusable:12 free:16515072
io.netty.buffer.PoolChunk#12 mem:java.nio.DirectByteBuffer#24 chunksize:16777216 unusable:12 free:16515072
io.netty.buffer.PoolChunk#13 mem:java.nio.DirectByteBuffer#25 chunksize:16777216 unusable:12 free:11436032
2021-08-10:共分配45个chunk,并且每个chunk的大小是16MB,共 45 * 16 = 720MB |
io.netty.buffer.PoolChunk#1 mem:java.nio.DirectByteBuffer#11 chunksize:16777216 unusable:12 free:1900544
io.netty.buffer.PoolChunk#2 mem:java.nio.DirectByteBuffer#13 chunksize:16777216 unusable:12 free:466944
io.netty.buffer.PoolChunk#3 mem:java.nio.DirectByteBuffer#14 chunksize:16777216 unusable:12 free:16515072
io.netty.buffer.PoolChunk#4 mem:java.nio.DirectByteBuffer#16 chunksize:16777216 unusable:12 free:16515072
io.netty.buffer.PoolChunk#5 mem:java.nio.DirectByteBuffer#17 chunksize:16777216 unusable:12 free:16515072
io.netty.buffer.PoolChunk#6 mem:java.nio.DirectByteBuffer#18 chunksize:16777216 unusable:12 free:16506880
io.netty.buffer.PoolChunk#7 mem:java.nio.DirectByteBuffer#19 chunksize:16777216 unusable:12 free:16515072
io.netty.buffer.PoolChunk#8 mem:java.nio.DirectByteBuffer#20 chunksize:16777216 unusable:12 free:16515072
io.netty.buffer.PoolChunk#10 mem:java.nio.DirectByteBuffer#21 chunksize:16777216 unusable:12 free:16515072
io.netty.buffer.PoolChunk#11 mem:java.nio.DirectByteBuffer#22 chunksize:16777216 unusable:12 free:16515072
io.netty.buffer.PoolChunk#12 mem:java.nio.DirectByteBuffer#23 chunksize:16777216 unusable:12 free:16769024
io.netty.buffer.PoolChunk#14 mem:java.nio.DirectByteBuffer#25 chunksize:16777216 unusable:12 free:0
io.netty.buffer.PoolChunk#15 mem:java.nio.DirectByteBuffer#26 chunksize:16777216 unusable:12 free:458752
io.netty.buffer.PoolChunk#16 mem:java.nio.DirectByteBuffer#27 chunksize:16777216 unusable:12 free:0
io.netty.buffer.PoolChunk#17 mem:java.nio.DirectByteBuffer#28 chunksize:16777216 unusable:12 free:0
io.netty.buffer.PoolChunk#18 mem:java.nio.DirectByteBuffer#29 chunksize:16777216 unusable:12 free:1032192
io.netty.buffer.PoolChunk#20 mem:java.nio.DirectByteBuffer#30 chunksize:16777216 unusable:12 free:262144
io.netty.buffer.PoolChunk#21 mem:java.nio.DirectByteBuffer#31 chunksize:16777216 unusable:12 free:122880
io.netty.buffer.PoolChunk#22 mem:java.nio.DirectByteBuffer#32 chunksize:16777216 unusable:12 free:1024000
io.netty.buffer.PoolChunk#23 mem:java.nio.DirectByteBuffer#33 chunksize:16777216 unusable:12 free:851968
io.netty.buffer.PoolChunk#25 mem:java.nio.DirectByteBuffer#34 chunksize:16777216 unusable:12 free:65536
io.netty.buffer.PoolChunk#26 mem:java.nio.DirectByteBuffer#35 chunksize:16777216 unusable:12 free:0
io.netty.buffer.PoolChunk#27 mem:java.nio.DirectByteBuffer#36 chunksize:16777216 unusable:12 free:327680
io.netty.buffer.PoolChunk#28 mem:java.nio.DirectByteBuffer#37 chunksize:16777216 unusable:12 free:131072
io.netty.buffer.PoolChunk#30 mem:java.nio.DirectByteBuffer#38 chunksize:16777216 unusable:12 free:663552
io.netty.buffer.PoolChunk#31 mem:java.nio.DirectByteBuffer#39 chunksize:16777216 unusable:12 free:65536
io.netty.buffer.PoolChunk#32 mem:java.nio.DirectByteBuffer#40 chunksize:16777216 unusable:12 free:294912
io.netty.buffer.PoolChunk#33 mem:java.nio.DirectByteBuffer#41 chunksize:16777216 unusable:12 free:196608
io.netty.buffer.PoolChunk#34 mem:java.nio.DirectByteBuffer#42 chunksize:16777216 unusable:12 free:3588096
io.netty.buffer.PoolChunk#36 mem:java.nio.DirectByteBuffer#43 chunksize:16777216 unusable:12 free:196608
io.netty.buffer.PoolChunk#37 mem:java.nio.DirectByteBuffer#44 chunksize:16777216 unusable:12 free:65536
io.netty.buffer.PoolChunk#38 mem:java.nio.DirectByteBuffer#45 chunksize:16777216 unusable:12 free:327680
io.netty.buffer.PoolChunk#39 mem:java.nio.DirectByteBuffer#46 chunksize:16777216 unusable:12 free:450560
io.netty.buffer.PoolChunk#40 mem:java.nio.DirectByteBuffer#47 chunksize:16777216 unusable:12 free:1941504
io.netty.buffer.PoolChunk#43 mem:java.nio.DirectByteBuffer#49 chunksize:16777216 unusable:12 free:0
io.netty.buffer.PoolChunk#44 mem:java.nio.DirectByteBuffer#50 chunksize:16777216 unusable:12 free:1376256
io.netty.buffer.PoolChunk#45 mem:java.nio.DirectByteBuffer#51 chunksize:16777216 unusable:12 free:917504
io.netty.buffer.PoolChunk#46 mem:java.nio.DirectByteBuffer#52 chunksize:16777216 unusable:12 free:8650752
io.netty.buffer.PoolChunk#48 mem:java.nio.DirectByteBuffer#53 chunksize:16777216 unusable:12 free:0
io.netty.buffer.PoolChunk#49 mem:java.nio.DirectByteBuffer#54 chunksize:16777216 unusable:12 free:0
io.netty.buffer.PoolChunk#50 mem:java.nio.DirectByteBuffer#55 chunksize:16777216 unusable:12 free:589824
io.netty.buffer.PoolChunk#51 mem:java.nio.DirectByteBuffer#56 chunksize:16777216 unusable:12 free:1736704
io.netty.buffer.PoolChunk#52 mem:java.nio.DirectByteBuffer#57 chunksize:16777216 unusable:12 free:892928
io.netty.buffer.PoolChunk#53 mem:java.nio.DirectByteBuffer#58 chunksize:16777216 unusable:12 free:2490368
io.netty.buffer.PoolChunk#54 mem:java.nio.DirectByteBuffer#59 chunksize:16777216 unusable:12 free:6160384
结论:netty chunk分配的与监控中的堆外内存基本一致,因此接下来一步需要解决的问题是,chunk中分配的内存是什么。 |
3.3. 查看DirectByteBuffer的引用问题已经chunk中分片的数据
oql说明:查询持有java.nio.DirectByteBuffer(cleaner 为null)引用的netty中的对象,包括memory、arena、chunk
select map(filter(referrers(s), "/io.netty.buffer.Pool/.test(classof(it).name)"),
"toHtml(it) + ' mem:' + toHtml(it.memory) + ' arena:' + toHtml(it.arena) + ' chunk:' + toHtml(it.chunk)")
from java.nio.DirectByteBuffer s
where s.cleaner == null & count(referrers(s)) > 0
小结:非引用的chunk呈现增长的趋势,此类chunk也极有可能是导致内存泄漏的问题点
先查看PooledUnsafeDirectByteBuf的引用,发现其主要分配对象为业务引用的静态对象以及协议的分隔符,这两种占用内存较少,并且其只占用2~3个chunk,并无增长的趋势
之后需要有效的分析的无引用的chunk,因为chunk是底层的分配单元,因此需要分析他的上一层级引用PoolArena
查看DirectArena发现其引用主要分为:PoolChunk/PoolChunkList/PoolThreadCache,由于PoolChunk/PoolChunkList与PoolArena是父子关系,因此暂不需要关注,只需关注PoolThreadCache关联的PooledUnsafeDirectByteBuf,
查看PooledUnsafeDirectByteBuf引用发现其最终关联到stock业务对象
明显发现SzL2FrameMap在不断的上涨,并且与JVM中old区的内存以及堆外内存成正比
小结:非引用类的chunk直接关联的是业务对象SzL2FrameMap,并且与堆内内存和堆外内存呈正相关的情况,并且SzL2FrameMap对象中也间接引用了DirectByteBuf对象 基本上初步判断是ringbuffer中消费问题导致其内部持有对象(SzL2FrameMap)无法被释放,最后导致堆外内存也无法释放,导致堆外内存的上涨