netty源码浅析-池化内存PoolThreadCache分配_netty poolthreadcache占用内存很高-CSDN博客

本文链接：https://blog.csdn.net/GeekerJava/article/details/132007752

PoolThreadCache分配

上面我们已经分析了没有命中缓存的分配方式，会分配一段连续的内存来分配，现在我们来分析一下名字缓存的分配方式，这个实现就在PoolThreadCache中。我们先看看这个类的几个属性

private final MemoryRegionCache<byte[]>[] tinySubPageHeapCaches;
    private final MemoryRegionCache<byte[]>[] smallSubPageHeapCaches;
    private final MemoryRegionCache<ByteBuffer>[] tinySubPageDirectCaches;
    private final MemoryRegionCache<ByteBuffer>[] smallSubPageDirectCaches;
    private final MemoryRegionCache<byte[]>[] normalHeapCaches;
    private final MemoryRegionCache<ByteBuffer>[] normalDirectCaches;

这个就是缓存数组，我们可以看到这分为两类缓存，直接内存的和堆内存的缓存，而这里我们以直接内存分析，又可以分为了三种类型，tinySubPageDirectCaches、smallSubPageDirectCaches、normalDirectCaches。这三种类型代表了不同的内存大小缓存，我们来看下他们是如何创建的

if (directArena != null) {
            tinySubPageDirectCaches = createSubPageCaches(
                    tinyCacheSize, PoolArena.numTinySubpagePools, SizeClass.Tiny);//512 32 SubPageMemoryRegionCache(512,tiny)[32]
            smallSubPageDirectCaches = createSubPageCaches(
                    smallCacheSize, directArena.numSmallSubpagePools, SizeClass.Small);//256 4 SubPageMemoryRegionCache(256,small)[4]

            numShiftsNormalDirect = log2(directArena.pageSize);
            normalDirectCaches = createNormalCaches(
                    normalCacheSize, maxCachedBufferCapacity, directArena);//64  32K NormalMemoryRegionCache(64)[3]

            directArena.numThreadCaches.getAndIncrement();//持有该arena的线程数
        }

我们进入createSubPageCaches方法

private static <T> MemoryRegionCache<T>[] createSubPageCaches(
            int cacheSize, int numCaches, SizeClass sizeClass) {
        if (cacheSize > 0 && numCaches > 0) {
            @SuppressWarnings("unchecked")
            MemoryRegionCache<T>[] cache = new MemoryRegionCache[numCaches];//32
            for (int i = 0; i < cache.length; i++) {
                // TODO: maybe use cacheSize / cache.length
                cache[i] = new SubPageMemoryRegionCache<T>(cacheSize, sizeClass);//SubPageMemoryRegionCache ->
            }
            return cache;
        } else {
            return null;
        }
    }

这里如果是tiny类型就创建了32个MemoryRegionCache，并且将他们初始化，我们跟进SubPageMemoryRegionCache的创建

SubPageMemoryRegionCache(int size, SizeClass sizeClass) {
            super(size, sizeClass);
        }
MemoryRegionCache(int size, SizeClass sizeClass) {
            this.size = MathUtil.safeFindNextPositivePowerOfTwo(size);//获取大于size距离最近的2的次幂数值 ，这里传入的是512
            queue = PlatformDependent.newFixedMpscQueue(this.size);
            this.sizeClass = sizeClass;
        }

我们可以看到这里如果是tiny类型，则创建了一个长度为512的MpscArrayQueue队列，并且赋值了是tiny类型。small类型的创建tiny创建类似，只是创建的数组长度为4个，队列的长度为256。我们看下normal类型的创建

private static <T> MemoryRegionCache<T>[] createNormalCaches(
            int cacheSize, int maxCachedBufferCapacity, PoolArena<T> area) {
        if (cacheSize > 0 && maxCachedBufferCapacity > 0) {
            int max = Math.min(area.chunkSize, maxCachedBufferCapacity);
            int arraySize = Math.max(1, log2(max / area.pageSize) + 1);//log2(32K/8192) + 1 = 3

            @SuppressWarnings("unchecked")
            MemoryRegionCache<T>[] cache = new MemoryRegionCache[arraySize];
            for (int i = 0; i < cache.length; i++) {
                cache[i] = new NormalMemoryRegionCache<T>(cacheSize);
            }
            return cache;
        } else {
            return null;
        }
    }

创建了一个长度为3的NormalMemoryRegionCache数组，数组内的MemoryRegionCache内创建长度为64的队列，类型为Normal。这里我们看下分配后的大小分类说明

tiny：总共有32个规格，都是16的整数倍，从0B、16B、32B…496B
small：总共有四个规则，512B、1024B、2048B、4096B
normal：总共有三个规格，8K、16K、32K

也就是不同规格释放后的byteBuf会放入到数组对应位置的MemoryRegionCache的队列中，等下次使用时直接取出

我们回到上面分析过得分配内存过程，回到io.netty.buffer.PooledByteBufAllocator#newDirectBuffer方法

protected ByteBuf newDirectBuffer(int initialCapacity, int maxCapacity) {
        //每个线程都有一个threadCache，第一次执行会在这里初始化
        PoolThreadCache cache = threadCache.get();
        PoolArena<ByteBuffer> directArena = cache.directArena;

        final ByteBuf buf;
        if (directArena != null) {
            buf = directArena.allocate(cache, initialCapacity, maxCapacity);
        } else {
            buf = PlatformDependent.hasUnsafe() ?
                    UnsafeByteBufUtil.newUnsafeDirectByteBuf(this, initialCapacity, maxCapacity) :
                    new UnpooledDirectByteBuf(this, initialCapacity, maxCapacity);
        }

        return toLeakAwareBuffer(buf);
    }

这里会调用threadCache.get()方法获取PoolThreadCache，我们看到threadCache就是PoolThreadLocalCache，我们先看下这个类

final class PoolThreadLocalCache extends FastThreadLocal<PoolThreadCache>

这个类继承了FastThreadLocal这个是netty对threadLocal的重写然后调用get方法，从线程本地获取如果获取不到就创建一个会执行initialValue方法

protected synchronized PoolThreadCache initialValue() {
            final PoolArena<byte[]> heapArena = leastUsedArena(heapArenas);//共享线程使用最少的领域
            final PoolArena<ByteBuffer> directArena = leastUsedArena(directArenas);

            final Thread current = Thread.currentThread();
            if (useCacheForAllThreads || current instanceof FastThreadLocalThread) {
                final PoolThreadCache cache = new PoolThreadCache(
                        heapArena, directArena, tinyCacheSize, smallCacheSize, normalCacheSize,
                        DEFAULT_MAX_CACHED_BUFFER_CAPACITY, DEFAULT_CACHE_TRIM_INTERVAL);//512 256 64 32K 8K

                if (DEFAULT_CACHE_TRIM_INTERVAL_MILLIS > 0) {
                    final EventExecutor executor = ThreadExecutorMap.currentExecutor();
                    if (executor != null) {
                        executor.scheduleAtFixedRate(trimTask, DEFAULT_CACHE_TRIM_INTERVAL_MILLIS,
                                DEFAULT_CACHE_TRIM_INTERVAL_MILLIS, TimeUnit.MILLISECONDS);
                    }
                }
                return cache;
            }
            // No caching so just use 0 as sizes.
            return new PoolThreadCache(heapArena, directArena, 0, 0, 0, 0, 0);
        }

这里会先选出使用最少的poolArean，然后根据是不是为所有线程共享缓存或者是当前线程是不是FastThreadLocalThread类型创建不同的PoolThreadCache，表示是不是启用缓存。我们继续回到newDirectBuffer方法，我们上面已经分析过，这个方法会来到io.netty.buffer.PoolArena#allocate()中

private void allocate(PoolThreadCache cache, PooledByteBuf<T> buf, final int reqCapacity) {
        //规范后值
        final int normCapacity = normalizeCapacity(reqCapacity);//272
        if (isTinyOrSmall(normCapacity)) { // 需要申请的大小是否小于8K
            int tableIdx;
            PoolSubpage<T>[] table;
            //是不是tiny类型
            boolean tiny = isTiny(normCapacity);
            if (tiny) { // < 512
                //先从缓存中分配内存，分配成功直接返回
                if (cache.allocateTiny(this, buf, reqCapacity, normCapacity)) {// 256 272
                    // was able to allocate out of the cache so move on
                    return;
                }
                tableIdx = tinyIdx(normCapacity);//7
                table = tinySubpagePools;
            } else {
                if (cache.allocateSmall(this, buf, reqCapacity, normCapacity)) {
                    // was able to allocate out of the cache so move on
                    return;
                }
                tableIdx = smallIdx(normCapacity);
                table = smallSubpagePools;
            }
            //tiny类型table中分配normCapacity的位置
            final PoolSubpage<T> head = table[tableIdx];//table[17]

            /**
             * Synchronize on the head. This is needed as {@link PoolChunk#allocateSubpage(int)} and
             * {@link PoolChunk#free(long)} may modify the doubly linked list as well.
             */
            synchronized (head) {
                //第一次分配双向链表中就一个节点，head.next就是head
                final PoolSubpage<T> s = head.next;
                //如果分配内存大小之前已经分配过了
                if (s != head) {
                    assert s.doNotDestroy && s.elemSize == normCapacity;
                    long handle = s.allocate();
                    assert handle >= 0;
                    s.chunk.initBufWithSubpage(buf, null, handle, reqCapacity);
                    incTinySmallAllocation(tiny);
                    return;
                }
            }
            //创建新的PoolChunk分配
            synchronized (this) {
                allocateNormal(buf, reqCapacity, normCapacity);
            }

            incTinySmallAllocation(tiny);
            return;
        }
        //分配norm类型，8K-16M
        if (normCapacity <= chunkSize) {
            if (cache.allocateNormal(this, buf, reqCapacity, normCapacity)) {
                // was able to allocate out of the cache so move on
                return;
            }
            synchronized (this) {
                allocateNormal(buf, reqCapacity, normCapacity);
                ++allocationsNormal;
            }
        } else {
            // Huge allocations are never served via the cache so just call allocateHuge
            //分配huge对象，超过了16M
            allocateHuge(buf, reqCapacity);
        }
    }

这个方法我们在上面已经分析过了，但是当时跳过了缓存分配的方式，我们现在继续分析，我们知道每次分配的时候都会先从缓存分配，如果缓存无法分配了再从poolChunkList分配，如果也无法分配就新创建一个poolChunk来分配。这里我们跟进到tiny类型缓存分配的方法中

if (cache.allocateTiny(this, buf, reqCapacity, normCapacity)) {// 256 272      
        return;
   }

boolean allocateTiny(PoolArena<?> area, PooledByteBuf<?> buf, int reqCapacity, int normCapacity) {
        return allocate(cacheForTiny(area, normCapacity), buf, reqCapacity);
    }

我们继续跟进先看看cacheForTiny方法

 private MemoryRegionCache<?> cacheForTiny(PoolArena<?> area, int normCapacity) {
        int idx = PoolArena.tinyIdx(normCapacity);//除以16，查看从哪个位置分配
        //如果是直接内存
        if (area.isDirect()) {
            return cache(tinySubPageDirectCaches, idx);
        }
        return cache(tinySubPageHeapCaches, idx);
    }

private static <T> MemoryRegionCache<T> cache(MemoryRegionCache<T>[] cache, int idx) {
        if (cache == null || idx > cache.length - 1) {
            return null;
        }
        //获取数组对应下标的MemoryRegionCache
        return cache[idx];
    }

这里可以看到获取了缓存数组中对应位置的MemoryRegionCache，然后继续回到allocateTiny方法

private boolean allocate(MemoryRegionCache<?> cache, PooledByteBuf buf, int reqCapacity) {
        //非空判断
        if (cache == null) {
            // no cache found so just return false here
            return false;
        }
        //执行分配
        boolean allocated = cache.allocate(buf, reqCapacity);
    	//达到了阈值尝试释放一次
        if (++ allocations >= freeSweepAllocationThreshold) {
            allocations = 0;
            trim();
        }
        return allocated;
    }

这里先判断cache不能为null，然后开始执行申请操作

public final boolean allocate(PooledByteBuf<T> buf, int reqCapacity) {
            //队列刚初始化，这里就是获取不到内容
            Entry<T> entry = queue.poll();
            if (entry == null) {
                return false;
            }
    //初始化操作
            initBuf(entry.chunk, entry.nioBuffer, entry.handle, buf, reqCapacity);
            entry.recycle();

            // allocations is not thread-safe which is fine as this is only called from the same thread all time.
            ++ allocations;
            return true;
        }

这里会从MemoryRegionCache的队列中获取一个entry对象，我们看下这个这个entry类内部结构

static final class Entry<T> {
            final Handle<Entry<?>> recyclerHandle;
            PoolChunk<T> chunk;//属于哪个chunk
            ByteBuffer nioBuffer;//内存
            long handle = -1;//唯一确定chunk中的一段内存
            ......
}

然后回到allocate方法，这里判断如果为null，直接返回false，也就是cache分配失败。哪这个queue什么时候有值呢，其实我们在上面分析内存回收的时候有分析过我们在回顾一下io.netty.buffer.PoolArena#free方法

void free(PoolChunk<T> chunk, ByteBuffer nioBuffer, long handle, int normCapacity, PoolThreadCache cache) {
        //如果是非池化的
        if (chunk.unpooled) {
            int size = chunk.chunkSize();
            //释放内存
            destroyChunk(chunk);
            activeBytesHuge.add(-size);
            deallocationsHuge.increment();
        } else {
            //如果是池化的查看是什么类型
            SizeClass sizeClass = sizeClass(normCapacity);
            //放入队列中下次继续复用
            if (cache != null && cache.add(this, chunk, nioBuffer, handle, normCapacity, sizeClass)) {
                // cached so not free it.
                return;
            }
            //如果队列满了或者其他原因导致加入队列失败则标记这个内存未被使用
            freeChunk(chunk, handle, sizeClass, nioBuffer, false);
        }
    }

 boolean add(PoolArena<?> area, PoolChunk chunk, ByteBuffer nioBuffer,
                long handle, int normCapacity, SizeClass sizeClass) {
        //找到对应分配大小的位置
        MemoryRegionCache<?> cache = cache(area, normCapacity, sizeClass);
        if (cache == null) {
            return false;
        }
        //添加到队列中，下次申请同样大小的内存时直接使用
        return cache.add(chunk, nioBuffer, handle);
    }

public final boolean add(PoolChunk<T> chunk, ByteBuffer nioBuffer, long handle) {
            //从对象池中获取一个entry，并且将所有的chunk、内存对象和handle信息进行赋值
            Entry<T> entry = newEntry(chunk, nioBuffer, handle);
            //加入队列中
            boolean queued = queue.offer(entry);
            //如果放入失败将entry回收
            if (!queued) {
                // If it was not possible to cache the chunk, immediately recycle the entry
                entry.recycle();
            }

            return queued;
        }

有一段关键性代码，就是在释放内存的时候先尝试添加到cache中对应MemoryRegionCache的队列中

我们继续回到io.netty.buffer.PoolThreadCache.MemoryRegionCache#allocate方法，这里返回了一个entry对象并且不为null，下面开始执行初始化操作，如果是tiny或者small类型

protected void initBuf(
                PoolChunk<T> chunk, ByteBuffer nioBuffer, long handle, PooledByteBuf<T> buf, int reqCapacity) {
            chunk.initBufWithSubpage(buf, nioBuffer, handle, reqCapacity);
        }

 void initBufWithSubpage(PooledByteBuf<T> buf, ByteBuffer nioBuffer, long handle, int reqCapacity) {
        initBufWithSubpage(buf, nioBuffer, handle, bitmapIdx(handle), reqCapacity);
    }

 private void initBufWithSubpage(PooledByteBuf<T> buf, ByteBuffer nioBuffer,
                                    long handle, int bitmapIdx, int reqCapacity) {
        assert bitmapIdx != 0;

        int memoryMapIdx = memoryMapIdx(handle);//二叉树中分配的哪个page

        PoolSubpage<T> subpage = subpages[subpageIdx(memoryMapIdx)];//获取子page
        assert subpage.doNotDestroy;
        assert reqCapacity <= subpage.elemSize;

        buf.init(
            this, nioBuffer, handle,
            runOffset(memoryMapIdx) + (bitmapIdx & 0x3FFFFFFF) * subpage.elemSize + offset,
                //runOffset(memoryMapIdx): page在chunk中的偏移量
                //(bitmapIdx & 0x3FFFFFFF): 子page是属于第几个子page
                //(bitmapIdx & 0x3FFFFFFF) * subpage.elemSize + offset:表示在当前page的偏移量
                //offset:0
                reqCapacity, subpage.elemSize, arena.parent.threadCache());
    }

这里如果是normal类型

void initBuf(PooledByteBuf<T> buf, ByteBuffer nioBuffer, long handle, int reqCapacity) {
        int memoryMapIdx = memoryMapIdx(handle);//取低32位
        int bitmapIdx = bitmapIdx(handle);//取高32位
        if (bitmapIdx == 0) {//如果分配的内存大于一个page
            byte val = value(memoryMapIdx);
            assert val == unusable : String.valueOf(val);
            buf.init(this, nioBuffer, handle, runOffset(memoryMapIdx) + offset,
                    reqCapacity, runLength(memoryMapIdx), arena.parent.threadCache());
        } else {
            initBufWithSubpage(buf, nioBuffer, handle, bitmapIdx, reqCapacity);
        }
    }

这些初始化方法我们在上面已经分析过了，这里就不再分了

我们继续回到io.netty.buffer.PoolThreadCache.MemoryRegionCache#allocate方法，初始化结束后将entry对象回收，这个是如果不回收，弹出的对象会在GC的时候被回收掉，然后判断是不是到了freeSweepAllocationThreshold阈值默认是8K，如果是尝试释放内存，然后直接返回，申请内存成功。可能会有同学有疑问为什么PoolArena有长度为32的tinySubpagePools和长度为4的smallSubpagePools，而cache中也有。其实PoolArena中的中的数组，如tinySubpagePools中的元素其实也是双向链表，他会把我们第一次申请的元素连接到数组中的head下，这样我们如果在下次需要申请和之前申请的长度相同时，就可以快速找到相同大小的已经分配的poolChunk，然后就可以直接通过这个page已经切分后的subpage进行分配了。而cache中的就是缓存的MemoryRegionCache数组，里面装着已经被回收可以复用的内存。

到这里池化分配我们就分析结束了。