Netty服务端源码阅读笔记（七）零拷贝与ByteBuf

最新推荐文章于 2024-08-15 03:43:27 发布

李有乾

最新推荐文章于 2024-08-15 03:43:27 发布

阅读量361

点赞数

分类专栏：笔记 Netty

本文链接：https://blog.csdn.net/xyjy11/article/details/115126352

版权

笔记同时被 3 个专栏收录

76 篇文章 2 订阅

订阅专栏

Java

58 篇文章 0 订阅

订阅专栏

Netty

16 篇文章 0 订阅

订阅专栏

一波笔记，可能有些粗糙，之后逐步学习逐步完善

一、DMA(Direct Memory Access)

3、AbstractByteBufAllocator

4、UnpooledByteBufAllocator

4.1、UnpooledHeapByteBuf

4.2、UnpooledDirectByteBuf

5、PooledByteBufAllocator

6、PooledHeapByteBuf和PooledDirectByteBuf

六、Netty中的零拷贝

一、DMA(Direct Memory Access)

详见计算机组成原理--I/O系统

二、零拷贝 (Zero-Copy)

零拷贝是针对CPU而言，为了使CPU在io过程中尽量少的进行数据复制操作而产生的技术。

对于操作系统来说，首先来看场景，将磁盘上某文件读取并发送到网络，传统IO（read/write）需要经过以下操作：

用户程序发起读文件请求，cpu从用户态切换为内核态，启动DMA执行数据传送，由DMA复制数据至内核缓冲区
DMA读取完毕，通知CPU中断，CPU从内核缓冲区将数据复制到应用缓冲区，并从内核态切换回用户态
接着用户程序发起socket.send(), CPU切换用户态到内核态，将应用缓冲区数据再次复制到内核socket缓冲区，发起DMA处理，之后返回结果，并从内核态切换回用户态
由DMA将内核socket缓冲区数据复制到网卡缓冲区，由socket发送出去

这种方式共进行了4次上下文切换和4次复制操作，其中2次CPU复制，2次DMA复制。

linux提供的优化方案：

1、mmap

将应用缓冲区和内核缓冲区进行映射，数据无需复制到应用缓冲区，应用空间持有内核缓冲区空间的引用，可以直接操作，那么示例场景第二步和第三步将被优化为，CPU直接在两个内核缓冲区间进行复制。

这种方式则进行了4次上下文切换和3次复制操作，其中1次CPU复制，2次DMA复制

2、sendfile

示例场景对数据并没有进行操作，原封不动发送出去，那么并不需要将文件数据读到程序中，直接发起传送请求，在linux2.1版本中操作流程就变成了：

用户程序发起文件传送请求，cpu从用户态切换为内核态，启动DMA执行数据传送，由DMA复制数据至内核缓冲区
DMA传送完毕，通知CPU中断，CPU再将数据复制到socket缓冲区，再启动DMA，并切换会用户态返回结果
DMA将socket缓冲区数据复制到网卡缓冲区

这种方式则进行了两次上下文切换和3次复制，其中1次CPU复制，2次DMA复制。

继续优化，在内核中进行的这一次复制没有必要，在linux2.4中优化了这一步，在第二步时，没有复制数据，而是将关于数据的位置和长度的信息的描述符被追加到了socket buffer 缓冲区中，第三步DMA直接根据描述符将内核缓冲区的数据复制到网卡缓冲区。至此这种方式将CPU完全解放，实现“零拷贝”。当数据需要进行操作后再传送就不能使用这种方式了。

3、广义零拷贝

上述2中实现的零拷贝，是对与操作系统而言真正意义的零拷贝，但是广义零拷贝概念，是操作中尽量减少对数据进行不必要的拷贝，能减少就算零拷贝

三、JAVA中的零拷贝

1、MappedByteBuffer

java中提供的基于内存映射mmap的零拷贝方式，NIO中的堆外缓冲DirectByteBuffer就是它的实现类

2、FileChannel

transferFrom() 和 transferTo() 两个抽象方法，底层为sendfile，map方法底层mmap，返回MappedByteBuffer

四、ByteBuffer（JAVA）

public abstract class ByteBuffer extends Buffer implements Comparable<ByteBuffer>

先看Buffer父类有什么属性

1、Buffer

public abstract class Buffer {
    
// Invariants: mark <= position <= limit <= capacity

    // 一个备用字段，暂存position的值时使用，mark()方法暂存position, reset()方法再换回来
    private int mark = -1;

    // 当前读或写到的索引位置
    private int position = 0;

    // 最多能读或写的索引位置，例如容量为512，limit设置为10，那么只能读或写前10个位置
    private int limit;

    // 容量，能够容纳的数据元素的最大数量，构造后不能改变
    private int capacity;

    // Used only by direct buffers  只被直接缓冲区使用，mmap
    // NOTE: hoisted here for speed in JNI GetDirectBufferAddress
    long address;

    Buffer(int mark, int pos, int lim, int cap) {// 包内可见package-private
        if (cap < 0)
            throw new IllegalArgumentException("Negative capacity: " + cap);
        this.capacity = cap;
        limit(lim);
        position(pos);
        if (mark >= 0) {
            if (mark > pos)
                throw new IllegalArgumentException("mark > position: ("
                                                   + mark + " > " + pos + ")");
            this.mark = mark;
        }
    }

    // 写模式转读模式
    public final Buffer flip() {
        limit = position;
        position = 0;
        mark = -1;
        return this;
    }
}

Buffer子类有ByteBuffer、CharBuffer、DoubleBuffer、FloatBuffer、IntBuffer、LongBuffer 和 ShortBuffer，基本数据类型除了boolean都有对应的缓冲区实现，本次就先看ByteBuffer

2、ByteBuffer

public abstract class ByteBuffer extends Buffer implements Comparable<ByteBuffer>
{
    // Non-null only for heap buffers
    // 仅堆内缓冲区实现类，hb不为空
    final byte[] hb;                 
    
    final int offset;
    boolean isReadOnly;                 // Valid only for heap buffers

    ByteBuffer(int mark, int pos, int lim, int cap) { // package-private
        this(mark, pos, lim, cap, null, 0);
    }

     ByteBuffer(int mark, int pos, int lim, int cap,   // package-private
                 byte[] hb, int offset)
    {
        super(mark, pos, lim, cap);
        this.hb = hb;
        this.offset = offset;
    }

    // 分配堆外内存缓冲区
    public static ByteBuffer allocateDirect(int capacity) {
        return new DirectByteBuffer(capacity);
    }

    // 分配堆内缓冲区
    public static ByteBuffer allocate(int capacity) {
        if (capacity < 0)
            throw new IllegalArgumentException();
        return new HeapByteBuffer(capacity, capacity);
    }
 
   // 将数组转为buffer
   public static ByteBuffer wrap(byte[] array) {
        return wrap(array, 0, array.length);
    }

   public static ByteBuffer wrap(byte[] array, int offset, int length)
    {
        try {
            return new HeapByteBuffer(array, offset, length);
        } catch (IllegalArgumentException x) {
            throw new IndexOutOfBoundsException();
        }
    }

    // 创建一个新缓冲区，容量为当前buffer剩余的容量，当前是direct，新buffer就是direct,当前是readOnly，新的也一样
    // 源码注释：Creates a new byte buffer whose content is a shared subsequence of this buffer's content.
    // 创建的新buffer是本buffer的共享子序列，就HeapByteBuffer来说，底层byte[]数组是同一个对象，但是position、limit等不同
    public abstract ByteBuffer slice();

    // 源码注释：Creates a new byte buffer that shares this buffer's content.
    // 创建一个新的buffer,position、limit等属性独立，但是共享byte[]数组对象
    public abstract ByteBuffer duplicate();

    // 获取当前position位置的byte，最基本的get方法，堆内实现取得底层数组对应值，堆外实现unsafe类获取内存地址中值
    public abstract byte get();

    // 获取给定index的byte,
    public abstract byte get(int index);

    public ByteBuffer get(byte[] dst) {
        return get(dst, 0, dst.length);
    }

    // 读buffer，读出的数据放到dst数组的 offset--length段
    public ByteBuffer get(byte[] dst, int offset, int length) {
        checkBounds(offset, length, dst.length);
        if (length > remaining())
            throw new BufferUnderflowException();
        int end = offset + length;
        for (int i = offset; i < end; i++)
            dst[i] = get();
        return this;
    }

    // 往当前position写入byte
    public abstract ByteBuffer put(byte b);

    // 往给定index写入byte
    public abstract ByteBuffer put(int index, byte b);

    public final ByteBuffer put(byte[] src) {
        return put(src, 0, src.length);
    }

    // 把src的 offset-length这一段写入本buffer
    public ByteBuffer put(byte[] src, int offset, int length) {
        checkBounds(offset, length, src.length);
        if (length > remaining())
            throw new BufferOverflowException();
        int end = offset + length;
        for (int i = offset; i < end; i++)
            this.put(src[i]);
        return this;
    }
}

3、HeapByteBuffer

内部就是ByteBuffer中的byte[]数组存放数据

class HeapByteBuffer extends ByteBuffer
{ 
    // 初始化底层byte[]数组
    HeapByteBuffer(int cap, int lim) {            // package-private
        super(-1, 0, lim, cap, new byte[cap], 0);
    }

   // 上文说的两个方法的实现，数组对象不变，position，limit，mark属性重新设置
    public ByteBuffer slice() {
        return new HeapByteBuffer(hb,
                                        -1,
                                        0,
                                        this.remaining(),
                                        this.remaining(),
                                        this.position() + offset);
    }

    public ByteBuffer duplicate() {
        return new HeapByteBuffer(hb,
                                        this.markValue(),
                                        this.position(),
                                        this.limit(),
                                        this.capacity(),
                                        offset);
    }
}

4、MappedByteBuffer

public abstract class MappedByteBuffer extends ByteBuffer
{

    // 文件描述符，简单理解成是个非负整数的索引值，指向内核为每一个进程所维护的该进程打开文件的记录表
    private final FileDescriptor fd;

    // This should only be invoked by the DirectByteBuffer constructors
    //
    MappedByteBuffer(int mark, int pos, int lim, int cap, // package-private
                     FileDescriptor fd)
    {
        super(mark, pos, lim, cap);
        this.fd = fd;
    }
}

5、DirectByteBuffer

public interface DirectBuffer {
    long address();

    Object attachment();

    // 堆外内存用这个来释放空间
    Cleaner cleaner();
}

class DirectByteBuffer extends MappedByteBuffer implements DirectBuffer
{
   DirectByteBuffer(int cap) {                   // package-private

        super(-1, 0, cap, cap);
        boolean pa = VM.isDirectMemoryPageAligned();
        int ps = Bits.pageSize();
        long size = Math.max(1L, (long)cap + (pa ? ps : 0));
        Bits.reserveMemory(size, cap);

        long base = 0;
        try {
            // unsafe分配内存，C.alloc()
            base = unsafe.allocateMemory(size);
        } catch (OutOfMemoryError x) {
            Bits.unreserveMemory(size, cap);
            throw x;
        }
        unsafe.setMemory(base, size, (byte) 0);
        if (pa && (base % ps != 0)) {
            // Round up to page boundary
            address = base + ps - (base & (ps - 1));
        } else {
            address = base;
        }

        // directBuffer用cleaner来清理内存
        cleaner = Cleaner.create(this, new Deallocator(base, size, cap));
        att = null;
    }


   // 获取当前位置byte，unsafe为Unsafe.getUnsafe()，调用native方法获取
    public byte get() {
        return ((unsafe.getByte(ix(nextGetIndex()))));
    }

    public byte get(int i) {
        return ((unsafe.getByte(ix(checkIndex(i)))));
    }
}

Cleaner.create(this, new Deallocator(base, size, cap));

1）new Deallocator(base, size, cap)

Deallocator对象是一个Runnable，用来调用unsafe释放内存

    private static class Deallocator
        implements Runnable
    {

        private static Unsafe unsafe = Unsafe.getUnsafe();

        private long address;
        private long size;
        private int capacity;

        private Deallocator(long address, long size, int capacity) {
            assert (address != 0);
            this.address = address;
            this.size = size;
            this.capacity = capacity;
        }

        public void run() {
            if (address == 0) {
                // Paranoia
                return;
            }
            unsafe.freeMemory(address);
            address = 0;
            Bits.unreserveMemory(size, capacity);
        }

    }

2）Cleaner.create

public class Cleaner extends PhantomReference<Object> {
    // 
    private static final ReferenceQueue<Object> dummyQueue = new ReferenceQueue();

    // 静态链表变量，每个cleaner都会被加入到链表头
    private static Cleaner first = null;

    // 双向链表
    private Cleaner next = null;
    private Cleaner prev = null;

    // 清理操作的任务，也就是Deallocator对象
    private final Runnable thunk;    

    public static Cleaner create(Object var0, Runnable var1) {
        // new一个Cleaner()，并add到链表头
        return var1 == null ? null : add(new Cleaner(var0, var1));
    }

    private Cleaner(Object var1, Runnable var2) {
        super(var1, dummyQueue);
        this.thunk = var2;
    }

    // add到cleaner链表头
    private static synchronized Cleaner add(Cleaner var0) {
        if (first != null) {
            var0.next = first;
            first.prev = var0;
        }

        first = var0;
        return var0;
    }

   // 清理时调用的方法
   public void clean() {
        if (remove(this)) {
            try {
                this.thunk.run();
            } catch (final Throwable var2) {
               //。。。
            }
        }
    }
}

五、ByteBuf（Netty）

public abstract class ByteBuf implements ReferenceCounted, Comparable<ByteBuf> {
  // 基本上所有的方法都是abstract 
  // 定义了一些基础的操作方法，几乎和ByteBuffer里边差不多的意思
}

1、ReferenceCounted

// 引用计数接口，引用计数为0然后释放空间，gc中类似
public interface ReferenceCounted { 
     // 当前引用数
     int refCnt();

     // 引用数加一
     ReferenceCounted retain();

     // 引用数减一，为0时释放空间
     boolean release();
}

2、ByteBufAllocator

实例化ByteBuf的工具类接口，定义了几个核心方法

// 只列了部分
public interface ByteBufAllocator {
    // 默认实现，内部static 方法中根据系统参数io.netty.allocator.type实例化了一个ByteBufAllocator
    // UnpooledByteBufAllocator、PooledByteBufAllocator，池化和非池化
    ByteBufAllocator DEFAULT = ByteBufUtil.DEFAULT_ALLOCATOR;

    // 返回一个入参大小的buffer，至于是direct还是heap，由实现类决定
    ByteBuf buffer(int initialCapacity);

    // 返回一个入参大小的buffer，最好是适合io的直接缓冲区direct
    ByteBuf ioBuffer(int initialCapacity);

    // 返回堆内buffer
    ByteBuf heapBuffer(int initialCapacity);

    // 返回堆外nuffer
    ByteBuf directBuffer(int initialCapacity);

    // CompositeByteBuf 是个ByteBuf的组合，底层ByteBuf[]
    CompositeByteBuf compositeBuffer(int maxNumComponents);

    CompositeByteBuf compositeHeapBuffer(int maxNumComponents);

    CompositeByteBuf compositeDirectBuffer(int maxNumComponents);
}

3、AbstractByteBufAllocator

核心方法newHeapBuffer,,newDirectBuffer都是abstract, 接下来就关注PooledByteBufAllocator和UnpooledByteBufAllocator怎么实现

public abstract class AbstractByteBufAllocator implements ByteBufAllocator {
    static final int DEFAULT_INITIAL_CAPACITY = 256;
    static final int DEFAULT_MAX_CAPACITY = Integer.MAX_VALUE;
    static final int DEFAULT_MAX_COMPONENTS = 16;
    static final int CALCULATE_THRESHOLD = 1048576 * 4; // 4 MiB page
    private final boolean directByDefault;
    private final ByteBuf emptyBuf;

    // 入参数为true表示当前系统首选buffer类型是direct类型
    // 首选direct && 当前系统支持unsafe操作，那么buffer方法返回的类型就是direct
    protected AbstractByteBufAllocator(boolean preferDirect) {
        directByDefault = preferDirect && PlatformDependent.hasUnsafe();
        emptyBuf = new EmptyByteBuf(this);
    }

    // 以下几个方法，还有些默认参数的重载方法
    @Override
    public ByteBuf buffer(int initialCapacity, int maxCapacity) {
        if (directByDefault) {
            return directBuffer(initialCapacity, maxCapacity);
        }
        return heapBuffer(initialCapacity, maxCapacity);
    }

   @Override
    public ByteBuf ioBuffer(int initialCapacity, int maxCapacity) {
        if (PlatformDependent.hasUnsafe()) {
            return directBuffer(initialCapacity, maxCapacity);
        }
        return heapBuffer(initialCapacity, maxCapacity);
    }

    @Override
    public ByteBuf heapBuffer(int initialCapacity, int maxCapacity) {
        if (initialCapacity == 0 && maxCapacity == 0) {
            return emptyBuf;
        }
        validate(initialCapacity, maxCapacity);
        return newHeapBuffer(initialCapacity, maxCapacity);
    }

    @Override
    public ByteBuf directBuffer(int initialCapacity, int maxCapacity) {
        if (initialCapacity == 0 && maxCapacity == 0) {
            return emptyBuf;
        }
        validate(initialCapacity, maxCapacity);
        return newDirectBuffer(initialCapacity, maxCapacity);
    }
}

4、UnpooledByteBufAllocator

以下的几种实现类，heap和direct两大种，每种又分了unsafe和非unsafe。

判断如果当前操作系统支持unsafe操作，就返回unsafe类型的实现类，底层getByte(index)方法就是用unsafe.getByte(byte[] , index)来操作。

heap的都继承UnpooledHeapByteBuf，unsafe类型的底层获取getByte()实现调用unsafe类实现，非unsafe类直接返回byte[index]。

direct的UnpooledDirectByteBuf和UnpooledUnsafeDirectByteBuf倒是没有继承关系，底层都是ByteBuffer.allocateDirect()，getByte()也都是调用unsafe(native)来实现，我还真没看出区别。。道行不够

至于Instrumented（仪表化），仅仅是有个属性来记录目前Allocator在堆上或堆外已经分配了多少的空间，分配数组和释放数组空间时更新了一下这个属性。

  
 @Override
    protected ByteBuf newHeapBuffer(int initialCapacity, int maxCapacity) {
        return PlatformDependent.hasUnsafe() ?
                new InstrumentedUnpooledUnsafeHeapByteBuf(this, initialCapacity, maxCapacity) :
                new InstrumentedUnpooledHeapByteBuf(this, initialCapacity, maxCapacity);
    }

    @Override
    protected ByteBuf newDirectBuffer(int initialCapacity, int maxCapacity) {
        final ByteBuf buf;
        if (PlatformDependent.hasUnsafe()) {
            buf = noCleaner ? new InstrumentedUnpooledUnsafeNoCleanerDirectByteBuf(this, initialCapacity, maxCapacity) :
                    new InstrumentedUnpooledUnsafeDirectByteBuf(this, initialCapacity, maxCapacity);
        } else {
            buf = new InstrumentedUnpooledDirectByteBuf(this, initialCapacity, maxCapacity);
        }
        return disableLeakDetector ? buf : toLeakAwareBuffer(buf);
    }

4.1、UnpooledHeapByteBuf

public class UnpooledHeapByteBuf extends AbstractReferenceCountedByteBuf {

    private final ByteBufAllocator alloc;
    byte[] array;
    private ByteBuffer tmpNioBuf;

    public UnpooledHeapByteBuf(ByteBufAllocator alloc, int initialCapacity, int maxCapacity) {
        super(maxCapacity);

        this.alloc = alloc;
        setArray(allocateArray(initialCapacity));
        setIndex(0, 0);
    }

    // 底层数组对象
    protected byte[] allocateArray(int initialCapacity) {
        return new byte[initialCapacity];
    }

    @Override
    protected byte _getByte(int index) {
        return HeapByteBufUtil.getByte(array, index);
    }

    // 释放buffer空间底层方法
    @Override
    protected void deallocate() {
        // 空实现 
       freeArray(array);
        // 数组对象置为空  {}
        array = EmptyArrays.EMPTY_BYTES;
    }
}
final class HeapByteBufUtil {

    static byte getByte(byte[] memory, int index) {
        return memory[index];
    }
}

4.2、UnpooledDirectByteBuf

public class UnpooledDirectByteBuf extends AbstractReferenceCountedByteBuf {

      private ByteBuffer buffer;

      public UnpooledDirectByteBuf(ByteBufAllocator alloc, int initialCapacity, int maxCapacity) {
        super(maxCapacity);

        this.alloc = alloc;
        setByteBuffer(allocateDirect(initialCapacity));
    }

    protected ByteBuffer allocateDirect(int initialCapacity) {
        return ByteBuffer.allocateDirect(initialCapacity);
    }

    @Override
    protected byte _getByte(int index) {
        // buffer是DirectByteBuffer，底层get为unsafe native实现
        return buffer.get(index);
    }
}

5、PooledByteBufAllocator

PooledByteBufAllocator对象中有两种PoolArena数组，每个PoolArena都是池化的buffer空间，一种heap的一种direct的。
Allocator对象为每个线程分配一组PoolArena放到ThreadLocal变量中，在newHeapBuffer/newDirectBuffer方法中，获取并调用PoolArena对象来进行分配Buf（PooledDirectByteBuf/PooledHeapByteBuf）
至于PoolArena池的结构、如何分配那是另一层，这波不关注下次再详细看

public class PooledByteBufAllocator extends AbstractByteBufAllocator implements ByteBufAllocatorMetricProvider {
  
  private final PoolThreadLocalCache threadCache = new PoolThreadLocalCache(useCacheForAllThreads);
 @Override
    protected ByteBuf newHeapBuffer(int initialCapacity, int maxCapacity) {
        // threadCache,简单理解为ThreadLocal变量，每个线程对应一个池空间
        PoolThreadCache cache = threadCache.get();

        // 获取池空间的堆内池HeapArena
        PoolArena<byte[]> heapArena = cache.heapArena;

        final ByteBuf buf;
        if (heapArena != null) {
            // 池不为空，调用池获取新的 buf
            buf = heapArena.allocate(cache, initialCapacity, maxCapacity);
        } else {
            // 否则就是unpool类型的buf
            buf = PlatformDependent.hasUnsafe() ?
                    new UnpooledUnsafeHeapByteBuf(this, initialCapacity, maxCapacity) :
                    new UnpooledHeapByteBuf(this, initialCapacity, maxCapacity);
        }

        return toLeakAwareBuffer(buf);
    }

    @Override
    protected ByteBuf newDirectBuffer(int initialCapacity, int maxCapacity) {
        PoolThreadCache cache = threadCache.get();

        // 获取堆外内存池DirectArena
        PoolArena<ByteBuffer> directArena = cache.directArena;

        final ByteBuf buf;
        if (directArena != null) {
            buf = directArena.allocate(cache, initialCapacity, maxCapacity);
        } else {
            buf = PlatformDependent.hasUnsafe() ?
                    UnsafeByteBufUtil.newUnsafeDirectByteBuf(this, initialCapacity, maxCapacity) :
                    new UnpooledDirectByteBuf(this, initialCapacity, maxCapacity);
        }

        return toLeakAwareBuffer(buf);
    }
}

6、PooledHeapByteBuf和PooledDirectByteBuf

分配从PoolArena池中获取buf空间，释放再释放回池等待下次重用

//io.netty.buffer.PoolArena#allocate(io.netty.buffer.PoolThreadCache, int, int)   
 PooledByteBuf<T> allocate(PoolThreadCache cache, int reqCapacity, int maxCapacity) {
        // 从当前线程栈上获取可重用的PooledByteBuf对象
        PooledByteBuf<T> buf = newByteBuf(maxCapacity);

        // 根据申请的大小重新分配空间
        allocate(cache, buf, reqCapacity);
        return buf;
    }

这个一时半会研究不全面，得细琢磨琢磨，下一章准备看看池的分配回收代码

暂时只是稍微理解下，小测试代码：

  public static void main(String[] args) {

        // 默认是使用direct类型实现类，这里设置为true，表示使用heap类型
        System.setProperty("io.netty.noPreferDirect", "true");

        // 默认是PoolByteBufAllocator
        ByteBufAllocator byalloc = ByteBufAllocator.DEFAULT;

        // 新建一个16字节的buffer
        ByteBuf buffer = byalloc.buffer(16);

        // 释放buffer空间，PooledByteBuf的heap\direct实现底层是将本对象放到一个threadLocal对象暂存，等下次申请的时候重用此对象
        buffer.release();

        // 在申请一个32字节的buf
        ByteBuf buffer1 = byalloc.buffer(32);

        // 结果为true,两次是用一个对象
        System.out.println(buffer == buffer1);
    }