NIO中的内存映射和零拷贝及Netty中的零拷贝

最新推荐文章于 2024-05-22 14:01:43 发布

筏镜

最新推荐文章于 2024-05-22 14:01:43 发布

阅读量780

点赞数

分类专栏： nio 文章标签： nio 零拷贝 netty mmp sendFile

本文链接：https://blog.csdn.net/fajing_feiyue/article/details/106893591

版权

nio 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

传统IO

在这里插入图片描述

这传统io读取数据并发送的流程图。

在这里插入图片描述
这传统io读取数据并发送的过程中发生的上下文切换过程，与拷贝的对应过程。

1、DMA模块从磁盘中读取文件内容，内核通过sys_read()（或等价的方法）从文件读取数据，并将其存储在内核空间的缓冲区内，完成了第1次复制。（系统调用read导致了从用户空间到内核空间的上下文切换。）
2、数据从内核空间缓冲区复制到用户空间缓冲区，read()方法返回导致上下文从内核态切换到用户态。此时，需要的数据已存放在指定的用户空间缓冲区内(参数tmp_buf)。

3、write()调用导致上下文从用户态切换到内核态。第三次拷贝数据从用户空间重新拷贝到内核空间缓冲区。但是，这一次，数据被写入一个不同的缓冲区，一个与目标套接字相关联的缓冲区。
4、系统调用返回，导致了第4次上下文切换。第4次复制在DMA模块将数据从内核空间缓冲区传递至协议引擎的时候发生，这与我们的代码的执行是独立且异步发生的。

sendFile

sendfile系统调用在内核版本2.1中被引入，目的是简化通过网络在两个本地文件之间进行的数据传输过程。sendfile系统调用的引入，不仅减少了数据复制，还减少了上下文切换的次数。FileChannel 的write 和 read 方法均是线程安全的，实现了数据直接从内核的读缓冲区传输到套接字缓冲区，避免了用户态(User-space) 与内核态(Kernel-space) 之间的数据拷贝。它内部通过一把 private final Object positionLock = new Object(); 锁来控制并发。

在这里插入图片描述
sendFile（）在linux 2.1 到2.4 之间的系统调用流程图。

在这里插入图片描述
transferTo方法调用触发DMA引擎将文件上下文信息拷贝到内核读缓冲区，接着内核将数据从内核缓冲区拷贝到与套接字相关联的缓冲区。
DMA引擎将数据从内核套接字缓冲区传输到协议引擎（第三次数据拷贝）。

在内核版本2.4中，socket缓冲区描述符结构发生了改动，以适应聚合操作的要求——这就是Linux中所谓的"零拷贝“。这种方式不仅减少了多个上下文切换，而且消除了数据冗余。从用户层应用程序的角度来开，没有发生任何改动，所有代码仍然是类似下面的形式：sendfile(socket, file, len);
在这里插入图片描述

linux 2.4 以后执行sendFile（）的流程。

在这里插入图片描述

linux 2.4 以后执行sendFile（）对应执行过程以及对应上下文切换。

1、sendFile（）方法调用触发DMA引擎将文件上下文信息拷贝到内核缓冲区。
2、数据不会被拷贝到套接字缓冲区，只有数据的描述符（包括数据位置和长度）被拷贝到套接字缓冲区。DMA 引擎直接将数据从内核缓冲区拷贝到协议引擎，这样减少了最后一次需要消耗CPU的拷贝操作。

示例代码：

//    public static void main(String[] args) throws IOException {
//        long startTime = System.currentTimeMillis();
//       File toFile = new File("C:\\Users\\Administrator\\Desktop\\fileTest\\xcd_buffer.zip");
//        File fromFile = new File("C:\\Users\\Administrator\\Downloads\\xcd.zip");
//        /*  fileCopyWithFileChannel(fromFile,toFile);*/
//        bufferedCopy(fromFile, toFile);
//        long endTime = System.currentTimeMillis();
//        System.out.println(endTime - startTime);
//    }


    /**
     * fileChannel进行文件复制（零拷贝）
     *
     * @param fromFile 源文件
     * @param toFile   目标文件
     */
    public static void fileCopyWithTransfer(File fromFile, File toFile) {
        try (
                // 得到fileInputStream的文件通道
             FileChannel fileChannelInput = new FileInputStream(fromFile).getChannel();
             // 得到fileOutputStream的文件通道
             FileChannel fileChannelOutput = new FileOutputStream(toFile).getChannel()) {

            //将fileChannelInput通道的数据，写入到fileChannelOutput通道
            fileChannelInput.transferTo(0, fileChannelInput.size(), fileChannelOutput);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }


    static final int BUFFER_SIZE = 1024;
    /**
     * BufferedInputStream进行文件复制（用作对比实验）
     *
     * @param fromFile 源文件
     * @param toFile   目标文件
     */
    public static void bufferedCopy(File fromFile,File toFile)  {
        try(BufferedInputStream bis = new BufferedInputStream(new FileInputStream(fromFile));
            BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(toFile))){
            byte[] buf = new byte[BUFFER_SIZE];
            while ((bis.read(buf)) != -1) {
                bos.write(buf);
            }
        }catch (IOException e){
            e.printStackTrace();
        }
    }

其实可以看到在使用java 调用的时候，是直接使用transferTo 而不是sendfile(socket, file, len);。那我这里通过transferTo查看这个调用过程。

 /* * natures and states of the channels.  Fewer than the requested number of
     * bytes are transferred if this channel's file contains fewer than
     * <tt>count</tt> bytes starting at the given <tt>position</tt>, or if the
     * target channel is non-blocking and it has fewer than <tt>count</tt>
     * bytes free in its output buffer.
     *
     * <p> This method does not modify this channel's position.  If the given
     * position is greater than the file's current size then no bytes are
     * transferred.  If the target channel has a position then bytes are
     * written starting at that position and then the position is incremented
     * by the number of bytes written.
     *
     * <p> This method is potentially much more efficient than a simple loop
     * that reads from this channel and writes to the target channel.  Many
     * operating systems can transfer bytes directly from the filesystem cache
     * to the target channel without actually copying them.  </p>
     *
     */
    public abstract long transferTo(long position, long count,
                                    WritableByteChannel target)
        throws IOException;

从上面看
Many operating systems can transfer bytes directly from the filesystem cache
to the target channel without actually copying the 从上面看出，当系统支持零拷贝的，这个才会支持。不然也同样会走传统方式。

public long transferTo(long var1, long var3, WritableByteChannel var5) throws IOException {
        this.ensureOpen();
     。。。（省略部分代码）
     // 如果内核支持，采用直接传送的方式
                if ((var9 = this.transferToDirectly(var1, var8, var5)) >= 0L) {
                    return var9;
                } else {
                //this.transferToTrustedChannel(var1, (long)var8, var5))    尝试使用mmap传送方式
                    return (var9 = this.transferToTrustedChannel(var1, (long)var8, var5)) >= 0L ? var9 :
                    //传统的传送方式
                     this.transferToArbitraryChannel(var1, var8, var5);
                }
            }
        } else {
            throw new IllegalArgumentException();
        }
    }

 private long transferToDirectly(long var1, int var3, WritableByteChannel var4) throws IOException {
      。。。。//省略部分代码
            if (var5 == null) {
                return -4L;
            } else {
                int var19 = IOUtil.fdVal(this.fd);
                int var7 = IOUtil.fdVal(var5);
                if (var19 == var7) {
                    return -4L;
                } else if (this.nd.transferToDirectlyNeedsPositionLock()) {
                    Object var8 = this.positionLock;
                    synchronized(this.positionLock) {
                        long var9 = this.position();

                        long var11;
                        try {
                        //进行只真正文件传输
                            var11 = this.transferToDirectlyInternal(var1, var3, var4, var5);
                        } finally {
                            this.position(var9);
                        }

                        return var11;
                    }
                } else {
                 //进行只真正文件传输
                    return this.transferToDirectlyInternal(var1, var3, var4, var5);
                }
            }
        }
    }

 private long transferToDirectlyInternal(long var1, int var3, WritableByteChannel var4, FileDescriptor var5) throws IOException {
        assert !this.nd.transferToDirectlyNeedsPositionLock() || Thread.holdsLock(this.positionLock);

。。。//省略部分代码（可以看到，java实际调用是transferTo0）
            do {
                var6 = this.transferTo0(this.fd, var1, (long)var3, var5);
            } while(var6 == -3L && this.isOpen());

           。。。

        return var9;
    }

最终transferTo()方法还是需要委托给native的方法transferTo0()来完成调用，此方法的源码依然在FileChannelImpl.c中：

JNIEXPORT jlong JNICALL
Java_sun_nio_ch_FileChannelImpl_transferTo0(JNIEnv *env, jobject this,
                                            jobject srcFDO,
                                            jlong position, jlong count,
                                            jobject dstFDO)
{
    jint srcFD = fdval(env, srcFDO);
    jint dstFD = fdval(env, dstFDO);

#if defined(__linux__)
    off64_t offset = (off64_t)position;
    // 内部确实是sendfile()系统调用
    jlong n = sendfile64(dstFD, srcFD, &offset, (size_t)count);
   。。。
    return n;
#elif defined (__solaris__)
    sendfilevec64_t sfv;
    size_t numBytes = 0;
    jlong result;

    sfv.sfv_fd = srcFD;
    sfv.sfv_flag = 0;
    sfv.sfv_off = (off64_t)position;
    sfv.sfv_len = count;
    // 内部确实是sendfile()系统调用
    result = sendfilev64(dstFD, &sfv, 1, &numBytes);

    /* Solaris sendfilev() will return -1 even if some bytes have been
     * transferred, so we check numBytes first.
     */
。。。
    return result;

mmap

它可以将一段用户空间内存映射到内核空间, 当映射成功后, 用户对这段内存区域的修改可以直接反映到内核空间；同样地，内核空间对这段区域的修改也直接反映用户空间。省去了从内核缓冲区复制到用户空间的过程，文件中的位置在虚拟内存中有了对应的地址，可以像操作内存一样操作这个文件，这样的文件读写文件方式少了数据从内核缓存到用户空间的拷贝，效率很高。

tmp_buf = mmap(file, len);
write(socket, tmp_buf, len);
在这里插入图片描述
mmp的流程图。

mmp中上下文切换流程图。

1、mmap系统调用导致文件的内容通过DMA模块被复制到内核缓冲区中，该缓冲区之后与用户进程共享，这样就内核缓冲区与用户缓冲区之间的复制就不会发生。

2、.write系统调用导致内核将数据从内核缓冲区复制到与socket相关联的内核缓冲区中。

3、 DMA模块将数据由socket的缓冲区传递给协议引擎时，第3次复制发生。

MMAP 使用时必须实现指定好内存映射的大小，mmap 在 Java 中一次只能映射 1.5~2G 的文件内存，其中RocketMQ 中限制了单文件1G来避免这个问题
MMAP 可以通过 force() 来手动控制，但控制不好也会有大麻烦
MMAP 的回收问题，当 MappedByteBuffer 不再需要时，可以手动释放占用的虚拟内存，但使用方式非常的麻烦

示例代码：
写：

public static void main(String[] args) {
 File file = new File("C:\\Users\\Administrator\\Desktop\\fileTest\\a.txt");
    try (FileChannel fileChannel = new RandomAccessFile(file, "rw").getChannel();) {
        //MappedByteBuffer 便是MMAP的操作类(获得一个 1.5k 的文件)
        MappedByteBuffer mappedByteBuffer = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, (int)(1.5 * 1024));
// write
       // byte[] data = new byte[];
        byte[] data = new String("你们好123").getBytes("utf-8");
        System.out.println(data.length);
        int position = 8;
//从当前 mmap 指针的位置写入 的数据
        mappedByteBuffer.put(data);
//指定 position 写入 数据
        //Creates a new byte buffer whose content is a shared subsequence of
        //     this buffer's content.
        MappedByteBuffer subBuffer = (MappedByteBuffer) mappedByteBuffer.slice();
        subBuffer.position(position);
        subBuffer.put(data);



    } catch (Exception e) {
        e.printStackTrace();
    }
}

读：

public static void main(String[] args) {

    File file = new File("C:\\Users\\Administrator\\Desktop\\fileTest\\a.txt");
    try (FileChannel fileChannel = new RandomAccessFile(file, "r").getChannel();) {
        //MappedByteBuffer 便是MMAP的操作类(获得一个 1.5k 的文件)
        MappedByteBuffer mappedByteBuffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, (int)(1.5 * 1024));
// write
         byte[] data = new byte[12];

        int position = 8+12;
//从当前 mmap 指针的位置写入 的数据
//指定 position 写入 数据
        //Creates a new byte buffer whose content is a shared subsequence of
        //     this buffer's content.
        MappedByteBuffer subBuffer = (MappedByteBuffer) mappedByteBuffer.slice();
        subBuffer.position(position);
        subBuffer.get(data);

        System.out.println(new String(data, "utf-8"));

    } catch (Exception e) {
        e.printStackTrace();
    }
}

 public static void fileReadWithMmap(File fileIn,File fileOut) {

        long begin = System.currentTimeMillis();
        byte[] b = new byte[BUFFER_SIZE];
        int len = (int) fileIn.length();
        try ( FileChannel channelIn=new RandomAccessFile(fileIn, "r").getChannel();
              FileChannel channelOut=new RandomAccessFile(fileOut, "rw").getChannel();) {
            // 将文件所有字节映射到内存中。返回MappedByteBuffer
            MappedByteBuffer mappedByteBufferInt = channelIn.map(FileChannel.MapMode.READ_ONLY, 0, len);
            MappedByteBuffer mappedByteBufferOut =channelOut.map(FileChannel.MapMode.READ_WRITE, 0, len);

            for (int offset = 0; offset < len; offset += BUFFER_SIZE) {
                if (len - offset > BUFFER_SIZE) {
                    mappedByteBufferInt.get(b);
                    mappedByteBufferOut.put(b);
                } else {
                    byte[] bytes = new byte[len - offset];
                    mappedByteBufferInt.get(bytes);
                    mappedByteBufferOut.put(bytes);
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        long end = System.currentTimeMillis();
        System.out.println("time is:" + (end - begin));
    }

我们可以看到我mmap在代码里面调用map()方法。我们这里查看map的源码分析

public MappedByteBuffer map(MapMode var1, long var2, long var4) throws IOException {
        this.ensureOpen();
    。。。（省略部分代码）

  

                        if (var4 != 0L) {
                            var12 = (int)(var2 % allocationGranularity);
                            long var34 = var2 - (long)var12;
                            long var15 = var4 + (long)var12;

                            try {
                            // 实际调用的是调用map0方法
                                var7 = this.map0(var6, var34, var15);
                            } catch (OutOfMemoryError var30) {
                                System.gc();

                            。。。（省略部分代码）

                return (MappedByteBuffer)var10;
            }
        }
    }

JNIEXPORT jlong JNICALL
Java_sun_nio_ch_FileChannelImpl_map0(JNIEnv *env, jobject this,
                                     jint prot, jlong off, jlong len)
{
   。。。（省略部分代码）
    // 内部果然是通过mmap系统调用来实现的
    mapAddress = mmap64(
        0,                    /* Let OS decide location */
        len,                  /* Number of bytes to map */
        protections,          /* File permissions */
        flags,                /* Changes are shared */
        fd,                   /* File descriptor of mapped file */
        off);                 /* Offset into file */

    if (mapAddress == MAP_FAILED) {
        if (errno == ENOMEM) {
            JNU_ThrowOutOfMemoryError(env, "Map failed");
            return IOS_THROWN;
        }
        return handle(env, -1, "Map failed");
    }

    return ((jlong) (unsigned long) mapAddress);
}

Netty 通过CompositeByteBuf实现零拷贝之Buffer合并

在这里插入图片描述

两个真实的buffer，逻辑合并成一个CompositeByteBuf，但CompositeByteBuf并只是指向原来真实的两个buffer，而只是两个buffer逻辑上合并成的一个数组。

示例代码：

    @Test
    public void compositeTest() {
        ByteBuf buffer1 = Unpooled.buffer(3);
        buffer1.writeByte(1);
        ByteBuf buffer2 = Unpooled.buffer(3);
        buffer2.writeByte(4);
        CompositeByteBuf compositeByteBuf = Unpooled.compositeBuffer();
        CompositeByteBuf newBuffer = compositeByteBuf.addComponents(true, buffer1, buffer2);
        System.out.println(newBuffer);
    }

在这里插入图片描述
可以看出实际components 代表实际这个两个实际buffer组合而成。

Netty 通过通过slice操作实现零拷贝之Buffer拆分

用slice方法产生buffer的过程是没有拷贝操作的，两个buffer对象在内部其实是共享了byteBuf存储空间的不同部分而已。
在这里插入图片描述

示例代码：

  @Test
    public void sliceTest() {
        ByteBuf buffer1 = Unpooled.wrappedBuffer("你好123".getBytes());
        ByteBuf newBuffer = buffer1.slice(1, 2);
        //ByteBuf newBuffer = buffer1.slice();
        newBuffer.unwrap();
        System.out.println(newBuffer.toString());
    }

在这里插入图片描述

可以看出整个完整buffer大小为9，但获取到拆分之后buffer为2，拆分之后的buffer依然指向的原来的buffer

筏镜

关注

0
点赞
踩
7

收藏

觉得还不错? 一键收藏
1
评论
NIO中的内存映射和零拷贝及Netty中的零拷贝

传统IO这传统io读取数据并发送的流程图。这传统io读取数据并发送的过程中发生的上下文切换过程，与拷贝的对应过程。1、DMA模块从磁盘中读取文件内容，内核通过sys_read()（或等价的方法）从文件读取数据，并将其存储在内核空间的缓冲区内，完成了第1次复制。（系统调用read导致了从用户空间到内核空间的上下文切换。）2、数据从内核空间缓冲区复制到用户空间缓冲区，read()方法返回导致上下文从内核态切换到用户态。此时，需要的数据已存放在指定的用户空间缓冲区内(参数tmp_buf)。3、wri
复制链接

扫一扫