Kafka_Kafka中的Zero Copy

1.Kafka “高吞吐” 之顺序访问与零拷贝

https://cloud.tencent.com/developer/article/1476649

2.kafka通过零拷贝实现高效的数据传输

https://blog.csdn.net/lxlmycsdnfree/article/details/78973864

3.Kafka的零拷贝技术

https://www.jianshu.com/p/835ec2d4c170

4.什么是“零拷贝”技术

https://baijiahao.baidu.com/s?id=1648595456047501430&wfr=spider&for=pc

  Kafka在数据传输的时候,使用了零拷贝技术,这样的技术大大提升了Kafka 的吞吐率。来研究下 Kafka中的零拷贝是如何实现的。

 

普通的数据传输实现

   许多Web应用程序都提供了大量的静态内容,这相当于从磁盘读取数据并将完全相同的数据写回到响应socket。这个活动可能似乎只需要相对较少的CPU活动,但是效率有些低下:内核从磁盘读取数据,并将其从内核用户边界推送到应用程序,然后应用程序将其推回到内核用户边界写出来的socket。实际上,应用程序作为一个低效的媒介,从磁盘文件获取数据到socket。

图示如下:

 代码如下:

File.read(fileDesc, buf, len);

Socket.send(socket, buf, len);

  复制操作需要在用户模式和内核模式之间进行四次上下文切换,并且在操作完成之前将数据复制 四次。上图显示了数据如何从文件内部移动到套接字:

 

 

Kafka 的零拷贝

    内核使用零拷贝的应用程序要求内核直接将数据从磁盘文件复制到套接字,而不通过应用程序。零拷贝大大提高了应用程序的性能,减少了内核和用户模式之间的上下文切换次数。 这样的话只需要 两次 数据复制。

图示如下:

     

在Java 的实现是通过  java.nio.channels.FileChannel  的 transfer 实现的 ,看下具体的实现。

其中FileChannel 是一个抽象类

    /**
     * Transfers bytes from this channel's file to the given writable byte
     * channel.
     *
     * <p> An attempt is made to read up to <tt>count</tt> bytes starting at
     * the given <tt>position</tt> in this channel's file and write them to the
     * target channel.  An invocation of this method may or may not transfer
     * all of the requested bytes; whether or not it does so depends upon the
     * natures and states of the channels.  Fewer than the requested number of
     * bytes are transferred if this channel's file contains fewer than
     * <tt>count</tt> bytes starting at the given <tt>position</tt>, or if the
     * target channel is non-blocking and it has fewer than <tt>count</tt>
     * bytes free in its output buffer.
     *
     * <p> This method does not modify this channel's position.  If the given
     * position is greater than the file's current size then no bytes are
     * transferred.  If the target channel has a position then bytes are
     * written starting at that position and then the position is incremented
     * by the number of bytes written.
     *
     * <p> This method is potentially much more efficient than a simple loop
     * that reads from this channel and writes to the target channel.  Many
     * operating systems can transfer bytes directly from the filesystem cache
     * to the target channel without actually copying them.  </p>
     *
     * @param  position
     *         The position within the file at which the transfer is to begin;
     *         must be non-negative
     *
     * @param  count
     *         The maximum number of bytes to be transferred; must be
     *         non-negative
     *
     * @param  target
     *         The target channel
     *
     * @return  The number of bytes, possibly zero,
     *          that were actually transferred
     *
     * @throws IllegalArgumentException
     *         If the preconditions on the parameters do not hold
     *
     * @throws  NonReadableChannelException
     *          If this channel was not opened for reading
     *
     * @throws  NonWritableChannelException
     *          If the target channel was not opened for writing
     *
     * @throws  ClosedChannelException
     *          If either this channel or the target channel is closed
     *
     * @throws  AsynchronousCloseException
     *          If another thread closes either channel
     *          while the transfer is in progress
     *
     * @throws  ClosedByInterruptException
     *          If another thread interrupts the current thread while the
     *          transfer is in progress, thereby closing both channels and
     *          setting the current thread's interrupt status
     *
     * @throws  IOException
     *          If some other I/O error occurs
     */
    public abstract long transferTo(long position, long count,
                                    WritableByteChannel target)
        throws IOException;

This method is potentially much more efficient than a simple loop  that reads from this channel and writes to the target channel.  Many operating systems can transfer bytes directly from the filesystem cache  to the target channel without actually copying them.  

看下具体的实现类 ctrl + alt + b (idea)   sun.nio.ch.FileChannelImpl

 public long transferTo(long var1, long var3, WritableByteChannel var5) throws IOException {
        this.ensureOpen();
        if(!var5.isOpen()) {
            throw new ClosedChannelException();
        } else if(!this.readable) {
            throw new NonReadableChannelException();
        } else if(var5 instanceof FileChannelImpl && !((FileChannelImpl)var5).writable) {
            throw new NonWritableChannelException();
        } else if(var1 >= 0L && var3 >= 0L) {
            long var6 = this.size();
            if(var1 > var6) {
                return 0L;
            } else {
                int var8 = (int)Math.min(var3, 2147483647L);
                if(var6 - var1 < (long)var8) {
                    var8 = (int)(var6 - var1);
                }

                long var9;
                return (var9 = this.transferToDirectly(var1, var8, var5)) >= 0L?var9:((var9 = this.transferToTrustedChannel(var1, (long)var8, var5)) >= 0L?var9:this.transferToArbitraryChannel(var1, var8, var5));
            }
        } else {
            throw new IllegalArgumentException();
        }

最后追踪到了  sun.nio.ch.FileChannelImpl 的如下方法 :

private native long transferTo0(int var1, long var2, long var4, int var6);

 

sendFile

其底层在Linux 中是调用了 sendFile 函数

ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count)
  • in_fd被打开是等待读数据的fd.
  • out_fd被打开是等待写数据的fd.
  • Offset是在正式开始读取数据之前应该向前偏移的byte数.
  • count是需要在两个fd之间“搬移”的数据的byte数.

  sendFile系统调用零拷贝就是避免了上下文切换带来的copy操作,同时利用直接存储器访问技术(DMA)执行IO操作,避免了内核缓冲区之前的数据拷贝操作。

 

总结与分析

“零拷贝技术”只用将磁盘文件的数据复制到页面缓存中一次,然后将数据从页面缓存直接发送到网络中(发送给不同的订阅者时,都可以使用同一个页面缓存),避免了重复复制操作。

如果有10个消费者,传统方式下,数据复制次数为4*10=40次,而使用“零拷贝技术”只需要1+10=11次,一次为从磁盘复制到页面缓存,10次表示10个消费者各自读取一次页面缓存。


 

 

 

 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值