zero copy解析,通过sendfile分析

翻译 2013年12月02日 11:15:09

To understand the impact of sendfile, it is important to understand the common data path for transfer of data from file to socket:

  1. The operating system reads data from the disk into pagecache in kernel space
  2. The application reads the data from kernel space into a user-space buffer
  3. The application writes the data back into kernel space into a socket buffer
  4. The operating system copies the data from the socket buffer to the NIC buffer where it is sent over the network

This is clearly inefficient, there are four copies and two system calls. Using sendfile, this re-copying is avoided by allowing the OS to send the data from pagecache to the network directly. So in this optimized path, only the final copy to the NIC buffer is needed.

NDFILE(2) Linux Programmer's Manual SENDFILE(2)

NAME         top

       sendfile - transfer data between file descriptors

SYNOPSIS         top

       #include <sys/sendfile.h>

       ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

DESCRIPTION         top

       sendfile() copies data between one file descriptor and another.
       Because this copying is done within the kernel, sendfile() is more
       efficient than the combination of read(2) and write(2), which would
       require transferring data to and from user space.

       in_fd should be a file descriptor opened for reading and out_fd
       should be a descriptor opened for writing.

       If offset is not NULL, then it points to a variable holding the file
       offset from which sendfile() will start reading data from in_fd.
       When sendfile() returns, this variable will be set to the offset of
       the byte following the last byte that was read.  If offset is not
       NULL, then sendfile() does not modify the current file offset of
       in_fd; otherwise the current file offset is adjusted to reflect the
       number of bytes read from in_fd.

       If offset is NULL, then data will be read from in_fd starting at the
       current file offset, and the file offset will be updated by the call.

       count is the number of bytes to copy between the file descriptors.

       The in_fd argument must correspond to a file which supports
       mmap(2)-like operations (i.e., it cannot be a socket).

       In Linux kernels before 2.6.33, out_fd must refer to a socket.  Since
       Linux 2.6.33 it can be any file.  If it is a regular file, then
       sendfile() changes the file offset appropriately.

RETURN VALUE         top

       If the transfer was successful, the number of bytes written to out_fd
       is returned.  On error, -1 is returned, and errno is set
       appropriately.

ERRORS         top

       EAGAIN Nonblocking I/O has been selected using O_NONBLOCK and the
              write would block.

       EBADF  The input file was not opened for reading or the output file
              was not opened for writing.

       EFAULT Bad address.

       EINVAL Descriptor is not valid or locked, or an mmap(2)-like
              operation is not available for in_fd.

       EIO    Unspecified error while reading from in_fd.

       ENOMEM Insufficient memory to read from in_fd.

VERSIONS         top

       sendfile() is a new feature in Linux 2.2.  The include file
       <sys/sendfile.h> is present since glibc 2.1.

CONFORMING TO         top

       Not specified in POSIX.1-2001, or other standards.

       Other UNIX systems implement sendfile() with different semantics and
       prototypes.  It should not be used in portable programs.

NOTES         top

       If you plan to use sendfile() for sending files to a TCP socket, but
       need to send some header data in front of the file contents, you will
       find it useful to employ the TCP_CORK option, described in tcp(7), to
       minimize the number of packets and to tune performance.

       In Linux 2.4 and earlier, out_fd could also refer to a regular file,
       and sendfile() changed the current offset of that file.

       The original Linux sendfile() system call was not designed to handle
       large file offsets.  Consequently, Linux 2.4 added sendfile64(), with
       a wider type for the offset argument.  The glibc sendfile() wrapper
       function transparently deals with the kernel differences.

       Applications may wish to fall back to read(2)/write(2) in the case
       where sendfile() fails with EINVAL or ENOSYS.

       The Linux-specific splice(2) call supports transferring data between
       arbitrary files (e.g., a pair of sockets).

SEE ALSO         top

       mmap(2), open(2), socket(2), splice(2)

COLOPHON         top

       This page is part of release 3.54 of the Linux man-pages project.  A
       description of the project, and information about reporting bugs, can
       be found at http://www.kernel.org/doc/man-pages/.


相关文章推荐

Sendfile & Zero-copy

1. 典型IO调用一个典型的IO调用过程如下:read(file, tmp_buf, len); write(socket, tmp_buf, len);首先调用read将文件从磁盘读取到tmp_bu...
  • jiecxy
  • jiecxy
  • 2016年11月30日 11:44
  • 145

Zero-Copy&sendfile浅析

一、典型IO调用的问题一个典型的web服务器传送静态文件(如CSS,JS,图片等)的过程如下:read(file, tmp_buf, len);write(socket, tmp_buf, len);...

Kafka Zero-Copy 使用分析

之前有听过Zero-Copy 技术,而Kafka是典型的使用者。网上找了找,竟然没有找到合适的介绍文章。正好这段时间正在阅读Kafka的相关代码,于是有了这篇内容。这篇文章会简要介绍Zero-Copy...

linux zero copy mmap

  • 2016年03月28日 01:24
  • 53KB
  • 下载

zero copy(kafka)

  • 2017年11月16日 15:48
  • 208KB
  • 下载

zero copy

zero copy 零拷贝
  • ch1308
  • ch1308
  • 2011年03月04日 11:29
  • 501

理解Netty中的零拷贝(Zero-Copy)机制

理解零拷贝 零拷贝是Netty的重要特性之一,而究竟什么是零拷贝呢?  WIKI中对其有如下定义: “Zero-copy” describes computer operations...
  • gredn
  • gredn
  • 2017年06月26日 11:30
  • 289

理解Netty中的零拷贝(Zero-Copy)机制

理解零拷贝 零拷贝是Netty的重要特性之一,而究竟什么是零拷贝呢?  WIKI中对其有如下定义: “Zero-copy” describes computer operations...

zero copy 为什么快

转自 :http://blog.csdn.net/jiangbo_hit/article/details/6146502 一、典型IO调用的问题 一个典型的web服务器传送静态文件(如CSS,JS...

Zero Copy简介

原文出处: http://www.ibm.com/developerworks/library/j-zerocopy/ 传统的I/O 使用传统的I/O程序读取文件内容, 并写入到另一个文件...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:zero copy解析,通过sendfile分析
举报原因:
原因补充:

(最多只允许输入30个字)