socket

本文详细介绍了socket的AF与PF的区别、socket的创建与类型、raw socket的使用、socket地址结构、bind、connect、accept、select等关键操作。还提及了socket在并发连接、性能优化、网络嗅探等方面的应用,并提供了TCP/UDP服务器、syn flood攻击、包嗅探器等实践场景。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

一. AF 和 PF

    作为宏定义,两者对应的数值完全相同,区别只存在于文字上:

        AF: Address  Family

        PF: Protocol Family

    特别说明,在 Unix/Linux 系统的不同版本中,这两者存在微小差别,对于 BSD 是 AF,对于 POSIX 是 PF。

    注: 理论上,创建 socket 是指定协议,应用 PF_xxxx,而设置地址应用 AF_xxxx。但 man socket 出现的都是 AF,故建议统一用 AF。


二. socket

        int socket(int domain, int type, int protocol);

    creates an endpoint for communication and returns a descriptor.

    The domain argument specifies a communication domain; this selects the protocol family which will be used for communication. These families are defined in <sys/socket.h>. The currently understood formats include:

       Name                Purpose                          Man page
       AF_UNIX, AF_LOCAL   Local communication              unix(7)
       AF_INET             IPv4 Internet protocols          ip(7)
       AF_INET6            IPv6 Internet protocols          ipv6(7)
       AF_IPX              IPX - Novell protocols
       AF_NETLINK          Kernel user interface device     netlink(7)
       AF_X25              ITU-T X.25 / ISO-8208 protocol   x25(7)
       AF_AX25             Amateur radio AX.25 protocol
       AF_ATMPVC           Access to raw ATM PVCs
       AF_APPLETALK        Appletalk                        ddp(7)
       AF_PACKET           Low level packet interface       packet(7)

    The socket has the indicated type, which specifies the communication semantics.  Currently defined types are:

        SOCK_STREAM     Provides sequenced, reliable, two-way, connection-based byte streams.  An out-of-band data  transmission  mechanism may be supported.
        SOCK_DGRAM      Supports datagrams (connectionless, unreliable messages of a fixed maximum length).
        SOCK_SEQPACKET  Provides  a  sequenced,  reliable,  two-way  connection-based data transmission path for datagrams of fixed maximum length; a consumer is required to read an entire packet with each input system call.
        SOCK_RAW        Provides raw network protocol access.
        SOCK_RDM        Provides a reliable datagram layer that does not guarantee ordering.
        SOCK_PACKET     Obsolete and should not be used in new programs; see packet(7).
    Some socket types may not be implemented by all protocol families; for example, SOCK_SEQPACKET is not implemented for AF_INET.
    Since Linux 2.6.27, the type argument serves a second purpose: in addition to specifying a socket type, it may include the  bitwise OR of any of the following values, to modify the behavior of socket():
       SOCK_NONBLOCK   Set  the  O_NONBLOCK  file  status  flag  on  the  new open file description.  Using this flag saves extra calls to fcntl(2) to achieve the same result.
       SOCK_CLOEXEC    Set the close-on-exec (FD_CLOEXEC) flag on the new file descriptor.  See the description of the O_CLOEXEC  flag  in open(2) for reasons why this may be useful. 当子进程 exec 一个新的程序时,调用进程中打开的文件描述符仍然保持打开,但设置了执行即关 FD_CLOEXEC 的文件描述字除外。

    The protocol specifies a particular protocol to be used with the socket.  Normally only a single protocol exists to support a par‐ticular socket type within a given protocol family, in which case protocol can be specified as 0.  However,  it  is  possible  that many protocols may exist, in which case a particular protocol must be specified in this manner.  The protocol number to use is spe‐cific to the “communication domain” in which communication is to take place; see protocols(5).  See getprotoent(3) on  how  to  map protocol name strings to protocol numbers.
    Sockets of type SOCK_STREAM are full-duplex byte streams, similar to pipes(PS: PIPES are not full-duplex).  They do not preserve record boundaries.  A stream socket must be in a connected state before any data may be sent or received on it.  A connection to another socket is created  with a  connect(2)  call.   Once  connected, data may be transferred using read(2) and write(2) calls or some variant of the send(2) and recv(2) calls.  When a session has been completed a close(2) may be  performed.   Out-of-band  data  may  also  be  transmitted  as described in send(2) and received as described in recv(2).

    Q: what's the differences between read/write and send/recv? 

    Ahttp://blog.csdn.net/deng529828/article/details/6245254


三. raw socket

    利用 raw socket,可以在用户空间实现新的 IPv4 协议。raw socket 收发的报文不包含链路头,即硬件地址。

        raw_socket = socket(AF_INETSOCK_RAW, int protocol);

        Q:创建 raw socket,domain 为何应该用 AF_INET 而非 AF_PACKET ?

    A:man packet 可以知道,AF_PACKET 更底层,"at device driver level"。故 AF_PACKET 仅用于涉及到链路层的场合,例如网络嗅探等,而 raw socket 用于涉及到网络层及之上的场合,例如篡改原 IP 地址等。另外,对于 AF_PACKET,也可以通过 type 参数,控制是否获取链路头,具体参见 man packet。

            packet socket 和 raw socket 的另一个重要区别是,前者不会重组 IP 分片,而后者会:"Note that packet sockets don't reassemble IP fragments, unlike raw sockets.(man raw)"

   raw socket 最重要的选项是 IP_HDRINCL(ip header include)。该选项决定 raw socket 发送报文时,是否自动生成 IP 头。默认未enable。未 enable 时,raw socket 将自行构造 IP 头,优点是方便,但无法伪造 IP 头信息如源 IP;enable 时,则需要自行构造 IP 头。此外,该选项仅控制报文的发送,即,在用 raw socket 接收报文时,IP 头总是包含在报文中的。

    如果 protocol 是 IPPROTO_RAW(255),则已经自动使能 IP_HDRINCL,此时,该 raw socket 只能发送报文,而不能接收报文。

    man raw 中有这么一句:”In Linux 2.2, all IP header fields and options can be set using IP socket options“. 但是 man ip 看过了所有的 option,发现并不能 set 诸如源、目 IP 等字段。此外,man ip 中提到:”When this flag(IP_HDRINCL) is enabled the values set by IP_OPTIONS, IP_TTL and IP_TOS are ignored“. 所以,一旦设置了 IP_HDRINCL,只能自行填充几乎所有的 IP 头字段。

    说”几乎所有“,是因为即使 enable 了 IP_HDRINCL 选项,raw socket 也会帮你填充部分选项

              +---------------------------------------------------+
              |IP Header fields modified on sending by IP_HDRINCL |
              +----------------------+----------------------------+
              |IP Checksum           |Always filled in.           |
              +----------------------+----------------------------+
              |Source Address        |Filled in when zero.        |
              +----------------------+----------------------------+
              |Packet Id             |Filled in when zero.        |
              +----------------------+----------------------------+
              |Total Length          |Always filled in.           |
              +----------------------+----------------------------+


    不用自行计算校验和,不用自行计算总长度...,不得不说,raw socket 的确很贴心。但这也会导致无法伪造校验和、总长度,不利于分析协议代码的测试。

    所有匹配 protocol 的报文或者错误,都将先送给该 raw socket,然后才交给其它的协议挂钩:

    ”When a packet is received, it is passed to any raw sockets which have been bound to its protocol before it is passed to other protocol handlers (e.g., kernel protocol modules).“

    Raw sockets may tap all IP protocols in Linux, even protocols like ICMP or TCP which have a protocol module in the kernel. In this case, the packets are passed to both the kernel module and the raw socket(s). This should not be relied upon in portable programs, many other BSD socket implementation have limitations here.
    IP 和 TCP 头结构已经在 <linux/ip.h> 和 <linux/tcp.h> 中定义,编写 raw socket 时,无需自行定义。

        #include <linux/ip.h>

        #include <linux/tcp.h>


四. socket address structures

    Each socket domain has its own format for socket addresses, with a domain-specific address structure. Each of these structures begins with an integer "family" field (typed assa_family_t) that indicates the type of the address structure. This allows the various systemcalls eg: connect(2)bind(2)accept(2)getsockname(2)getpeername(2), which are generic to all socket domains, to determine the domain of a particular socket address.

    To allow any type of socket address to be passed to interfaces in the sockets API, the type struct sockaddr is

defined. The purpose of this type is purely to allow casting of domain-specific socket address types to a "generic" type, so as to avoid compiler warnings about type mismatches in calls to the sockets API.

        struct sockaddr {

            sa_family_t sa_family;

            char        sa_data[14];

        };

    实际中,常使用等价的结构体:

        struct sockaddr_in {

            short          sin_family;

            unsigned short sin_port;

            struct in_addr sin_addr;

            unsigned char  sin_zero[8];

        };

    然后,在调用 connect、bind 时,强制类型转换成 struct sockaddr 结构。

    说白了,就是 struct sockaddr 结构不好用,sa_data[14] 这样的字段不方便填充,所以才出来了 struct sockaddr_in。

    提到了 struct sockaddr_in,就不得不提到 struct sockaddr_un:

        struct sockaddr_un {

            sa_family_t sa_family;

            char        sun_path[108];

        };

    该结构用于 domain socket,sa_family 只能是 AF_UNIX 或者 AF_LOCAL,sun_path 不要求存在字符串结束符。

    计算 struct sockaddr_un 结构的长度一般采用如下的方式:

        size = offsetof(struct sockaddr_un, sun_path) + strlen(un.sun_path);

    offsetof 宏在 stddef.h 中定义:

        #define offsetof(TYPE, MEMBER) ((int)&((TYPE *)0)->MEMBER)

    其实现方式是,将 TYPE 类型的指针首地址设为 0,然后取 MEMBER 成员的地址就是该成员在 TYPE 中的偏移数。


五. bind

        int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);       

    When a socket is created with socket(2), it exists in a name space (address family) but has no address assigned to it. bind() assigns the address specified by addr to the socket referred to by the file descriptor sockfdaddrlen specifies the size, in bytes, of the address structure pointed to by addr. Traditionally, this operation is called "assigning a name to a socket".

    在 bind 时,如无特别需要,可以使用 INADDR_ANY,表示绑定本地所有地址。由于 INADDR_ANY 为 0.0.0.0,在将 struct sockaddr_in 结构清 0 后不填充 IP 字段,效果也一样。此外,sin_port 和 sin_addr 都必须是网络序。


六. connect

        int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);       

    The connect() system call connects the socket referred to by the file descriptor sockfd to the address specified by addr. The addrlen argument specifies the size of addr. The format of the address in addr is determined by the address space of the socket sockfd; seesocket(2) for further details.

    If the socket sockfd is of type SOCK_DGRAM then addr is the address to which datagrams are sent by default, and the only address from which datagrams are received. If the socket is of type SOCK_STREAM or SOCK_SEQPACKET, this call attempts to make a connection to the socket that is bound to the address specified by addr.       

    Generally, connection-based protocol sockets may successfully connect() only once; connectionless protocol sockets may use connect() multiple times to change their association. Connectionless sockets may dissolve the association by connecting to an address with the sa_family member of sockaddr set to AF_UNSPEC (supported on Linux since kernel 2.2).


七. listen

        int listen(int sockfd, int backlog);

   listen() marks the socket referred to by sockfd as a passive socket, that is, as a socket that will be used to accept incoming connection requests using accept(2).

    The sockfd argument is a file descriptor that refers to a socket of type SOCK_STREAM orSOCK_SEQPACKET.

    The backlog argument defines the maximum length to which the queue of pending connections for sockfd may grow. If a connection request arrives when the queue is full, the client may receive an error with an indication of ECONNREFUSED or, if the underlying protocol supports retransmission, the request may be ignored so that a later reattempt at connection succeeds.

   The behavior of the backlog argument on TCP sockets changed with Linux 2.2.

    Now it specifies the queue length for completely established sockets waiting to be accepted, instead of the number of incomplete connection requests. The maximum length of the queue for incomplete sockets can be set using proc/sys/net/ipv4/tcp_max_syn_backlog. When syncookies are enabled there is no logical maximum length and this setting is ignored.

    在 Linux 2.2 之后,backlog 参数的行为有所变化。现在指的是连接已经建立(3 次握手已完成)、但尚未被 accept 的队列长度。而 SYN 队列的长度定义在 /proc/sys/net/ipv4/tcp_max_syn_backlog 中。而当 syncookies 开启后,该队列长度就无逻辑上限, tcp_max_syn_backlog 中的设置也会被忽略。

    开启 syncookies 的方法 http://lijichao.blog.51cto.com/67487/308509

    See tcp(7) for more information.

    If the backlog argument is greater than the value in proc/sys/net/core/somaxconn, then it is silently truncated to that value; the default value in this file is 128. In kernels before 2.4.25, this limit was a hard coded value, SOMAXCONN, with the value 128.

    proc/sys/net/core/somaxconn 是 listen 函数 backlog 参数的上限,为 128。当 backlog 的值超过该设置时,以该设置为准。

    在实际使用中,backlog 参数设置成 128 就行了。


八. accept

        int accept(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

    The accept() system call is used with connection-based socket types (SOCK_STREAMSOCK_SEQPACKET). It extracts the first connection request on the queue of pending connections for the listening socket, sockfdcreates a new connected socket, and returns a new file descriptor referring to that socket. The newly created socket is not in the listening state. The original socket sockfd is unaffected by this call.

    The argument sockfd is a socket that has been created with socket(2), bound to a local address with bind(2), and is listening for connections after a listen(2).

    The argument addr is a pointer to a sockaddr structure. This structure is filled in with the address of the peer socket, as known to the communications layer. The exact format of the address returned addr is determined by the socket's address family (seesocket(2) and the respective protocol man pages). When addr is NULL, nothing is filled in; in this case, addrlen is not used, and should also be NULL.

    The addrlen argument is a value-result argument: the caller must initialize it to contain the size (in bytes) of the structure pointed to by addr; on return it will contain the actual size of the peer address.

    The returned address is truncated if the buffer provided is too small; in this case, addrlen will return a value greater than was supplied to the call.       

    If no pending connections are present on the queue, and the socket is not marked as nonblocking, accept() blocks the caller until a connection is present. If the socket is marked nonblocking and no pending connections are present on the queue, accept() fails with the error EAGAIN or EWOULDBLOCK.

    In order to be notified of incoming connections on a socket, you can use select(2) or poll(2). A readable event will be delivered when a new connection is attempted and you may then call accept() to get a socket for that connection. Alternatively, you can set the socket to deliver SIGIO when activity occurs on a socket; see socket(7) for details.

    On Linux, the new socket returned by accept() does not inherit file status flags such as O_NONBLOCK and O_ASYNC from the listening socket. This behavior differs from the canonical BSD sockets implementation. Portable programs should not rely on inheritance or noninheritance of file status flags and always explicitly set all requiredflags on the socket returned from accept().


九. select

        int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

        int pselect(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, const struct timespec                               *timeout, const sigset_t *sigmask);

        void FD_CLR(int fd, fd_set *set);

        int FD_ISSET(int fd, fd_set *set);

        void FD_SET(int fd, fd_set *set);

        void FD_ZERO(fd_set *set);

    select() and pselect() allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible). A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g.,read(2)) without blocking.

    Three independent sets of file descriptors are watched. readfds is watched to see if data is available for reading from any of its file descriptors. After select() has returned, readfds will be cleared of all file descriptors except for those that are immediately available for reading. The same are writefds and exceptfds.

    On exit, the sets are modified in place to indicate which file descriptors actually changed status. Each of the three file descriptor sets may be specified as NULL if no file descriptors are to be watched for the corresponding class of events. So, after select() returns, all file descriptors in all sets should be checked to see if they are ready.

    The functions read(2),recv(2), write(2), and send(2) as well as the select() call can return -1 with errno set to EINTR, or witherrno set to EAGAIN (EWOULDBLOCK). These results must be properly managed (not done properly above). 

    so, code must be write like this:

        r = select(nfds + 1, &rd, &wr, &er, NULL);

        if (r == -1 && errno == EINTR)

            continue;

        if (r == -1) {

            perror("select()");

            exit(EXIT_FAILURE);

        }

    If the functions read(2),recv(2), write(2), and send(2) fail with errors other than those listed above, or one of the input functions returns 0, indicating end of file, then you should not pass that descriptor to select() again.

    Four macros are provided to manipulate the sets. FD_ZERO() clears a set. FD_SET() and FD_CLR() respectively add and remove a given file descriptor from a set. FD_ISSET() tests to see if a file descriptor is part of the set; this is useful after select() returns.

    nfds is the highest-numbered file descriptor in any of the three sets, plus 1.

    The timeout argument specifies the minimum interval that select() should block waiting for a file descriptor to become ready. (This interval will be rounded up to the system clock granularity, and kernel scheduling delays mean that the blocking interval may overrun by a small amount.) If both fields of the timeval structure are zero, thenselect() returns immediately. (This is useful for polling.) If timeout is NULL (no timeout), select() can block indefinitely.

    sigmask is a pointer to a signal mask (see sigprocmask(2)); if it is not NULL, then pselect() first replaces the current signal mask by the one pointed to by sigmask, then does the "select" function, and then restores the original signal mask.

    The operation of select() and pselect() is identical, other than these three differences:

    (i)select() uses a timeout that is a struct timeval (with seconds and microseconds), while pselect() uses a struct timespec (with seconds and nanoseconds).

    (ii)select() may update the timeout argument to indicate how much time was left. pselect() does not change this argument.

    (iii)select() has no sigmask argument, and behaves as pselect() called with NULL sigmask.

    Other than the difference in the precision of the timeout argument, the following pselect() call:

        ready = pselect(nfds, &readfds, &writefds, &exceptfds, timeout, &sigmask);

    is equivalent to atomically executing the following calls:

        sigset_t origmask;

        pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);

        ready = select(nfds, &readfds, &writefds, &exceptfds, timeout); 

        pthread_sigmask(SIG_SETMASK, &origmask, NULL);

    The reason that pselect() is needed is that if one wants to wait for either a signal or for a file descriptor to become ready, then an atomic test is needed to prevent race conditions. (Suppose the signal handler sets a global flag and returns. Then a test of this global flag followed by a call ofselect() could hang indefinitely if the signal arrived just after the test but just before the call. By contrast, pselect() allows one to first block signals, handle the signals that have come in, then call pselect() with the desired sigmask, avoiding the race.)




job

1. write a TCP server, support simultaneous connections

2. write a TCP server use epoll

3. write a UDP server, focus on performance

4. UDP client + connect, send datagram to this client from another server

5. raw socket, write a synflooder

6. packet socket, write a sniffer

7. domain socket

Select在Socket编程中还是比较重要的,可是对于初学Socket的人来说都不太爱用Select写程序,他们只是习惯写诸如connect、accept、recv或recvfrom这样的阻塞程序(所谓阻塞方式block,顾名思义,就是进程或是线程执行到这些函数时必须等待某个事件的发生,如果事件没有发生,进程或线程就被阻塞,函数不能立即返回)。可是使用Select就可以完成非阻塞(所谓非阻塞方式non-block,就是进程或线程执行此函数时不必非要等待事件的发生,一旦执行肯定返回,以返回值的不同来反映函数的执行情况,如果事件发生则与阻塞方式相同,若事件没有发生则返回一个代码来告知事件未发生,而进程或线程继续执行,所以效率较高)方式工作的程序,它能够监视我们需要监视的文件描述符的变化情况——读写或是异常。下面详细介绍一下! Select的函数格式(我所说的是Unix系统下的伯克利socket编程,和windows下的有区别,一会儿说明): int select(int maxfdp,fd_set *readfds,fd_set *writefds,fd_set *errorfds,struct timeval *timeout); 先说明两个结构体: 第一,struct fd_set可以理解为一个集合,这个集合中存放的是文件描述符(file descriptor),即文件句柄,这可以是我们所说的普通意义的文件,当然Unix下任何设备、管道、FIFO等都是文件形式,全部包括在内,所以毫无疑问一个socket就是一个文件,socket句柄就是一个文件描述符。fd_set集合可以通过一些宏由人为来操作,比如清空集合FD_ZERO(fd_set *),将一个给定的文件描述符加入集合之中FD_SET(int ,fd_set *),将一个给定的文件描述符从集合中删除FD_CLR(int ,fd_set*),检查集合中指定的文件描述符是否可以读写FD_ISSET(int ,fd_set* )。一会儿举例说明。 第二,struct timeval是一个大家常用的结构,用来代表时间值,有两个成员,一个是秒数,另一个是毫秒数。 具体解释select的参数: int maxfdp是一个整数值,是指集合中所有文件描述符的范围,即所有文件描述符的最大值加1,不能错!在Windows中这个参数的值无所谓,可以设置不正确。 fd_set *readfds是指向fd_set结构的指针,这个集合中应该包括文件描述符,我们是要监视这些文件描述符的读变化的,即我们关心是否可以从这些文件中读取数据了,如果这个集合中有一个文件可读,select就会返回一个大于0的值,表示有文件可读,如果没有可读的文件,则根据timeout参数再判断是否超时,若超出timeout的时间,select返回0,若发生错误返回负值。可以传入NULL值,表示不关心任何文件的读变化。 fd_set *writefds是指向fd_set结构的指针,这个集合中应该包括文件描述符,我们是要监视这些文件描述符的写变化的,即我们关心是否可以向这些文件中写入数据了,如果这个集合中有一个文件可写,select就会返回一个大于0的值,表示有文件可写,如果没有可写的文件,则根据timeout参数再判断是否超时,若超出timeout的时间,select返回0,若发生错误返回负值。可以传入NULL值,表示不关心任何文件的写变化。 fd_set *errorfds同上面两个参数的意图,用来监视文件错误异常。 struct timeval* timeout是select的超时时间,这个参数至关重要,它可以使select处于三种状态,第一,若将NULL以形参传入,即不传入时间结构,就是将select置于阻塞状态,一定等到监视文件描述符集合中某个文件描述符发生变化为止;第二,若将时间值设为0秒0毫秒,就变成一个纯粹的非阻塞函数,不管文件描述符是否有变化,都立刻返回继续执行,文件无变化返回0,有变化返回一个正值;第三,timeout的值大于0,这就是等待的超时时间,即select在timeout时间内阻塞,超时时间之内有事件到来就返回了,否则在超时后不管怎样一定返回,返回值同上述。 返回值: 负值:select错误 正值:某些文件可读写或出错 0:等待超时,没有可读写或错误的文件
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值