Linux 内核源码分析---传输层(SCTP&DCCP)

流控制传输协议(SCTP)

SCTP(Stream Control Transmission Protocol,流控制传输协议)是在2007年发布的 RFC 4960 中定义的,但它首次被定义则是在2000 年。SCTP 设计用于通过 IP 网络传输公共交换电话网络(Public Switched Telephone Network,PSTN)信令。

TCP 为了保证数据传输的可靠性,需要严格要求数据传输的顺序。比如一个数据包被分拆成了三份分别标号为A,B,C。那么对于接收方来说,则必须先接受数据包 A,然后是 B 和 C。如果先接受到的B,那么接收方就会需要让发送发重发数据包。
在这种严格要求数据包顺序的情况下,可能会造成不必要的数据延迟和消息阻塞。

因为TCP是面向数据流的,为了标记数据流中的不同记录,TCP 中的数据需要额外添加一些标记或者编码来对记录进行区分

另外,为了提升传输效率,避免发送多个小数据包的情况,TCP 还可能会对其进行优化,也就是说等等多个小的数据包将其合并为一个大的数据包。如果不希望这样的优化,那么需要在 TCP 数据包中设置 PSH 标志,明确该请求是无延迟的传输请求。

最后TCP还容易收到DOS(denial-of-service)攻击

SCTP 协议是对 TCP 和 UDP 协议的提升,具体而言 SCTP 协议既提供了 UDP 协议的面向消息的特性,同时又具有 TCP 协议的可靠性、顺序传输和拥塞控制的功能,并且还提供了多宿主和冗余路径的功能,从而提高弹性和可靠性。

SCTP主要有两大特点:
(1)Message-based,也就是说 SCTP是面向消息的。SCTP传输的是一系列的消息,一个消息是一组字节。
(2)Multi-streaming,多流指的是 SCTP 能够并行传输多个独立的数据流

对 TCP 协议来说,客户端和服务器端都只有一个,属于一对一进行连接的情况,如果这个连接两端的而任何一个IP或者端口不可以,那么整个TCP的连接就崩溃了。

SCT P就是 TCP 协议的升级版本,它在增强可靠性方面做了优化。具体而言,每个SCTP的节点都会使用心跳的机制定时检查远程节点的主IP地址和备用的冗余IP地址的可达性。在 SCTP 中一个节点可以绑定多个 IP 地址

心跳机制通过交换 SCTP 数据包 HEARTBEAT 和 HEARTBEAT-ACK 来检测路径的连接性。到达无返回心跳确认阈值后,它将宣布IP地址失效,默认每隔 30 秒将发送一个 HEARTBEAT 块,用来对空闲的目标传输地址进行监视。如果要配置这个时间间隔,可设置 /proc/sys/net/sctp/hb interval,默认值为30000(30秒)。

SCTP具有四次握手:SCTP 在收到客户端的连接请求之后,并不会立即分配内存缓存起来,而是返回一个 COOKIE 给客户端。客户端再次请求的时候,需要带上这个COOKIE信息,服务器端通过COOKIE校验,确认客户端的身份之后,才会最终建立连接。从而避免TCP的SYN 攻击。

SCTP 初始化操作,方法 sctp_init() 可为各种结构分配内存,并在 IPv4 和 IPv6 中注册 SCTP。

在这里插入图片描述
在这里插入图片描述

1、SCTP数据包和数据块
每个 SCTP 数据包都有一个通用的 SCTP 报头,后面紧跟着一个或多个块。块包含数据或 SCTP 控制信息。
在这里插入图片描述

/* RFC2960 1.4 Key Terms
 *
 * o Chunk: A unit of information within an SCTP packet, consisting of
 * a chunk header and chunk-specific content.
 *
 * As a matter of convenience, we remember the SCTP common header for
 * each chunk as well as a few other header pointers...
 */
struct sctp_chunk {
	struct list_head list;

	atomic_t refcnt;

	/* How many times this chunk have been sent, for prsctp RTX policy */
	int sent_count;

	/* This is our link to the per-transport transmitted list.  */
	struct list_head transmitted_list;

	/* This field is used by chunks that hold fragmented data.
	 * For the first fragment this is the list that holds the rest of
	 * fragments. For the remaining fragments, this is the link to the
	 * frag_list maintained in the first fragment.
	 */
	struct list_head frag_list;

	/* This points to the sk_buff containing the actual data.  */
	struct sk_buff *skb;

	/* In case of GSO packets, this will store the head one */
	struct sk_buff *head_skb;

	/* These are the SCTP headers by reverse order in a packet.
	 * Note that some of these may happen more than once.  In that
	 * case, we point at the "current" one, whatever that means
	 * for that level of header.
	 */

	/* We point this at the FIRST TLV parameter to chunk_hdr.  */
	union sctp_params param_hdr;
	union {
		__u8 *v;
		struct sctp_datahdr *data_hdr;
		struct sctp_inithdr *init_hdr;
		struct sctp_sackhdr *sack_hdr;
		struct sctp_heartbeathdr *hb_hdr;
		struct sctp_sender_hb_info *hbs_hdr;
		struct sctp_shutdownhdr *shutdown_hdr;
		struct sctp_signed_cookie *cookie_hdr;
		struct sctp_ecnehdr *ecne_hdr;
		struct sctp_cwrhdr *ecn_cwr_hdr;
		struct sctp_errhdr *err_hdr;
		struct sctp_addiphdr *addip_hdr;
		struct sctp_fwdtsn_hdr *fwdtsn_hdr;
		struct sctp_authhdr *auth_hdr;
	} subh;

	__u8 *chunk_end;

	struct sctp_chunkhdr *chunk_hdr;
	struct sctphdr *sctp_hdr;

	/* This needs to be recoverable for SCTP_SEND_FAILED events. */
	struct sctp_sndrcvinfo sinfo;

	/* Which association does this belong to?  */
	struct sctp_association *asoc;

	/* What endpoint received this chunk? */
	struct sctp_ep_common *rcvr;

	/* We fill this in if we are calculating RTT. */
	unsigned long sent_at;

	/* What is the origin IP address for this chunk?  */
	union sctp_addr source;
	/* Destination address for this chunk. */
	union sctp_addr dest;

	/* For outbound message, track all fragments for SEND_FAILED. */
	struct sctp_datamsg *msg;

	/* For an inbound chunk, this tells us where it came from.
	 * For an outbound chunk, it tells us where we'd like it to
	 * go.	It is NULL if we have no preference.
	 */
	struct sctp_transport *transport;

	/* SCTP-AUTH:  For the special case inbound processing of COOKIE-ECHO
	 * we need save a pointer to the AUTH chunk, since the SCTP-AUTH
	 * spec violates the principle premis that all chunks are processed
	 * in order.
	 */
	struct sk_buff *auth_chunk;

#define SCTP_CAN_FRTX 0x0
#define SCTP_NEED_FRTX 0x1
#define SCTP_DONT_FRTX 0x2
	__u16	rtt_in_progress:1,	/* This chunk used for RTT calc? */
		has_tsn:1,		/* Does this chunk have a TSN yet? */
		has_ssn:1,		/* Does this chunk have a SSN yet? */
		singleton:1,		/* Only chunk in the packet? */
		end_of_packet:1,	/* Last chunk in the packet? */
		ecn_ce_done:1,		/* Have we processed the ECN CE bit? */
		pdiscard:1,		/* Discard the whole packet now? */
		tsn_gap_acked:1,	/* Is this chunk acked by a GAP ACK? */
		data_accepted:1,	/* At least 1 chunk accepted */
		auth:1,			/* IN: was auth'ed | OUT: needs auth */
		has_asconf:1,		/* IN: have seen an asconf before */
		tsn_missing_report:2,	/* Data chunk missing counter. */
		fast_retransmit:2;	/* Is this chunk fast retransmitted? */
};

2、SCTP关联
SCTP 关联而不是连接,连接指的是两个 IP 地址之间的通信,而关联指的是两个端点之间的通信,端点可能有多个IP地址,内核源码将 SCTP 关联由结构 sctp_association 表示如下:

/* RFC2960
 *
 * 12. Recommended Transmission Control Block (TCB) Parameters
 *
 * This section details a recommended set of parameters that should
 * be contained within the TCB for an implementation. This section is
 * for illustrative purposes and should not be deemed as requirements
 * on an implementation or as an exhaustive list of all parameters
 * inside an SCTP TCB. Each implementation may need its own additional
 * parameters for optimization.
 */


/* Here we have information about each individual association. */
struct sctp_association {

	/* A base structure common to endpoint and association.
	 * In this context, it represents the associations's view
	 * of the local endpoint of the association.
	 */
	struct sctp_ep_common base;

	/* Associations on the same socket. */
	struct list_head asocs;

	/* association id. */
	// 关联的唯一 ID
	sctp_assoc_t assoc_id;

	/* This is our parent endpoint.	 */
	struct sctp_endpoint *ep;

	/* These are those association elements needed in the cookie.  */
	// cookie 需要的关联元素,与关联状态 cookie 有关(sctp_cookie对象)
	struct sctp_cookie c;

	/* This is all information about our peer.  */
	// 有关对等体的所有信息
	struct {
		/* transport_addr_list
		 *
		 * Peer	       : A list of SCTP transport addresses that the
		 * Transport   : peer is bound to. This information is derived
		 * Address     : from the INIT or INIT ACK and is used to
		 * List	       : associate an inbound packet with a given
		 *	       : association. Normally this information is
		 *	       : hashed or keyed for quick lookup and access
		 *	       : of the TCB.
		 *	       : The list is also initialized with the list
		 *	       : of addresses passed with the sctp_connectx()
		 *	       : call.
		 *
		 * It is a list of SCTP_transport's.
		 */
		struct list_head transport_addr_list;

		/* rwnd
		 *
		 * Peer Rwnd   : Current calculated value of the peer's rwnd.
		 */
		__u32 rwnd;

		/* transport_count
		 *
		 * Peer        : A count of the number of peer addresses
		 * Transport   : in the Peer Transport Address List.
		 * Address     :
		 * Count       :
		 */
		__u16 transport_count;

		/* port
		 *   The transport layer port number.
		 */
		__u16 port;

		/* primary_path
		 *
		 * Primary     : This is the current primary destination
		 * Path	       : transport address of the peer endpoint.  It
		 *	       : may also specify a source transport address
		 *	       : on this endpoint.
		 *
		 * All of these paths live on transport_addr_list.
		 *
		 * At the bakeoffs, we discovered that the intent of
		 * primaryPath is that it only changes when the ULP
		 * asks to have it changed.  We add the activePath to
		 * designate the connection we are currently using to
		 * transmit new data and most control chunks.
		 */
		struct sctp_transport *primary_path;	// 表示建立初始连接所用的地址

		/* Cache the primary path address here, when we
		 * need a an address for msg_name.
		 */
		union sctp_addr primary_addr;

		/* active_path
		 *   The path that we are currently using to
		 *   transmit new data and most control chunks.
		 */
		struct sctp_transport *active_path;	// 当前发送数据时使用的对等体地址

		/* retran_path
		 *
		 * RFC2960 6.4 Multi-homed SCTP Endpoints
		 * ...
		 * Furthermore, when its peer is multi-homed, an
		 * endpoint SHOULD try to retransmit a chunk to an
		 * active destination transport address that is
		 * different from the last destination address to
		 * which the DATA chunk was sent.
		 */
		struct sctp_transport *retran_path;

		/* Pointer to last transport I have sent on.  */
		struct sctp_transport *last_sent_to;

		/* This is the last transport I have received DATA on.	*/
		struct sctp_transport *last_data_from;

		/*
		 * Mapping  An array of bits or bytes indicating which out of
		 * Array    order TSN's have been received (relative to the
		 *	    Last Rcvd TSN). If no gaps exist, i.e. no out of
		 *	    order packets have been received, this array
		 *	    will be set to all zero. This structure may be
		 *	    in the form of a circular buffer or bit array.
		 *
		 * Last Rcvd   : This is the last TSN received in
		 * TSN	       : sequence. This value is set initially by
		 *	       : taking the peer's Initial TSN, received in
		 *	       : the INIT or INIT ACK chunk, and subtracting
		 *	       : one from it.
		 *
		 * Throughout most of the specification this is called the
		 * "Cumulative TSN ACK Point".	In this case, we
		 * ignore the advice in 12.2 in favour of the term
		 * used in the bulk of the text.  This value is hidden
		 * in tsn_map--we get it by calling sctp_tsnmap_get_ctsn().
		 */
		struct sctp_tsnmap tsn_map;

		/* This mask is used to disable sending the ASCONF chunk
		 * with specified parameter to peer.
		 */
		__be16 addip_disabled_mask;

		/* These are capabilities which our peer advertised.  */
		__u8	ecn_capable:1,      /* Can peer do ECN? */
			ipv4_address:1,     /* Peer understands IPv4 addresses? */
			ipv6_address:1,     /* Peer understands IPv6 addresses? */
			hostname_address:1, /* Peer understands DNS addresses? */
			asconf_capable:1,   /* Does peer support ADDIP? */
			prsctp_capable:1,   /* Can peer do PR-SCTP? */
			reconf_capable:1,   /* Can peer do RE-CONFIG? */
			auth_capable:1;     /* Is peer doing SCTP-AUTH? */

		/* sack_needed : This flag indicates if the next received
		 *             : packet is to be responded to with a
		 *             : SACK. This is initialized to 0.  When a packet
		 *             : is received sack_cnt is incremented. If this value
		 *             : reaches 2 or more, a SACK is sent and the
		 *             : value is reset to 0. Note: This is used only
		 *             : when no DATA chunks are received out of
		 *             : order.  When DATA chunks are out of order,
		 *             : SACK's are not delayed (see Section 6).
		 */
		__u8    sack_needed:1,     /* Do we need to sack the peer? */
			sack_generation:1,
			zero_window_announced:1;
		__u32	sack_cnt;

		__u32   adaptation_ind;	 /* Adaptation Code point. */

		struct sctp_inithdr_host i;
		void *cookie;
		int cookie_len;

		/* ADDIP Section 4.2 Upon reception of an ASCONF Chunk.
		 * C1) ... "Peer-Serial-Number'. This value MUST be initialized to the
		 * Initial TSN Value minus 1
		 */
		__u32 addip_serial;

		/* SCTP-AUTH: We need to know pears random number, hmac list
		 * and authenticated chunk list.  All that is part of the
		 * cookie and these are just pointers to those locations
		 */
		sctp_random_param_t *peer_random;
		sctp_chunks_param_t *peer_chunks;
		sctp_hmac_algo_param_t *peer_hmacs;
	} peer;	// 是一个内部结构,表示关联的对等端点。

	/* State       : A state variable indicating what state the
	 *	       : association is in, i.e. COOKIE-WAIT,
	 *	       : COOKIE-ECHOED, ESTABLISHED, SHUTDOWN-PENDING,
	 *	       : SHUTDOWN-SENT, SHUTDOWN-RECEIVED, SHUTDOWN-ACK-SENT.
	 *
	 *		Note: No "CLOSED" state is illustrated since if a
	 *		association is "CLOSED" its TCB SHOULD be removed.
	 *
	 *		In this implementation we DO have a CLOSED
	 *		state which is used during initiation and shutdown.
	 *
	 *		State takes values from SCTP_STATE_*.
	 */
	sctp_state_t state;

	/* Overall     : The overall association error count.
	 * Error Count : [Clear this any time I get something.]
	 */
	int overall_error_count;

	/* The cookie life I award for any cookie.  */
	ktime_t cookie_life;

	/* These are the association's initial, max, and min RTO values.
	 * These values will be initialized by system defaults, but can
	 * be modified via the SCTP_RTOINFO socket option.
	 */
	unsigned long rto_initial;
	unsigned long rto_max;
	unsigned long rto_min;

	/* Maximum number of new data packets that can be sent in a burst.  */
	int max_burst;

	/* This is the max_retrans value for the association.  This value will
	 * be initialized initialized from system defaults, but can be
	 * modified by the SCTP_ASSOCINFO socket option.
	 */
	int max_retrans;

	/* This is the partially failed retrans value for the transport
	 * and will be initialized from the assocs value.  This can be
	 * changed using the SCTP_PEER_ADDR_THLDS socket option
	 */
	int pf_retrans;

	/* Maximum number of times the endpoint will retransmit INIT  */
	__u16 max_init_attempts;

	/* How many times have we resent an INIT? */
	__u16 init_retries;

	/* The largest timeout or RTO value to use in attempting an INIT */
	unsigned long max_init_timeo;

	/* Heartbeat interval: The endpoint sends out a Heartbeat chunk to
	 * the destination address every heartbeat interval. This value
	 * will be inherited by all new transports.
	 */
	unsigned long hbinterval;

	/* This is the max_retrans value for new transports in the
	 * association.
	 */
	__u16 pathmaxrxt;

	/* Flag that path mtu update is pending */
	__u8   pmtu_pending;

	/* Association : The smallest PMTU discovered for all of the
	 * PMTU	       : peer's transport addresses.
	 */
	__u32 pathmtu;

	/* Flags controlling Heartbeat, SACK delay, and Path MTU Discovery. */
	__u32 param_flags;

	__u32 sackfreq;
	/* SACK delay timeout */
	unsigned long sackdelay;

	unsigned long timeouts[SCTP_NUM_TIMEOUT_TYPES];
	struct timer_list timers[SCTP_NUM_TIMEOUT_TYPES];

	/* Transport to which SHUTDOWN chunk was last sent.  */
	struct sctp_transport *shutdown_last_sent_to;

	/* Transport to which INIT chunk was last sent.  */
	struct sctp_transport *init_last_sent_to;

	/* How many times have we resent a SHUTDOWN */
	int shutdown_retries;

	/* Next TSN    : The next TSN number to be assigned to a new
	 *	       : DATA chunk.  This is sent in the INIT or INIT
	 *	       : ACK chunk to the peer and incremented each
	 *	       : time a DATA chunk is assigned a TSN
	 *	       : (normally just prior to transmit or during
	 *	       : fragmentation).
	 */
	__u32 next_tsn;

	/*
	 * Last Rcvd   : This is the last TSN received in sequence.  This value
	 * TSN	       : is set initially by taking the peer's Initial TSN,
	 *	       : received in the INIT or INIT ACK chunk, and
	 *	       : subtracting one from it.
	 *
	 * Most of RFC 2960 refers to this as the Cumulative TSN Ack Point.
	 */

	__u32 ctsn_ack_point;

	/* PR-SCTP Advanced.Peer.Ack.Point */
	__u32 adv_peer_ack_point;

	/* Highest TSN that is acknowledged by incoming SACKs. */
	__u32 highest_sacked;

	/* TSN marking the fast recovery exit point */
	__u32 fast_recovery_exit;

	/* Flag to track the current fast recovery state */
	__u8 fast_recovery;

	/* The number of unacknowledged data chunks.  Reported through
	 * the SCTP_STATUS sockopt.
	 */
	__u16 unack_data;

	/* The total number of data chunks that we've had to retransmit
	 * as the result of a T3 timer expiration
	 */
	__u32 rtx_data_chunks;

	/* This is the association's receive buffer space.  This value is used
	 * to set a_rwnd field in an INIT or a SACK chunk.
	 */
	__u32 rwnd;

	/* This is the last advertised value of rwnd over a SACK chunk. */
	__u32 a_rwnd;

	/* Number of bytes by which the rwnd has slopped.  The rwnd is allowed
	 * to slop over a maximum of the association's frag_point.
	 */
	__u32 rwnd_over;

	/* Keeps treack of rwnd pressure.  This happens when we have
	 * a window, but not recevie buffer (i.e small packets).  This one
	 * is releases slowly (1 PMTU at a time ).
	 */
	__u32 rwnd_press;

	/* This is the sndbuf size in use for the association.
	 * This corresponds to the sndbuf size for the association,
	 * as specified in the sk->sndbuf.
	 */
	int sndbuf_used;

	/* This is the amount of memory that this association has allocated
	 * in the receive path at any given time.
	 */
	atomic_t rmem_alloc;

	/* This is the wait queue head for send requests waiting on
	 * the association sndbuf space.
	 */
	wait_queue_head_t	wait;

	/* The message size at which SCTP fragmentation will occur. */
	__u32 frag_point;
	__u32 user_frag;

	/* Counter used to count INIT errors. */
	int init_err_counter;

	/* Count the number of INIT cycles (for doubling timeout). */
	int init_cycle;

	/* Default send parameters. */
	__u16 default_stream;
	__u16 default_flags;
	__u32 default_ppid;
	__u32 default_context;
	__u32 default_timetolive;

	/* Default receive parameters */
	__u32 default_rcv_context;

	/* Stream arrays */
	struct sctp_stream *stream;

	/* All outbound chunks go through this structure.  */
	struct sctp_outq outqueue;

	/* A smart pipe that will handle reordering and fragmentation,
	 * as well as handle passing events up to the ULP.
	 */
	struct sctp_ulpq ulpq;

	/* Last TSN that caused an ECNE Chunk to be sent.  */
	__u32 last_ecne_tsn;

	/* Last TSN that caused a CWR Chunk to be sent.	 */
	__u32 last_cwr_tsn;

	/* How many duplicated TSNs have we seen?  */
	int numduptsns;

	/* These are to support
	 * "SCTP Extensions for Dynamic Reconfiguration of IP Addresses
	 *  and Enforcement of Flow and Message Limits"
	 * <draft-ietf-tsvwg-addip-sctp-02.txt>
	 * or "ADDIP" for short.
	 */



	/* ADDIP Section 4.1.1 Congestion Control of ASCONF Chunks
	 *
	 * R1) One and only one ASCONF Chunk MAY be in transit and
	 * unacknowledged at any one time.  If a sender, after sending
	 * an ASCONF chunk, decides it needs to transfer another
	 * ASCONF Chunk, it MUST wait until the ASCONF-ACK Chunk
	 * returns from the previous ASCONF Chunk before sending a
	 * subsequent ASCONF. Note this restriction binds each side,
	 * so at any time two ASCONF may be in-transit on any given
	 * association (one sent from each endpoint).
	 *
	 * [This is our one-and-only-one ASCONF in flight.  If we do
	 * not have an ASCONF in flight, this is NULL.]
	 */
	struct sctp_chunk *addip_last_asconf;

	/* ADDIP Section 5.2 Upon reception of an ASCONF Chunk.
	 *
	 * This is needed to implement itmes E1 - E4 of the updated
	 * spec.  Here is the justification:
	 *
	 * Since the peer may bundle multiple ASCONF chunks toward us,
	 * we now need the ability to cache multiple ACKs.  The section
	 * describes in detail how they are cached and cleaned up.
	 */
	struct list_head asconf_ack_list;

	/* These ASCONF chunks are waiting to be sent.
	 *
	 * These chunaks can't be pushed to outqueue until receiving
	 * ASCONF_ACK for the previous ASCONF indicated by
	 * addip_last_asconf, so as to guarantee that only one ASCONF
	 * is in flight at any time.
	 *
	 * ADDIP Section 4.1.1 Congestion Control of ASCONF Chunks
	 *
	 * In defining the ASCONF Chunk transfer procedures, it is
	 * essential that these transfers MUST NOT cause congestion
	 * within the network.	To achieve this, we place these
	 * restrictions on the transfer of ASCONF Chunks:
	 *
	 * R1) One and only one ASCONF Chunk MAY be in transit and
	 * unacknowledged at any one time.  If a sender, after sending
	 * an ASCONF chunk, decides it needs to transfer another
	 * ASCONF Chunk, it MUST wait until the ASCONF-ACK Chunk
	 * returns from the previous ASCONF Chunk before sending a
	 * subsequent ASCONF. Note this restriction binds each side,
	 * so at any time two ASCONF may be in-transit on any given
	 * association (one sent from each endpoint).
	 *
	 *
	 * [I really think this is EXACTLY the sort of intelligence
	 *  which already resides in sctp_outq.	 Please move this
	 *  queue and its supporting logic down there.	--piggy]
	 */
	struct list_head addip_chunk_list;

	/* ADDIP Section 4.1 ASCONF Chunk Procedures
	 *
	 * A2) A serial number should be assigned to the Chunk. The
	 * serial number SHOULD be a monotonically increasing
	 * number. The serial number SHOULD be initialized at
	 * the start of the association to the same value as the
	 * Initial TSN and every time a new ASCONF chunk is created
	 * it is incremented by one after assigning the serial number
	 * to the newly created chunk.
	 *
	 * ADDIP
	 * 3.1.1  Address/Stream Configuration Change Chunk (ASCONF)
	 *
	 * Serial Number : 32 bits (unsigned integer)
	 *
	 * This value represents a Serial Number for the ASCONF
	 * Chunk. The valid range of Serial Number is from 0 to
	 * 4294967295 (2^32 - 1).  Serial Numbers wrap back to 0
	 * after reaching 4294967295.
	 */
	__u32 addip_serial;
	int src_out_of_asoc_ok;
	union sctp_addr *asconf_addr_del_pending;
	struct sctp_transport *new_transport;

	/* SCTP AUTH: list of the endpoint shared keys.  These
	 * keys are provided out of band by the user applicaton
	 * and can't change during the lifetime of the association
	 */
	struct list_head endpoint_shared_keys;

	/* SCTP AUTH:
	 * The current generated assocaition shared key (secret)
	 */
	struct sctp_auth_bytes *asoc_shared_key;

	/* SCTP AUTH: hmac id of the first peer requested algorithm
	 * that we support.
	 */
	__u16 default_hmac_id;

	__u16 active_key_id;

	__u8 need_ecne:1,	/* Need to send an ECNE Chunk? */
	     temp:1,		/* Is it a temporary association? */
	     force_delay:1,
	     prsctp_enable:1,
	     reconf_enable:1;

	__u8 strreset_enable;
	__u8 strreset_outstanding; /* request param count on the fly */

	__u32 strreset_outseq; /* Update after receiving response */
	__u32 strreset_inseq; /* Update after receiving request */
	__u32 strreset_result[2]; /* save the results of last 2 responses */

	struct sctp_chunk *strreset_chunk; /* save request chunk */

	struct sctp_priv_assoc_stats stats;

	int sent_cnt_removable;

	__u64 abandoned_unsent[SCTP_PR_INDEX(MAX) + 1];
	__u64 abandoned_sent[SCTP_PR_INDEX(MAX) + 1];
};

建立SCTP关联,初始化操作是一个四次握手过程。
【1】端点(A)向要与通信的端口(Z)发送 INIT 块。INIT 块的发起标签字符包含本地生成的标签,还包含一个值为 0 的验证标签;
【2】发送 INIT 块后,关联进入 SCTP_STATE_COOKIE_WAIT 状态;
【3】作为应答,端点(Z)会向端点(A)发送一个 INIT-ACK 块。这个块的发起标签字段包含一个本地生成的标签,同时,它还会将远程端点的发起标签用作验证标签。端点(Z)还需要生成一个状态 cookie,并且通过 INIT-ACK 应答发送它;
【4】端点(A)收到 INIT-ACK 块后,这会退出 SCTP_STATE_COOKIE_WAIT 状态;从此开始,在传输所有数据报中,(A)都会将远程端点的发起标签用作验证标签,接下来,(A)将通过一个 COOKIE ECHO 块发送状态 cookie,并进入 SCTP_STATE_COOKIE_ECHOED 状态;
【5】收到 COOKIE ECHO 块后,端点(Z)将创建一个传输控制块(Transmission control block, TCB),TCB 是包含 SCTP 连接一端的连接信息的数据结构,接下来,(Z)将切换到状态 SCTP_STATE_ESTABLISHED,并使用 COOKIE ACK 块进行应答。到此为止,在(Z)端点处就可以建立关联,该关联将使用保存的标签,
【6】收到 COOKIE ACK 后,A 端点将从状态SCTP_STATE_COOKIE_ECHOED切换到SCTP_STATE_ESTABLISHED状态。

3、接收SCTP数据包
负责接收 SCTP 数据包主要处理程序的方法为sctp_rcv(),源码处理如下:
在这里插入图片描述

4、发送SCTP数据包
写入用户空间SCTP套接字方法为sctp_sendmsg()
在这里插入图片描述

数据报拥塞控制协议(DCCP)

DCCP 是一种不可靠的拥塞控制传输协议,它借鉴 UDP 和 TCP,并添加新功能,与 UDP 一样,它是面向消息且不可靠的,与TCP一样,它是面向连接的,且将使用三次握手来建立连接。

它是一个可以进行拥塞控制的非可靠传输协议,并同时提供多种拥塞控制机制,在通信开始时由用户进行协商选择。除预留和自定义方式外,目前 DCCP 定义了两种拥塞控制机制:TCP-Like 和 TFRC。TCP-Like 类似 TCP 的 AIMD 机制,而 TFRC是TCP友好的速率控制机制。

有确认的不可靠数据报流。使用 Data 和 DataAck 两种数据流的数据传输,Data是纯数据,DataAck可以既有数据又有确认信息。
可靠性协商:包括合适的拥塞控制协商、拥塞控制协商算法。半连接。这使得两台主机间可以使用两个半连接来连接,并使用不同的拥塞控制机制。(拥塞控制标记(CCID))。每个CCID说明了它的终端是如何对ECN报告进行回应的。
多重连接和移动通信中的应用:DCCP 提供多重连接。在连接过程中可以通知对方地址或者端口的改变。当移动端点得到新的地址后,它从新地址发送 DCCP-Move 包给固定端点,然后固定端点使用新的地址改变连接状态。此外,DCCP 使用一个缓存来取代 TCP 的探测帧,这样减少了网络开销。

每个 DCCP 数据包开头都是一个 DCCP 报头。DCCP 报头最短12字节。DCCP 使用 12-2020 字节的变长报头,具体长度取决于使用的是否是短序列号以及包含哪些 TLV 数据包选项。

DCCP 序列号为已发送的数据包数,可从 6 字节缩短到 3 字节。
在这里插入图片描述

在这里插入图片描述

1、DCCP套接字的初始化操作
在用户空间中,使用系统调用 socket() 来创建 DCCP 套接字,其中的域参数(SOCK_DCCP)指明要创建的是 DCCP 套接字。

将 DCC P套接字的字段初始化为合理默认值。比如,将套接字状态设置为DCCPC_LOSED
通过调用方法dccp_init_xmit_timers()初始化DCCP定时器;
通过调用方法dccp_feat_init()初始化功能协议部分。

2、接收来自L3的DCCP数据包

方法 dccp_v4_rcv() 是负责接收来自网络层的 DCCP 数据包处理程序:

/* this is called when real data arrives */
static int dccp_v4_rcv(struct sk_buff *skb)
{
	const struct dccp_hdr *dh;
	const struct iphdr *iph;
	bool refcounted;
	struct sock *sk;
	int min_cov;

	/* Step 1: Check header basics */
	// 首先,丢弃无效的数据包,例如数据包不是发送给当前主机或者是数据包长度比DCCP报头还短就丢掉
	if (dccp_invalid_packet(skb))
		goto discard_it;

	iph = ip_hdr(skb);
	/* Step 1: If header checksum is incorrect, drop packet and return */
	if (dccp_v4_csum_finish(skb, iph->saddr, iph->daddr)) {
		DCCP_WARN("dropped packet with invalid checksum\n");
		goto discard_it;
	}

	dh = dccp_hdr(skb);

	DCCP_SKB_CB(skb)->dccpd_seq  = dccp_hdr_seq(dh);
	DCCP_SKB_CB(skb)->dccpd_type = dh->dccph_type;

	dccp_pr_debug("%8.8s src=%pI4@%-5d dst=%pI4@%-5d seq=%llu",
		      dccp_packet_name(dh->dccph_type),
		      &iph->saddr, ntohs(dh->dccph_sport),
		      &iph->daddr, ntohs(dh->dccph_dport),
		      (unsigned long long) DCCP_SKB_CB(skb)->dccpd_seq);

	if (dccp_packet_without_ack(skb)) {
		DCCP_SKB_CB(skb)->dccpd_ack_seq = DCCP_PKT_WITHOUT_ACK_SEQ;
		dccp_pr_debug_cat("\n");
	} else {
		DCCP_SKB_CB(skb)->dccpd_ack_seq = dccp_hdr_ack_seq(skb);
		dccp_pr_debug_cat(", ack=%llu\n", (unsigned long long)
				  DCCP_SKB_CB(skb)->dccpd_ack_seq);
	}

lookup:
	// 根据流查找套接字
	sk = __inet_lookup_skb(&dccp_hashinfo, skb, __dccp_hdr_len(dh),
			       dh->dccph_sport, dh->dccph_dport, &refcounted);
	// 如果没有找到匹配的套接字,就将数据包丢掉
	if (!sk) {
		dccp_pr_debug("failed to look up flow ID in table and "
			      "get corresponding socket\n");
		goto no_dccp_socket;
	}

	/*
	 * Step 2:
	 *	... or S.state == TIMEWAIT,
	 *		Generate Reset(No Connection) unless P.type == Reset
	 *		Drop packet and return
	 */
	if (sk->sk_state == DCCP_TIME_WAIT) {
		dccp_pr_debug("sk->sk_state == DCCP_TIME_WAIT: do_time_wait\n");
		inet_twsk_put(inet_twsk(sk));
		goto no_dccp_socket;
	}

	if (sk->sk_state == DCCP_NEW_SYN_RECV) {
		struct request_sock *req = inet_reqsk(sk);
		struct sock *nsk;

		sk = req->rsk_listener;
		if (unlikely(sk->sk_state != DCCP_LISTEN)) {
			inet_csk_reqsk_queue_drop_and_put(sk, req);
			goto lookup;
		}
		sock_hold(sk);
		refcounted = true;
		nsk = dccp_check_req(sk, skb, req);
		if (!nsk) {
			reqsk_put(req);
			goto discard_and_relse;
		}
		if (nsk == sk) {
			reqsk_put(req);
		} else if (dccp_child_process(sk, nsk, skb)) {
			dccp_v4_ctl_send_reset(sk, skb);
			goto discard_and_relse;
		} else {
			sock_put(sk);
			return 0;
		}
	}
	/*
	 * RFC 4340, sec. 9.2.1: Minimum Checksum Coverage
	 *	o if MinCsCov = 0, only packets with CsCov = 0 are accepted
	 *	o if MinCsCov > 0, also accept packets with CsCov >= MinCsCov
	 */
	min_cov = dccp_sk(sk)->dccps_pcrlen;
	if (dh->dccph_cscov && (min_cov == 0 || dh->dccph_cscov < min_cov))  {
		dccp_pr_debug("Packet CsCov %d does not satisfy MinCsCov %d\n",
			      dh->dccph_cscov, min_cov);
		/* FIXME: "Such packets SHOULD be reported using Data Dropped
		 *         options (Section 11.7) with Drop Code 0, Protocol
		 *         Constraints."                                     */
		goto discard_and_relse;
	}

	if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
		goto discard_and_relse;
	nf_reset(skb);

	// 所有检验和查体完整性一切正常,调用sk_receive_skb()将数据包交给传输层(L4)
	return __sk_receive_skb(sk, skb, 1, dh->dccph_doff * 4, refcounted);

no_dccp_socket:
	if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
		goto discard_it;
	/*
	 * Step 2:
	 *	If no socket ...
	 *		Generate Reset(No Connection) unless P.type == Reset
	 *		Drop packet and return
	 */
	if (dh->dccph_type != DCCP_PKT_RESET) {
		DCCP_SKB_CB(skb)->dccpd_reset_code =
					DCCP_RESET_CODE_NO_CONNECTION;
		dccp_v4_ctl_send_reset(sk, skb);
	}

discard_it:
	kfree_skb(skb);
	return 0;

discard_and_relse:
	if (refcounted)
		sock_put(sk);
	goto discard_it;
}

3、发送DCCP数据包

当从 DCCP 用户空间套接字发送数据时,在内核中,最终将由方法 dccp_sendmsg() 处理:
在这里插入图片描述

由于更改DCCP数据包的源或目标IP地址通常会使DCCP校验和无效,因此在没有专用支持的情况下,不可能通过NAT使用DCCP

Because changing the source or destination IP address of a DCCP packet will normally invalidate the DCCP checksum, it is not possible to use DCCP through a NAT without dedicated support. Some NAT devices are known to provide “generic” transport-protocol support, whereby only the IP header is mangled. That scheme is not sufficient to support DCCP.
https://rfc2cn.com/rfc5597.html

http://t.csdnimg.cn/AN4yS
https://www.flydean.com/21-sctp/
https://zh.wikipedia.org/wiki/%E6%95%B0%E6%8D%AE%E6%8B%A5%E5%A1%9E%E6%8E%A7%E5%88%B6%E5%8D%8F%E8%AE%AE

  • 8
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

飞大圣

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值