流控制传输协议(SCTP)
SCTP(Stream Control Transmission Protocol,流控制传输协议)是在2007年发布的 RFC 4960 中定义的,但它首次被定义则是在2000 年。SCTP 设计用于通过 IP 网络传输公共交换电话网络(Public Switched Telephone Network,PSTN)信令。
TCP 为了保证数据传输的可靠性,需要严格要求数据传输的顺序。比如一个数据包被分拆成了三份分别标号为A,B,C。那么对于接收方来说,则必须先接受数据包 A,然后是 B 和 C。如果先接受到的B,那么接收方就会需要让发送发重发数据包。
在这种严格要求数据包顺序的情况下,可能会造成不必要的数据延迟和消息阻塞。
因为TCP是面向数据流的,为了标记数据流中的不同记录,TCP 中的数据需要额外添加一些标记或者编码来对记录进行区分。
另外,为了提升传输效率,避免发送多个小数据包的情况,TCP 还可能会对其进行优化,也就是说等等多个小的数据包将其合并为一个大的数据包。如果不希望这样的优化,那么需要在 TCP 数据包中设置 PSH 标志,明确该请求是无延迟的传输请求。
最后TCP还容易收到DOS(denial-of-service)攻击。
SCTP 协议是对 TCP 和 UDP 协议的提升,具体而言 SCTP 协议既提供了 UDP 协议的面向消息的特性,同时又具有 TCP 协议的可靠性、顺序传输和拥塞控制的功能,并且还提供了多宿主和冗余路径的功能,从而提高弹性和可靠性。
SCTP主要有两大特点:
(1)Message-based,也就是说 SCTP是面向消息的。SCTP传输的是一系列的消息,一个消息是一组字节。
(2)Multi-streaming,多流指的是 SCTP 能够并行传输多个独立的数据流。
对 TCP 协议来说,客户端和服务器端都只有一个,属于一对一进行连接的情况,如果这个连接两端的而任何一个IP或者端口不可以,那么整个TCP的连接就崩溃了。
SCT P就是 TCP 协议的升级版本,它在增强可靠性方面做了优化。具体而言,每个SCTP的节点都会使用心跳的机制定时检查远程节点的主IP地址和备用的冗余IP地址的可达性。在 SCTP 中一个节点可以绑定多个 IP 地址。
心跳机制通过交换 SCTP 数据包 HEARTBEAT 和 HEARTBEAT-ACK 来检测路径的连接性。到达无返回心跳确认阈值后,它将宣布IP地址失效,默认每隔 30 秒将发送一个 HEARTBEAT 块,用来对空闲的目标传输地址进行监视。如果要配置这个时间间隔,可设置
/proc/sys/net/sctp/hb interval
,默认值为30000
(30秒)。
SCTP具有四次握手:SCTP 在收到客户端的连接请求之后,并不会立即分配内存缓存起来,而是返回一个 COOKIE 给客户端。客户端再次请求的时候,需要带上这个COOKIE信息,服务器端通过COOKIE校验,确认客户端的身份之后,才会最终建立连接。从而避免TCP的SYN 攻击。
SCTP 初始化操作,方法 sctp_init()
可为各种结构分配内存,并在 IPv4 和 IPv6 中注册 SCTP。
1、SCTP数据包和数据块
每个 SCTP 数据包都有一个通用的 SCTP 报头,后面紧跟着一个或多个块。块包含数据或 SCTP 控制信息。
/* RFC2960 1.4 Key Terms
*
* o Chunk: A unit of information within an SCTP packet, consisting of
* a chunk header and chunk-specific content.
*
* As a matter of convenience, we remember the SCTP common header for
* each chunk as well as a few other header pointers...
*/
struct sctp_chunk {
struct list_head list;
atomic_t refcnt;
/* How many times this chunk have been sent, for prsctp RTX policy */
int sent_count;
/* This is our link to the per-transport transmitted list. */
struct list_head transmitted_list;
/* This field is used by chunks that hold fragmented data.
* For the first fragment this is the list that holds the rest of
* fragments. For the remaining fragments, this is the link to the
* frag_list maintained in the first fragment.
*/
struct list_head frag_list;
/* This points to the sk_buff containing the actual data. */
struct sk_buff *skb;
/* In case of GSO packets, this will store the head one */
struct sk_buff *head_skb;
/* These are the SCTP headers by reverse order in a packet.
* Note that some of these may happen more than once. In that
* case, we point at the "current" one, whatever that means
* for that level of header.
*/
/* We point this at the FIRST TLV parameter to chunk_hdr. */
union sctp_params param_hdr;
union {
__u8 *v;
struct sctp_datahdr *data_hdr;
struct sctp_inithdr *init_hdr;
struct sctp_sackhdr *sack_hdr;
struct sctp_heartbeathdr *hb_hdr;
struct sctp_sender_hb_info *hbs_hdr;
struct sctp_shutdownhdr *shutdown_hdr;
struct sctp_signed_cookie *cookie_hdr;
struct sctp_ecnehdr *ecne_hdr;
struct sctp_cwrhdr *ecn_cwr_hdr;
struct sctp_errhdr *err_hdr;
struct sctp_addiphdr *addip_hdr;
struct sctp_fwdtsn_hdr *fwdtsn_hdr;
struct sctp_authhdr *auth_hdr;
} subh;
__u8 *chunk_end;
struct sctp_chunkhdr *chunk_hdr;
struct sctphdr *sctp_hdr;
/* This needs to be recoverable for SCTP_SEND_FAILED events. */
struct sctp_sndrcvinfo sinfo;
/* Which association does this belong to? */
struct sctp_association *asoc;
/* What endpoint received this chunk? */
struct sctp_ep_common *rcvr;
/* We fill this in if we are calculating RTT. */
unsigned long sent_at;
/* What is the origin IP address for this chunk? */
union sctp_addr source;
/* Destination address for this chunk. */
union sctp_addr dest;
/* For outbound message, track all fragments for SEND_FAILED. */
struct sctp_datamsg *msg;
/* For an inbound chunk, this tells us where it came from.
* For an outbound chunk, it tells us where we'd like it to
* go. It is NULL if we have no preference.
*/
struct sctp_transport *transport;
/* SCTP-AUTH: For the special case inbound processing of COOKIE-ECHO
* we need save a pointer to the AUTH chunk, since the SCTP-AUTH
* spec violates the principle premis that all chunks are processed
* in order.
*/
struct sk_buff *auth_chunk;
#define SCTP_CAN_FRTX 0x0
#define SCTP_NEED_FRTX 0x1
#define SCTP_DONT_FRTX 0x2
__u16 rtt_in_progress:1, /* This chunk used for RTT calc? */
has_tsn:1, /* Does this chunk have a TSN yet? */
has_ssn:1, /* Does this chunk have a SSN yet? */
singleton:1, /* Only chunk in the packet? */
end_of_packet:1, /* Last chunk in the packet? */
ecn_ce_done:1, /* Have we processed the ECN CE bit? */
pdiscard:1, /* Discard the whole packet now? */
tsn_gap_acked:1, /* Is this chunk acked by a GAP ACK? */
data_accepted:1, /* At least 1 chunk accepted */
auth:1, /* IN: was auth'ed | OUT: needs auth */
has_asconf:1, /* IN: have seen an asconf before */
tsn_missing_report:2, /* Data chunk missing counter. */
fast_retransmit:2; /* Is this chunk fast retransmitted? */
};
2、SCTP关联
SCTP 关联而不是连接,连接指的是两个 IP 地址之间的通信,而关联指的是两个端点之间的通信,端点可能有多个IP地址,内核源码将 SCTP 关联由结构 sctp_association 表示如下:
/* RFC2960
*
* 12. Recommended Transmission Control Block (TCB) Parameters
*
* This section details a recommended set of parameters that should
* be contained within the TCB for an implementation. This section is
* for illustrative purposes and should not be deemed as requirements
* on an implementation or as an exhaustive list of all parameters
* inside an SCTP TCB. Each implementation may need its own additional
* parameters for optimization.
*/
/* Here we have information about each individual association. */
struct sctp_association {
/* A base structure common to endpoint and association.
* In this context, it represents the associations's view
* of the local endpoint of the association.
*/
struct sctp_ep_common base;
/* Associations on the same socket. */
struct list_head asocs;
/* association id. */
// 关联的唯一 ID
sctp_assoc_t assoc_id;
/* This is our parent endpoint. */
struct sctp_endpoint *ep;
/* These are those association elements needed in the cookie. */
// cookie 需要的关联元素,与关联状态 cookie 有关(sctp_cookie对象)
struct sctp_cookie c;
/* This is all information about our peer. */
// 有关对等体的所有信息
struct {
/* transport_addr_list
*
* Peer : A list of SCTP transport addresses that the
* Transport : peer is bound to. This information is derived
* Address : from the INIT or INIT ACK and is used to
* List : associate an inbound packet with a given
* : association. Normally this information is
* : hashed or keyed for quick lookup and access
* : of the TCB.
* : The list is also initialized with the list
* : of addresses passed with the sctp_connectx()
* : call.
*
* It is a list of SCTP_transport's.
*/
struct list_head transport_addr_list;
/* rwnd
*
* Peer Rwnd : Current calculated value of the peer's rwnd.
*/
__u32 rwnd;
/* transport_count
*
* Peer : A count of the number of peer addresses
* Transport : in the Peer Transport Address List.
* Address :
* Count :
*/
__u16 transport_count;
/* port
* The transport layer port number.
*/
__u16 port;
/* primary_path
*
* Primary : This is the current primary destination
* Path : transport address of the peer endpoint. It
* : may also specify a source transport address
* : on this endpoint.
*
* All of these paths live on transport_addr_list.
*
* At the bakeoffs, we discovered that the intent of
* primaryPath is that it only changes when the ULP
* asks to have it changed. We add the activePath to
* designate the connection we are currently using to
* transmit new data and most control chunks.
*/
struct sctp_transport *primary_path; // 表示建立初始连接所用的地址
/* Cache the primary path address here, when we
* need a an address for msg_name.
*/
union sctp_addr primary_addr;
/* active_path
* The path that we are currently using to
* transmit new data and most control chunks.
*/
struct sctp_transport *active_path; // 当前发送数据时使用的对等体地址
/* retran_path
*
* RFC2960 6.4 Multi-homed SCTP Endpoints
* ...
* Furthermore, when its peer is multi-homed, an
* endpoint SHOULD try to retransmit a chunk to an
* active destination transport address that is
* different from the last destination address to
* which the DATA chunk was sent.
*/
struct sctp_transport *retran_path;
/* Pointer to last transport I have sent on. */
struct sctp_transport *last_sent_to;
/* This is the last transport I have received DATA on. */
struct sctp_transport *last_data_from;
/*
* Mapping An array of bits or bytes indicating which out of
* Array order TSN's have been received (relative to the
* Last Rcvd TSN). If no gaps exist, i.e. no out of
* order packets have been received, this array
* will be set to all zero. This structure may be
* in the form of a circular buffer or bit array.
*
* Last Rcvd : This is the last TSN received in
* TSN : sequence. This value is set initially by
* : taking the peer's Initial TSN, received in
* : the INIT or INIT ACK chunk, and subtracting
* : one from it.
*
* Throughout most of the specification this is called the
* "Cumulative TSN ACK Point". In this case, we
* ignore the advice in 12.2 in favour of the term
* used in the bulk of the text. This value is hidden
* in tsn_map--we get it by calling sctp_tsnmap_get_ctsn().
*/
struct sctp_tsnmap tsn_map;
/* This mask is used to disable sending the ASCONF chunk
* with specified parameter to peer.
*/
__be16 addip_disabled_mask;
/* These are capabilities which our peer advertised. */
__u8 ecn_capable:1, /* Can peer do ECN? */
ipv4_address:1, /* Peer understands IPv4 addresses? */
ipv6_address:1, /* Peer understands IPv6 addresses? */
hostname_address:1, /* Peer understands DNS addresses? */
asconf_capable:1, /* Does peer support ADDIP? */
prsctp_capable:1, /* Can peer do PR-SCTP? */
reconf_capable:1, /* Can peer do RE-CONFIG? */
auth_capable:1; /* Is peer doing SCTP-AUTH? */
/* sack_needed : This flag indicates if the next received
* : packet is to be responded to with a
* : SACK. This is initialized to 0. When a packet
* : is received sack_cnt is incremented. If this value
* : reaches 2 or more, a SACK is sent and the
* : value is reset to 0. Note: This is used only
* : when no DATA chunks are received out of
* : order. When DATA chunks are out of order,
* : SACK's are not delayed (see Section 6).
*/
__u8 sack_needed:1, /* Do we need to sack the peer? */
sack_generation:1,
zero_window_announced:1;
__u32 sack_cnt;
__u32 adaptation_ind; /* Adaptation Code point. */
struct sctp_inithdr_host i;
void *cookie;
int cookie_len;
/* ADDIP Section 4.2 Upon reception of an ASCONF Chunk.
* C1) ... "Peer-Serial-Number'. This value MUST be initialized to the
* Initial TSN Value minus 1
*/
__u32 addip_serial;
/* SCTP-AUTH: We need to know pears random number, hmac list
* and authenticated chunk list. All that is part of the
* cookie and these are just pointers to those locations
*/
sctp_random_param_t *peer_random;
sctp_chunks_param_t *peer_chunks;
sctp_hmac_algo_param_t *peer_hmacs;
} peer; // 是一个内部结构,表示关联的对等端点。
/* State : A state variable indicating what state the
* : association is in, i.e. COOKIE-WAIT,
* : COOKIE-ECHOED, ESTABLISHED, SHUTDOWN-PENDING,
* : SHUTDOWN-SENT, SHUTDOWN-RECEIVED, SHUTDOWN-ACK-SENT.
*
* Note: No "CLOSED" state is illustrated since if a
* association is "CLOSED" its TCB SHOULD be removed.
*
* In this implementation we DO have a CLOSED
* state which is used during initiation and shutdown.
*
* State takes values from SCTP_STATE_*.
*/
sctp_state_t state;
/* Overall : The overall association error count.
* Error Count : [Clear this any time I get something.]
*/
int overall_error_count;
/* The cookie life I award for any cookie. */
ktime_t cookie_life;
/* These are the association's initial, max, and min RTO values.
* These values will be initialized by system defaults, but can
* be modified via the SCTP_RTOINFO socket option.
*/
unsigned long rto_initial;
unsigned long rto_max;
unsigned long rto_min;
/* Maximum number of new data packets that can be sent in a burst. */
int max_burst;
/* This is the max_retrans value for the association. This value will
* be initialized initialized from system defaults, but can be
* modified by the SCTP_ASSOCINFO socket option.
*/
int max_retrans;
/* This is the partially failed retrans value for the transport
* and will be initialized from the assocs value. This can be
* changed using the SCTP_PEER_ADDR_THLDS socket option
*/
int pf_retrans;
/* Maximum number of times the endpoint will retransmit INIT */
__u16 max_init_attempts;
/* How many times have we resent an INIT? */
__u16 init_retries;
/* The largest timeout or RTO value to use in attempting an INIT */
unsigned long max_init_timeo;
/* Heartbeat interval: The endpoint sends out a Heartbeat chunk to
* the destination address every heartbeat interval. This value
* will be inherited by all new transports.
*/
unsigned long hbinterval;
/* This is the max_retrans value for new transports in the
* association.
*/
__u16 pathmaxrxt;
/* Flag that path mtu update is pending */
__u8 pmtu_pending;
/* Association : The smallest PMTU discovered for all of the
* PMTU : peer's transport addresses.
*/
__u32 pathmtu;
/* Flags controlling Heartbeat, SACK delay, and Path MTU Discovery. */
__u32 param_flags;
__u32 sackfreq;
/* SACK delay timeout */
unsigned long sackdelay;
unsigned long timeouts[SCTP_NUM_TIMEOUT_TYPES];
struct timer_list timers[SCTP_NUM_TIMEOUT_TYPES];
/* Transport to which SHUTDOWN chunk was last sent. */
struct sctp_transport *shutdown_last_sent_to;
/* Transport to which INIT chunk was last sent. */
struct sctp_transport *init_last_sent_to;
/* How many times have we resent a SHUTDOWN */
int shutdown_retries;
/* Next TSN : The next TSN number to be assigned to a new
* : DATA chunk. This is sent in the INIT or INIT
* : ACK chunk to the peer and incremented each
* : time a DATA chunk is assigned a TSN
* : (normally just prior to transmit or during
* : fragmentation).
*/
__u32 next_tsn;
/*
* Last Rcvd : This is the last TSN received in sequence. This value
* TSN : is set initially by taking the peer's Initial TSN,
* : received in the INIT or INIT ACK chunk, and
* : subtracting one from it.
*
* Most of RFC 2960 refers to this as the Cumulative TSN Ack Point.
*/
__u32 ctsn_ack_point;
/* PR-SCTP Advanced.Peer.Ack.Point */
__u32 adv_peer_ack_point;
/* Highest TSN that is acknowledged by incoming SACKs. */
__u32 highest_sacked;
/* TSN marking the fast recovery exit point */
__u32 fast_recovery_exit;
/* Flag to track the current fast recovery state */
__u8 fast_recovery;
/* The number of unacknowledged data chunks. Reported through
* the SCTP_STATUS sockopt.
*/
__u16 unack_data;
/* The total number of data chunks that we've had to retransmit
* as the result of a T3 timer expiration
*/
__u32 rtx_data_chunks;
/* This is the association's receive buffer space. This value is used
* to set a_rwnd field in an INIT or a SACK chunk.
*/
__u32 rwnd;
/* This is the last advertised value of rwnd over a SACK chunk. */
__u32 a_rwnd;
/* Number of bytes by which the rwnd has slopped. The rwnd is allowed
* to slop over a maximum of the association's frag_point.
*/
__u32 rwnd_over;
/* Keeps treack of rwnd pressure. This happens when we have
* a window, but not recevie buffer (i.e small packets). This one
* is releases slowly (1 PMTU at a time ).
*/
__u32 rwnd_press;
/* This is the sndbuf size in use for the association.
* This corresponds to the sndbuf size for the association,
* as specified in the sk->sndbuf.
*/
int sndbuf_used;
/* This is the amount of memory that this association has allocated
* in the receive path at any given time.
*/
atomic_t rmem_alloc;
/* This is the wait queue head for send requests waiting on
* the association sndbuf space.
*/
wait_queue_head_t wait;
/* The message size at which SCTP fragmentation will occur. */
__u32 frag_point;
__u32 user_frag;
/* Counter used to count INIT errors. */
int init_err_counter;
/* Count the number of INIT cycles (for doubling timeout). */
int init_cycle;
/* Default send parameters. */
__u16 default_stream;
__u16 default_flags;
__u32 default_ppid;
__u32 default_context;
__u32 default_timetolive;
/* Default receive parameters */
__u32 default_rcv_context;
/* Stream arrays */
struct sctp_stream *stream;
/* All outbound chunks go through this structure. */
struct sctp_outq outqueue;
/* A smart pipe that will handle reordering and fragmentation,
* as well as handle passing events up to the ULP.
*/
struct sctp_ulpq ulpq;
/* Last TSN that caused an ECNE Chunk to be sent. */
__u32 last_ecne_tsn;
/* Last TSN that caused a CWR Chunk to be sent. */
__u32 last_cwr_tsn;
/* How many duplicated TSNs have we seen? */
int numduptsns;
/* These are to support
* "SCTP Extensions for Dynamic Reconfiguration of IP Addresses
* and Enforcement of Flow and Message Limits"
* <draft-ietf-tsvwg-addip-sctp-02.txt>
* or "ADDIP" for short.
*/
/* ADDIP Section 4.1.1 Congestion Control of ASCONF Chunks
*
* R1) One and only one ASCONF Chunk MAY be in transit and
* unacknowledged at any one time. If a sender, after sending
* an ASCONF chunk, decides it needs to transfer another
* ASCONF Chunk, it MUST wait until the ASCONF-ACK Chunk
* returns from the previous ASCONF Chunk before sending a
* subsequent ASCONF. Note this restriction binds each side,
* so at any time two ASCONF may be in-transit on any given
* association (one sent from each endpoint).
*
* [This is our one-and-only-one ASCONF in flight. If we do
* not have an ASCONF in flight, this is NULL.]
*/
struct sctp_chunk *addip_last_asconf;
/* ADDIP Section 5.2 Upon reception of an ASCONF Chunk.
*
* This is needed to implement itmes E1 - E4 of the updated
* spec. Here is the justification:
*
* Since the peer may bundle multiple ASCONF chunks toward us,
* we now need the ability to cache multiple ACKs. The section
* describes in detail how they are cached and cleaned up.
*/
struct list_head asconf_ack_list;
/* These ASCONF chunks are waiting to be sent.
*
* These chunaks can't be pushed to outqueue until receiving
* ASCONF_ACK for the previous ASCONF indicated by
* addip_last_asconf, so as to guarantee that only one ASCONF
* is in flight at any time.
*
* ADDIP Section 4.1.1 Congestion Control of ASCONF Chunks
*
* In defining the ASCONF Chunk transfer procedures, it is
* essential that these transfers MUST NOT cause congestion
* within the network. To achieve this, we place these
* restrictions on the transfer of ASCONF Chunks:
*
* R1) One and only one ASCONF Chunk MAY be in transit and
* unacknowledged at any one time. If a sender, after sending
* an ASCONF chunk, decides it needs to transfer another
* ASCONF Chunk, it MUST wait until the ASCONF-ACK Chunk
* returns from the previous ASCONF Chunk before sending a
* subsequent ASCONF. Note this restriction binds each side,
* so at any time two ASCONF may be in-transit on any given
* association (one sent from each endpoint).
*
*
* [I really think this is EXACTLY the sort of intelligence
* which already resides in sctp_outq. Please move this
* queue and its supporting logic down there. --piggy]
*/
struct list_head addip_chunk_list;
/* ADDIP Section 4.1 ASCONF Chunk Procedures
*
* A2) A serial number should be assigned to the Chunk. The
* serial number SHOULD be a monotonically increasing
* number. The serial number SHOULD be initialized at
* the start of the association to the same value as the
* Initial TSN and every time a new ASCONF chunk is created
* it is incremented by one after assigning the serial number
* to the newly created chunk.
*
* ADDIP
* 3.1.1 Address/Stream Configuration Change Chunk (ASCONF)
*
* Serial Number : 32 bits (unsigned integer)
*
* This value represents a Serial Number for the ASCONF
* Chunk. The valid range of Serial Number is from 0 to
* 4294967295 (2^32 - 1). Serial Numbers wrap back to 0
* after reaching 4294967295.
*/
__u32 addip_serial;
int src_out_of_asoc_ok;
union sctp_addr *asconf_addr_del_pending;
struct sctp_transport *new_transport;
/* SCTP AUTH: list of the endpoint shared keys. These
* keys are provided out of band by the user applicaton
* and can't change during the lifetime of the association
*/
struct list_head endpoint_shared_keys;
/* SCTP AUTH:
* The current generated assocaition shared key (secret)
*/
struct sctp_auth_bytes *asoc_shared_key;
/* SCTP AUTH: hmac id of the first peer requested algorithm
* that we support.
*/
__u16 default_hmac_id;
__u16 active_key_id;
__u8 need_ecne:1, /* Need to send an ECNE Chunk? */
temp:1, /* Is it a temporary association? */
force_delay:1,
prsctp_enable:1,
reconf_enable:1;
__u8 strreset_enable;
__u8 strreset_outstanding; /* request param count on the fly */
__u32 strreset_outseq; /* Update after receiving response */
__u32 strreset_inseq; /* Update after receiving request */
__u32 strreset_result[2]; /* save the results of last 2 responses */
struct sctp_chunk *strreset_chunk; /* save request chunk */
struct sctp_priv_assoc_stats stats;
int sent_cnt_removable;
__u64 abandoned_unsent[SCTP_PR_INDEX(MAX) + 1];
__u64 abandoned_sent[SCTP_PR_INDEX(MAX) + 1];
};
建立SCTP关联,初始化操作是一个四次握手过程。
【1】端点(A)向要与通信的端口(Z)发送 INIT 块。INIT 块的发起标签字符包含本地生成的标签,还包含一个值为 0 的验证标签;
【2】发送 INIT 块后,关联进入 SCTP_STATE_COOKIE_WAIT
状态;
【3】作为应答,端点(Z)会向端点(A)发送一个 INIT-ACK 块。这个块的发起标签字段包含一个本地生成的标签,同时,它还会将远程端点的发起标签用作验证标签。端点(Z)还需要生成一个状态 cookie,并且通过 INIT-ACK 应答发送它;
【4】端点(A)收到 INIT-ACK 块后,这会退出 SCTP_STATE_COOKIE_WAIT
状态;从此开始,在传输所有数据报中,(A)都会将远程端点的发起标签用作验证标签,接下来,(A)将通过一个 COOKIE ECHO 块发送状态 cookie,并进入 SCTP_STATE_COOKIE_ECHOED
状态;
【5】收到 COOKIE ECHO 块后,端点(Z)将创建一个传输控制块(Transmission control block, TCB),TCB 是包含 SCTP 连接一端的连接信息的数据结构,接下来,(Z)将切换到状态 SCTP_STATE_ESTABLISHED
,并使用 COOKIE ACK 块进行应答。到此为止,在(Z)端点处就可以建立关联,该关联将使用保存的标签,
【6】收到 COOKIE ACK 后,A 端点将从状态SCTP_STATE_COOKIE_ECHOED
切换到SCTP_STATE_ESTABLISHED
状态。
3、接收SCTP数据包
负责接收 SCTP 数据包主要处理程序的方法为sctp_rcv()
,源码处理如下:
4、发送SCTP数据包
写入用户空间SCTP套接字方法为sctp_sendmsg()
数据报拥塞控制协议(DCCP)
DCCP 是一种不可靠的拥塞控制传输协议,它借鉴 UDP 和 TCP,并添加新功能,与 UDP 一样,它是面向消息且不可靠的,与TCP一样,它是面向连接的,且将使用三次握手来建立连接。
它是一个可以进行拥塞控制的非可靠传输协议,并同时提供多种拥塞控制机制,在通信开始时由用户进行协商选择。除预留和自定义方式外,目前 DCCP 定义了两种拥塞控制机制:TCP-Like 和 TFRC。TCP-Like 类似 TCP 的 AIMD 机制,而 TFRC是TCP友好的速率控制机制。
有确认的不可靠数据报流。使用 Data 和 DataAck 两种数据流的数据传输,Data是纯数据,DataAck可以既有数据又有确认信息。
可靠性协商:包括合适的拥塞控制协商、拥塞控制协商算法。半连接。这使得两台主机间可以使用两个半连接来连接,并使用不同的拥塞控制机制。(拥塞控制标记(CCID))。每个CCID说明了它的终端是如何对ECN报告进行回应的。
多重连接和移动通信中的应用:DCCP 提供多重连接。在连接过程中可以通知对方地址或者端口的改变。当移动端点得到新的地址后,它从新地址发送 DCCP-Move 包给固定端点,然后固定端点使用新的地址改变连接状态。此外,DCCP 使用一个缓存来取代 TCP 的探测帧,这样减少了网络开销。
每个 DCCP 数据包开头都是一个 DCCP 报头。DCCP 报头最短12字节。DCCP 使用 12-2020 字节的变长报头,具体长度取决于使用的是否是短序列号以及包含哪些 TLV 数据包选项。
DCCP 序列号为已发送的数据包数,可从 6 字节缩短到 3 字节。
1、DCCP套接字的初始化操作
在用户空间中,使用系统调用 socket()
来创建 DCCP 套接字,其中的域参数(SOCK_DCCP
)指明要创建的是 DCCP 套接字。
将 DCC P套接字的字段初始化为合理默认值。比如,将套接字状态设置为DCCPC_LOSED
;
通过调用方法dccp_init_xmit_timers()
初始化DCCP定时器;
通过调用方法dccp_feat_init()
初始化功能协议部分。
2、接收来自L3的DCCP数据包
方法 dccp_v4_rcv()
是负责接收来自网络层的 DCCP 数据包处理程序:
/* this is called when real data arrives */
static int dccp_v4_rcv(struct sk_buff *skb)
{
const struct dccp_hdr *dh;
const struct iphdr *iph;
bool refcounted;
struct sock *sk;
int min_cov;
/* Step 1: Check header basics */
// 首先,丢弃无效的数据包,例如数据包不是发送给当前主机或者是数据包长度比DCCP报头还短就丢掉
if (dccp_invalid_packet(skb))
goto discard_it;
iph = ip_hdr(skb);
/* Step 1: If header checksum is incorrect, drop packet and return */
if (dccp_v4_csum_finish(skb, iph->saddr, iph->daddr)) {
DCCP_WARN("dropped packet with invalid checksum\n");
goto discard_it;
}
dh = dccp_hdr(skb);
DCCP_SKB_CB(skb)->dccpd_seq = dccp_hdr_seq(dh);
DCCP_SKB_CB(skb)->dccpd_type = dh->dccph_type;
dccp_pr_debug("%8.8s src=%pI4@%-5d dst=%pI4@%-5d seq=%llu",
dccp_packet_name(dh->dccph_type),
&iph->saddr, ntohs(dh->dccph_sport),
&iph->daddr, ntohs(dh->dccph_dport),
(unsigned long long) DCCP_SKB_CB(skb)->dccpd_seq);
if (dccp_packet_without_ack(skb)) {
DCCP_SKB_CB(skb)->dccpd_ack_seq = DCCP_PKT_WITHOUT_ACK_SEQ;
dccp_pr_debug_cat("\n");
} else {
DCCP_SKB_CB(skb)->dccpd_ack_seq = dccp_hdr_ack_seq(skb);
dccp_pr_debug_cat(", ack=%llu\n", (unsigned long long)
DCCP_SKB_CB(skb)->dccpd_ack_seq);
}
lookup:
// 根据流查找套接字
sk = __inet_lookup_skb(&dccp_hashinfo, skb, __dccp_hdr_len(dh),
dh->dccph_sport, dh->dccph_dport, &refcounted);
// 如果没有找到匹配的套接字,就将数据包丢掉
if (!sk) {
dccp_pr_debug("failed to look up flow ID in table and "
"get corresponding socket\n");
goto no_dccp_socket;
}
/*
* Step 2:
* ... or S.state == TIMEWAIT,
* Generate Reset(No Connection) unless P.type == Reset
* Drop packet and return
*/
if (sk->sk_state == DCCP_TIME_WAIT) {
dccp_pr_debug("sk->sk_state == DCCP_TIME_WAIT: do_time_wait\n");
inet_twsk_put(inet_twsk(sk));
goto no_dccp_socket;
}
if (sk->sk_state == DCCP_NEW_SYN_RECV) {
struct request_sock *req = inet_reqsk(sk);
struct sock *nsk;
sk = req->rsk_listener;
if (unlikely(sk->sk_state != DCCP_LISTEN)) {
inet_csk_reqsk_queue_drop_and_put(sk, req);
goto lookup;
}
sock_hold(sk);
refcounted = true;
nsk = dccp_check_req(sk, skb, req);
if (!nsk) {
reqsk_put(req);
goto discard_and_relse;
}
if (nsk == sk) {
reqsk_put(req);
} else if (dccp_child_process(sk, nsk, skb)) {
dccp_v4_ctl_send_reset(sk, skb);
goto discard_and_relse;
} else {
sock_put(sk);
return 0;
}
}
/*
* RFC 4340, sec. 9.2.1: Minimum Checksum Coverage
* o if MinCsCov = 0, only packets with CsCov = 0 are accepted
* o if MinCsCov > 0, also accept packets with CsCov >= MinCsCov
*/
min_cov = dccp_sk(sk)->dccps_pcrlen;
if (dh->dccph_cscov && (min_cov == 0 || dh->dccph_cscov < min_cov)) {
dccp_pr_debug("Packet CsCov %d does not satisfy MinCsCov %d\n",
dh->dccph_cscov, min_cov);
/* FIXME: "Such packets SHOULD be reported using Data Dropped
* options (Section 11.7) with Drop Code 0, Protocol
* Constraints." */
goto discard_and_relse;
}
if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
goto discard_and_relse;
nf_reset(skb);
// 所有检验和查体完整性一切正常,调用sk_receive_skb()将数据包交给传输层(L4)
return __sk_receive_skb(sk, skb, 1, dh->dccph_doff * 4, refcounted);
no_dccp_socket:
if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
goto discard_it;
/*
* Step 2:
* If no socket ...
* Generate Reset(No Connection) unless P.type == Reset
* Drop packet and return
*/
if (dh->dccph_type != DCCP_PKT_RESET) {
DCCP_SKB_CB(skb)->dccpd_reset_code =
DCCP_RESET_CODE_NO_CONNECTION;
dccp_v4_ctl_send_reset(sk, skb);
}
discard_it:
kfree_skb(skb);
return 0;
discard_and_relse:
if (refcounted)
sock_put(sk);
goto discard_it;
}
3、发送DCCP数据包
当从 DCCP 用户空间套接字发送数据时,在内核中,最终将由方法 dccp_sendmsg()
处理:
由于更改DCCP数据包的源或目标IP地址通常会使DCCP校验和无效,因此在没有专用支持的情况下,不可能通过NAT使用DCCP
Because changing the source or destination IP address of a DCCP packet will normally invalidate the DCCP checksum, it is not possible to use DCCP through a NAT without dedicated support. Some NAT devices are known to provide “generic” transport-protocol support, whereby only the IP header is mangled. That scheme is not sufficient to support DCCP.
https://rfc2cn.com/rfc5597.html
http://t.csdnimg.cn/AN4yS
https://www.flydean.com/21-sctp/
https://zh.wikipedia.org/wiki/%E6%95%B0%E6%8D%AE%E6%8B%A5%E5%A1%9E%E6%8E%A7%E5%88%B6%E5%8D%8F%E8%AE%AE