套接字内核参数

套接字内核参数

内核参数设置

以修改somaxconn举例:

1.暂时性修改(系统重启后保存不了)

step 1

echo 2048 >   /proc/sys/net/core/somaxconn

step 2

sysctl -p

2.永久性修改,在/etc/sysctl.conf中添加如下

step 1

net.core.somaxconn = 2048

step 2

sysctl -p

内核套接字参数

以下文件的所在目录为/proc/sys/net/ipv4 或 /proc/sys/net/core/ (Centos Linux release 7.2.1511)

tcp_retries1

[TCP/IP详解 卷一(中文 第二版) P464]
reference
重传超过阈值tcp_retries1,主要的动作就是更新路由缓存

tcp_retries2

[TCP/IP详解 卷一(中文 第二版) P464]

tcp_syn_retries & tcp_synack_retries

[TCP/IP详解 卷一(中文 第二版) P464]
For SYN segments, net.ipv4.tcp_syn_retries and net.ipv4.tcp_synack_retries bounds the number of retransmissions of SYN segments; their default value is 5 (roughly 180s).

tcp_fin_timeout

[TCP/IP详解 卷一(中文 第二版) P446]
和FIN_WAIT_2有关

tcp_abort_on_overflow

[TCP/IP详解 卷一(中文 第二版) P455]
If there is not enough room on the queue for the new connection, the TCP delays responding to the SYN, to give the application a chance to catch up. Linux is somewhat unique in this behavior—it persists in not ignoring incoming connections if it possibly can. If the net.ipv4.tcp_abort_on_overflow system control variable is set, new incoming connections are reset with a reset segment.

tcp_max_syn_backlog

[TCP/IP详解 卷一(中文 第二版) P458]
When a connection request arrives (i.e.,the SYN segment), the system-wide parameter tcp_max_syn_backlog is checked (default 1000). If the number of connections in the SYN_RCVD state would exceed this threshold, the incoming connection is rejected.

tcp_timestamps

TCP Timestamps Option (TSopt):
结构:

  +-------+-------+---------------------+---------------------+
  |Kind=8 |  10   |   TS Value (TSval)  |TS Echo Reply (TSecr)|
  +-------+-------+---------------------+---------------------+
      1       1              4                     4

 The Timestamps option carries two four-byte timestamp fields.
 The Timestamp Value field (TSval) contains the current value of
 the timestamp clock of the TCP sending the option.

 The Timestamp Echo Reply field (TSecr) is only valid if the ACK
 bit is set in the TCP header; if it is valid, it echos a times-
 tamp value that was sent by the remote TCP in the TSval field
 of a Timestamps option.  When TSecr is not valid, its value
 must be zero.  The TSecr value will generally be from the most
 recent Timestamp option that was received; however, there are
 exceptions that are explained below.

默认开启, 作用:1.更加精准的测量RTT; 2.防回绕序列号(PAWS)

reference

tcp_tw_reuse && tcp_tw_recyle

reference(一篇极好的文章)

tcp_tw_reuse
By enabling net.ipv4.tcp_tw_reuse, Linux will reuse an existing connection in the TIME-WAIT state for a new outgoing connection if the new timestamp is strictly bigger than the most recent timestamp recorded for the previous connection: an outgoing connection in the TIME-WAIT state can be reused after just one second.

Q : 重用(reuse)什么
A : connection, 内核中的相关套接字数据结构
Q : 谁重用这些数据结构
A : 处于TIME_WAIT状态的一方,再一次发起相同连接(TCP套接字四元组一致)的时候,进行重用。
Q : 具体流程以及为什么依赖tcp_timestamps
A : 见如下分析
Once a new connection replaces the TIME-WAIT entry [time 1], the SYN segment of the new connection is ignored (thanks to the timestamps) [time 2] and won’t be answered by a RST [time 3] but only by a retransmission of the FIN and ACK segment [time 3]. The FIN segment will then be answered with a RST (because the local connection is in the SYN-SENT state)[time 4] which will allow the transition out of the LAST-ACK state. The initial SYN segment will eventually be resent (after one second) because there was no answer and the connection will be established without apparent error, except a slight delay:

clipboard.png

tcp_tw_recyle
建议不要打开该选项
Starting from Linux 4.10 (commit 95a22caee396), Linux will randomize timestamp offsets for each connection, making this option completely broken, with or without NAT.

需要了解内核套接字的数据结构:TODO

net.ipv4.tcp_syncookies

[TCP/IP详解 卷一(中文 第二版) P455]
当net.ipv4.tcp_syncookies = 1, 表示开启SYN Cookies。 当出现SYN等待队列溢出时,启用cookies来处理,可防范SYN攻击,默认为0,表示关闭。

tcp_dsack

[TCP/IP详解 卷一(中文 第二版) P482]

tcp_sack

默认开启
[TCP/IP详解 卷一(中文 第二版) P478]

somaxconn

[TCP/IP详解 卷一(中文 第二版) P455]
Each listening endpoint has a fixed-length queue of connections that have been completely accepted by TCP (i.e., the three-way handshake is complete) but not yet accepted by the application. The application specifies a limit to this queue, commonly called the backlog. This backlog must be between 0 and a system-specific maximum called net.core.somaxconn, inclusive (default 128).

netdev_max_backlog

TODO

rmem_max && wmem_max && rmem_default && wmem_default

reference

net.core.rmem_default = 262144  // 单个连接的读缓存(其实,读缓存还是动态变化的,这是一个上限)
net.core.rmem_max = 16777216  // 当调用setsockopt设置最大读缓存时,不能超过rmem_max
net.core.wmem_default = 262144  
net.core.wmem_max = 16777216  

tcp_moderate_rcvbuf && tcp_rmem && tcp_wmem && tcp_mem

reference

设置好最大缓存限制后就高枕无忧了吗?对于一个TCP连接来说,可能已经充分利用网络资源,使用大窗口、大缓存来保持高速传输了。比如在长肥网络中,缓存上限可能会被设置为几十兆字节,但系统的总内存却是有限的,当每一个连接都全速飞奔使用到最大窗口时,1万个连接就会占用内存到几百G了,这就限制了高并发场景的使用,公平性也得不到保证。我们希望的场景是,在并发连接比较少时,把缓存限制放大一些,让每一个TCP连接开足马力工作;当并发连接很多时,此时系统内存资源不足,那么就把缓存限制缩小一些,使每一个TCP连接的缓存尽量的小一些,以容纳更多的连接。

linux为了实现这种场景,引入了自动调整内存分配的功能,由tcp_moderate_rcvbuf配置决定,如下:
net.ipv4.tcp_moderate_rcvbuf = 1
默认tcp_moderate_rcvbuf配置为1,表示打开了TCP内存自动调整功能。若配置为0,这个功能将不会生效(慎用)。
当我们在编程中对连接设置了SO_SNDBUF、SO_RCVBUF,将会使linux内核不再对这样的连接执行自动调整功能!

net.ipv4.tcp_rmem = 8192 87380 16777216  
net.ipv4.tcp_wmem = 8192 65536 16777216  
net.ipv4.tcp_mem = 8388608 12582912 16777216 

tcp_rmem[3]数组表示任何一个TCP连接上的读缓存上限,其中tcp_rmem[0]表示最小上限(比如,使用调用setsockopt设置最大读缓存时,若其值小于8192,那么最大读缓存会被设置为8192),tcp_rmem[1]表示初始上限(注意,它会覆盖适用于所有协议的rmem_default配置),tcp_rmem[2]表示最大上限。
tcp_wmem[3]数组表示写缓存,与tcp_rmem[3]类似,不再赘述。

tcp_mem[3]数组就用来设定TCP内存的整体使用状况,所以它的值很大(它的单位也不是字节,而是--4K或者8K等这样的单位!)。这3个值定义了TCP整体内存的无压力值、压力模式开启阀值、最大使用值。以这3个值为标记点则内存共有4种情况:

1、只要系统TCP的总体内存超了 tcp_mem[2] ,新内存分配都会失败。
2、tcp_rmem[0]或者tcp_wmem[0]优先级也很高,只要条件1不超限,那么只要连接内存小于这两个值,就保证新内存分配一定成功。
3、只要总体内存不超过tcp_mem[0],那么新内存在不超过连接缓存的上限时也能保证分配成功。
4、tcp_mem[1]与tcp_mem[0]构成了开启、关闭内存压力模式的开关。在压力模式下,连接缓存上限可能会减少。在非压力模式下,连接缓存上限可能会增加,最多增加到tcp_rmem[2]或者tcp_wmem[2]。

TODO

tcp_adv_win_scale
tcp_allowed_congestion_control
tcp_app_win
tcp_autocorking
tcp_available_congestion_control
tcp_base_mss
tcp_challenge_ack_limit
tcp_congestion_control

tcp_early_retrans
tcp_ecn
tcp_fack
tcp_fastopen
tcp_fastopen_key

tcp_frto
tcp_invalid_ratelimit
tcp_keepalive_intvl
tcp_keepalive_probes
tcp_keepalive_time
tcp_limit_output_bytes
tcp_low_latency
tcp_max_orphans
tcp_max_ssthresh

tcp_max_tw_buckets
tcp_mem
tcp_min_tso_segs
tcp_moderate_rcvbuf
tcp_mtu_probing
tcp_no_metrics_save
tcp_notsent_lowat
tcp_orphan_retries
tcp_reordering
tcp_retrans_collapse
tcp_rfc1337
tcp_rmem

tcp_slow_start_after_idle
tcp_stdurg
tcp_thin_dupack
tcp_thin_linear_timeouts

tcp_tso_win_divisor
tcp_tw_recycle
tcp_window_scaling
tcp_wmem
tcp_workaround_signed_windows
udp_mem
udp_rmem_min
udp_wmem_min
xfrm4_gc_thresh

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值