运维问题.Docker.kernel: nf_conntrack: table full,drop?

事故前提

1.线上docker的data文件未做重定向,导致根目录撑爆,无奈重启释放内存中数据

2.线上docker容器alc服务器需要维持10万个会话连接

3.nf_conntrack表爆满,随即切换至host,关闭服务器防火期,关闭net.ipv4.ip_forward=0

问题原因

1./var/log/messages中一直打印kernel: nf_conntrack: table full, dropping packet,导致入口,出口严重丢包

Jun 21 20:33:11 vm10-254-136-6 kernel: nf_conntrack: table full, dropping packet
Jun 21 20:33:11 vm10-254-136-6 kernel: nf_conntrack: table full, dropping packet
Jun 21 20:33:11 vm10-254-136-6 kernel: nf_conntrack: table full, dropping packet
Jun 21 20:33:11 vm10-254-136-6 kernel: nf_conntrack: table full, dropping packet
Jun 21 20:33:11 vm10-254-136-6 kernel: nf_conntrack: table full, dropping packet
Jun 21 20:33:11 vm10-254-136-6 kernel: nf_conntrack: table full, dropping packet
Jun 21 20:33:11 vm10-254-136-6 kernel: nf_conntrack: table full, dropping packet
Jun 21 20:33:11 vm10-254-136-6 kernel: nf_conntrack: table full, dropping packet
Jun 21 20:33:11 vm10-254-136-6 kernel: nf_conntrack: table full, dropping packet
Jun 21 20:33:11 vm10-254-136-6 kernel: nf_conntrack: table full, dropping packet
Jun 21 20:33:16 vm10-254-136-6 kernel: nf_conntrack: table full, dropping packet
Jun 21 20:33:16 vm10-254-136-6 kernel: nf_conntrack: table full, dropping packet
Jun 21 20:33:16 vm10-254-136-6 kernel: nf_conntrack: table full, dropping packet
Jun 21 20:33:16 vm10-254-136-6 kernel: nf_conntrack: table full, dropping packet
Jun 21 20:33:16 vm10-254-136-6 kernel: nf_conntrack: table full, dropping packet
Jun 21 20:33:16 vm10-254-136-6 kernel: nf_conntrack: table full, dropping packet
Jun 21 20:33:16 vm10-254-136-6 kernel: nf_conntrack: table full, dropping packet

排查思路

1.重新检查服务器防火墙/DOCKER网络模式/net.ipv4.ip_forward=0设置,发现了问题(ip_forward)

cat /proc/sys/net/ipv4/ip_forward
1
service docker status
Redirecting to /bin/systemctl status  docker.service
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2016-06-22 09:40:58 CST; 2h 29min ago
     Docs: https://docs.docker.com
 Main PID: 748 (docker)
   Memory: 95.6M
   CGroup: /system.slice/docker.service
           ├─ 748 /usr/bin/docker daemon -H fd://
           ├─1845 docker-containerd -l /var/run/docker/libcontainerd/docker-containerd.sock --runtime docker-runc --start-timeout 2m
           ├─2042 docker-containerd-shim 6ee6ae880f800d3b9cb0caa42d75ed6c7711ceea037a322be3af7a4f0adffa45 /var/run/docker/libcontainerd/6ee...
           ├─2070 docker-containerd-shim dca4a84bdf21720ead39cf60eea3833401dd4788049ac74fe98bff4083bd0d37 /var/run/docker/libcontainerd/dca...
           ├─2110 docker-containerd-shim 928b687adb9f993e19526b245e8e416c60d06de7da5cde44dc173fb57761e01f /var/run/docker/libcontainerd/928...
           ├─2251 docker-containerd-shim 24619ec28fd7cd50893043201a10e9702cc1053e50384c50ddfd0d428516c751 /var/run/docker/libcontainerd/246...
           ├─2285 docker-containerd-shim ff1f6a34eb3b252bedf02896a4a9b258549d4adef20f611d891579366e91426e /var/run/docker/libcontainerd/ff1...
           ├─3169 docker-containerd-shim 7c0010dc518ddc74259dc72d74af8f25a04db9a1c8a4255873c1db40d8335f77 /var/run/docker/libcontainerd/7c0...
           ├─3611 docker-containerd-shim 6b844bb2f3b8f96b054e7f1b8b99f92dd8a08975b8f2022bc915a73b066c0f56 /var/run/docker/libcontainerd/6b8...
           ├─3750 docker-containerd-shim 6b844bb2f3b8f96b054e7f1b8b99f92dd8a08975b8f2022bc915a73b066c0f56 /var/run/docker/libcontainerd/6b8...
           ├─3779 docker-containerd-shim 6b844bb2f3b8f96b054e7f1b8b99f92dd8a08975b8f2022bc915a73b066c0f56 /var/run/docker/libcontainerd/6b8...
           ├─3807 docker-containerd-shim 6b844bb2f3b8f96b054e7f1b8b99f92dd8a08975b8f2022bc915a73b066c0f56 /var/run/docker/libcontainerd/6b8...
           └─3832 docker-containerd-shim 6b844bb2f3b8f96b054e7f1b8b99f92dd8a08975b8f2022bc915a73b066c0f56 /var/run/docker/libcontainerd/6b8...

Jun 22 11:18:57 vm10-254-136-6.ksc.com docker[748]: time="2016-06-22T11:18:57.803889188+08:00" level=error msg="Handler for GET /v1.2...56e1a"
Jun 22 11:18:58 vm10-254-136-6.ksc.com docker[748]: time="2016-06-22T11:18:58.252585878+08:00" level=error msg="Handler for GET /v1.2...56e1a"
Jun 22 11:19:08 vm10-254-136-6.ksc.com docker[748]: time="2016-06-22T11:19:08.797803102+08:00" level=error msg="Handler for GET /v1.2...56e1a"
Jun 22 11:19:09 vm10-254-136-6.ksc.com docker[748]: time="2016-06-22T11:19:09.174213090+08:00" level=error msg="Handler for GET /v1.2...56e1a"
Jun 22 11:19:19 vm10-254-136-6.ksc.com docker[748]: time="2016-06-22T11:19:19.675408402+08:00" level=error msg="Handler for GET /v1.2...56e1a"
Jun 22 11:19:20 vm10-254-136-6.ksc.com docker[748]: time="2016-06-22T11:19:20.202181622+08:00" level=error msg="Handler for GET /v1.2...56e1a"
Jun 22 11:19:30 vm10-254-136-6.ksc.com docker[748]: time="2016-06-22T11:19:30.450311518+08:00" level=error msg="Handler for GET /v1.2...56e1a"
Jun 22 11:19:30 vm10-254-136-6.ksc.com docker[748]: time="2016-06-22T11:19:30.910231364+08:00" level=error msg="Handler for GET /v1.2...56e1a"
Jun 22 11:19:41 vm10-254-136-6.ksc.com docker[748]: time="2016-06-22T11:19:41.147869043+08:00" level=error msg="Handler for GET /v1.2...56e1a"
Jun 22 11:19:41 vm10-254-136-6.ksc.com docker[748]: time="2016-06-22T11:19:41.486591196+08:00" level=error msg="Handler for GET /v1.2...56e1a"
Warning: docker.service changed on disk. Run 'systemctl daemon-reload' to reload units.
Hint: Some lines were ellipsized, use -l to show in full.

说明:/usr/bin/docker daemon -H fd://发现Docker使用了默认配置--ip-forward=true,导致重启服务器后,守护重启docker.service时自动重写了net.ipv4.ip_forward的值,所以会出现contrack爆满持续丢包的情况

解决办法

1.如上可以看到启动文件位于/usr/lib/systemd/system/docker.service,最简单的方式是在ExecStart启动程序后面添加--ip-forward=false选项,有些Docker平台可能有些许区别,可以编辑/etc/sysconfig/docker-network添加此选项,具体情况具体而定

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network.target docker.socket
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/docker daemon -H fd:// --ip-forward=false
MountFlags=slave
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

[Install]
WantedBy=multi-user.target

2.如果依然出现如上问题可以尝试对ip_conntrack内核参数进行设置,将最大值更改为系统open_files的最大句柄数,线上服务器有的开的100万,有的200万

net.ipv4.netfilter.ip_conntrack_max = 2000000
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 1200

失败后

net.netfilter.nf_conntrack_max = 2000000
net.netfilter.nf_conntrack_tcp_timeout_established = 1200

说明:如果sysctl -p时出现unknow key错误时,标识内核版本太高,可以如上的参数来对nf_conntrack设置

 

转载于:https://my.oschina.net/pydevops/blog/699173

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值