问题现象:
问题环境:
# docker info
Containers: 5
Running: 3
Paused: 0
Stopped: 2
Images: 6
Server Version: 1.12.1
Storage Driver: devicemapper
Pool Name: data-docker_thinpool
Pool Blocksize: 524.3 kB
Base Device Size: 21.47 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 5.911 GB
Data Space Total: 42.95 GB
Data Space Available: 37.04 GB
Metadata Space Used: 1.122 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.146 GB
Thin Pool Minimum Free Space: 4.295 GB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Library Version: 1.02.107-RHEL7 (2016-06-09)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge null overlay host
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.389 GiB
Name: iZbp10zkx5pckci8f8gzalZ
ID: NRZQ:4DNU:U4LN:M4K2:TOKQ:Q7HP:ZI7A:Q6UT:RGLR:OS5G:5VDS:AYOH
Docker Root Dir: /data3/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
docker-registry.i.beebank.com:5000
127.0.0.0/8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# docker info
Containers:5
Running:3
Paused:0
Stopped:2
Images:6
ServerVersion:1.12.1
StorageDriver:devicemapper
PoolName:data-docker_thinpool
PoolBlocksize:524.3kB
BaseDeviceSize:21.47GB
BackingFilesystem:xfs
Datafile:
Metadatafile:
DataSpaceUsed:5.911GB
DataSpaceTotal:42.95GB
DataSpaceAvailable:37.04GB
MetadataSpaceUsed:1.122MB
MetadataSpaceTotal:2.147GB
MetadataSpaceAvailable:2.146GB
ThinPoolMinimumFreeSpace:4.295GB
UdevSyncSupported:true
DeferredRemovalEnabled:true
DeferredDeletionEnabled:false
DeferredDeletedDeviceCount:0
LibraryVersion:1.02.107-RHEL7(2016-06-09)
LoggingDriver:json-file
CgroupDriver:cgroupfs
Plugins:
Volume:local
Network:bridgenulloverlayhost
Swarm:inactive
Runtimes:runc
DefaultRuntime:runc
SecurityOptions:seccomp
KernelVersion:3.10.0-327.el7.x86_64
OperatingSystem:CentOSLinux7(Core)
OSType:linux
Architecture:x86_64
CPUs:4
TotalMemory:7.389GiB
Name:iZbp10zkx5pckci8f8gzalZ
ID:NRZQ:4DNU:U4LN:M4K2:TOKQ:Q7HP:ZI7A:Q6UT:RGLR:OS5G:5VDS:AYOH
DockerRootDir:/data3/docker
DebugMode(client):false
DebugMode(server):false
Registry:https://index.docker.io/v1/
WARNING:bridge-nf-call-iptablesisdisabled
WARNING:bridge-nf-call-ip6tablesisdisabled
InsecureRegistries:
docker-registry.i.beebank.com:5000
127.0.0.0/8
dockerd 打开的文件数太多了, ~ 4w
类似如下:
# lsof -p 5483|tail
dockerd 5483 root *142r FIFO 0,18 0t0 65967126 /run/docker/libcontainerd/07b2c171b26c4039f97a8c9f5d322b210db674202070ef0de0bcd320734dfb68/78ca64efd92379182ea6e8cd8c4feb5e55d60881ec34f4527b446345f6755967-stderr
dockerd 5483 root *143r FIFO 0,18 0t0 66012750 /run/docker/libcontainerd/07b2c171b26c4039f97a8c9f5d322b210db674202070ef0de0bcd320734dfb68/1037b07b2024c85b77f6f3d94c9a3c60b99a546dc3ff2f2ee8ec2182bf74ff03-stderr
dockerd 5483 root *145u FIFO 0,18 0t0 66100248 /run/docker/libcontainerd/76db8384ab95c2797f6f1b5e6bdc1b6633f117648508c14a676bed53ea437731/9f440b526129813ab7bad3a880d4aa0296e8e56d6ccc076dca4064919d141520-stdin (deleted)
...
1
2
3
4
5
# lsof -p 5483|tail
dockerd5483root *142rFIFO0,180t065967126/run/docker/libcontainerd/07b2c171b26c4039f97a8c9f5d322b210db674202070ef0de0bcd320734dfb68/78ca64efd92379182ea6e8cd8c4feb5e55d60881ec34f4527b446345f6755967-stderr
dockerd5483root *143rFIFO0,180t066012750/run/docker/libcontainerd/07b2c171b26c4039f97a8c9f5d322b210db674202070ef0de0bcd320734dfb68/1037b07b2024c85b77f6f3d94c9a3c60b99a546dc3ff2f2ee8ec2182bf74ff03-stderr
dockerd5483root *145uFIFO0,180t066100248/run/docker/libcontainerd/76db8384ab95c2797f6f1b5e6bdc1b6633f117648508c14a676bed53ea437731/9f440b526129813ab7bad3a880d4aa0296e8e56d6ccc076dca4064919d141520-stdin(deleted)
...
和哪个容器相关还是和特定容器无关?
主要和docker-registry的那个容器相关(应该其他容器的个数少,不一定完全正常)
杀掉docker-registry这个容器,dockerd内存使用并未减少
[root@CX-DOCKER-BASE-1-10.139.105.202 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
76db8384ab95 00fc3bfe3cd7 "/sbin/init" 3 days ago Up 3 days 10.139.105.202:80->80/tcp, 10.139.105.202:3000->3000/tcp, 10.139.105.202:10051->10051/tcp zabbix-server
bf620dcc79e7 ed08e0e387b6 "/bin/bash" 10 days ago Up 10 days 0.0.0.0:2003-2004->2003-2004/tcp, 0.0.0.0:2023-2024->2023-2024/tcp, 0.0.0.0:8081->80/tcp graphite
[root@CX-DOCKER-BASE-1-10.139.105.202 ~]# lsof -p 5483|wc -l
39204
1
2
3
4
5
6
[root@CX-DOCKER-BASE-1-10.139.105.202~]# docker ps
CONTAINERIDIMAGECOMMANDCREATEDSTATUSPORTSNAMES
76db8384ab9500fc3bfe3cd7"/sbin/init"3daysagoUp3days10.139.105.202:80->80/tcp,10.139.105.202:3000->3000/tcp,10.139.105.202:10051->10051/tcpzabbix-server
bf620dcc79e7ed08e0e387b6"/bin/bash"10daysagoUp10days0.0.0.0:2003-2004->2003-2004/tcp,0.0.0.0:2023-2024->2023-2024/tcp,0.0.0.0:8081->80/tcpgraphite
[root@CX-DOCKER-BASE-1-10.139.105.202~]# lsof -p 5483|wc -l
39204
但是相应的文件是不存在的了:
dockerd 5483 root *317r FIFO 0,18 0t0 7272290 /run/docker/libcontainerd/07b2c171b26c4039f97a8c9f5d322b210db674202070ef0de0bcd320734dfb68/121c5aedbd12fa0a582df748b90db1699306b0fc263c104cab48182146037d74-stderr (deleted)
dockerd 5483 root *318r FIFO 0,18 0t0 7282242 /run/docker/libcontainerd/07b2c171b26c4039f97a8c9f5d322b210db674202070ef0de0bcd320734dfb68/4386ebbb7736f33ebbc74c7e6aef7d25726a3b81923cf46d86e3988224a7d84a-stderr (deleted)
dockerd 5483 root *319r FIFO 0,18 0t0 7275129 /run/docker/libcontainerd/07b2c171b26c4039f97a8c9f5d322b210db674202070ef0de0bcd320734dfb68/d6597c848c3cea0720ab58fa0091c365eb7a98b488547732a68ad08d6c0fe946-stderr (deleted)
1
2
3
dockerd5483root *317rFIFO0,180t07272290/run/docker/libcontainerd/07b2c171b26c4039f97a8c9f5d322b210db674202070ef0de0bcd320734dfb68/121c5aedbd12fa0a582df748b90db1699306b0fc263c104cab48182146037d74-stderr(deleted)
dockerd5483root *318rFIFO0,180t07282242/run/docker/libcontainerd/07b2c171b26c4039f97a8c9f5d322b210db674202070ef0de0bcd320734dfb68/4386ebbb7736f33ebbc74c7e6aef7d25726a3b81923cf46d86e3988224a7d84a-stderr(deleted)
dockerd5483root *319rFIFO0,180t07275129/run/docker/libcontainerd/07b2c171b26c4039f97a8c9f5d322b210db674202070ef0de0bcd320734dfb68/d6597c848c3cea0720ab58fa0091c365eb7a98b488547732a68ad08d6c0fe946-stderr(deleted)
然后,只好重启dockerd试试了:
重启dockerd,问题解决(还好我们的dockerd重启不会影响容器的正常工作)
问题的非最终原因(但是也进了一步):
我们对所有的容器都是有监控的,基本手段是使用docker exec 在容器内执行命令,docker-registry这个容器比较特殊,没有我们要执行的命令,除了我们无法获取该容器信息外,没执行一次不存在的命令,dockerd就会在目录/run/docker/libcontainerd/ 下创建一个FIFO文件,并且始终打开着,不关闭(如果要执行的命令存在,则不会存在不关闭的情况)
详细测试结果如下:
对于一个返回值非零的命令,如果使用exec -i 选项,则不会残留打开的文件,如果不使用 exec -i 选项,会残留打开的文件(但是继续跟踪发现,该残留的时间不会太长)
对于一个不存在的命令,不管是否使用 -i 选项,总是会残留打开的文件;
解决办法:
为确保要执行的命令存在,可以使用如下方式:
该方式不但不会产生多余的未关闭文件描述符,而且可以执行一个命令序列,而直接docker exec -it cmd 只能执行一条命令