环境:
# docker info
Containers: 56
Running: 33
Paused: 0
Stopped: 23
Images: 31
Server Version: 1.12.5
Storage Driver: devicemapper
Pool Name: data-docker_thinpool
Pool Blocksize: 524.3 kB
Base Device Size: 32.21 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 516.2 GB
Data Space Total: 644.2 GB
Data Space Available: 128.1 GB
Metadata Space Used: 45.47 MB
Metadata Space Total: 16.98 GB
Metadata Space Available: 16.93 GB
Thin Pool Minimum Free Space: 64.42 GB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: journald
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: host bridge null overlay
Swarm: inactive
Runtimes: runc docker-runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-514.6.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 2
CPUs: 40
Total Memory: 94.14 GiB
Name: VM-2-10-12
ID: NYBE:NZML:4KQQ:PF2J:RXCB:IPPI:Y3BI:CY7E:RVAC:WVWV:VDM2:3EEK
Docker Root Dir: /data1/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
docker-registry.i.bbtfax.com:5000
127.0.0.0/8
Registries: docker.io (secure)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# docker info
Containers:56
Running:33
Paused:0
Stopped:23
Images:31
ServerVersion:1.12.5
StorageDriver:devicemapper
PoolName:data-docker_thinpool
PoolBlocksize:524.3kB
BaseDeviceSize:32.21GB
BackingFilesystem:xfs
Datafile:
Metadatafile:
DataSpaceUsed:516.2GB
DataSpaceTotal:644.2GB
DataSpaceAvailable:128.1GB
MetadataSpaceUsed:45.47MB
MetadataSpaceTotal:16.98GB
MetadataSpaceAvailable:16.93GB
ThinPoolMinimumFreeSpace:64.42GB
UdevSyncSupported:true
DeferredRemovalEnabled:true
DeferredDeletionEnabled:false
DeferredDeletedDeviceCount:0
LibraryVersion:1.02.135-RHEL7(2016-11-16)
LoggingDriver:journald
CgroupDriver:cgroupfs
Plugins:
Volume:local
Network:hostbridgenulloverlay
Swarm:inactive
Runtimes:runcdocker-runc
DefaultRuntime:runc
SecurityOptions:seccomp
KernelVersion:3.10.0-514.6.1.el7.x86_64
OperatingSystem:CentOSLinux7(Core)
OSType:linux
Architecture:x86_64
NumberofDockerHooks:2
CPUs:40
TotalMemory:94.14GiB
Name:VM-2-10-12
ID:NYBE:NZML:4KQQ:PF2J:RXCB:IPPI:Y3BI:CY7E:RVAC:WVWV:VDM2:3EEK
DockerRootDir:/data1/docker
DebugMode(client):false
DebugMode(server):false
Registry:https://index.docker.io/v1/
InsecureRegistries:
docker-registry.i.bbtfax.com:5000
127.0.0.0/8
Registries:docker.io(secure)
现象:
# docker exec -it c6176f37c4b6 bash
rpc error: code = 13 desc = invalid header field value "oci runtime error: exec failed: container_linux.go:247: starting container process caused \"process_linux.go:75: starting setns process caused \\\"fork/exec /proc/self/exe: no such file or directory\\\"\"\n"
1
2
# docker exec -it c6176f37c4b6 bash
rpcerror:code=13desc=invalidheaderfieldvalue"oci runtime error: exec failed: container_linux.go:247: starting container process caused \"process_linux.go:75: starting setns process caused \\\"fork/exec/proc/self/exe:nosuchfileordirectory\\\"\"\n"
乍一看, /proc/self/exe 文件找不见?
一般来讲,文件找不见并不奇怪,怪就怪在是 /proc/self/exe 找不见就不太应该了;
因为docker exec 最终是由libcontainerd进程来出来的,strace跟进发现,是chdir到 /root/data1/docker/devicemapper/mnt/4723e8178992b32b7284aa48c1c62f4011a6b785aca0c54e18d7ce5cc23b22dc/rootfs 时,找不到目标目录导致的,于是我就迅速地看了一下,该目录确实不存在,但是对于正常的能够exec的容器来讲,相应的rootfs目录也是不存在的
思考中。。。
docker玩的就是名字空间和cgroup,所以不能不想到这些;libcontainerd也有自己的(mnt)名字空间,我们进入libcontainerd进程的文件系统就可以查看到上面目录的存在了,而且,正常的容器存在相应的目录,异常的容器不存在相应的目录;
通过mount命令可以发现mount的规律,从容器的config.json (/var/run/docker/libcontainerd/c6176f37c4b67b03d4187edef6d1131cd44ab80bd0f0c20b24a7a20056967652/config.json) 中查看到对应的mount的位置,通过nsenter进入libcontainerd的mnt名字空间手动mount上去就好了,如下:
# nsenter -m -t 3639 bash
# mount /dev/mapper/docker-253\:3-3221225568-4723e8178992b32b7284aa48c1c62f4011a6b785aca0c54e18d7ce5cc23b22dc -o rw,relatime,nouuid,attr2,inode64,sunit=512,swidth=1024,noquota -t xfs /data1/docker/devicemapper/mnt/4723e8178992b32b7284aa48c1c62f4011a6b785aca0c54e18d7ce5cc23b22dc
1
2
# nsenter -m -t 3639 bash
# mount /dev/mapper/docker-253\:3-3221225568-4723e8178992b32b7284aa48c1c62f4011a6b785aca0c54e18d7ce5cc23b22dc -o rw,relatime,nouuid,attr2,inode64,sunit=512,swidth=1024,noquota -t xfs /data1/docker/devicemapper/mnt/4723e8178992b32b7284aa48c1c62f4011a6b785aca0c54e18d7ce5cc23b22dc
写个脚本自动修复之:
Shell
#!/bin/bash
# author: phpor
#
LIBCONTAINERD_DIR=/var/run/docker/libcontainerd
function main() {
local pidOfCotainerd=$(pidof docker-containerd-current)
local mountinfo=$(< /proc/$pidOfCotainerd/mountinfo)
for config in $LIBCONTAINERD_DIR/*/config.json;do
local cid=$(awk -F'/' '{print $6}' <<
local rootpath=$(jq -r .root.path $config|sed 's/\/rootfs$//')
grep "$rootpath" <</dev/null
if [[ $? -eq 0 ]]; then
echo $cid $rootpath OK
else
echo $cid $rootpath Should repair
local device=/dev/mapper/$(docker inspect $cid|jq -r .[0].GraphDriver.Data.DeviceName)
nsenter -m -p -t $pidOfCotainerd mount -t xfs -o rw,nouuid,attr2,inode64,sunit=512,swidth=1024,noquota $device $rootpath
fi
done
}
main
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/bin/bash
# author: phpor
#
LIBCONTAINERD_DIR=/var/run/docker/libcontainerd
functionmain(){
localpidOfCotainerd=$(pidofdocker-containerd-current)
localmountinfo=$(
forconfigin$LIBCONTAINERD_DIR/*/config.json;do
localcid=$(awk-F'/''{print $6}'<<
localrootpath=$(jq-r.root.path$config|sed's/\/rootfs$//')
grep"$rootpath"<</dev/null
if[[$?-eq0]];then
echo$cid$rootpathOK
else
echo$cid$rootpathShouldrepair
localdevice=/dev/mapper/$(dockerinspect$cid|jq-r.[0].GraphDriver.Data.DeviceName)
nsenter-m-p-t$pidOfCotainerdmount-txfs-orw,nouuid,attr2,inode64,sunit=512,swidth=1024,noquota$device$rootpath
fi
done
}
main
那么该mount点是如何丢掉的呢?重启dockerd能否自动修复该问题呢?(应该重启一下容器就行)稍后再研究