背景
线上某环境,创建pod时,pod在调度运行的节点一直无法running,describe该pod信息,有如下提示
系统为centos7.7,内核版本3.10.0-1062
排查
- 查看错误节点信息的message日志,发现docker的报错
kuberuntime_manager.go:710] createPodSandbox for pod "media-ai-engine-test-6d75b6f4c6-7jtps_test-media(e1849041-4521-4864-bbbe-d70c70246470)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "media-ai-engine-test-6d75b6f4c6-7jtps": Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:303: getting the final child's pid from pipe caused \"EOF\"": unknown
从上述报错来看,提示kubelet使用的runtime有问题,环境使用的为docker,从提示getting the final child’s pid from pipe caused来看,像是提示docker无法获取到创建的容器子进程信息,于是通过systemctl restart docker
重启docker尝试恢复,发现问题依旧。
- 继续排查节点message日志,发现在提示上述pod创建失败日志的上下文,都有如下日志
kernel: runc:[1:CHILD]: page allocation failure: order:6, mode:0xc0d0
Sep 2 12:59:29 VM_16_54_centos kernel: CPU: 0 PID: 8880 Comm: runc:[1:CHILD] Kdump: loaded Tainted: G ------------ T 3.10.0-1062.9.1.el7.x86_64 #1
Sep 2 12:59:29 VM_16_54_centos kernel: Hardware name: Smdbmds KVM, BIOS seabios-1.9.1-qemu-project.org 04/01/2014
Sep 2 12:59:29 VM_16_54_centos kernel: Call Trace:
Sep 2 12:59:29 VM_16_54_centos kernel: [<ffffffffb817ac23>] dump_stack+0x19/0x1b
Sep 2 12:59:29 VM_16_54_centos kernel: [<ffffffffb7bc3d70>] warn_alloc_failed+0x110/0x180
Sep 2 12:59:29 VM_16_54_centos kernel: [<ffffffffb7bc897f>] __alloc_pages_nodemask+0x9df/0xbe0
Sep 2 12:59:29 VM_16_54_centos kernel: [<ffffffffb7c16b28>] alloc_pages_current+0x98/0x110
Sep 2 12:59:29 VM_16_54_centos kernel: [<ffffffffb7be3b28>] kmalloc_order+0x18/0x40
Sep 2 12:59:29 VM_16_54_centos kernel: [<ffffffffb7c22056>] kmalloc_order_trace+0x26/0xa0
Sep 2 12:59:29 VM_16_54_centos kernel: [<ffffffffb7c26611>] __kmalloc+0x211/0x230
Sep 2 12:59:29 VM_16_54_centos kernel: [<ffffffffb7c3ed61>] memcg_alloc_cache_params+0x81/0xb0
Sep 2 12:59:29 VM_16_54_centos kernel: [<ffffffffb7be37d4>] do_kmem_cache_create+0x74/0xf0
Sep 2 12:59:29 VM_16_54_centos kernel: [<ffffffffb7be3952>] kmem_cache_create+0x102/0x1b0
Sep 2 12:59:29 VM_16_54_centos kernel: [<ffffffffc0636dc1>] nf_conntrack_init_net+0xf1/0x260 [nf_conntrack]
Sep 2 12:59:29 VM_16_54_centos kernel: [<ffffffff