journalctl -xeu kubelet
5月 12 03:53:00 master-07 kubelet[1888]: E0512 03:53:00.458029 1888 controller.go:187] failed to update lease, error: etcdserver: request timed out, possibly due to previous leader failure
5月 12 07:56:46 master-07 kubelet[1888]: E0512 07:56:46.985601 1888 event.go:264] Server rejected event ‘&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:“kube-apiserver-master-07.167d8f2eec8d39c4”, GenerateName:"", Namespace:“kube-system”, SelfLink:"", UID:"", ResourceVersion:“12057210”, Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:“Pod”, Namespace:“kube-system”, Name:“kube-apiserver-master-07”, UID:“16eed824206a219d3e74c221b62fa993”, APIVersion:“v1”, ResourceVersion:"", FieldPath:“spec.containers{kube-apiserver}”}, Reason:“Unhealthy”, Message:“Readiness probe failed: HTTP probe failed with statuscode: 500”, Source:v1.EventSource{Component:“kubelet”, Host:“master-07”}, FirstTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63756205672, loc:(*time.Location)(0x70e3140)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc01ee44de3fc1638, ext:512615735833428, loc:(*time.Location)(0x70e3140)}}, Count:196, Type:“Warning”, EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}’: ‘etcdserver: request timed out’ (will not retry!)
有一个节点etcd的pod重启达到400多次,apiserver重启达到500多次,查看这个节点kube-apiserver-master-07的事件:
Events:
Type Reason Age From Message
Warning Unhealthy 46m (x36 over 2d1h) kubelet Liveness probe failed: HTTP probe failed with statuscode: 500
Warning Unhealthy 37m (x219 over 2d1h) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
找到问题的根因,下面解决它: