我的pod怎么起不来?
现象
我的coredns无法调度,调度日志因日志级别默认的,详细调度日志无法追查。
podevent报错为node没有free ports(PodFitsPorts)。相关截图…丢了
尝试排查
我的coredns是daemonset部署的,根据label标签选择调度主机上的
1. 查看kube-scheduler日志
因日志级别为默认的,没有相关pod的调度日志。
2. 查看kube-controller-manager日志
因没有调度上,所以controller-manager没有相关日志。
3. 查看相应标签主机kubelet日志
空空如也。
4. 根据pod-event排查主机上端口占用情况
因coredns使用的hostnetwork部署的占用端口有
TCP的53/9153(metrics)/8080(http-check)端口
UDP的53端口
ss -untlp|grep -E '53|9153|8080'
结果端口无占用,懵逼中
代码分析
因通过上述分析可以定位到问题出现在scheduler调度的时候,因调度分为预选(RegisterFitPredicate)和优选(RegisterPriorityFunction)两个阶段。
预选主要根据内存CPU端口等来进行节点筛选,所以我们先从预选代码入手。
话不多说,直接看kube-scheduler预选代码
pkg/scheduler/algorithm/predicates/predicates.go
func PodFitsHostPorts(pod *v1.Pod, meta algorithm.PredicateMetadata, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) {
var wantPorts []*v1.ContainerPort
if predicateMeta, ok := meta.(*predicateMetadata); ok {
wantPorts = predicateMeta.podPorts
} else {
// We couldn't parse metadata - fallback to computing it.
wantPorts = schedutil.GetContainerPorts(pod)
}
if len(wantPorts) == 0 {
return true, nil, nil
}
existingPorts := nodeInfo.UsedPorts()
// try to see whether existingPorts and wantPorts will conflict or not
if portsConflict(existingPorts, wantPorts) {
return false, []algorithm.PredicateFailureReason{ErrPodNotFitsHostPorts}, nil
}
return true, nil, nil
}
从上述代码块可以看出来,这块代码进行了PodFitsHostPorts相关端口冲突的判断,传参NodeInfo为外部的传参,我们先来看下怎么判断端口是否冲突。
pkg/scheduler/cache/host_ports.go
func (h HostPortInfo) CheckConflict(ip, protocol string, port int32) bool {
if port <= 0 {
return false
}
h.sanitize(&ip, &protocol)
pp := NewProtocolPort(protocol, port)
// If ip is 0.0.0.0 check all IP's (protocol, port) pair
if ip == DefaultBindAllHostIP {
for _, m := range h {
if _, ok := m[*pp]; ok {
return true
}
}
return false
}
// If ip isn't 0.0.0.0, only check IP and 0.0.0.0's (protocol, port) pair
for _, key := range []string{DefaultBindAllHostIP, ip} {
if m, ok := h[key]; ok {
if _, ok2 := m[*pp]; ok2 {
return true
}
}
}
return false
}
可以看到判断相应的pod待启动的结构体里pod端口是否与nodinfo里某些字段是否冲突?NodeInfo是怎么来的,咱们往上看。
pkg/scheduler/algorithm/predicates/predicates.go
func EssentialPredicates(pod *v1.Pod, meta algorithm.PredicateMetadata, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) {
var predicateFails []algorithm.PredicateFailureReason
fit, reasons, err := PodFitsHost(pod, meta, nodeInfo)
if err != nil {
return false, predicateFails, err
}
if !fit {
predicateFails = append(predicateFails, reasons...)
}
// TODO: PodFitsHostPorts is essential for now, but kubelet should ideally
// preempt pods to free up host ports too
fit, reasons, err = PodFitsHostPorts(pod, meta, nodeInfo)
if err != nil {
return false, predicateFails, err
}
if !fit {
predicateFails = append(predicateFails, reasons...)
}
fit, reasons, err = PodMatchNodeSelector(pod, meta, nodeInfo)
if err != nil {
return false, predicateFails, err
}
if !fit {
predicateFails = append(predicateFails, reasons...)
}
return len(predicateFails) == 0, predicateFails, nil
}
这里可以看到PodFitsHostPorts还是nodeInfo传参进来的,这时候我们继续看EssentialPredicates方法的调用关系
继续往上看EssentialPredicates的调用关系,然后终于找到了
pkg/controller/daemon/daemon_controller.go
func (dsc *DaemonSetsController) simulate(newPod *v1.Pod, node *v1.Node, ds *apps.DaemonSet) ([]algorithm.PredicateFailureReason, *schedulercache.NodeInfo, error) {
objects, err := dsc.podNodeIndex.ByIndex("nodeName", node.Name)
if err != nil {
return nil, nil, err
}
nodeInfo := schedulercache.NewNodeInfo()
nodeInfo.SetNode(node)
for _, obj := range objects {
// Ignore pods that belong to the daemonset when taking into account whether a daemonset should bind to a node.
// TODO: replace this with metav1.IsControlledBy() in 1.12
pod, ok := obj.(*v1.Pod)
if !ok {
continue
}
if isControlledByDaemonSet(pod, ds.GetUID()) {
continue
}
nodeInfo.AddPod(pod)
}
_, reasons, err := Predicates(newPod, nodeInfo)
return reasons, nodeInfo, err
}
nodeInfo := schedulercache.NewNodeInfo()
这里可以看到nodeInfo是由schedulercache.NewNodeInfo()生成的
nodeInfo.AddPod(pod)
func (n *NodeInfo) AddPod(pod *v1.Pod) {
res, non0CPU, non0Mem := calculateResource(pod)
n.requestedResource.MilliCPU += res.MilliCPU
n.requestedResource.Memory += res.Memory
n.requestedResource.EphemeralStorage += res.EphemeralStorage
if n.requestedResource.ScalarResources == nil && len(res.ScalarResources) > 0 {
n.requestedResource.ScalarResources = map[v1.ResourceName]int64{}
}
for rName, rQuant := range res.ScalarResources {
n.requestedResource.ScalarResources[rName] += rQuant
}
n.nonzeroRequest.MilliCPU += non0CPU
n.nonzeroRequest.Memory += non0Mem
n.pods = append(n.pods, pod)
if hasPodAffinityConstraints(pod) {
n.podsWithAffinity = append(n.podsWithAffinity, pod)
}
// Consume ports when pods added.
n.UpdateUsedPorts(pod, true)
n.generation = nextGeneration()
}
可见更新端口方法为
n.UpdateUsedPorts(pod, true)
func (n *NodeInfo) UpdateUsedPorts(pod *v1.Pod, add bool) {
for j := range pod.Spec.Containers {
container := &pod.Spec.Containers[j]
for k := range container.Ports {
podPort := &container.Ports[k]
if add {
n.usedPorts.Add(podPort.HostIP, string(podPort.Protocol), podPort.HostPort)
} else {
n.usedPorts.Remove(podPort.HostIP, string(podPort.Protocol), podPort.HostPort)
}
}
}
}
更新端口的是获取pod资源对象里hostport里的。
hostprot是什么呢?
当资源对象使用hostport端口暴露的时候,会在主机上开放对应的端口,用户可以使用服务对应的主机的管理网IP地址以及这个端口进行服务访问。
那么问题来了,对应主机上为什么没有端口占用呢?
资源对象里虽然使用hostport暴露,但是容器内没有使用这个端口,所以主机上不会真正暴露出来,所以肉眼无法感知到端口占用,但是通过上述代码逻辑分析可以得出,scheduler逻辑判断占用,所以回到了问题最开始的问题。贴图
淦,全剧终,服务搞掉,coredns正常