svc拓扑感知实现原理

cluster ip工作原理

某个node节点上的pod对集群的某个服务发起调用,首先通过OUTPUT链,然后按照规则链顺序一级一级往下找,直到iptables识别到调用服务的endpoint后端对应的pod IP地址

  • 服务状态
[root@master test]# kubectl get pods -owide
NAME                            READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
go-server-v1-57858d9f57-7zv5m   1/1     Running   0          48m   10.244.0.69   master   <none>           <none>
go-server-v2-6cf6889f8d-99lrq   1/1     Running   0          48m   10.244.1.69   slave1   <none>           <none>
[root@master test]# kubectl get endpoints go-server 
NAME        ENDPOINTS                           AGE
go-server   10.244.0.69:8083,10.244.1.69:8083   48m

  • 获取OUTPUT
[root@slave1 ~]# iptables -t nat -nvL OUTPUT
Chain OUTPUT (policy ACCEPT 1022 packets, 58546 bytes)
 pkts bytes target     prot opt in     out     source               destination         
24907 1479K KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
 5157  309K DOCKER     all  --  *      *       0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

  • 获取KUBE-SERVICES

KUBE-MARK-MASQ:匹配发往cluster-IP的流量,跳转到了KUBE-MARK-MASQ链进一步处理,其作用为打开了一个MARK

KUBE-SVC-XXXXXX:匹配发往cluster-IP的流量,跳转到了KUBE-SVC-XXXXXX链进一步处理

[root@slave1 ~]# iptables -t nat -nvL KUBE-SERVICES
Chain KUBE-SERVICES (2 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 KUBE-MARK-MASQ  udp  --  *      *      !10.244.0.0/16        10.96.0.10           /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
    0     0 KUBE-SVC-TCOU7JCQXEZGVUNU  udp  --  *      *       0.0.0.0/0            10.96.0.10           /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
    0     0 KUBE-MARK-MASQ  tcp  --  *      *      !10.244.0.0/16        10.96.0.10           /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
    0     0 KUBE-SVC-ERIFXISQEP7F7OF4  tcp  --  *      *       0.0.0.0/0            10.96.0.10           /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
    0     0 KUBE-MARK-MASQ  tcp  --  *      *      !10.244.0.0/16        10.96.0.10           /* kube-system/kube-dns:metrics cluster IP */ tcp dpt:9153
    0     0 KUBE-SVC-JD5MR3NA4I4DYORP  tcp  --  *      *       0.0.0.0/0            10.96.0.10           /* kube-system/kube-dns:metrics cluster IP */ tcp dpt:9153
    0     0 KUBE-MARK-MASQ  tcp  --  *      *      !10.244.0.0/16        10.96.0.1            /* default/kubernetes:https cluster IP */ tcp dpt:443
    0     0 KUBE-SVC-NPX46M4PTMTKRN6Y  tcp  --  *      *       0.0.0.0/0            10.96.0.1            /* default/kubernetes:https cluster IP */ tcp dpt:443
    0     0 KUBE-MARK-MASQ  tcp  --  *      *      !10.244.0.0/16        10.96.218.181        /* default/go-server:server cluster IP */ tcp dpt:8083
    0     0 KUBE-SVC-MPJELURHHI6BMTVT  tcp  --  *      *       0.0.0.0/0            10.96.218.181        /* default/go-server:server cluster IP */ tcp dpt:8083
    2   104 KUBE-NODEPORTS  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

  • 获取KUBE-SVC-XXXXXX

该规则链获取到的为endpoint后端对应的负载均衡池,如下示例表示每条规则的应用概率为0.5,相当于对该服务的请求将会随机路由至这2个负载上

[root@slave1 ~]# iptables -t nat -nvL KUBE-SVC-MPJELURHHI6BMTVT
Chain KUBE-SVC-MPJELURHHI6BMTVT (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 KUBE-SEP-2SMY4NG7UFZWXMZI  all  --  *      *       0.0.0.0/0            0.0.0.0/0            statistic mode random probability 0.50000000000
    0     0 KUBE-SEP-KGC5DRJ4OP3VKLM3  all  --  *      *       0.0.0.0/0            0.0.0.0/0      
  • 获取具体负载ip地址
[root@slave1 ~]# iptables -t nat -nvL KUBE-SEP-2SMY4NG7UFZWXMZI
Chain KUBE-SEP-2SMY4NG7UFZWXMZI (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 KUBE-MARK-MASQ  all  --  *      *       10.244.0.69          0.0.0.0/0           
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp to:10.244.0.69:8083

[root@slave1 ~]# iptables -t nat -nvL KUBE-SEP-KGC5DRJ4OP3VKLM3
Chain KUBE-SEP-KGC5DRJ4OP3VKLM3 (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 KUBE-MARK-MASQ  all  --  *      *       10.244.1.69          0.0.0.0/0           
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp to:10.244.1.69:8083

svc拓扑感知

开启svc拓扑感知之后生成的iptables链与之前一致,当在service中添加感知拓扑字段(topologyKeys)后,将会改变生成的iptables链。

  • 服务状态
[root@master test]# kubectl get pods -owide
NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE     NOMINATED NODE   READINESS GATES
go-server-v1-57858d9f57-4blgs   1/1     Running   0          4m27s   10.244.0.72   master   <none>           <none>
go-server-v2-6cf6889f8d-mm7dd   1/1     Running   0          4m27s   10.244.1.71   slave1   <none>           <none>
[root@master test]# kubectl get endpointslices.discovery.k8s.io 
NAME              ADDRESSTYPE   PORTS   ENDPOINTS                 AGE
go-server-gtmr7   IPv4          8083    10.244.0.72,10.244.1.71   6m40s
kubernetes        IPv4          6443    192.168.214.101           7d7h

  • 获取KUBE-SERVICES
[root@slave1 ~]# iptables -t nat -nvL KUBE-SERVICES
Chain KUBE-SERVICES (2 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 KUBE-MARK-MASQ  tcp  --  *      *      !10.244.0.0/16        10.96.0.10           /* kube-system/kube-dns:metrics cluster IP */ tcp dpt:9153
    0     0 KUBE-SVC-JD5MR3NA4I4DYORP  tcp  --  *      *       0.0.0.0/0            10.96.0.10           /* kube-system/kube-dns:metrics cluster IP */ tcp dpt:9153
    0     0 KUBE-MARK-MASQ  udp  --  *      *      !10.244.0.0/16        10.96.0.10           /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
    0     0 KUBE-SVC-TCOU7JCQXEZGVUNU  udp  --  *      *       0.0.0.0/0            10.96.0.10           /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
    0     0 KUBE-MARK-MASQ  tcp  --  *      *      !10.244.0.0/16        10.96.0.10           /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
    0     0 KUBE-SVC-ERIFXISQEP7F7OF4  tcp  --  *      *       0.0.0.0/0            10.96.0.10           /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
    0     0 KUBE-MARK-MASQ  tcp  --  *      *      !10.244.0.0/16        10.96.218.181        /* default/go-server:server cluster IP */ tcp dpt:8083
    0     0 KUBE-SVC-MPJELURHHI6BMTVT  tcp  --  *      *       0.0.0.0/0            10.96.218.181        /* default/go-server:server cluster IP */ tcp dpt:8083
    0     0 KUBE-MARK-MASQ  tcp  --  *      *      !10.244.0.0/16        10.96.0.1            /* default/kubernetes:https cluster IP */ tcp dpt:443
    0     0 KUBE-SVC-NPX46M4PTMTKRN6Y  tcp  --  *      *       0.0.0.0/0            10.96.0.1            /* default/kubernetes:https cluster IP */ tcp dpt:443
    0     0 KUBE-NODEPORTS  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

  • 获取KUBE-SVC-XXXXXX

与之前的规则链相比,现在生成的规则链中,当前节点优先包含在当前节点上运行的具体负载pod,若当前节点没有具体负载pod,就会生成与当前节点同一地域的具体负载,按照service的topologyKeys顺序进行生成

[root@slave1 ~]# iptables -t nat -nvL KUBE-SVC-MPJELURHHI6BMTVT
Chain KUBE-SVC-MPJELURHHI6BMTVT (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 KUBE-SEP-SP4MKVUDO742U2J2  all  --  *      *       0.0.0.0/0            0.0.0.0/0  
  • 获取具体负载ip地址
[root@slave1 ~]# iptables -t nat -nvL KUBE-SEP-SP4MKVUDO742U2J2
Chain KUBE-SEP-SP4MKVUDO742U2J2 (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 KUBE-MARK-MASQ  all  --  *      *       10.244.1.71          0.0.0.0/0           
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp to:10.244.1.71:8083

对比结果

对比上述2种结果,发现svc拓扑感知是通过kube-proxy生成的规则链进行控制的,所以可以通过如下思路查看原理实现:

  1. kube-proxy如何获取svc的topologyKeys匹配顺序?
  2. 如何按照topologyKeys匹配顺序获取endpointslices中的pod信息(状态、IP地址、拓扑域)?
  3. 若匹配到后如何在指定区域生成该服务后端pod的规则链?

kube-proxy实现原理

  • 获取并创建services、endpointslices

Run()方法。通过client-goinformerapi-server获取service配置,并调用操作方法。包含创建、更新、删除

cmd/kube-proxy/app/server.go:727

	// Make informers that filter out objects that want a non-default service proxy.
	informerFactory := informers.NewSharedInformerFactoryWithOptions(s.Client, s.ConfigSyncPeriod,
		informers.WithTweakListOptions(func(options *metav1.ListOptions) {
			options.LabelSelector = labelSelector.String()
		}))

	// Create configs (i.e. Watches for Services and Endpoints or EndpointSlices)
	// Note: RegisterHandler() calls need to happen before creation of Sources because sources
	// only notify on changes, and the initial update (on process start) may be lost if no handlers
	// are registered yet.
	serviceConfig := config.NewServiceConfig(informerFactory.Core().V1().Services(), s.ConfigSyncPeriod)
	serviceConfig.RegisterEventHandler(s.Proxier)
	go serviceConfig.Run(wait.NeverStop)

	if s.UseEndpointSlices {
		endpointSliceConfig := config.NewEndpointSliceConfig(informerFactory.Discovery().V1beta1().EndpointSlices(), s.ConfigSyncPeriod)
		endpointSliceConfig.RegisterEventHandler(s.Proxier)
		go endpointSliceConfig.Run(wait.NeverStop)
	} else {
		endpointsConfig := config.NewEndpointsConfig(informerFactory.Core().V1().Endpoints(), s.ConfigSyncPeriod)
		endpointsConfig.RegisterEventHandler(s.Proxier)
		go endpointsConfig.Run(wait.NeverStop)
	}
  • 执行serviceConfig.Run方法
func (c *ServiceConfig) Run(stopCh <-chan struct{}) {
	klog.Info("Starting service config controller")

	if !cache.WaitForNamedCacheSync("service config", stopCh, c.listerSynced) {
		return
	}

	for i := range c.eventHandlers {
		klog.V(3).Info("Calling handler.OnServiceSynced()")
		c.eventHandlers[i].OnServiceSynced()
	}
}

syncProxyRules()方法为kube-proxy生成具体规则的实现

func (proxier *Proxier) OnServiceSynced() {
	proxier.mu.Lock()
	proxier.servicesSynced = true
	if utilfeature.DefaultFeatureGate.Enabled(features.EndpointSliceProxying) {
		proxier.setInitialized(proxier.endpointSlicesSynced)
	} else {
		proxier.setInitialized(proxier.endpointsSynced)
	}
	proxier.mu.Unlock()

	// Sync unconditionally - this is called once per lifetime.
	proxier.syncProxyRules()
}
  • 判断是否启用了svc拓扑感知

如果启用了,那就调用FilterTopologyEndpoint获取endpointslices中的信息,同时判断获取到的pod信息状态是否为Ready

pkg/proxy/iptables/proxier.go:1028

        allEndpoints := proxier.endpointsMap[svcName]

		// Service Topology will not be enabled in the following cases:
		// 1. externalTrafficPolicy=Local (mutually exclusive with service topology).
		// 2. ServiceTopology is not enabled.
		// 3. EndpointSlice is not enabled (service topology depends on endpoint slice
		// to get topology information).
		if !svcInfo.OnlyNodeLocalEndpoints() && utilfeature.DefaultFeatureGate.Enabled(features.ServiceTopology) && utilfeature.DefaultFeatureGate.Enabled(features.EndpointSliceProxying) {
			allEndpoints = proxy.FilterTopologyEndpoint(proxier.nodeLabels, svcInfo.TopologyKeys(), allEndpoints)
		}

		readyEndpoints := make([]proxy.Endpoint, 0, len(allEndpoints))
		for _, endpoint := range allEndpoints {
			if !endpoint.IsReady() {
				continue
			}

			readyEndpoints = append(readyEndpoints, endpoint)
		}
  • FilterTopologyEndpoint

方法主要内容是判断service中的topologyKeys拓扑感知键,并根据拓扑感知键的顺序与node节点的labels进行匹配

pkg/proxy/topology.go

func FilterTopologyEndpoint(nodeLabels map[string]string, topologyKeys []string, endpoints []Endpoint) []Endpoint {
	// Do not filter endpoints if service has no topology keys.
	if len(topologyKeys) == 0 {
		return endpoints
	}

	filteredEndpoint := []Endpoint{}

	if len(nodeLabels) == 0 {
		if topologyKeys[len(topologyKeys)-1] == v1.TopologyKeyAny {
			// edge case: include all endpoints if topology key "Any" specified
			// when we cannot determine current node's topology.
			return endpoints
		}
		// edge case: do not include any endpoints if topology key "Any" is
		// not specified when we cannot determine current node's topology.
		return filteredEndpoint
	}

	for _, key := range topologyKeys {
		if key == v1.TopologyKeyAny {      //TopologyKeyAny string = "*"。如果为*,则表明不做感知限制,返回所有endpoints
			return endpoints
		}
		topologyValue, found := nodeLabels[key]
		if !found {
			continue
		}

		for _, ep := range endpoints {
			topology := ep.GetTopology()
			if value, found := topology[key]; found && value == topologyValue {
				filteredEndpoint = append(filteredEndpoint, ep)
			}
		}
		if len(filteredEndpoint) > 0 {
			return filteredEndpoint
		}
	}
	return filteredEndpoint
}
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值