EndpointSlice 是由社区提交的0752-endpointslices 引入的。
EndpointSlice是Endpoint对象的集合。kubernetes会给任何带选择器的Service对象创建EndpointSlice. EndpintSlice对象包含Service 选择器匹配到的所有Pod 的网络地址。EndpointSlice通过<协议,端口,Service名字>
对Endpoint进行分组。
原来的Endpoint类似:
Subsets:
Addresses: 192.168.238.108,192.168.238.109,192.168.238.110
Endpoint 包含Service所有的Pod IP, 因此当一个Pod发生重启时,Pod IP发生改变,需要对整个Endpoint对象重新计算并存储。
当Pod数量较少的情况下,这个不是大问题。但是当数量很多时,需要的大量的网络IO.
当一个Pod发生重启时,EndpointSlice引入的话,只需要更新发生改变的数组元素就可以了。核心点在于在etcd里存储过程中不把整块数据一块存放(像原来Endpoint那样),而是分成几块分开存储。
下面一个场景来直观感受下EndpointSlice引入带来的好处:
场景1: 20,000 endpoints, 5,000 nodes
Service Creation/Deletion
Endpoints | 100 Endpoints per EndpointSlice | 1 Endpoint per EndpointSlice | |
---|---|---|---|
# of writes | O(1) | O(P/B) | O§ |
1 | 200 | 20000 | |
Size of API object | O§ | O(B) | O(1) |
20k * const = ~2.0 MB | 100 * const = ~10 KB | < ~1KB | |
# of watchers per object | O(N) | O(N) | O(N) |
5000 | 5000 | 5000 | |
# of total watch event | O(N) | O(NP/B) | O(NP) |
5000 | 5000 * 200 = 1,000,000 | 5000 * 20000 = 100,000,000 | |
Total Bytes Transmitted | O(PN) | O(PN) | O(PN) |
2.0MB * 5000 = 10GB | 10KB * 5000 * 200 = 10GB | ~10GB |
Single Endpoint Update
Endpoints | 100 Endpoints per EndpointSlice | 1 Endpoint per EndpointSlice | |
---|---|---|---|
# of writes | O(1) | O(1) | O(1) |
1 | 1 | 1 | |
Size of API object | O§ | O(B) | O(1) |
20k * const = ~2.0 MB | 100 * const = ~10 KB | < ~1KB | |
# of watchers per object | O(N) | O(N) | O(N) |
5000 | 5000 | 5000 | |
# of total watch event | O(N) | O(N) | O(N) |
5000 | 5000 | 5000 | |
Total Bytes Transmitted | O(PN) | O(BN) | O(N) |
~2.0MB * 5000 = 10GB | ~10k * 5000 = 50MB | ~1KB * 5000 = ~5MB |
Rolling Update
Endpoints | 100 Endpoints per EndpointSlice | 1 Endpoint per EndpointSlice | |
---|---|---|---|
# of writes | O§ | O§ | O§ |
20k | 20k | 20k | |
Size of API object | O§ | O(B) | O(1) |
20k * const = ~2.0 MB | 100 * const = ~10 KB | < ~1KB | |
# of watchers per object | O(N) | O(N) | O(N) |
5000 | 5000 | 5000 | |
# of total watch event | O(NP) | O(NP) | O(NP) |
5000 * 20k | 5000 * 20k | 5000 * 20k | |
Total Bytes Transmitted | O(P^2N) | O(NPB) | O(NP) |
2.0MB * 5000 * 20k = 200 TB | 10KB * 5000 * 20k = 1 TB | ~1KB * 5000 * 20k = ~100 GB |
可以发现当副本数量增多,网络IO是相当多的。尤其是一个Pod重启这样非常简单频繁出现的场景竟然要传输10TB的数据,而EndpointSlice方式只需要50MB.
因为更