前言:
注
意
\color{red}注意
注意:集群版本不同,部署方式不同,本文章配置模式为修改初始启动文件,需要重启docker,kubelet服务,慎行!1.21版本过后,支持动态配置node节点调度规则,以yaml形式管理
官网链接
1.限制k8s-node计算资源(修改启动文件方式-适用版本k8s-1.7+):链接地址
2.节点调度驱逐策略(动态配置节点资源规则-使用版本k8s-1.21+):链接地址
借鉴链接:k8s 节点可分配资源限制
kubectl api-versions
用来确认当前版本是否支持动态配置节点调度
查看当前集群apiversion命令可使用的资源类型
前置要求
必须调整为cgroup的管理方式
1.先确认docker的cgroup driver:
# docker info | grep "Cgroup Driver"
Cgroup Driver: cgroupfs
如果确认docker的Cgroup Driver不是 cgroupfs,则可以通过以下方法配置。
2.修改docker配置
{
"registry-mirrors": ["https://bk6kzfqm.mirror.aliyuncs.com"],
"exec-opts": ["native.cgroupdriver=cgroupfs"], #修改此处
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}
3.修改kubelet cgroup 驱动systemd为cgroupfs
# vim /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.2"
--cgroup-driver=cgroupfs
参数修改成cgroupfs
4.查看kubelet 所有的配置文件
# /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
vim /etc/sysconfig/kubelet
KUBELET_EXTRA_ARGS=--cgroup-driver=cgroupfs ##修改成cgroupfs
5.重启docker和kubelet
systemctl restart docker && systemctl restart kubelet
##报错的话排查问题
# systemctl status kubelet.service -l
# journalctl _PID=<pid>
Kubelet Node Allocatable 节点约束资源
1.查看当前节点可用资源
kubectl describe nodes <node_name>
...
Capacity: ##总资源
cpu: 2
ephemeral-storage: 99561988Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 4026372Ki
pods: 110
Allocatable: ##可用资源
cpu: 2
ephemeral-storage: 91756327989
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 2868278679 #大概2.6g
pods: 110
...
2.概念学习
Kubelet Node Allocatable用来为Kube组件和System进程预留资源,从而保证当节点出现满负荷时也能保证Kube和System进程有足够的资源。
目前支持cpu, memory, ephemeral-storage三种资源预留。
Node Capacity是Node的所有硬件资源,kube-reserved是给kube组件预留的资源,system-reserved是给System进程预留的资源, eviction-threshold(阈值)是kubelet eviction(收回)的阈值设定,
allocatable才是真正scheduler调度Pod时的参考值(保证Node上所有Pods的request resource不超过Allocatable)
Node Allocatable Resource = Node Capacity - Kube-reserved - system-reserved - eviction-threshold。
修改后/var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.2 \
--enforce-node-allocatable=pods,kube-reserved,system-reserved \
--kube-reserved-cgroup=/system.slice/kubelet.service \
--system-reserved-cgroup=/system.slice \
--kube-reserved=cpu=0,memory=1000Mi \
--system-reserved=cpu=1,memory=1000Mi \
--eviction-hard=memory.available<5%,nodefs.available<10%,imagefs.available<10% \
--eviction-soft=memory.available<10%,nodefs.available<15%,imagefs.available<15% \
--eviction-soft-grace-period=memory.available=2m,nodefs.available=2m,imagefs.available=2m \
--eviction-max-pod-grace-period=30 \
--eviction-minimum-reclaim=memory.available=0Mi,nodefs.available=500Mi,imagefs.available=500Mi"
参数解释:
--enforce-node-allocatable=pods,kube-reserved,system-reserved
含义:指定kubelet为哪些进程做硬限制,可选的值有:
- pods
- kube-reserved #给kube组件预留的资源:kubelet,kube-proxy以及docker等
- system-reserve #system-reserved:给system进程预留的资源
--kube-reserved-cgroup=/system.slice/kubelet.service
含义:这个参数用来指定k8s系统组件所使用的cgroup。
注意,这里指定的cgroup及其子系统需要预先创建好,kubelet并不会为你自动创建好。
--system-reserved-cgroup=/system.slice
含义:这个参数用来指定系统守护进程所使用的cgroup。
注意,这里指定的cgroup及其子系统需要预先创建好,kubelet并不会为你自动创建好。
--kube-reserved=cpu=1,memory=250Mi
含义:这里的kube-reserved只为非pod形式启动的kube组件预留资源
--system-reserved=cpu=200m,memory=250Mi
含义:为系统守护进程(sshd, udev等)预留的资源量,
如:–system-reserved=cpu=500m,memory=1Gi,ephemeral-storage=1Gi。
注意,除了考虑为系统进程预留的量之外,还应该为kernel和用户登录会话预留一些内存。
--eviction-hard=memory.available<5%,nodefs.available<10%,imagefs.available<10%
含义:设置进行pod驱逐的阈值,这个参数只支持内存和磁盘。
通过–eviction-hard标志预留一些内存后,当节点上的可用内存降至保留值以下时,
kubelet 将会对pod进行驱逐。
--eviction-soft=memory.available<10%,nodefs.available<15%,imagefs.available<15%
含义:配置 驱逐pod的软阈值
--eviction-soft-grace-period=memory.available=2m,nodefs.available=2m,imagefs.available=2m
含义:定义达到软阈值之后,持续时间超过多久才进行驱逐
--eviction-max-pod-grace-period=30
含义:驱逐pod前最大等待时间=min(pod.Spec.TerminationGracePeriodSeconds, eviction-max-pod-grace-period),单位为秒
--eviction-minimum-reclaim=memory.available=0Mi,nodefs.available=500Mi,imagefs.available=500Mi
含义:至少回收的资源量
2.开始修改并生效
修改成合适的值后,保存
# cat /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.2 \
--enforce-node-allocatable=pods,kube-reserved,system-reserved \
--kube-reserved-cgroup=/system.slice/kubelet.service \
--system-reserved-cgroup=/system.slice \
--kube-reserved=cpu=0,memory=100Mi \
--system-reserved=cpu=1,memory=100Mi \
--eviction-hard=memory.available<5%,nodefs.available<10%,imagefs.available<10% \
--eviction-soft=memory.available<10%,nodefs.available<15%,imagefs.available<15% \
--eviction-soft-grace-period=memory.available=2m,nodefs.available=2m,imagefs.available=2m \
--eviction-max-pod-grace-period=30 \
--eviction-minimum-reclaim=memory.available=0Mi,nodefs.available=500Mi,imagefs.available=500Mi"
修改Kubelet启动service文件 /lib/systemd/system/kubelet.service
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/home/
[Service]
ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpuset/system.slice/kubelet.service
ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/hugetlb/system.slice/kubelet.service
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10
[Install]
WantedBy=multi-user.target
3.重启kubelet 和docker服务,再次查看节点的Capacity和Allocatable
# systemctl restart docker && systemctl restart kubelet
# kubectl describe nodes <node-name>
Addresses:
InternalIP: 192.168.17.150
Hostname: k8s-01
Capacity:
cpu: 2
ephemeral-storage: 99561988Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 4026372Ki
pods: 110
Allocatable:
cpu: 1
ephemeral-storage: 91756327989
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1819702679 #大概1.6g
pods: 110
对比:
再次声明:需要重启docker和kubelet,生产环境慎行,1.21版本后使用yaml文件动态配置即可
类似于这种方式
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
evictionHard:
memory.available: "500Mi"
nodefs.available: "1Gi"
imagefs.available: "100Gi"
evictionMinimumReclaim:
memory.available: "0Mi"
nodefs.available: "500Mi"
imagefs.available: "2Gi"