(1)PasS: Cloud Foundry,PaaS提供一种应用托管的能力,基于IaaS(Infrastructure as a service)之上
Docker镜像,使得本地环境和云端环境高度一致
(2)编排(Orchestration): 主要指用户如何通过某些工具或者配置来完成一组虚拟机以及关联资源的定义,配置,创建,删除等工作,然后由云计算平台按照这些指定的逻辑来完成的过程。
CNCF: Cloud Native Computing Foundation
Borg, Omega
Istio, Operator, Rook
1.容器技术基础
1.1.Linux Namespace
Linux namespace机制:为Linux创建新进程的一个可选参数,调用的为如下方法:
int pid = clone(main_function, stack_size, CLONE_NEWPID | SIGCHLD, NULL)
除了PID Namespace机制,Linux还提供了Mount,UTS,IPC,Network和User这些namespace,用于对各种不同的进程上下文进行障眼法操作。
- Mount namespace: 用于让被隔离进程只看到当前namespace里的挂载点信息;
- Network namespace:用于让被隔离进程看到当前Namespace里的网络设备和配置。
Docker在创建容器进程时,指定了该进程所需要启动的一组namespace参数,容器只能看到当前namespace所限定的资源,文件,设备,状态,或者配置,容器只是一种特殊的进程而已,而这些进程在namespace的作用下只能看到指定的资源,因此给人一种感觉好像和操作系统隔离了,其实只是缩小了容器所能看见的资源。
# 容器进入mac虚拟机终端
screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
docker engine 更多地扮演旁路由和辅助管理工作。
-
虚拟化技术:采用Hypervisor来负责创建虚拟机,该虚拟机真实存在,并且里面必须运行一个完整的Guest OS才能执行用户的应用进程;且虚拟机本身就需要占用内存,对宿主机操作系统的调用不可避免地要经过虚拟化软件的拦截和处理。
-
容器化技术:相比于虚拟化技术,隔离的不彻底,多个容器为运行在宿主机上的一种特殊进程,那么多个容器之间使用的就还是同一个宿主机操作系统内核。且在Linux内核中,很多资源和对象是不能被namespace化的。
如时间,加入你在容器中修改时间,宿主机也会跟着改变。
针对这种缺点,可以采用Seccomp等技术,对容器内部发起的所有系统调用进行过滤和甄别来进行安全加固,但这种方法因为读了一层对系统调用的过滤,会拖累容器性能。
1.2.Linux Cgroup
Linux Control Group 是linux内核中用来为进程设置资源限制的一个重要功能,包括CPU,内存,磁盘,网络带框。
在Linux中,Cgroups给用户暴露出来的操作接口时文件系统,即它以文件和目录的方式组织在操作系统/sys/fs/cgroup 下
Cgroup 配置及使用
(1)挂载使用cgroup
(2)创建一个container目录组,其下会自动生成相关限制文件
(3)创建一个进程用于测试
(4)将上述进程号写入tasks文件中
修改cpu.cfs_quota_us文件设置cpu使用额度为20ms,意味着限制cpu使用为20%
除了对cpu资源的限制
- blkio:为块设备设置I/O限制,一般用于磁盘等设备
- cpuset:为进程分配单独的CPU核和对应的内存节点
- memory:为进程设定内存使用的限制
linux cgroups设计比较易用,可以理解为一个子系统目录加上一组资源限制文件的组合。
我们可以看到该目录下有docker以及kubepod
docker run -it --cpu-period=100000 --cpu-quota=20000 busybox /bin/bash
可以看出docker run的容器在对应的目录下都做了cgroup的限制;
一个正在运行的Docker容器,其实就是启用了多个Linux Namespace的应用进程,而该进程可以使用的资源量,受到Cgroups配置的限制,容器是一个单进程模型。
## ns.c
#define _GNU_SOURCE
#include <sys/mount.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <unistd.h>
#define STACK_SIZE (1024 * 1024)
static char container_stack[STACK_SIZE];
char* const container_args[] = {
"/bin/bash",
NULL
};
int container_main(void* arg)
{
printf("Container - inside the container!\n");
// 如果你的机器的根目录的挂载类型是shared,那必须先重新挂载根目录 // mount("", "/", NULL, MS_PRIVATE, "");
mount("none", "/tmp", "tmpfs", 0, "");
execv(container_args[0], container_args);
printf("Something's wrong!\n");
return 1;
}
int main()
{
printf("Parent - start a container!\n");
int container_pid = clone(container_main, container_stack+STACK_SIZE, CLONE_NEWNS | SIGCHLD , NULL);
waitpid(container_pid, NULL, 0);
printf("Parent - container stopped!\n");
return 0;
}
# 执行上述代码后进入新的容器
gcc -o ns ns.c
./ns
# 但发现该容器和宿主机的内容一样
ls -al
经过实践,我们发现通过mount(“none”, “/tmp”, “tmpfs”, 0, “”);可以将指定文件挂载至容器中,从而可以控制容器中的文件;
mkdir -p ~/test
mkdir -p ~/test/{bin,lib64,lib}
cd ~/test
# 详细显示命令执行的操作
cp -v /bin/{bash,ls,sh} ~/test/bin
list="$(ldd /bin/ls | egrep -o '/lib.*\.[0-9]')"
for i in $list; do mkdir ~/test`dirname $i`; cp -v $i ~/test$i; done
# 改变 /bin/sh 进程的根目录到~/test
chroot ~/test /bin/sh
通常会将挂载在容器根目录上,用来为容器提供隔离后执行环境的文件系统,成为容器镜像,rootfs
- pivot_root:改变文件系统,将当前进程的root文件系统放在put_old目录下,使得new_root成为新的文件系统
- chroot:在指定根目录下运行指令
rootfs是一个操作系统所包含的文件,配置和目录,并不包括操作系统内核,相当于躯体,而linux操作系统内核是操作系统的灵魂,所有容器共享同一个linux内核,意味着该内核的修改会影响所有容器;
Docker在镜像的设计中,引入了层的概念,即用户制作镜像的每一步操作都会生成一个层,为增量的rootfs
- Union File System: 将多个不同位置的目录联合挂载到同一个目录下
# 将./B, ./A, ./C 联合挂载到./D 中,文件覆盖规则如下图
mkdir A
echo aa > ./A/a
echo xx > ./A/x
mkdir B
echo bbb > ./B/b
echo xxx > ./B/x
mkdir BB
echo bbbbbb > ./BB/b
echo xxxxxx > ./BB/x
mkdir C
mkdir D
# 其中./A为最上层,./BB为第二层,./B为第三层
# 原理:挂载时,会将低级目录文件全部拷贝到上级目录中,因此对低级目录中的文件修改并不会修改原本的文件
mount -t overlay overlay -o lowerdir=./BB:./B,upperdir=./A,workdir=./C ./D
可以看出docker使用的是overlay2的方式进行文件的联合挂载
docker pull nginx
docker inspect nginx
root@raspberrypi:~# docker inspect nginx
[
{
"Id": "sha256:b7dd3d7d83385d0bad882b2a2e1298d2c2003dd58eeae7d959e183b8d8392b9b",
"RepoTags": [
"nginx:latest"
],
"RepoDigests": [
"nginx@sha256:790711e34858c9b0741edffef6ed3d8199d8faa33f2870dea5db70f16384df79"
],
"Parent": "",
"Comment": "",
"Created": "2022-08-03T20:11:32.842853023Z",
"Container": "a562c0d94e0989705c34428ecc49bc8cf0339b71469f2574ce156593e1e05368",
"ContainerConfig": {
"Hostname": "a562c0d94e09",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"ExposedPorts": {
"80/tcp": {}
},
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"NGINX_VERSION=1.23.1",
"NJS_VERSION=0.7.6",
"PKG_RELEASE=1~bullseye"
],
"Cmd": [
"/bin/sh",
"-c",
"#(nop) ",
"CMD [\"nginx\" \"-g\" \"daemon off;\"]"
],
"Image": "sha256:8a3bbfa267ceda280efc568b8faa75fde2761c14607d20256d73b1a390188daa",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": [
"/docker-entrypoint.sh"
],
"OnBuild": null,
"Labels": {
"maintainer": "NGINX Docker Maintainers <docker-maint@nginx.com>"
},
"StopSignal": "SIGQUIT"
},
"DockerVersion": "20.10.12",
"Author": "",
"Config": {
"Hostname": "",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"ExposedPorts": {
"80/tcp": {}
},
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"NGINX_VERSION=1.23.1",
"NJS_VERSION=0.7.6",
"PKG_RELEASE=1~bullseye"
],
"Cmd": [
"nginx",
"-g",
"daemon off;"
],
"Image": "sha256:8a3bbfa267ceda280efc568b8faa75fde2761c14607d20256d73b1a390188daa",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": [
"/docker-entrypoint.sh"
],
"OnBuild": null,
"Labels": {
"maintainer": "NGINX Docker Maintainers <docker-maint@nginx.com>"
},
"StopSignal": "SIGQUIT"
},
"Architecture": "arm",
"Variant": "v7",
"Os": "linux",
"Size": 109131098,
"VirtualSize": 109131098,
"GraphDriver": {
"Data": {
"LowerDir": "/var/lib/docker/overlay2/3e8262849af4cbc4cde895685870114127ffd57eead9416c928784291f4ba53f/diff:/var/lib/docker/overlay2/e277a2e14d5ba7c3c7edfb746bb8e074f6bdd96fb0427a8d520e2efd4b9e43c9/diff:/var/lib/docker/overlay2/a05ffd4fecc6c4e32ed65ab5a88c4214969b88b152dab7612dd28ef1b4ebc704/diff:/var/lib/docker/overlay2/dd7005f61f9f06554ab770992ac1463727a7cd79398cde359ba5795fb381eb48/diff:/var/lib/docker/overlay2/a471495d68abb4636ca1414d1cf5e5fcd8241df8d82c4c24a8931f25d0f70ac9/diff",
"MergedDir": "/var/lib/docker/overlay2/670fa195d32249e3ba78bf989417dcf09272b0f5a950b3edcfb82fdf2d9f02de/merged",
"UpperDir": "/var/lib/docker/overlay2/670fa195d32249e3ba78bf989417dcf09272b0f5a950b3edcfb82fdf2d9f02de/diff",
"WorkDir": "/var/lib/docker/overlay2/670fa195d32249e3ba78bf989417dcf09272b0f5a950b3edcfb82fdf2d9f02de/work"
},
"Name": "overlay2"
},
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:ddffee9f8d11b7acf1e1f7b78d0ec21b79d51a283a3fdaf10fd4d13d14693648",
"sha256:29110869bb9fdb3ebf6da2f4297c36cb3bd8733fdbe5bb33bfe6d77745640d3d",
"sha256:b3c88d2b26a3f1920c90dea050fa4478d71b9ccf7a6b1eea601e4f369ec52132",
"sha256:51d4212bec0900e2747c5c7ca68777922ec775dcd3b462c0e279b1c519195bec",
"sha256:54c0a6219829bca9912b9ac604b110af109a0a876c57b4716a9a68ed662dca5d",
"sha256:e44bd7094faabcc9f4e6c77e635a98a586affa70b7c36f0709848c60fc858d58"
]
},
"Metadata": {
"LastTagTime": "0001-01-01T00:00:00Z"
}
}
]
如上RootFS显示,ngnix由6层组成。
可以看出该文件为一个完整linux目录结构
如果自己对容器中文件进行修改后,会产生新的层,此时我们docker commit; docker push 即可生成自己的容器。
docker 支持的UnionFS: aufs, device, mapper, btrfs, overlayfs, vfs, zfs。
控制节点:包含kube-apiserver, kube-schedule, kube-controller-manager,集群中持久化数据由kube-apiserver处理后保存在etcd中。
Device Plugin: k8s用于管理GPU等宿主机物理设备的主要组件;
OCI(Open Container Interface),为CRI的一个标准,使用linux内核系统调用。
CRI(Container Runtime Interface):按标准容器运行镜像,docker为其中一个实现。
CNI(Container Network Interface):网络插件
CSI(Container Storage Interface):存储插件
Service服务的主要作用,是作为Pod的代理入口(Portal),从而代替Pod对外暴露一个固定的网络地址。
ipvs模式下,可为Service指定负载均衡策略
IPVS 提供了更多选项来平衡后端 Pod 的流量。 这些是:
名称 | 解释 |
---|---|
rr | 轮替(Round-Robin) |
lc | 最少链接(Least Connection),即打开链接数量最少者优先 |
dh | 目标地址哈希(Destination Hashing) |
sh | 源地址哈希(Source Hashing) |
sed | 最短预期延迟(Shortest Expected Delay) |
nq | 从不排队(Never Queue) |
声明式API
容器:容器运行时 + 容器镜像
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
2.Kubernetes集群搭建与实践
2.1. 树莓派使用(CentOS7.9 armv71 Kubernetes1.19.0)
树莓派使用(CentOS7.9 armv71 Kubernetes1.19.0)
树莓派部署Kubernetes集群
2.2. Prerequisite of Ubuntu
2.2.1.docker安装
apt install docker.io
# 重启失败
systemctl restart docker
systemctl enable docker
# 查看日志
journalctl -xeu docker
# 手动运行看看有什么问题 抛出 Devices cgroup isn't mounted
sudo dockerd --debug
curl https://github.com/tianon/cgroupfs-mount/blob/master/cgroupfs-mount -o mount_cgroup.sh
chmod +x mount_cgroup.sh
# 每次服务器重启均需执行该脚本,因此可写到 /etc/profile 中
./mount_cgroup.sh
//下载所需组件
//查看所需的组件 kubeadm config images list
//kubeadm init
//chmod +x xxx
REGISTRY=registry.cn-hangzhou.aliyuncs.com
VERSION="v1.20.0"
echo "Pulling docker images..."
imageList=(
kube-apiserver:$VERSION
kube-controller-manager:$VERSION
kube-scheduler:$VERSION
kube-proxy:$VERSION
etcd:3.4.13-0
pause:3.2
#oredns-arm64:1.7.a
coredns:1.7.0
)
for image in ${imageList[@]}
do
echo "Pulling "$image
docker pull $REGISTRY/google_containers/$image
docker tag $REGISTRY/google_containers/$image k8s.gcr.io/$image
docker rmi $REGISTRY/google_containers/$image
done
echo "Finish pulling images..."
2.2.2. 关闭swap分区
修改完分区配置文件后reboot
sudo curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
sudo tee /etc/apt/sources.list.d/kubernetes.list <<-'EOF'
deb https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main
EOF
apt update
# 查看kubeadm所有版本
apt-cache madison kubeadm
apt install kubeadm=1.20.0-00 kubectl=1.20.0-00 kubelet=1.20.0-00
# 初始化一个master节点
kubeadm init --image-repository=registry.aliyuncs.com/google_containers --kubernetes-version=1.20.0 --ignore-preflight-errors=all --v=10
# 将一个Node节点加入到当前集群中
kubeadm join master_ip:master_port
2.2.3. K8s配置网络插件
docker pull weaveworks/weave-npc:2.8.1
docker tag weaveworks/weave-npc:2.8.1 ghcr.io/weaveworks/launcher/weave-npc:2.8.1
docker pull weaveworks/weave-kube:2.8.1
docker tag weaveworks/weave-kube:2.8.1 ghcr.io/weaveworks/launcher/weave-kube:2.8.1
wget https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n') -O weave.yaml
k apply -f weave-2.8.1.yaml
通用做法:将kubelet直接运行在宿主机上,然后使用容器部署其他的kubernetes组件
kubeadm为k8s生成的证书文件都放在Master节点/etc/kubernetes/pki目录下,在该目录下,最主要的证书文件是ca.cert和对应的私钥ca.key
我们通过kubelet获取容器日志等streaming操作时,需要通过kube-apiserver向kubelet发起请求,这个连接也必须是安全的。kubeadm为这一步生成的是apiserver-kubelet-client.crt 对应的私钥是 apiserver-kubelet-client.key.
(1)证书生成:我们可以选择不让kubeadm为你生成这些证书,而是拷贝现有证书到如下证书目录
/etc/kubernetes/pki/ca.{crt,key}
(2)kubeadm为其他组件生成kube-apiserver 所需的配置文件
树莓派部署K3S
(5) rancher安装k3s
curl -sfL https://rancher-mirror.oss-cn-beijing.aliyuncs.com/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn K3S_KUBECONFIG="644" sh -s -
(6) 从树莓派节点加入集群
获取k3s token:
curl https://get.k3s.io | K3S_TOKEN="xxx" K3S_URL="https://[your server ip]:6443" K3S_NODE_NAME="servername" sh -
3.Kubernetes容器编排
namespace做隔离,cgroups做限制,rootfs做文件系统
容器是进程
Pod: Pod里的所有容器,共享的是同一个Network Namespace,并且可以声明共享同一个Volume
Infra: 同一个Pod里面所有的容器的流量都会通过Infra容器完成,共享Infra Network Namespace
3.1.Pod
3.1.1.nodeSelector
- nodeSelector: 将Pod与Node进行绑定的字段
apiVersion: v1
kind: Pod
...
spec:
nodeSelector:
disktype: ssd
3.1.2.hostAlias
- hostAliases: 定义Pod的hosts文件,如/etc/hosts
apiVersion: v1
kind: Pod
...
spec:
hostAliases:
- ip: "10.1.2.3"
hostnames:
- "foo.remote"
- "bar.remote"
...
3.1.3.shareProcessNamespace
- shareProcessNamespace : 容器共享Linux Namespace
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
# Pod中所有容器共享同一命名空间
shareProcessNamespace: true
containers:
- name: nginx
image: nginx
- name: shell
image: busybox
# 等价于 docker -it 用于接受用户标准输入,返回操作系统标准输出
stdin: true
tty: true
3.1.4.hostNetwork, hostIPC, hostPID
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
# 共享宿主机的network, ipc, pid
hostNetwork: true
hostIPC: true
hostPID: true
containers:
- name: nginx
image: nginx
- name: shell
image: busybox
stdin: true
tty: true
3.1.5.lifecycle
apiVersion: v1
kind: Pod
metadata:
namespace: lifecycle-demo
spec:
containers:
- name: lifecycle-demo-container
image: nginx
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "echo Hello from postStart > /usr/share/message"]
preStop:
exec:
command: ["/usr/sbin/nginx", "-s", "quit"]
3.2. Projected Volume
apiVersion: v1
kind: Pod
metadata:
name: test-projected-volume
spec:
containers:
- name: test-secret-volume
image: busybox
command: ["sh", "-c"]
args:
- while true; do
if [[ -e /etc/podinfo/labels ]]; then
echo -en '\n\n'; cat /etc/podinfo/labels; fi;
sleep 5;
done;
volumeMounts:
- name: mysql-cred
mountPath: "/projected-volume"
readOnly: true
- name: podinfo
mountPath: /etc/podinfo
readOnly: false
env:
- name: USER
valueFrom:
secretKeyRef:
name: mysecret
key: user
optional: false
- name: LOG_LEVEL
valueFrom:
configMapKeyRef:
name: env-config
key: log_level
envFrom:
- configMapRef:
name: special-config
volumes:
- name: mysql-cred
secret:
secretName: mysecret
optional: false
- name: podinfo
projected:
sources:
- downwardAPI:
items:
- path: "labels"
fieldRef:
fieldPath: metadata.labels
3.2.1.Secret
apiVersion: v1
kind: Secret
metadata:
name: mysecret
type: Opaque
data:
user: bGl5dWFu
pass: MTIzNDU2
3.2.2.ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: env-config
namespace: default
data:
log_level: DEBUG
---
apiVersion: v1
kind: ConfigMap
metadata:
name: special-config
namespace: default
data:
special.how: very
time: "2022-09-19"
3.2.3. PodPreset
vim /etc/kubernetes/manifests/kube-apiserver.yaml
apiVersion: settings.k8s.io/v1alpha1
kind: PodPreset
metadata:
name: allow-database
spec:
selector:
matchLabels:
role: frontend
env:
- name: DB_PORT
value: "6379"
volumeMounts:
- mountPath: /cache
name: cache-volume
volumes:
- name: cache-volume
emptyDir: {}
3.3.Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginx
replicas: 8
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
volumeMounts:
- name: nginx-volume
mountPath: /usr/share/nginx/html
volumes:
- name: nginx-volume
hostPath:
path: "/var/data"
# --record 可以记录下部署的事件,供回滚使用
k apply -f nginx-deployment.yaml --record
k scale --replicas=3 deploy nginx-deployment
# 回滚到上一个版本
k rollout undo deploy nginx-deployment
k rollout history deploy nginx-deployment
# 查看第四次部署的信息
k rollout history deploy nginx-deployment --revison=4
# 回滚到第4次部署的版本
kubectl rollout undo deployment/nginx-deployment --to-revision=4
# 停止本次nginx-deployment的滚动更新
k rollout pause deploy nginx-deployment
# 在停止滚动更新的期间,我们可以修改nginx-deployment的配置
k edit deploy nginx-deployment
# 恢复滚动更新
k rollout resume deploy nginx-deployment
3.3.1.金丝雀发布
优先发布少量机器进行升级,等验证无误后再更新其他机器,
# partition:2 表示当Pod模板变化时(如更新镜像),只有需要>=2的Pod才会更新,如果删除序号<2的Pod,它使用的模板还是旧的;
kubectl patch statefulset mysql -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}'statefulset.apps/mysql patched
3.3.2.蓝绿部署
2组机器, 蓝表示当前版本v1,绿表示升级完成的v2版本,通过LoadBalancer将流量全部导入V2完成升级部署
3.4.Service
将运行在一组Pods上的应用程序公开为网络服务的抽象方法
3.4.1.Headless Service
创建出来后不会被分配一个VIP,而是以DNS记录方式暴露出其所代理的Pod
<pod-name>.<svc-name>.<namespace>.svc.cluster.local
3.5.StatefulSet
3.5.1 Headless Service
apiVersion: v1
kind: Service
metadata:
name: tomcat-svc
labels:
app: tomcat
spec:
ports:
- port: 8080
name: tomcat-web
clusterIP: None
selector:
app: tomcat
3.5.2. tomcat-statefulset
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: tomcat-statefulset
spec:
serviceName: tomcat-svc
replicas: 5
selector:
matchLabels:
app: tomcat
template:
metadata:
labels:
app: tomcat
spec:
containers:
- name: tomcat
image: tomcat
ports:
- containerPort: 8080
name: tomcat-web
3.5.3.busybox:1.28.3
apiVersion: apps/v1
kind: Deployment
metadata:
name: busybox-deployment
spec:
selector:
matchLabels:
app: busybox
replicas: 1
template:
metadata:
labels:
app: busybox
spec:
containers:
- name: busybox
image: busybox:1.28.3
args:
- /bin/sh
- -c
- sleep 10000;
ping tomcat-statefulset-0.tomcat-svc
3.6.StorageClass
部署NFS StorageClass:
nfs-subdir-external-provisioner
docker pull willdockerhub/nfs-subdir-external-provisioner:v4.0.2
docker tag docker.io/willdockerhub/nfs-subdir-external-provisioner:v4.0.2 k8s.gcr.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner --set nfs.server=192.168.31.175 --set nfs.path=/root/kubernetes/data/nfs
3.7.Persistent Volume Claim
3.7.1.申明PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: nfs-client # 该storageClassName由第三步生成
3.7.2.使用该PVC
apiVersion: v1
kind: Pod
metadata:
name: pv-pod
spec:
containers:
- name: pv-container
image: nginx
ports:
- containerPort: 80
name: http-server
volumeMounts:
- mountPath: /usr/share/nginx/html
name: pvc-storage
volumes:
- name: pvc-storage
persistentVolumeClaim:
claimName: pvc-claim
3.7.3.Statefulset方式使用PVC
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: tomcat-statefulset
spec:
serviceName: tomcat-svc
replicas: 5
selector:
matchLabels:
app: tomcat
template:
metadata:
labels:
app: tomcat
spec:
containers:
- name: tomcat
image: tomcat
ports:
- containerPort: 8080
name: tomcat-web
volumeMounts:
- name: tomcat-statefulset-pvc
mountPath: /usr/local/tomcat/webapps
volumeClaimTemplates:
- metadata:
name: tomcat-statefulset-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: nfs-client
4.StatefulSet 主从mysql (CentOS7.9.2009 armv71)
apiVersion: v1
kind: ConfigMap
metadata:
name: mysql
labels:
app: mysql
data:
master.cnf: |
[mysqld]
log-bin
slave.cnf: |
[mysqld]
super-read-only
---
apiVersion: v1
kind: Service
metadata:
name: mysql-external
labels:
app: mysql
spec:
type: NodePort
ports:
- name: mysql-port
port: 3306
targetPort: 3306
protocol: TCP
selector:
app: mysql
---
apiVersion: v1
kind: Service
metadata:
name: mysql
labels:
app: mysql
spec:
ports:
- name: mysql
port: 3306
clusterIP: None
selector:
app: mysql
---
apiVersion: v1
kind: Service
metadata:
name: mysql-read
labels:
app: mysql
spec:
ports:
- name: mysql
port: 3306
selector:
app: mysql
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
selector:
matchLabels:
app: mysql
serviceName: mysql
replicas: 2
template:
metadata:
labels:
app: mysql
spec:
initContainers:
- name: init-mysql
image: biarms/mysql:5.7.33-beta-circleci
command:
- bash
- "-c"
- |
set -ex
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1
order=${BASH_REMATCH[1]}
echo [mysqld] > /mnt/conf.d/server-id.cnf
echo server-id=$((100 + $order)) >> /mnt/conf.d/server-id.cnf
if [[ $order -eq 0 ]]; then
cp /mnt/config-map/master.cnf /mnt/conf.d/
else
cp /mnt/config-map/slave.cnf /mnt/conf.d/
fi
volumeMounts:
- name: conf
mountPath: /mnt/conf.d
- name: config-map
mountPath: /mnt/config-map
- name: clone-mysql
image: ysakashita/xtrabackup:latest
command:
- bash
- "-c"
- |
set -ex
# 如果数据已存在,跳过
[[ -d /var/lib/mysql/mysql ]] && exit 0
# master 节点0不需要做这个操作
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1
order=${BASH_REMATCH[1]}
[[ $order -eq 0 ]] && exit 0
# clone from master node
ncat --recv-only mysql-0.mysql 3307 | xbstream -x -C /var/lib/mysql
xtrabackup --prepare --target-dir=/var/lib/mysql
volumeMounts:
- name: data
mountPath: /var/lib/mysql
subPath: mysql
- name: conf
mountPath: /etc/mysql/conf.d
containers:
- name: mysql
image: biarms/mysql:5.7.33-beta-circleci
env:
- name: MYSQL_ALLOW_EMPTY_PASSWORD
value: "1"
ports:
- name: mysql
containerPort: 3306
volumeMounts:
- name: data
mountPath: /var/lib/mysql
subPath: mysql
- name: conf
mountPath: /etc/mysql/conf.d
resources:
requests:
cpu: 500m
memory: 1Gi
livenessProbe:
exec:
command: ["mysqladmin", "ping"]
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
readinessProbe:
exec:
command: ["mysql","-h","127.0.0.1","-e","select 1"]
initialDelaySeconds: 5
periodSeconds: 4
timeoutSeconds: 2
- name: xtrabackup
image: ysakashita/xtrabackup:latest
ports:
- name: xtrabackup
containerPort: 3307
command:
- bash
- "-c"
- |
set -ex
cd /var/lib/mysql
# 从备份信息文件中读取MASTER_LOG_FILEM 和 MASTER_LOG_POS 这2个值,拼装初始化SQLSQL
if [[ -f xtrabackup_slave_info ]]; then
echo "Found xtrabackup_slave_info..."
mv xtrabackup_slave_info change_master_to.sql.in
rm -f xtrabackup_binlog_info
# 注意此处为xtrabackup_binlog_pos_innodb 而不是 xtrabackup_binlog_info, 因为我使用的xtrabackup为旧版的
elif [[ -f xtrabackup_binlog_pos_innodb ]]; then
echo "Found xtrabackup_binlog_pos_innodb..."
[[ `cat xtrabackup_binlog_pos_innodb` =~ ^(.*?)[[:space:]]+(.*?)$ ]] || exit 1
# reteieve binlog filename and pos from xtrabackup_binlog_pos_innodb
rm xtrabackup_binlog_pos_innodb
echo "CHANGE MASTER TO MASTER_LOG_FILE='${BASH_REMATCH[1]}',MASTER_LOG_POS=${BASH_REMATCH[2]}" > change_master_to.sql.in
fi
ls -al;
# 如果change_master_to.sql.in, 就意味着需要做集群的初始化
if [[ -f change_master_to.sql.in ]]; then
cat change_master_to.sql.in;
echo "Waiting for mysqld to be ready (accepting connections)"
until mysql -h 127.0.0.1 -e "select 1"; do sleep 1; done
echo "Initializing replication from clone position"
# mv change_master_to.sql.orig change_master_to.sql.orig.in
# as the change_master_to.sql.in is empty, I shall generate by myself, but not sure whether above [[ xtrabackup_binlog_info ]] would influence it or not, so I use >>
# echo "CHANGE MASTER TO MASTER_USER='root'" >> change_master_to.sql.in
mysql -h 127.0.0.1 \
-e "$(<change_master_to.sql.in),\
MASTER_HOST='mysql-0.mysql',\
MASTER_USER='root',\
MASTER_PASSWORD='',\
MASTER_CONNECT_RETRY=10;\
START SLAVE;" || exit 1;
mv change_master_to.sql.in change_master_to.sql.orig
fi
# 使用ncat监听3307端口,作用是当收到传输请求时,直接执行xtrabackup --backup 命令, 备份MYSQL数据并发送给请求者
exec ncat --listen --keep-open --send-only --max-conns=1 3307 -c "xtrabackup --backup --slave-info --stream=xbstream --host=127.0.0.1 --user=root"
volumeMounts:
- name: data
mountPath: /var/lib/mysql
subPath: mysql
- name: conf
mountPath: /etc/mysql/conf.d
volumes:
- name: conf
emptyDir: {}
- name: config-map
configMap:
name: mysql
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- ReadWriteOnce
storageClassName: nfs-client
resources:
requests:
storage: 6Gi
# storageClassName: nfs-client
5.DaemonSet
- 该Pod运行在每一个Node上
- 该节点只有一个这样的Pod实例
- 当有新的Node加入集群后,该Pod会自动创建于该Node
5.1. fluentd-kubernetes-daemonset:v1.9.1-debian-elasticsearch7-1.0
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluentd
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fluentd
rules:
- apiGroups:
- ""
resources:
- pods
- namespaces
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: fluentd
roleRef:
kind: ClusterRole
name: fluentd
apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
name: fluentd
namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
namespace: kube-system
labels:
k8s-app: fluentd-logging
version: v1
spec:
selector:
matchLabels:
k8s-app: fluentd-logging
version: v1
template:
metadata:
labels:
k8s-app: fluentd-logging
version: v1
spec:
serviceAccount: fluentd
serviceAccountName: fluentd
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd
image: r1cebank/fluentd-kubernetes-daemonset:v1.9.1-debian-elasticsearch7-1.0
env:
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: dockercontainerlogdirectory
mountPath: /var/log/pods
readOnly: true
volumes:
- name: varlog
hostPath:
path: /var/log
- name: dockercontainerlogdirectory
hostPath:
path: /var/log/pods
5.2.controllerrevision
查看版本变化记录
k get controllerrevision --all-namespaces
k describe controllerrevision mysql-69bf89c9c9
# daemonset非常类似deployment, 也有回滚操作
kubectl rollout undo daemonset fluentd-elasticsearch --to-revision=1 -n kube-system
6.Job & CronJob
Job 是 Pod Controller
[root@master job]# cat batch_job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
# 最大并行数
parallelism: 2
# 最小完成数
completions: 4
template:
spec:
containers:
- name: pi
image: busybox:latest
command: ["sh","-c","echo $((3*4))"]
# 若将restartPoliy设置为OnFailure,则会重复启动该Pod的容器
restartPolicy: Never
backofflimit: 5
# 100s 后无论Pod怎样都会被终止
activeDeadlineSeconds: 100
# 失败重试次数
backoffLimit: 4
CronJob 是Job Controller
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
# 分 时 日 月 星期
schedule: "*/1 * * * *"
# Allow: 允许任务并发执行
# Forbid: 拒绝并发任务
# Replace: 任务未执行完成直接用新任务替换
concurrencyPolicy: Allow
# 若任务再过去200s内失败100次,则该job就不会再被创建执行
startingDeadlineSeconds: 200
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the kubernetes cluster
restartPolicy: OnFailure
7. Custom Resource Definition
8.RBAC(Role Based Access Control)
8.1.Role
# example-role.yaml
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: default
name: example-role
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get","list","watch"]
8.2.RoleBinding
# example-rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: example-rolebinding
namespace: default
subjects:
- kind: User
name: example-user
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: example-role
apiGroup: rbac.authorization.k8s.io
8.3.ClusterRole
# example-clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: example-clusterrole
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get","watch","list"]
- verbs: [“get”,“list”,“watch”,“create”,“update”,“patch”,“delete”]
8.4.ClusterRolebinding
# example-clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: example-clusterrolebinding
subjects:
- kind: User
name: example-user
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: example-clusterrole
apiGroup: rbac.authorization.k8s.io
8.5.ServiceAccount
# example-account.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
namespace: default
name: example-sa
# example-rolebinding.yaml
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: example-rolebinding
namespace: default
subjects:
- kind: User
name: example-user
apiGroup: rbac.authorization.k8s.io
- kind: ServiceAccount
name: example-sa
namespace: default
roleRef:
kind: Role
name: example-role
apiGroup: rbac.authorization.k8s.io
-
ServiceAccount: system:serviceaccount:{namespace}:{serviceAccout}
-
ServiceAccountGroup: system:serviceaccount:{namespace}
...
subjects:
- kind: Group
name: system:serviceaccount:default
apiGroup: rbac.authorization.k8s.io
# 查看系统内置的clusterRole
k get clusterrole
# 为所有角色绑定一个view权限
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: example-role-view
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["get","list","watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: rolebindingallusersviewpermission
subjects:
- kind: ServiceAccount
name: system:serviceaccount:default
# apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: example-role-view
apiGroup: rbac.authorization.k8s.io
---
9.Operator
10. PersistentVolume
PV描述的是持久化存储数据卷
# example-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: example-pv
spec:
storageClassName: manual
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
nfs:
server: 192.168.31.175
path: /root/kubernetes/data/nfs/pvtest
11.PersistentVolumeClaim
PVC描述的是Pod所希望使用的持久化存储属性
PVC生效需要满足:
- PV和PVC的spec字段如storage必须适配
- PV和PVC的storageClassName必须一样
# example-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: example-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: manual
resources:
requests:
storage: 1Gi
12.PersistentVolumeController
不断查询每一个PVC是不是已经处在了Bound状态,若不是,则会遍历所有可用的PV,并尝试将其与该PVC绑定,即将该PV的名称填写至PVC的spec.volumeName上
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"example-pvc","namespace":"default"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"1Gi"}},"storageClassName":"manual"}}
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
creationTimestamp: "2022-10-19T12:16:28Z"
finalizers:
- kubernetes.io/pvc-protection
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:spec:
f:accessModes: {}
f:resources:
f:requests:
.: {}
f:storage: {}
f:storageClassName: {}
f:volumeMode: {}
manager: kubectl-client-side-apply
operation: Update
time: "2022-10-19T12:16:28Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:pv.kubernetes.io/bind-completed: {}
f:pv.kubernetes.io/bound-by-controller: {}
f:spec:
f:volumeName: {}
f:status:
f:accessModes: {}
f:capacity:
.: {}
f:storage: {}
f:phase: {}
manager: kube-controller-manager
operation: Update
time: "2022-10-19T12:16:56Z"
name: example-pvc
namespace: default
resourceVersion: "7442712"
selfLink: /api/v1/namespaces/default/persistentvolumeclaims/example-pvc
uid: 3bb225f0-8a38-4834-802f-335e08137311
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: manual
volumeMode: Filesystem
volumeName: example-pv
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
phase: Bound
# 查看该Pod的uid
k get po web-frontend -oyaml
echo "hello world by liyuan 2022-10-19 20:51" > /root/kubernetes/data/nfs/pvtest/hello.html
# 如下目录下也会同步创建hello.html文件
ls /var/lib/kubelet/pods/c61b2f65-2d72-4d5f-a87b-ea7bf9bdde33/volumes/kubernetes.io~nfs/example-pv/
# 等价于
mount -t nfs 192.168.31.175:/root/kubernetes/data/nfs/pvtest/ /var/lib/kubelet/pods/c61b2f65-2d72-4d5f-a87b-ea7bf9bdde33/volumes/kubernetes.io~nfs/example-pv/
# (1) k8s将通用目录按照PV配置的NFS挂载至远程文件服务器
# (2)容器运行时docker将应用需要挂载的目录挂载至k8s通用目录
docker run -itd --rm --name nginx-test -v /var/lib/kubelet/pods/c61b2f65-2d72-4d5f-a87b-ea7bf9bdde33/volumes/kubernetes.io~nfs/example-pv/:/usr/share/nginx/html nginx
两阶段处理
- attach: 获取存储资源,k8s传入nodeName,通过AttachDetachController控制,运行在master节点。,属于kube-controller-manager一部分,不断检查每个Pod对应的PV和宿主机之间挂载的情况;
- mount::挂载,k8s提供宿主机Volume的dir参数,通过VolumeManagerReconciler控制,运行在每一个节点,属于kubelet一部分。
13.StorageClass
Dynamic Provisioning
StorageClass 用于创建PV模板
- 定义PV的属性如大小,存储类型;
- 绑定创建该PV所需要的存储插件(CEPH,NFS)
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
meta.helm.sh/release-name: nfs-subdir-external-provisioner
meta.helm.sh/release-namespace: default
creationTimestamp: "2022-09-17T14:42:58Z"
labels:
app: nfs-subdir-external-provisioner
app.kubernetes.io/managed-by: Helm
chart: nfs-subdir-external-provisioner-4.0.17
heritage: Helm
release: nfs-subdir-external-provisioner
managedFields:
- apiVersion: storage.k8s.io/v1
fieldsType: FieldsV1
fieldsV1:
f:allowVolumeExpansion: {}
f:metadata:
f:annotations:
.: {}
f:meta.helm.sh/release-name: {}
f:meta.helm.sh/release-namespace: {}
f:labels:
.: {}
f:app: {}
f:app.kubernetes.io/managed-by: {}
f:chart: {}
f:heritage: {}
f:release: {}
f:parameters:
.: {}
f:archiveOnDelete: {}
f:provisioner: {}
f:reclaimPolicy: {}
f:volumeBindingMode: {}
manager: helm
operation: Update
time: "2022-09-17T14:42:58Z"
name: nfs-client
resourceVersion: "365896"
selfLink: /apis/storage.k8s.io/v1/storageclasses/nfs-client
uid: 255aa2de-2322-4a6a-a193-632786458ee4
parameters:
archiveOnDelete: "true"
provisioner: cluster.local/nfs-subdir-external-provisioner
reclaimPolicy: Delete
volumeBindingMode: Immediate
若集群开启DefaultStorageClass的Admission Plugin,其会自动为PVC和PV添加一个StorageClass,否则PVC的StorageClassName值为“”,只会和storageClassName为“”的PV绑定
14.如何理解PV, PVC
- 我们不应当将宿主机上的目录当作PV使用,应当使用一块额外挂载在宿主机上的磁盘或者块设备;(一个PV一块盘)
为什么使用Local PersisitentVolume 呢?
假如我们在编写yaml文件时使用hostPath的形式,我们并不知道hostPath该写什么,因此采用该种形式用户无需关心要挂载在物理磁盘上的哪个位置;
Local PersistentVolume
将挂载的磁盘抽象为PV
# local_pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: local-pv
spec:
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: local-storage
local:
path: /root/test
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- master
# local_sc.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisoner
# volumeBindingMode=WaitForFirstConsumer
# 延迟绑定
volumeBindingMode: WaitForFirstConsumer
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: example-local-pvc
spec:
accessModes:
1. ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: local-storage
删除该PV方式:
- 删除Pod
- umount该磁盘
- 删除 PVC
- 删除 PV
参考文章
知乎:手把手教大家使用树莓派4B搭建K8s集群
Statefulset 主从MYSQL github
主从MYSQL - 深入剖析Kubernetes