一、前言
基础集群组件
0、k8s集群(k8s-1.23.17)(k8s-1.29.0)
1、helm、kubens、kubectl补全
2、MetalLB
3、ingress-nginx
4、istio
5、argocd
6、Argo Rollouts
7、metrics-server
8、vertical-pod-autoscaler
9、kuboard-v3
10、nfs-subdir-external-provisioner
11、Velero + minio(备份容灾)
12、gitlab
13、harbor
14、jenkins
15、prometheus + grafana
16、ELK
SpringCloud 业务组件
0、nginx-1.25.1
1、mysql-8.0.22
2、nacos-2.1.0
3、redis-7.2
4、mongo-7.0.0
5、kafka-3.5.1
6、minio
7、xxl-job-2.4.0
8、skywalking
DevOps(理想) | cpu、内存 | 磁盘 |
---|---|---|
k8s-master | 4c8g | 200g |
k8s-node1(中间件) | 8c16g | 200g |
k8s-node2(中间件) | 8c16g | 200g |
k8s-node3(中间件) | 8c16g | 200g |
k8s-node4(后端) | 8c16g | 200g |
k8s-node5(后端) | 8c16g | 200g |
k8s-node6(后端) | 8c16g | 200g |
k8s-node7(前端) | 8c16g | 200g |
k8s-node8(前端) | 8c16g | 200g |
k8s-node9(前端) | 8c16g | 200g |
k8s-ingress-nginx | 4c8g | 100g |
k8s-nfs-storage | 2c4g | 1t |
MySQL(主) | 8c16g | 200g |
harbor | 2c4g | 400g |
gitlab | 4c8g | 100g |
k8s-node(Jenkins)台式机 | 8c16g*3 | 200g |
4*2+8*3+16*10=192g
2*2+4*3+8*10=96c
100*2+200*11+400g*1+1t=3824g
戴尔R750(80c192g、4t)+台式机(24c64g、1t)
skywalking.huanghuanhui.cloud
kibana.huanghuanhui.cloud
argocd.huanghuanhui.cloud
gitlab.huanghuanhui.cloud
harbor.huanghuanhui.cloud
jenkins-prod.huanghuanhui.cloud
kuboard.huanghuanhui.cloud
minio.huanghuanhui.cloud
minio-console.huanghuanhui.cloud
webstatic.huanghuanhui.cloud
uploadstatic.huanghuanhui.cloud
mirrors.huanghuanhui.cloud
grafana.huanghuanhui.cloud
prometheus.huanghuanhui.cloud
www.huanghuanhui.cloud/nacos
www.huanghuanhui.cloud/xxl-job-admin
二、基础集群组件
2.0、k8s集群(k8s-1.29.0)
containerd-1.6.26 + k8s-1.29.0(最新)(kubeadm方式)(containerd容器运行时版)
kubeadm方式安装最新版k8s-1.29.0(containerd容器运行时)
containerd-1.6.26 + k8s-1.29.0(最新)(kubeadm方式)
containerd-1.6.26
k8s-1.29.0
2.0.0、环境准备(centos7 环境配置+调优)
# 颜色
echo "PS1='\[\033[35m\][\[\033[00m\]\[\033[31m\]\u\[\033[33m\]\[\033[33m\]@\[\033[03m\]\[\033[35m\]\h\[\033[00m\] \[\033[5;32m\]\w\[\033[00m\]\[\033[35m\]]\[\033[00m\]\[\033[5;31m\]\\$\[\033[00m\] '" >> ~/.bashrc && source ~/.bashrc
echo 'PS1="[\[\e[33m\]\u\[\e[0m\]\[\e[31m\]@\[\e[0m\]\[\e[35m\]\h\[\e[0m\]:\[\e[32m\]\w\[\e[0m\]] \[\e[33m\]\t\[\e[0m\] \[\e[31m\]Power\[\e[0m\]=\[\e[32m\]\!\[\e[0m\] \[\e[35m\]^0^\[\e[0m\]\n\[\e[95m\]公主请输命令^0^\[\e[0m\] \[\e[36m\]\\$\[\e[0m\] "' >> ~/.bashrc && source ~/.bashrc
# 0、centos7 环境配置
# 安装 vim
yum -y install vim wget net-tools
# 行号
echo "set nu" >> /root/.vimrc
# 搜索关键字高亮
sed -i "8calias grep='grep --color'" /root/.bashrc
# 腾讯源
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.cloud.tencent.com/repo/centos7_base.repo
wget -O /etc/yum.repos.d/CentOS-Epel.repo http://mirrors.cloud.tencent.com/repo/epel-7.repo
yum clean all
yum makecache
# 1、设置主机名
hostnamectl set-hostname k8s-master && su -
# 2、添加hosts解析
cat >> /etc/hosts << EOF
192.168.1.201 k8s-master
192.168.1.202 k8s-node1
192.168.1.203 k8s-node2
192.168.1.204 k8s-node3
EOF
# 3、同步时间
yum -y install ntp
systemctl enable ntpd --now
# 4、永久关闭seLinux(需重启系统生效)
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
# 5、永久关闭swap(需重启系统生效)
swapoff -a # 临时关闭
sed -i 's/.*swap.*/#&/g' /etc/fstab # 永久关闭
# 6、升级内核为5.4版本(需重启系统生效)
# https://elrepo.org/tiki/kernel-lt
# https://elrepo.org/linux/kernel/el7/x86_64/RPMS/
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-6.el7.elrepo.noarch.rpm
yum --disablerepo="*" --enablerepo="elrepo-kernel" list available
yum --enablerepo=elrepo-kernel install -y kernel-lt
grub2-set-default 0
# 这里先重启再继续
# reboot
# 7、关闭防火墙、清空iptables规则
systemctl disable firewalld && systemctl stop firewalld
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X && iptables -P FORWARD ACCEPT
# 8、关闭 NetworkManager
systemctl disable NetworkManager && systemctl stop NetworkManager
# 9、加载IPVS模块
yum -y install ipset ipvsadm
cat > /etc/sysconfig/modules/ipvs.modules <<EOF
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack
EOF
modprobe -- nf_conntrack
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack
# 10、开启br_netfilter、ipv4 路由转发
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# 设置所需的 sysctl 参数,参数在重新启动后保持不变
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# 应用 sysctl 参数而不重新启动
sudo sysctl --system
# 查看是否生效
lsmod | grep br_netfilter
lsmod | grep overlay
sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward
# 11、内核调优
cat > /etc/sysctl.d/99-sysctl.conf << 'EOF'
# sysctl settings are defined through files in
# /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.
#
# Vendors settings live in /usr/lib/sysctl.d/.
# To override a whole file, create a new file with the same in
# /etc/sysctl.d/ and put new settings there. To override
# only specific settings, add a file with a lexically later
# name in /etc/sysctl.d/ and put new settings there.
#
# For more information, see sysctl.conf(5) and sysctl.d(5).
# Controls IP packet forwarding
# Controls source route verification
net.ipv4.conf.default.rp_filter = 1
# Do not accept source routing
net.ipv4.conf.default.accept_source_route = 0
# Controls the System Request debugging functionality of the kernel
# Controls whether core dumps will append the PID to the core filename.
# Useful for debugging multi-threaded applications.
kernel.core_uses_pid = 1
# Controls the use of TCP syncookies
net.ipv4.tcp_syncookies = 1
# Controls the maximum size of a message, in bytes
kernel.msgmnb = 65536
# Controls the default maxmimum size of a mesage queue
kernel.msgmax = 65536
net.ipv4.conf.all.promote_secondaries = 1
net.ipv4.conf.default.promote_secondaries = 1
net.ipv6.neigh.default.gc_thresh3 = 4096
kernel.sysrq = 1
net.ipv6.conf.all.disable_ipv6=0
net.ipv6.conf.default.disable_ipv6=0
net.ipv6.conf.lo.disable_ipv6=0
kernel.numa_balancing = 0
kernel.shmmax = 68719476736
kernel.printk = 5
net.core.rps_sock_flow_entries=8192
net.bridge.bridge-nf-call-ip6tables=1
net.ipv4.ip_local_reserved_ports=60001,60002
net.core.rmem_max=16777216
fs.inotify.max_user_watches=524288
kernel.core_pattern=core
net.core.dev_weight_tx_bias=1
net.ipv4.tcp_max_orphans=32768
kernel.pid_max=4194304
kernel.softlockup_panic=1
fs.file-max=3355443
net.core.bpf_jit_harden=1
net.ipv4.tcp_max_tw_buckets=32768
fs.inotify.max_user_instances=8192
net.core.bpf_jit_kallsyms=1
vm.max_map_count=262144
kernel.threads-max=262144
net.core.bpf_jit_enable=1
net.ipv4.tcp_keepalive_time=600
net.ipv4.tcp_wmem=4096 12582912 16777216
net.core.wmem_max=16777216
net.ipv4.neigh.default.gc_thresh1=2048
net.core.somaxconn=32768
net.ipv4.neigh.default.gc_thresh3=8192
net.ipv4.ip_forward=1
net.ipv4.neigh.default.gc_thresh2=4096
net.ipv4.tcp_max_syn_backlog=8096
net.bridge.bridge-nf-call-iptables=1
net.ipv4.tcp_rmem=4096 12582912 16777216
EOF
# 应用 sysctl 参数而不重新启动
sudo sysctl --system
# 12、设置资源配置文件
cat >> /etc/security/limits.conf << 'EOF'
* soft nofile 100001
* hard nofile 100002
root soft nofile 100001
root hard nofile 100002
* soft memlock unlimited
* hard memlock unlimited
* soft nproc 254554
* hard nproc 254554
* soft sigpending 254554
* hard sigpending 254554
EOF
grep -vE "^\s*#" /etc/security/limits.conf
ulimit -a
2.0.1、安装containerd-1.6.26(官方源)
wget -O /etc/yum.repos.d/docker-ce.repo https://download.docker.com/linux/centos/docker-ce.repo
yum makecache fast
yum list containerd.io --showduplicates | sort -r
yum -y install containerd.io-1.6.26-3.1.el7
containerd config default | sudo tee /etc/containerd/config.toml
# 修改cgroup Driver为systemd
sed -ri 's#SystemdCgroup = false#SystemdCgroup = true#' /etc/containerd/config.toml
# 更改sandbox_image
sed -ri 's#registry.k8s.io\/pause:3.6#registry.aliyuncs.com\/google_containers\/pause:3.9#' /etc/containerd/config.toml
# 添加镜像加速
# https://github.com/DaoCloud/public-image-mirror
# 1、指定配置文件目录
sed -i 's/config_path = ""/config_path = "\/etc\/containerd\/certs.d\/"/g' /etc/containerd/config.toml
# 2、配置加速
# docker.io 镜像加速
mkdir -p /etc/containerd/certs.d/docker.io
cat > /etc/containerd/certs.d/docker.io/hosts.toml << 'EOF'
server = "https://docker.io" # 源镜像地址
[host."https://xk9ak4u9.mirror.aliyuncs.com"] # 镜像加速地址
capabilities = ["pull","resolve"]
[host."https://dockerproxy.com"] # 镜像加速地址
capabilities = ["pull", "resolve"]
[host."https://docker.mirrors.ustc.edu.cn"] # 镜像加速地址
capabilities = ["pull","resolve"]
[host."https://registry-1.docker.io"]
capabilities = ["pull","resolve","push"]
EOF
# registry.k8s.io 镜像加速
mkdir -p /etc/containerd/certs.d/registry.k8s.io
cat > /etc/containerd/certs.d/registry.k8s.io/hosts.toml << 'EOF'
server = "https://registry.k8s.io"
[host."https://k8s.m.daocloud.io"]
capabilities = ["pull", "resolve", "push"]
EOF
# quay.io 镜像加速
mkdir -p /etc/containerd/certs.d/quay.io
cat > /etc/containerd/certs.d/quay.io/hosts.toml << 'EOF'
server = "https://quay.io"
[host."https://quay.m.daocloud.io"]
capabilities = ["pull", "resolve", "push"]
EOF
systemctl daemon-reload
systemctl enable containerd --now
systemctl restart containerd
systemctl status containerd
镜像加速配置无需重启服务,即可生效
#设置crictl
cat << EOF >> /etc/crictl.yaml
runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 10
debug: false
EOF
2.0.2、安装k8s(kubeadm-1.29.0、kubelet-1.29.0、kubectl-1.29.0)(清华源+自建源)
cat > /etc/yum.repos.d/kubernetes.repo << 'EO'F
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.29/rpm/
enabled=1
gpgcheck=0
EOF
yum makecache fast
yum -y install kubeadm-1.29.0 kubelet-1.29.0 kubectl-1.29.0
systemctl enable --now kubelet
2.0.3、初始化 k8s-1.29.0 集群
mkdir ~/kubeadm_init && cd ~/kubeadm_init
kubeadm config print init-defaults > kubeadm-init.yaml
cat > ~/kubeadm_init/kubeadm-init.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.1.201 # 修改自己的ip
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
name: k8s-master
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/k8s-master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.29.0
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
scheduler: {}
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
EOF
# 查看所需镜像列表
kubeadm config images list --config kubeadm-init.yaml
# 预拉取镜像
kubeadm config images pull --config kubeadm-init.yaml
# 初始化
kubeadm init --config=kubeadm-init.yaml | tee kubeadm-init.log
# 配置 kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
2.0.4、安装 k8s 集群网络(calico)
查看calico与k8s的版本对应关系
这里k8s-1.29.0,所以使用calico-v3.27.0版本(版本对应很关键)
mkdir -p ~/calico-yml
cd ~/calico-yml && wget https://github.com/projectcalico/calico/raw/v3.27.0/manifests/calico.yaml
1 修改CIDR
- name: CALICO_IPV4POOL_CIDR
value: "10.244.0.0/16"
2 指定网卡
# Cluster type to identify the deployment type
- name: CLUSTER_TYPE
value: "k8s,bgp"
# 下面添加
- name: IP_AUTODETECTION_METHOD
value: "interface=ens33"
# ens33为本地网卡名字(自己机器啥网卡就改啥)
# 1 修改CIDR
sed -i 's/192\.168/10\.244/g' calico.yaml
sed -i 's/# \(- name: CALICO_IPV4POOL_CIDR\)/\1/' calico.yaml
sed -i 's/# \(\s*value: "10.244.0.0\/16"\)/\1/' calico.yaml
# 2 指定网卡(ens33为本地网卡名字(自己机器啥网卡就改啥))
sed -i '/value: "k8s,bgp"/a \ - name: IP_AUTODETECTION_METHOD' \calico.yaml
sed -i '/- name: IP_AUTODETECTION_METHOD/a \ value: "interface=ens33"' \calico.yaml
kubectl apply -f ~/calico-yml/calico.yaml
2.0.5、coredns 解析测试是否正常
[root@k8s-master ~]# kubectl run -it --rm dns-test --image=busybox:1.28.4 sh
If you don't see a command prompt, try pressing enter.
/ # nslookup kubernetes
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local # 看到这个说明dns解析正常
Name: kubernetes
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
/ #
kubectl run -it --rm dns-test --image=busybox:1.28.4 sh
kubectl run -it --rm dns-test --image=ccr.ccs.tencentyun.com/huanghuanhui/busybox:1.28.4 sh
nslookup kubernetes
2.1、helm、kubens、kubectl补全
2.1.1、helm
cd && wget https://repo.huaweicloud.com/helm/v3.13.3/helm-v3.13.3-linux-amd64.tar.gz
tar xf ~/helm-v3.13.3-linux-amd64.tar.gz
cp ~/linux-amd64/helm /usr/local/sbin/helm
rm -rf ~/helm-v3.13.3-linux-amd64.tar.gz && rm -rf ~/linux-amd64
helm version
2.1.2、kubectx、kubens
wget -O /usr/local/sbin/kubens https://github.com/ahmetb/kubectx/raw/v0.9.5/kubens
# 代理地址
wget -O /usr/local/sbin/kubens https://gh-proxy.com/https://github.com/ahmetb/kubectx/raw/v0.9.5/kubens --no-check-certificate
chmod +x /usr/local/sbin/kubens
wget -O /usr/local/sbin/kubectx https://github.com/ahmetb/kubectx/raw/v0.9.5/kubectx
# chmod +x /usr/local/sbin/kubectx
2.1.3、kubectl 补全
yum -y install bash-completion
source /etc/profile.d/bash_completion.sh
echo "source <(crictl completion bash)" >> ~/.bashrc
echo "source <(kubectl completion bash)" >> ~/.bashrc
echo "source <(helm completion bash)" >> ~/.bashrc
source ~/.bashrc && su -
2.1.4、别名
cat >> ~/.bashrc << 'EOF'
alias pod='kubectl get pod'
alias po='kubectl get pod'
alias svc='kubectl get svc'
alias ns='kubectl get ns'
alias pvc='kubectl get pvc'
alias pv='kubectl get pv'
alias sc='kubectl get sc'
alias ingress='kubectl get ingress'
alias all='kubectl get all'
alias deployment='kubectl get deployments'
alias vs='kubectl get vs'
alias gateway='kubectl get gateway'
EOF
source ~/.bashrc
2.2、MetalLB
使用 MetalLB 让
LoadBalancer
服务使用EXTERNAL-IP
在
k8s-master
安装MetalLB
kubectl get configmap -n kube-system kube-proxy
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
strictARP: true
# see what changes would be made, returns nonzero returncode if different
kubectl get configmap kube-proxy -n kube-system -o yaml | \
sed -e "s/strictARP: false/strictARP: true/" | \
kubectl diff -f - -n kube-system
# actually apply the changes, returns nonzero returncode on errors only
kubectl get configmap kube-proxy -n kube-system -o yaml | \
sed -e "s/strictARP: false/strictARP: true/" | \
kubectl apply -f - -n kube-system
配置 MetalLB 为Layer2 模式
# kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.12/config/manifests/metallb-native.yaml
mkdir -p ~/MetalLB-yml && cd ~/MetalLB-yml
# 代理地址
wget https://gh-proxy.com/https://raw.githubusercontent.com/metallb/metallb/v0.13.12/config/manifests/metallb-native.yaml --no-check-certificate
kubectl apply -f ~/MetalLB-yml/metallb-native.yaml
# k8s-master 创建ip地址池
cat > ~/MetalLB-yml/iptool.yaml << 'EOF'
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: default
namespace: metallb-system
spec:
addresses:
- 192.168.1.100-192.168.1.119 # 网段跟node节点保持一致
autoAssign: true
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: default
namespace: metallb-system
spec:
ipAddressPools:
- default
EOF
kubectl apply -f ~/MetalLB-yml/iptool.yaml
# kubectl get IPAddressPool -n metallb-system
NAME AUTO ASSIGN AVOID BUGGY IPS ADDRESSES
default true false ["192.168.1.100-192.168.1.119"]
2.3、ingress-nginx
helm安装 ingress-nginx(k8s-master边缘节点)
master(ingress-nginx边缘节点)
chart version:4.9.0 (k8s:1.29、1.28、1.27、1.26、1.25)
当前版本:k8s-v1.29.0
https://github.com/kubernetes/ingress-nginxhttps://github.com/kubernetes/ingress-nginx
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm search repo ingress-nginx/ingress-nginx
helm pull ingress-nginx/ingress-nginx --version 4.9.0 --untar
方式1:DaemonSet + HostNetwork + nodeSelector
cat > ~/ingress-nginx/values-prod.yaml << 'EOF'
controller:
name: controller
image:
registry: dyrnq
image: controller
tag: "v1.9.5"
digest:
pullPolicy: IfNotPresent
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
publishService: # hostNetwork 模式下设置为false,通过节点IP地址上报ingress status数据
enabled: false
kind: DaemonSet
tolerations: # kubeadm 安装的集群默认情况下 k8s-master 是有污点,需要容忍这个污点才可以部署
- key: "node-role.kubernetes.io/k8s-master"
operator: "Equal"
effect: "NoSchedule"
nodeSelector: # 固定到k8s-master节点(自己master啥名字就写啥)
kubernetes.io/hostname: "k8s-master"
service: # HostNetwork 模式不需要创建service
enabled: false
admissionWebhooks: # 强烈建议开启 admission webhook
enabled: true
patch:
enabled: true
image:
registry: dyrnq
image: kube-webhook-certgen
tag: v20241011-8b53cabe0
digest:
pullPolicy: IfNotPresent
defaultBackend:
enabled: true
name: defaultbackend
image:
registry: dyrnq
image: defaultbackend-amd64
tag: "1.5"
digest:
pullPolicy: IfNotPresent
EOF
kubectl create ns ingress-nginx
helm upgrade --install --namespace ingress-nginx ingress-nginx -f ./values-prod.yaml .
方式2:MetalLB + Deployment + LoadBalancer(多副本、高可用)
cat > ~/ingress-nginx/values-prod.yaml << 'EOF'
controller:
name: controller
image:
registry: dyrnq
image: controller
tag: "v1.9.5"
digest:
pullPolicy: IfNotPresent
kind: Deployment
replicaCount: 3 # 设置副本数为 3
affinity: # 设置软策略
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- ingress-nginx
topologyKey: kubernetes.io/hostname
weight: 100
admissionWebhooks: # 强烈建议开启 admission webhook
enabled: true
patch:
enabled: true
image:
registry: dyrnq
image: kube-webhook-certgen
tag: v20241011-8b53cabe0
digest:
pullPolicy: IfNotPresent
defaultBackend:
enabled: true
name: defaultbackend
image:
registry: dyrnq
image: defaultbackend-amd64
tag: "1.5"
digest:
pullPolicy: IfNotPresent
EOF
kubectl create ns ingress-nginx
helm upgrade --install --namespace ingress-nginx ingress-nginx -f ./values-prod.yaml .
卸载
[root@k8s-master ~/ingress-nginx]# helm delete ingress-nginx -n ingress-nginx
[root@k8s-master ~/ingress-nginx]# kubectl delete ns ingress-nginx
2.4、istio
版本:istio-1.20.0
cd && wget https://github.com/istio/istio/releases/download/1.20.2/istio-1.20.2-linux-amd64.tar.gz
tar xf istio-1.20.2-linux-amd64.tar.gz
cp ~/istio-1.20.2/bin/istioctl /usr/bin/istioctl
# istioctl version
no ready Istio pods in "istio-system"
1.20.2
istioctl install --set profile=demo -y
# istioctl version
client version: 1.20.2
control plane version: 1.20.2
data plane version: 1.20.2 (2 proxies)
stioctl 命令补全
yum -y install bash-completion
source /etc/profile.d/bash_completion.sh
cp ~/istio-1.20.2/tools/istioctl.bash ~/.istioctl.bash
source ~/.istioctl.bash
卸载
istioctl manifest generate --set profile=demo | kubectl delete --ignore-not-found=true -f -
kubectl delete namespace istio-system
2.5、argocd
https://github.com/argoproj/argo-cd/releaseshttps://github.com/argoproj/argo-cd/releases
mkdir -p ~/argocd-yml
kubectl create ns argocd
cd ~/argocd-yml && wget https://github.com/argoproj/argo-cd/raw/v2.9.0/manifests/install.yaml
kubectl apply -f ~/argocd-yml/install.yaml -n argocd
cat > ~/argocd-yml/argocd-Ingress.yml << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: argocd-ingress
namespace: argocd
annotations:
kubernetes.io/tls-acme: "true"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/ssl-passthrough: "true"
nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
nginx.ingress.kubernetes.io/proxy-body-size: '4G'
spec:
ingressClassName: nginx
rules:
- host: argocd.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: argocd-server # 将所有请求发送到 argocd 服务的 80 端口
port:
number: 80
tls:
- hosts:
- argocd.huanghuanhui.cloud
secretName: argocd-ingress-tls
EOF
kubectl create secret -n argocd \
tls argocd-ingress-tls \
--key=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key \
--cert=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt
kubectl apply -f ~/argocd-yml/argocd-Ingress.yml
访问:argocd.huanghuanhui.cloud
账号:admin
# 获取密码方式如下
1、
# echo $(kubectl get secret -n argocd argocd-initial-admin-secret -o yaml | grep password | awk -F: '{print $2}') | base64 -d
2、
# kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d && echo
用户:admin
密码:xzeXgS0aSIcIq-x5
2.6、Argo Rollouts
mkdir -p ~/argo-rollouts-yml
kubectl create ns argo-rollouts
cd ~/argo-rollouts-yml && wget https://github.com/argoproj/argo-rollouts/releases/download/v1.6.4/install.yaml
cd ~/argo-rollouts-yml && wget https://github.com/argoproj/argo-rollouts/releases/download/v1.6.4/dashboard-install.yaml
kubectl apply -n argo-rollouts -f ~/argo-rollouts-yml/install.yaml
kubectl apply -n argo-rollouts -f ~/argo-rollouts-yml/dashboard-install.yaml
# curl -LO https://github.com/argoproj/argo-rollouts/releases/download/v1.6.4/kubectl-argo-rollouts-linux-amd64
# 代理地址
curl -LO https://gh-proxy.com/https://github.com/argoproj/argo-rollouts/releases/download/v1.6.4/kubectl-argo-rollouts-linux-amd64
chmod +x ./kubectl-argo-rollouts-linux-amd64
mv ./kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts
kubectl argo rollouts version
cat > ~/argo-rollouts-yml/argo-rollouts-dashboard-Ingress.yml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: argo-rollouts-dashboard-ingress
namespace: argo-rollouts
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
nginx.ingress.kubernetes.io/proxy-body-size: '4G'
nginx.ingress.kubernetes.io/auth-type: basic
nginx.ingress.kubernetes.io/auth-secret: argo-rollouts-dashboard-auth
spec:
ingressClassName: nginx
rules:
- host: argo-rollouts-dashboard.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: argo-rollouts-dashboard
port:
number: 3100
tls:
- hosts:
- argo-rollouts-dashboard.huanghuanhui.cloud
secretName: argo-rollouts-dashboard-ingress-tls
EOF
yum -y install httpd-tools
$ htpasswd -nb admin Admin@2024 > ~/argo-rollouts-yml/auth
kubectl create secret generic argo-rollouts-dashboard-auth --from-file=/root/argo-rollouts-yml/auth -n argo-rollouts
kubectl create secret -n argo-rollouts \
tls argo-rollouts-dashboard-ingress-tls \
--key=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key \
--cert=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt
kubectl apply -f ~/argo-rollouts-yml/argo-rollouts-dashboard-Ingress.yml
访问地址:kargo-rollouts-dashboard.huanghuanhui.cloud
用户密码:admin、Admin@2024
2.7、metrics-server
版本:v0.6.4
k8s-v1.29.0
Metrics Server | Metrics API group/version | Supported Kubernetes version |
---|---|---|
0.6.x | metrics.k8s.io/v1beta1 | 1.19+ |
0.5.x | metrics.k8s.io/v1beta1 | *1.8+ |
0.4.x | metrics.k8s.io/v1beta1 | *1.8+ |
0.3.x | metrics.k8s.io/v1beta1 | 1.8-1.21 |
mkdir -p ~/metrics-server
# wget -O ~/metrics-server/components.yaml https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.6.4/components.yaml
# 代理地址
wget -O ~/metrics-server/components.yaml https://gh-proxy.com/https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.6.4/components.yaml --no-check-certificate
#1、添加"- --kubelet-insecure-tls"参数(匹配行后)
sed -i '/15s/a\ - --kubelet-insecure-tls' ~/metrics-server/components.yaml
#2、 修改镜像(默认谷歌k8s.gcr.io)
sed -i 's/registry.k8s.io\/metrics-server/dyrnq/g' ~/metrics-server/components.yaml
kubectl apply -f ~/metrics-server/components.yaml
kubectl get pods -n kube-system -l k8s-app=metrics-server
[root@k8s-master ~/metrics-server]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-master 211m 5% 1882Mi 24%
k8s-node1 155m 3% 985Mi 12%
k8s-node2 164m 4% 1249Mi 15%
[root@k8s-master ~/metrics-server]# kubectl top pod
NAME CPU(cores) MEMORY(bytes)
calico-kube-controllers-646b6595d5-5fgj9 2m 28Mi
calico-node-c8pfd 33m 137Mi
calico-node-ck4kt 36m 137Mi
calico-node-gw7xs 37m 138Mi
coredns-6d8c4cb4d-mk5f2 4m 22Mi
coredns-6d8c4cb4d-r7xfv 4m 22Mi
etcd-k8s-master 17m 86Mi
kube-apiserver-k8s-master 52m 422Mi
kube-controller-manager-k8s-master 20m 73Mi
kube-proxy-fzpcp 8m 30Mi
kube-proxy-l6jhz 4m 32Mi
kube-proxy-m6s7s 10m 30Mi
kube-scheduler-k8s-master 3m 25Mi
metrics-server-848b755f94-jv6mq 4m 21Mi
[root@k8s-master ~/metrics-server]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane,master 43d v1.28.4
k8s-node1 Ready <none> 43d v1.28.4
k8s-node2 Ready <none> 43d v1.28.4
[root@k8s-master ~/metrics-server]#
2.8、kube-state-metrics
版本:k8s-v1.29.0
kube-state-metrics | Kubernetes client-go Version |
---|---|
v2.6.0 | v1.24 |
v2.7.0 | v1.25 |
v2.8.2 | v1.26 |
v2.9.2 | v1.26 |
v2.10.0 | v1.27 |
main | v1.28 |
wget https://github.com/kubernetes/kube-state-metrics/raw/main/examples/standard/service-account.yaml
wget https://github.com/kubernetes/kube-state-metrics/raw/main/examples/standard/cluster-role.yaml
wget https://github.com/kubernetes/kube-state-metrics/raw/main/examples/standard/cluster-role-binding.yaml
wget https://github.com/kubernetes/kube-state-metrics/raw/main/examples/standard/deployment.yaml
wget https://github.com/kubernetes/kube-state-metrics/raw/main/examples/standard/service.yaml
# 修改镜像(默认谷歌k8s.gcr.io)
sed -i 's/registry.k8s.io\/kube-state-metrics/dyrnq/g' deployment.yaml
kubectl apply -f .
kube_state_metrics_podIP=`kubectl get pods -n kube-system -o custom-columns='NAME:metadata.name,podIP:status.podIPs[*].ip' |grep kube-state-metrics |awk '{print $2}'`
curl "http://$kube_state_metrics_podIP:8080/metric"
2.9、vertical-pod-autoscaler
版本:k8s-v1.23.17
mkdir -p ~/vertical-pod-autoscaler-yml
wget https://kgithub.com/kubernetes/autoscaler/raw/vertical-pod-autoscaler-0.13.0/vertical-pod-autoscaler/deploy/vpa-rbac.yaml
wget https://kgithub.com/kubernetes/autoscaler/raw/vertical-pod-autoscaler-0.13.0/vertical-pod-autoscaler/deploy/vpa-v1-crd-gen.yaml
wget https://kgithub.com/kubernetes/autoscaler/raw/vertical-pod-autoscaler-0.8.0/vertical-pod-autoscaler/pkg/admission-controller/gencerts.sh
wget https://kgithub.com/kubernetes/autoscaler/raw/vertical-pod-autoscaler-0.13.0/vertical-pod-autoscaler/deploy/admission-controller-deployment.yaml
wget https://kgithub.com/kubernetes/autoscaler/raw/vertical-pod-autoscaler-0.13.0/vertical-pod-autoscaler/deploy/recommender-deployment.yaml
wget https://kgithub.com/kubernetes/autoscaler/raw/vertical-pod-autoscaler-0.13.0/vertical-pod-autoscaler/deploy/updater-deployment.yaml
sed -i 's/Always/IfNotPresent/g' ./*
sed -i 's/k8s\.gcr\.io\/autoscaling/registry.cn-hangzhou.aliyuncs.com\/acs/g' ./*
kubectl apply -f vpa-rbac.yaml
kubectl apply -f vpa-v1-crd-gen.yaml
sh gencerts.sh
kubectl apply -f admission-controller-deployment.yaml
kubectl apply -f recommender-deployment.yaml
kubectl apply -f updater-deployment.yaml
验证 VPA
cat > nginx-deployment-basic.yml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment-basic
namespace: default
labels:
app: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.25.2-alpine
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx-service
namespace: default
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
type: ClusterIP
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: nginx-deployment-basic-vpa
namespace: default
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: nginx-deployment-basic
updatePolicy:
updateMode: "Off"
EOF
kubectl apply -f nginx-deployment-basic.yml
# 需要等待两分钟,才能返回结果
kubectl get vpa -n default
kubectl describe vpa nginx-deployment-basic-vpa -n default |tail -n 16
如下:
[root@k8s-master ~]# kubectl get vpa -n default
NAME MODE CPU MEM PROVIDED AGE
nginx-deployment-basic-vpa Off 25m 262144k True 5m3s
[root@k8s-master ~]# kubectl describe vpa nginx-deployment-basic-vpa -n default |tail -n 16
Recommendation:
Container Recommendations:
Container Name: nginx
Lower Bound:
Cpu: 25m
Memory: 262144k
Target:
Cpu: 25m
Memory: 262144k
Uncapped Target:
Cpu: 25m
Memory: 262144k
Upper Bound:
Cpu: 4089m
Memory: 8765548505
Events: <none>
yum -y install httpd-tools
# 50 并发、2000 个请求
ab -c 50 -n 2000 LoadBalancer(sample-app):8080/
ab -c 50 -n 2000 http://10.103.87.82/
ab -c 1000 -n 100000000 http://10.103.87.82/
2.10、kuboard-v3
mkdir -p ~/kuboard-v3-yml
cat > ~/kuboard-v3-yml/kuboard-v3.yaml << 'EOF'
---
apiVersion: v1
kind: Namespace
metadata:
name: kuboard
---
apiVersion: v1
kind: ConfigMap
metadata:
name: kuboard-v3-config
namespace: kuboard
data:
KUBOARD_ENDPOINT: 'http://192.168.1.201:30080'
KUBOARD_AGENT_SERVER_UDP_PORT: '30081'
KUBOARD_AGENT_SERVER_TCP_PORT: '30081'
KUBOARD_SERVER_LOGRUS_LEVEL: info
KUBOARD_AGENT_KEY: 32b7d6572c6255211b4eec9009e4a816
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: kuboard-etcd
namespace: kuboard
labels:
app: kuboard-etcd
spec:
serviceName: kuboard-etcd
replicas: 3
selector:
matchLabels:
app: kuboard-etcd
template:
metadata:
name: kuboard-etcd
labels:
app: kuboard-etcd
spec:
containers:
- name: kuboard-etcd
image: swr.cn-east-2.myhuaweicloud.com/kuboard/etcd:v3.4.14
ports:
- containerPort: 2379
name: client
- containerPort: 2380
name: peer
env:
- name: KUBOARD_ETCD_ENDPOINTS
value: >-
kuboard-etcd-0.kuboard-etcd:2379,kuboard-etcd-1.kuboard-etcd:2379,kuboard-etcd-2.kuboard-etcd:2379
#volumeMounts:
#- name: data
#mountPath: /data
command:
- /bin/sh
- -c
- |
PEERS="kuboard-etcd-0=http://kuboard-etcd-0.kuboard-etcd:2380,kuboard-etcd-1=http://kuboard-etcd-1.kuboard-etcd:2380,kuboard-etcd-2=http://kuboard-etcd-2.kuboard-etcd:2380"
exec etcd --name ${HOSTNAME} \
--listen-peer-urls http://0.0.0.0:2380 \
--listen-client-urls http://0.0.0.0:2379 \
--advertise-client-urls http://${HOSTNAME}.kuboard-etcd:2379 \
--initial-advertise-peer-urls http://${HOSTNAME}:2380 \
--initial-cluster-token kuboard-etcd-cluster-1 \
--initial-cluster ${PEERS} \
--initial-cluster-state new \
--data-dir /data/kuboard.etcd
#volumeClaimTemplates:
#- metadata:
#name: data
#spec:
#storageClassName: please-provide-a-valid-StorageClass-name-here
#accessModes: [ "ReadWriteMany" ]
#resources:
#requests:
#storage: 5Gi
---
apiVersion: v1
kind: Service
metadata:
name: kuboard-etcd
namespace: kuboard
spec:
type: ClusterIP
ports:
- port: 2379
name: client
- port: 2380
name: peer
selector:
app: kuboard-etcd
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: '9'
k8s.kuboard.cn/ingress: 'false'
k8s.kuboard.cn/service: NodePort
k8s.kuboard.cn/workload: kuboard-v3
labels:
k8s.kuboard.cn/name: kuboard-v3
name: kuboard-v3
namespace: kuboard
spec:
replicas: 1
selector:
matchLabels:
k8s.kuboard.cn/name: kuboard-v3
template:
metadata:
labels:
k8s.kuboard.cn/name: kuboard-v3
spec:
containers:
- env:
- name: KUBOARD_ETCD_ENDPOINTS
value: >-
kuboard-etcd-0.kuboard-etcd:2379,kuboard-etcd-1.kuboard-etcd:2379,kuboard-etcd-2.kuboard-etcd:2379
envFrom:
- configMapRef:
name: kuboard-v3-config
image: 'swr.cn-east-2.myhuaweicloud.com/kuboard/kuboard:v3'
imagePullPolicy: Always
name: kuboard
---
apiVersion: v1
kind: Service
metadata:
annotations:
k8s.kuboard.cn/workload: kuboard-v3
labels:
k8s.kuboard.cn/name: kuboard-v3
name: kuboard-v3
namespace: kuboard
spec:
ports:
- name: webui
nodePort: 30080
port: 80
protocol: TCP
targetPort: 80
- name: agentservertcp
nodePort: 30081
port: 10081
protocol: TCP
targetPort: 10081
- name: agentserverudp
nodePort: 30081
port: 10081
protocol: UDP
targetPort: 10081
selector:
k8s.kuboard.cn/name: kuboard-v3
sessionAffinity: None
type: NodePort
EOF
kubectl apply -f ~/kuboard-v3-yml/kuboard-v3.yaml
cat > ~/kuboard-v3-yml/kuboard-Ingress.yml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: kuboard-ingress
namespace: kuboard
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
spec:
ingressClassName: nginx
rules:
- host: kuboard.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: kuboard-v3
port:
number: 80
tls:
- hosts:
- kuboard.huanghuanhui.cloud
secretName: kuboard-ingress-tls
EOF
kubectl create secret -n kuboard \
tls kuboard-ingress-tls \
--key=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key \
--cert=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt
kubectl apply -f ~/kuboard-v3-yml/kuboard-Ingress.yml
访问地址:kuboard.huanghuanhui.cloud
用户密码:admin、Kuboard123
2.11、nfs-subdir-external-provisioner
k8s(pv 与 pvc)动态存储 StorageClass
k8s-1.29.0 持久化存储(nfs动态存储)
2.11.1、部署nfs
nfs 服务端(k8s-master)
# 所有服务端节点安装nfs
yum -y install nfs-utils
systemctl enable nfs-server rpcbind --now
# 创建nfs共享目录、授权
mkdir -p /data/k8s && chmod -R 777 /data/k8s
# 写入exports
cat > /etc/exports << EOF
/data/k8s 192.168.1.0/24(rw,sync,no_root_squash)
EOF
systemctl reload nfs-server
使用如下命令进行验证
# showmount -e 192.168.1.201
Export list for 192.168.1.201:
/data/k8s 192.168.1.0/24
nfs 客户端(k8s-node)
yum -y install nfs-utils
systemctl enable rpcbind --now
使用如下命令进行验证
# showmount -e 192.168.1.201
Export list for 192.168.1.201:
/data/k8s 192.168.1.0/24
备份
mkdir -p /data/k8s && chmod -R 777 /data/k8s
rsync -avzP /data/k8s root@192.168.1.203:/data
00 2 * * * rsync -avz /data/k8s root@192.168.1.203:/data &>/dev/null
2.11.2、动态创建 NFS存储(动态存储)
https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner
mkdir ~/nfs-subdir-external-provisioner-4.0.18 && cd ~/nfs-subdir-external-provisioner-4.0.18
版本:nfs-subdir-external-provisioner-4.0.18
# wget https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner/raw/nfs-subdir-external-provisioner-4.0.18/deploy/deployment.yaml
# wget https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner/raw/nfs-subdir-external-provisioner-4.0.18/deploy/rbac.yaml
# wget https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner/raw/nfs-subdir-external-provisioner-4.0.18/deploy/class.yaml
# 代理地址
wget https://gh-proxy.com/https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner/raw/nfs-subdir-external-provisioner-4.0.18/deploy/deployment.yaml --no-check-certificate
wget https://gh-proxy.com/https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner/raw/nfs-subdir-external-provisioner-4.0.18/deploy/rbac.yaml --no-check-certificate
wget https://gh-proxy.com/https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner/raw/nfs-subdir-external-provisioner-4.0.18/deploy/class.yaml --no-check-certificate
# 1、修改镜像(默认谷歌k8s.gcr.io)
sed -i 's/registry.k8s.io\/sig-storage/dyrnq/g' deployment.yaml
# 2、修改nfs服务端地址
sed -i 's/10.3.243.101/192.168.1.201/g' deployment.yaml
# 3、修改存储地址(/data/k8s)
sed -i 's#\/ifs\/kubernetes#\/data\/k8s#g' deployment.yaml
sed -i 's#nfs-client#nfs-storage#g' class.yaml
sed -i 's/namespace: default/namespace: nfs-storage/g' rbac.yaml deployment.yaml
使用这个镜像:dyrnq/nfs-subdir-external-provisioner:v4.0.2
dockerhub 地址:https://hub.docker.com/r/dyrnq/nfs-subdir-external-provisioner/tags
kubectl create ns nfs-storage
kubectl -n nfs-storage apply -f .
kubectl get pods -n nfs-storage -l app=nfs-client-provisioner
kubectl get storageclass
2.12、Velero + minio(备份容灾)
Velero结合minio实现kubernetes业务数据备份与恢复
备份容灾到 minio 上(k8s、gitlab、jenkins)
2.12.1、minio
docker run -d \
--name minio \
--restart always \
--privileged=true \
-p 9000:9000 \
-p 5000:5000 \
-v ~/minio/data:/data \
-e "MINIO_ROOT_USER=admin" \
-e "MINIO_ROOT_PASSWORD=Admin@2024" \
-v /etc/localtime:/etc/localtime \
-v /etc/timezone:/etc/timezone \
minio/minio:RELEASE.2024-11-01T18-37-25Z \
server /data --console-address ":5000"
web访问地址:http://192.168.1.201:5000
账号密码: admin、Admin@2024
docker pull minio/mc:latest
0、创建 velero-k8s 桶
docker run --rm -it --entrypoint=/bin/sh minio/mc -c "
mc alias set minio http://192.168.1.201:9000 f23qFGN5X8WQMcciTa9u kCvoVXPZd4kGk1fJ2J3XnubLwTmvgiG83kQXsRVQ
mc mb minio/velero-k8s"
1、创建 gitlab 桶
docker run --rm -it --entrypoint=/bin/sh minio/mc -c "
mc alias set minio http://192.168.1.201:9000 f23qFGN5X8WQMcciTa9u kCvoVXPZd4kGk1fJ2J3XnubLwTmvgiG83kQXsRVQ
mc mb minio/gitlab"
2、创建 jenkins 桶
docker run --rm -it --entrypoint=/bin/sh minio/mc -c "
mc config host add minio http://192.168.1.201:9000 f23qFGN5X8WQMcciTa9u kCvoVXPZd4kGk1fJ2J3XnubLwTmvgiG83kQXsRVQ
mc mb minio/jenkins"
2.12.2、velero(集群 A 和 集群 B)
1、安装
wget https://github.com/vmware-tanzu/velero/releases/download/v1.12.1/velero-v1.12.1-linux-amd64.tar.gz
tar xf ~/velero-v1.12.1-linux-amd64.tar.gz
cp ~/velero-v1.12.1-linux-amd64/velero /usr/local/sbin
mkdir -p ~/velero
cat > ~/velero/velero-auth.txt << 'EOF'
# 创建访问minio的认证文件
[default]
aws_access_key_id = admin
aws_secret_access_key = Admin@2024
EOF
velero install --help |grep Image
(default "velero/velero:v1.12.1")
# 安装
velero --kubeconfig /root/.kube/config \
install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.8.1 \
--bucket velero-k8s \
--secret-file ~/velero/velero-auth.txt \
--use-volume-snapshots=false \
--uploader-type=restic \
--use-node-agent \
--image=velero/velero:v1.12.1 \
--namespace velero-system \
--backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.1.201:9000
# 卸载
velero uninstall --namespace velero-system --kubeconfig /root/.kube/config
2.12.3、k8s容灾备份
2.12.3.1、手动备份
备份不带 pv 的 pod
DATE=`date +%F-%H-%M-%S`
k8s_ns=ruoyi-cloud
velero backup create ${k8s_ns}-backup-${DATE} \
--include-namespaces ${k8s_ns} \
--kubeconfig=/root/.kube/config \
--namespace velero-system
velero backup get --kubeconfig=/root/.kube/config --namespace velero-system
DATE=`date +%F-%H-%M-%S`
k8s_ns=redis-2
velero backup create ${k8s_ns}-backup-${DATE} \
--include-namespaces ${k8s_ns} \
--kubeconfig=/root/.kube/config \
--namespace velero-system
# Velero可以将资源还原到与其备份来源不同的命名空间中。为此,请使用--namespace-mappings标志
# 例如下面将 redis 命名空间资源恢复到 redis-bak 下面
kubectl create ns redis-bak
velero restore create --from-backup "redis-2-backup-2024-11-07-22-47-17" --namespace-mappings redis:redis-bak --wait --kubeconfig=/root/.kube/config --namespace velero-system
备份带 pv 的 pod
DATE=`date +%F-%H-%M-%S`
k8s_ns=jenkins-prod
velero backup create ${k8s_ns}-backup-${DATE} \
--include-namespaces ${k8s_ns} \
--default-volumes-to-fs-backup \
--kubeconfig=/root/.kube/config \
--namespace velero-system
velero backup get --kubeconfig=/root/.kube/config --namespace velero-system
"jenkins-prod-backup-2024-11-08-13-05-31"
velero restore create --from-backup "jenkins-prod-backup-2024-11-08-13-05-31" --wait --kubeconfig=/root/.kube/config --namespace velero-system
2.12.3.2、自动备份
生产:每天0分备份,备份保留7天
生产:每小时备份,备份保留7天
# 创建备份计划(保留备份数据 7 天)
k8s_ns=jenkins-prod
velero schedule create ${k8s_ns}-backup \
--schedule="0 0 * * *" \
--ttl 168h0m0s \
--include-namespaces ${k8s_ns} \
--default-volumes-to-fs-backup \
--kubeconfig=/root/.kube/config \
--namespace velero-system
生产:每小时备份,备份保留7天
# 创建备份计划(保留备份数据 7 天)
k8s_ns=kube-system
velero schedule create ${k8s_ns}-backup \
--schedule="0 * * * *" \
--ttl 168h0m0s \
--include-namespaces ${k8s_ns} \
--kubeconfig=/root/.kube/config \
--namespace velero-system
# 查看备份计划
velero schedule get --kubeconfig=/root/.kube/config --namespace velero-system
# 查看备份结果
velero backup get --kubeconfig=/root/.kube/config --namespace velero-system
# 删除备份结果
velero backup delete kube-system-backup-2024-11-07-13-50-34 --kubeconfig=/root/.kube/config --namespace velero-system
# 删除备份计划
k8s_ns=jenkins-prod
velero schedule delete ${k8s_ns}-backup --kubeconfig=/root/.kube/config --namespace velero-system
2.12.3.3、恢复
# 查看备份结果
velero backup get --kubeconfig=/root/.kube/config --namespace velero-system
velero restore create --from-backup kube-system-backup-2024-11-07-13-50-34 --wait --kubeconfig=/root/.kube/config --namespace velero-system
恢复带 pv 的 pod,当前集群可以直接恢复,跨集群迁移(集群A、集群B)两个集群的 StorageClass 要保持一致
2.12.3.4、迁移
集群 A 和 集群 B 都需要安装 Velero 实例(1.5版本以上),并且共用同一个对象存储 COS 存储桶作为 Velero 后端存储
# 集群 A(备份)
# 集群 B(还原)
velero restore create --from-backup kube-system-backup-2024-11-07-13-50-34 --wait --kubeconfig=/root/.kube/config --namespace velero-system
2.12.4、Gitlab(容灾备份)
备份 gitlab
docker exec -t gitlab gitlab-backup create
把 gitlab 备份上传到 minio 上的 gitlab 桶上
docker run --rm -it --entrypoint=/bin/sh -v ~/gitlab/data/backups:/data minio/mc -c "
mc config host add minio http://192.168.1.201:9000 f23qFGN5X8WQMcciTa9u kCvoVXPZd4kGk1fJ2J3XnubLwTmvgiG83kQXsRVQ
mc cp -r /data/ minio/gitlab"
docker run --rm -it --entrypoint=/bin/sh -v ~/gitlab/config/gitlab.rb:/data/gitlab.rb minio/mc -c "
mc config host add minio http://192.168.1.201:9000 f23qFGN5X8WQMcciTa9u kCvoVXPZd4kGk1fJ2J3XnubLwTmvgiG83kQXsRVQ
mc cp /data/gitlab.rb minio/gitlab/gitlab.rb-$(date +%Y-%m-%d_%H:%M:%S)"
docker run --rm -it --entrypoint=/bin/sh -v ~/gitlab/config/gitlab-secrets.json:/data/gitlab-secrets.json minio/mc -c "
mc config host add minio http://192.168.1.201:9000 f23qFGN5X8WQMcciTa9u kCvoVXPZd4kGk1fJ2J3XnubLwTmvgiG83kQXsRVQ
mc cp /data/gitlab-secrets.json minio/gitlab/gitlab-secrets.json-$(date +%Y-%m-%d_%H:%M:%S)"
2.12.5、Jenkins(容灾备份)
把 jenkins 备份上传到 minio 上的 jenkins 桶上
docker run --rm -it --entrypoint=/bin/sh -v /data/k8s/jenkins-prod-jenkins-home-prod-pvc-69144358-4a0c-489f-a8f0-089fe28eed21/jobs:/jobs minio/mc -c "
mc config host add minio http://192.168.1.201:9000 f23qFGN5X8WQMcciTa9u kCvoVXPZd4kGk1fJ2J3XnubLwTmvgiG83kQXsRVQ
mc cp -r /jobs/ minio/jenkins/jobs-$(date +%Y-%m-%d_%H:%M:%S)"
docker run --rm -it --entrypoint=/bin/sh -v /data/k8s/jenkins-prod-jenkins-home-prod-pvc-69144358-4a0c-489f-a8f0-089fe28eed21/config.xml:/jobs/config.xml minio/mc -c "
mc config host add minio http://192.168.1.201:9000 f23qFGN5X8WQMcciTa9u kCvoVXPZd4kGk1fJ2J3XnubLwTmvgiG83kQXsRVQ
mc cp /jobs/config.xml minio/jenkins/config.xml-$(date +%Y-%m-%d_%H:%M:%S)"
计划任务
cat > minio-bak.sh << 'EOF'
# === 1、gitlab ===#
# 备份 gitlab
docker exec -t gitlab gitlab-backup create
# 把 gitlab 备份上传到 minio 上的 gitlab 桶上
docker run --rm --entrypoint=/bin/sh -v /root/gitlab/data/backups:/data minio/mc -c "
mc config host add minio http://192.168.1.201:9000 f23qFGN5X8WQMcciTa9u kCvoVXPZd4kGk1fJ2J3XnubLwTmvgiG83kQXsRVQ
mc cp -r /data/ minio/gitlab"
docker run --rm --entrypoint=/bin/sh -v /root/gitlab/config/gitlab.rb:/data/gitlab.rb minio/mc -c "
mc config host add minio http://192.168.1.201:9000 f23qFGN5X8WQMcciTa9u kCvoVXPZd4kGk1fJ2J3XnubLwTmvgiG83kQXsRVQ
mc cp /data/gitlab.rb minio/gitlab/gitlab.rb-$(date +%Y-%m-%d_%H:%M:%S)"
docker run --rm --entrypoint=/bin/sh -v /root/gitlab/config/gitlab-secrets.json:/data/gitlab-secrets.json minio/mc -c "
mc config host add minio http://192.168.1.201:9000 f23qFGN5X8WQMcciTa9u kCvoVXPZd4kGk1fJ2J3XnubLwTmvgiG83kQXsRVQ
mc cp /data/gitlab-secrets.json minio/gitlab/gitlab-secrets.json-$(date +%Y-%m-%d_%H:%M:%S)"
# === 2、jenkins ===#
# 把 jenkins 备份上传到 minio 上的 jenkins 桶上
docker run --rm --entrypoint=/bin/sh -v /data/k8s/jenkins-prod-jenkins-home-prod-pvc-69144358-4a0c-489f-a8f0-089fe28eed21/jobs:/jobs minio/mc -c "
mc config host add minio http://192.168.1.201:9000 f23qFGN5X8WQMcciTa9u kCvoVXPZd4kGk1fJ2J3XnubLwTmvgiG83kQXsRVQ
mc cp -r /jobs/ minio/jenkins/jobs-$(date +%Y-%m-%d_%H:%M:%S)"
docker run --rm --entrypoint=/bin/sh -v /data/k8s/jenkins-prod-jenkins-home-prod-pvc-69144358-4a0c-489f-a8f0-089fe28eed21/config.xml:/jobs/config.xml minio/mc -c "
mc config host add minio http://192.168.1.201:9000 f23qFGN5X8WQMcciTa9u kCvoVXPZd4kGk1fJ2J3XnubLwTmvgiG83kQXsRVQ
mc cp /jobs/config.xml minio/jenkins/config.xml-$(date +%Y-%m-%d_%H:%M:%S)"
EOF
设置每天晚上2点的计划任务
# crontab -l
0 2 * * * sh /root/minio-bak.sh >> /root/minio-bak.log 2>&1
2.12.6、etcd 客户端 etcdctl 方式备份整个集群
计划任务备份 k8s-etcd
wget https://kgithub.com/etcd-io/etcd/releases/download/v3.5.9/etcd-v3.5.9-linux-amd64.tar.gz
tar xf etcd-v3.5.9-linux-amd64.tar.gz
cp etcd-v3.5.9-linux-amd64/etcdctl /usr/local/sbin
ETCDCTL_API=3 etcdctl \
--write-out=table \
--cert="/etc/kubernetes/pki/etcd/server.crt" \
--key="/etc/kubernetes/pki/etcd/server.key" \
--cacert="/etc/kubernetes/pki/etcd/ca.crt" \
--endpoints 127.0.0.1:2379 \
endpoint health
mkdir -p ~/crontab
mkdir -p /data/k8s-etcd-backup
cat > ~/crontab/k8s-etcd-pod.sh << 'EOF'
#!/bin/bash
# 每天凌晨0点备份(k8s-etcd-pod)
# 0 0 * * * /bin/sh /root/crontab/k8s-etcd-pod.sh
k8s_etcd_DATE=`date +%F-%H-%M-%S`
ETCDCTL_API=3 /usr/local/sbin/etcdctl \
--write-out=table \
--cert="/etc/kubernetes/pki/etcd/server.crt" \
--key="/etc/kubernetes/pki/etcd/server.key" \
--cacert="/etc/kubernetes/pki/etcd/ca.crt" \
--endpoints 127.0.0.1:2379 \
snapshot save /data/k8s-etcd-backup/${k8s_etcd_DATE}-snapshot.bak
# 备份保留7天
find /data/k8s-etcd-backup -name "*.bak" -mtime +7 -exec rm -rf {} \;
EOF
[root@master ~]# crontab -l
0 0 * * * sh /root/crontab/k8s-etcd-pod.sh
[root@master ~]# crontab -l
* * * * * sh /root/crontab/k8s-etcd-pod.sh
tail -f /var/spool/mail/root
# 备份保留7天
find /data/k8s-etcd-backup -name "*.bak"
find /data/k8s-etcd-backup -name "*.bak" -mtime +7 -exec rm -rf {} \;
# 备份保留7分钟
find /data/k8s-etcd-backup -name "*.bak"
find /data/k8s-etcd-backup -name "*.bak" -mmin +7 -exec rm -rf {} \;
2.13、gitlab
4c8g、100g
docker安装gitlab(使用k8s的ingress暴露)
版本:Tags · GitLab.org / GitLab FOSS · GitLab
官方docker仓库:https://hub.docker.com/r/gitlab/gitlab-ce/tags
docker pull gitlab/gitlab-ce:16.7.3-ce.0
docker pull ccr.ccs.tencentyun.com/huanghuanhui/gitlab:16.7.3-ce.0
cd && mkdir gitlab && cd gitlab && export GITLAB_HOME=/root/gitlab
docker run -d \
--name gitlab \
--hostname 'gitlab.huanghuanhui.cloud' \
--restart always \
--privileged=true \
-p 9797:80 \
-v $GITLAB_HOME/config:/etc/gitlab \
-v $GITLAB_HOME/logs:/var/log/gitlab \
-v $GITLAB_HOME/data:/var/opt/gitlab \
-e TIME_ZONE='Asia/Shanghai' \
ccr.ccs.tencentyun.com/huanghuanhui/gitlab:16.7.3-ce.0
初始化默认密码:
docker exec -it gitlab grep 'Password:' /etc/gitlab/initial_root_password
使用k8s的ingress暴露
mkdir -p ~/gitlab-yml
kubectl create ns gitlab
cat > ~/gitlab-yml/gitlab-endpoints.yml << 'EOF'
apiVersion: v1
kind: Endpoints
metadata:
name: gitlab-service
namespace: gitlab
subsets:
- addresses:
- ip: 192.168.1.201
ports:
- port: 9797
EOF
kubectl apply -f ~/gitlab-yml/gitlab-endpoints.yml
cat > ~/gitlab-yml/gitlab-Service.yml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: gitlab-service
namespace: gitlab
spec:
ports:
- protocol: TCP
port: 80
targetPort: 9797
EOF
kubectl apply -f ~/gitlab-yml/gitlab-Service.yml
cat > ~/gitlab-yml/gitlab-Ingress.yml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: gitlab-ingress
namespace: gitlab
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
nginx.ingress.kubernetes.io/proxy-body-size: '4G'
spec:
ingressClassName: nginx
rules:
- host: gitlab.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: gitlab-service
port:
number: 80
tls:
- hosts:
- gitlab.huanghuanhui.cloud
secretName: gitlab-ingress-tls
EOF
kubectl create secret -n gitlab \
tls gitlab-ingress-tls \
--key=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key \
--cert=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt
kubectl apply -f ~/gitlab-yml/gitlab-Ingress.yml
访问地址:gitlab.huanghuanhui.cloud
https://gitlab.huanghuanhui.cloud/admin/users/root/edit
设置账号密码为:root、huanghuanhui@2024
2.14、harbor
2c4g、400g
docker-compose安装harbor-v2.8.4
2.14.1、安装 docker
腾讯源
wget -O /etc/yum.repos.d/docker-ce.repo https://download.docker.com/linux/centos/docker-ce.repo
sudo sed -i 's+download.docker.com+mirrors.cloud.tencent.com/docker-ce+' /etc/yum.repos.d/docker-ce.repo
yum -y install docker-ce
2.14.2、安装 docker-compose
官方文档:Overview of installing Docker Compose | Docker Docs
github:https://github.com/docker/compose/releases/
wget -O /usr/local/sbin/docker-compose https://gh-proxy.com/https://github.com/docker/compose/releases/download/v2.20.3/docker-compose-linux-x86_64 --no-check-certificate
chmod +x /usr/local/sbin/docker-compose
2.14.3、安装 harbor
https://github.com/goharbor/harbor/releases (离线下载上传)
wget https://github.com/goharbor/harbor/releases/download/v2.10.0/harbor-offline-installer-v2.10.0.tgz
cd && tar xf harbor-offline-installer-v2.10.0.tgz -C /usr/local/
ls -la /usr/local/harbor/
cp /usr/local/harbor/harbor.yml.tmpl /usr/local/harbor/harbor.yml
修改配置文件:
# harbor.yml
1、改成本机ip(域名)
hostname: harbor.huanghuanhui.cloud
2、修改https协议证书位置
https:
port: 443
certificate: /root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt
private_key: /root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key
3、修改登录密码(生产环境一定要修改)
harbor_admin_password: Admin@2024
sed -i.bak 's/reg\.mydomain\.com/harbor.huanghuanhui.cloud/g' /usr/local/harbor/harbor.yml
sed -i 's#certificate: .*#certificate: /root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt#g' /usr/local/harbor/harbor.yml
sed -i 's#private_key: .*#private_key: /root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key#g' /usr/local/harbor/harbor.yml
sed -i 's/Harbor12345/Admin@2024/g' /usr/local/harbor/harbor.yml
# ./install.sh(执行安装脚本)
/usr/local/harbor/install.sh
docker ps |grep harbor
访问地址:harbor.huanghuanhui.cloud
账号密码:admin、Admin@2024
2.15、jenkins
台式机
k8s手撕yml方式部署最新版 Jenkins 2.431(jdk-21版)(jenkins-prod)
mkdir -p ~/jenkins-prod-yml
kubectl create ns jenkins-prod
kubectl label node k8s-node1 jenkins-prod=jenkins-prod
cat > ~/jenkins-prod-yml/Jenkins-prod-rbac.yml << 'EOF'
apiVersion: v1
kind: Namespace
metadata:
name: jenkins-prod
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: jenkins-prod
namespace: jenkins-prod
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: jenkins-prod
rules:
- apiGroups:
- '*'
resources:
- statefulsets
- services
- replicationcontrollers
- replicasets
- podtemplates
- podsecuritypolicies
- pods
- pods/log
- pods/exec
- podpreset
- poddisruptionbudget
- persistentvolumes
- persistentvolumeclaims
- jobs
- endpoints
- deployments
- deployments/scale
- daemonsets
- cronjobs
- configmaps
- namespaces
- events
- secrets
verbs:
- create
- get
- watch
- delete
- list
- patch
- update
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- update
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: jenkins-prod
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: jenkins-prod
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts:jenkins-prod
EOF
kubectl apply -f ~/jenkins-prod-yml/Jenkins-prod-rbac.yml
cat > ~/jenkins-prod-yml/Jenkins-prod-Service.yml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: jenkins-prod
namespace: jenkins-prod
labels:
app: jenkins-prod
spec:
selector:
app: jenkins-prod
type: NodePort
ports:
- name: web
nodePort: 30456
port: 8080
targetPort: web
- name: agent
nodePort: 30789
port: 50000
targetPort: agent
EOF
kubectl apply -f ~/jenkins-prod-yml/Jenkins-prod-Service.yml
cat > ~/jenkins-prod-yml/Jenkins-prod-Deployment.yml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: jenkins-prod
namespace: jenkins-prod
labels:
app: jenkins-prod
spec:
replicas: 1
selector:
matchLabels:
app: jenkins-prod
template:
metadata:
labels:
app: jenkins-prod
spec:
tolerations:
- effect: NoSchedule
key: no-pod
operator: Exists
nodeSelector:
jenkins-prod: jenkins-prod
containers:
- name: jenkins-prod
#image: jenkins/jenkins:2.440-jdk21
image: ccr.ccs.tencentyun.com/huanghuanhui/jenkins:2.440-jdk21
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: "2"
memory: "4Gi"
requests:
cpu: "1"
memory: "2Gi"
securityContext:
runAsUser: 0
ports:
- containerPort: 8080
name: web
protocol: TCP
- containerPort: 50000
name: agent
protocol: TCP
env:
- name: LIMITS_MEMORY
valueFrom:
resourceFieldRef:
resource: limits.memory
divisor: 1Mi
- name: JAVA_OPTS
value: -Dhudson.security.csrf.GlobalCrumbIssuerConfiguration.DISABLE_CSRF_PROTECTION=true
volumeMounts:
- name: jenkins-home-prod
mountPath: /var/jenkins_home
- mountPath: /etc/localtime
name: localtime
volumes:
- name: jenkins-home-prod
persistentVolumeClaim:
claimName: jenkins-home-prod
- name: localtime
hostPath:
path: /etc/localtime
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jenkins-home-prod
namespace: jenkins-prod
spec:
storageClassName: "nfs-storage"
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 2Ti
EOF
kubectl apply -f ~/jenkins-prod-yml/Jenkins-prod-Deployment.yml
cat > ~/jenkins-prod-yml/Jenkins-prod-Ingress.yml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: jenkins-prod-ingress
namespace: jenkins-prod
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
nginx.ingress.kubernetes.io/proxy-body-size: '4G'
spec:
ingressClassName: nginx
rules:
- host: jenkins-prod.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: jenkins-prod # 将所有请求发送到 jenkins-prod 服务的 8080 端口
port:
number: 8080
tls:
- hosts:
- jenkins-prod.huanghuanhui.cloud
secretName: jenkins-prod-ingress-tls
EOF
kubectl create secret -n jenkins-prod \
tls jenkins-prod-ingress-tls \
--key=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key \
--cert=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt
kubectl apply -f ~/jenkins-prod-yml/Jenkins-prod-Ingress.yml
访问地址:jenkins-prod.huanghuanhui.cloud
设置账号密码为:admin、Admin@2024
# 插件
1、Localization: Chinese (Simplified)
2、Pipeline
3、Kubernetes
4、Git
5、Git Parameter
6、GitLab # webhook 触发构建
7、Config FIle Provider # 连接远程k8s集群
#8、Extended Choice Parameter
9、SSH Pipeline Steps # Pipeline通过ssh远程执行命令
10、Pipeline: Stage View
11、Role-based Authorization Strategy
12、DingTalk # 钉钉机器人
http://jenkins-prod.jenkins-prod:8080
cat > ~/jenkins-prod-yml/Jenkins-prod-slave-maven-cache.yml << 'EOF'
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jenkins-prod-slave-maven-cache
namespace: jenkins-prod
spec:
storageClassName: "nfs-storage"
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 2Ti
EOF
cat > ~/jenkins-prod-yml/Jenkins-prod-slave-node-cache.yml << 'EOF'
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jenkins-prod-slave-node-cache
namespace: jenkins-prod
spec:
storageClassName: "nfs-storage"
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 2Ti
EOF
cat > ~/jenkins-prod-yml/Jenkins-prod-slave-golang-cache.yml << 'EOF'
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jenkins-prod-slave-golang-cache
namespace: jenkins-prod
spec:
storageClassName: "nfs-storage"
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 2Ti
EOF
cat > ~/jenkins-prod-yml/Jenkins-prod-slave-go-build-cache.yml << 'EOF'
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jenkins-prod-slave-go-build-cache
namespace: jenkins-prod
spec:
storageClassName: "nfs-storage"
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 2Ti
EOF
2.16、prometheus + grafana + alertmanager
k8s 手撕方式安装 prometheus + grafana + alertmanager
k8s版本:k8s-1.28.4
prometheus + grafana + alertmanager 监控报警
2.16.1、k8s 手撕方式安装 prometheus
mkdir ~/prometheus-yml
kubectl create ns monitoring
cat > ~/prometheus-yml/prometheus-rbac.yml << 'EOF'
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoring
EOF
kubectl apply -f ~/prometheus-yml/prometheus-rbac.yml
cat > ~/prometheus-yml/prometheus-ConfigMap.yml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ['localhost:9090']
EOF
kubectl apply -f ~/prometheus-yml/prometheus-ConfigMap.yml
这里暂时只配置了对 prometheus 本身的监控
如果以后有新的资源需要被监控,只需要将 ConfigMap 对象更新即可
cat > ~/prometheus-yml/prometheus-ConfigMap.yml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
# 告警规则文件
rule_files:
- /etc/prometheus/rules.yml
- /etc/prometheus/rules/*.rules.yml
# 对接alertmanager
alerting:
alertmanagers:
- static_configs:
- targets: ["alertmanager-service.monitoring.svc.cluster.local:9093"]
scrape_configs:
#0、监控 prometheus
- job_name: prometheus
static_configs:
- targets: ['localhost:9090']
- job_name: 1.15.172.119
static_configs:
- targets: ['1.15.172.119:9100']
#1、监控 k8s节点
- job_name: 'k8s-nodes'
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100'
target_label: __address__
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
#2、监控 k8s-etcd
- job_name: 'k8s-etcd'
metrics_path: metrics
scheme: http
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
regex: etcd-k8s
action: keep
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
#3、监控 kube-apiserver
- job_name: 'kube-apiserver'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
#4、监控 kube-controller-manager
- job_name: 'kube-controller-manager'
kubernetes_sd_configs:
- role: endpoints
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
action: keep
regex: kube-system;kube-controller-manager
#5、监控 kube-scheduler
- job_name: 'kube-scheduler'
kubernetes_sd_configs:
- role: endpoints
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
action: keep
regex: kube-system;kube-scheduler
#6、监控 kubelet
- job_name: 'kubelet'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
#7、监控 kube-proxy
- job_name: 'kube-proxy'
metrics_path: metrics
scheme: http
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: false
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
regex: kube-proxy
action: keep
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
#8、监控 coredns
- job_name: 'coredns'
static_configs:
- targets: ['kube-dns.kube-system.svc.cluster.local:9153']
#9、监控容器
- job_name: 'kubernetes-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
replacement: /metrics/cadvisor
target_label: __metrics_path__
#10、svc自动发现
- job_name: 'k8s-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
# 11、监控 kube-state-metrics
- job_name: "kube-state-metrics"
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name]
regex: kube-system;kube-state-metrics
action: keep
# 告警规则
rules.yml: |
groups:
- name: test-node-mem
rules:
- alert: NodeMemoryUsage
expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes)) / node_memory_MemTotal_bytes * 100 > 20
for: 1m
labels:
cluster: RTG
severity: P1
annotations:
summary: "{{$labels.instance}}: High Memory usage detected"
description: "{{$labels.instance}}: Memory usage is above 20% (current value is: {{ $value }})"
- name: Hosts.rules
rules:
## Custom By huanghuanhui
- alert: HostDown
expr: up == 0
for: 1m
labels:
cluster: RTG
severity: P1
annotations:
Summary: '主机{{ $labels.instance }} ${{ $labels.job }} down'
description: "主机: 【{{ $labels.instance }}】has been down for more than 1 minute"
- alert: HostCpuLoadAvage
expr: node_load5 /count by (instance, job) (node_cpu_seconds_total{mode="idle"}) >= 0.95
for: 1m
annotations:
Summary: "主机{{ $labels.instance }} cpu 5分钟负载比率大于1 (当前值:{{ $value }})"
description: "主机: 【{{ $labels.instance }}】 cpu_load5值大于核心数。 (当前比率值:{{ $value }})"
labels:
cluster: RTG
severity: 'P3'
- alert: HostCpuUsage
expr: (1-((sum(increase(node_cpu_seconds_total{mode="idle"}[5m])) by (instance))/ (sum(increase(node_cpu_seconds_total[5m])) by (instance))))*100 > 80
for: 1m
annotations:
Summary: "主机{{ $labels.instance }} CPU 5分钟使用率大于80% (当前值:{{ $value }})"
description: "主机: 【{{ $labels.instance }}】 5五分钟内CPU使用率超过80% (当前值:{{ $value }})"
labels:
cluster: RTG
severity: 'P1'
- alert: HostMemoryUsage
expr: (1-((node_memory_Buffers_bytes + node_memory_Cached_bytes + node_memory_MemFree_bytes)/node_memory_MemTotal_bytes))*100 > 80
for: 1m
annotations:
Summary: "主机{{ $labels.instance }} 内存使用率大于80% (当前值:{{ $value }})"
description: "主机: 【{{ $labels.instance }}】 内存使用率超过80% (当前使用率:{{ $value }}%)"
labels:
cluster: RTG
severity: 'P3'
- alert: HostIOWait
expr: ((sum(increase(node_cpu_seconds_total{mode="iowait"}[5m])) by (instance))/(sum(increase(node_cpu_seconds_total[5m])) by (instance)))*100 > 10
for: 1m
annotations:
Summary: "主机{{ $labels.instance }} iowait大于10% (当前值:{{ $value }})"
description: "主机: 【{{ $labels.instance }}】 5五分钟内磁盘IO过高 (当前负载值:{{ $value }})"
labels:
cluster: RTG
severity: 'P3'
- alert: HostFileSystemUsage
expr: (1-(node_filesystem_free_bytes{fstype=~"ext4|xfs",mountpoint!~".*tmp|.*boot" }/node_filesystem_size_bytes{fstype=~"ext4|xfs",mountpoint!~".*tmp|.*boot" }))*100 > 80
for: 1m
annotations:
Summary: "主机{{ $labels.instance }} {{ $labels.mountpoint }} 磁盘空间使用大于80% (当前值:{{ $value }})"
description: "主机: 【{{ $labels.instance }}】 {{ $labels.mountpoint }}分区使用率超过80%, 当前值使用率:{{ $value }}%"
labels:
cluster: RTG
severity: 'P3'
- alert: HostSwapIsFillingUp
expr: (1 - (node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes)) * 100 > 80
for: 2m
labels:
cluster: RTG
severity: 'P4'
annotations:
Summary: "主机: 【{{ $labels.instance }}】 swap分区使用超过 (>80%), 当前值使用率: {{ $value }}%"
description: "主机: 【{{ $labels.instance }}】 swap分区使用超过 (>80%), 当前值使用率: {{ $value }}%"
- alert: HostNetworkConnection-ESTABLISHED
expr: sum(node_netstat_Tcp_CurrEstab) by (instance) > 2000
for: 5m
labels:
cluster: RTG
severity: 'P4'
annotations:
Summary: "主机{{ $labels.instance }} ESTABLISHED连接数过高 (当前值:{{ $value }})"
description: "主机: 【{{ $labels.instance }}】 ESTABLISHED连接数超过2000, 当前ESTABLISHED连接数: {{ $value }}"
- alert: HostNetworkConnection-TIME_WAIT
expr: sum(node_sockstat_TCP_tw) by (instance) > 1000
for: 5m
labels:
cluster: RTG
severity: 'P3'
annotations:
Summary: "主机{{ $labels.instance }} TIME_WAIT连接数过高 (当前值:{{ $value }})"
description: "主机: 【{{ $labels.instance }}】 TIME_WAIT连接数超过1000, 当前TIME_WAIT连接数: {{ $value }}"
- alert: HostUnusualNetworkThroughputIn
expr: sum by (instance, device) (rate(node_network_receive_bytes_total{device=~"eth.*"}[2m])) / 1024 / 1024 > 300
for: 5m
labels:
cluster: RTG
severity: 'P3'
annotations:
Summary: "主机{{ $labels.instance }} 入口流量超过 (> 300 MB/s) (当前值:{{ $value }})"
description: "主机: 【{{ $labels.instance }}】, 网卡: {{ $labels.device }} 入口流量超过 (> 300 MB/s), 当前值: {{ $value }}"
- alert: HostUnusualNetworkThroughputOut
expr: sum by (instance, device) (rate(node_network_transmit_bytes_total{device=~"eth.*"}[2m])) / 1024 / 1024 > 300
for: 5m
labels:
cluster: RTG
severity: 'P4'
annotations:
Summary: "主机{{ $labels.instance }} 出口流量超过 (> 300 MB/s) (当前值:{{ $value }})"
description: "主机: 【{{ $labels.instance }}】, 网卡: {{ $labels.device }} 出口流量超过 (> 300 MB/s), 当前值: {{ $value }}"
- alert: HostUnusualDiskReadRate
expr: sum by (instance, device) (rate(node_disk_read_bytes_total[2m])) / 1024 / 1024 > 50
for: 5m
labels:
cluster: RTG
severity: 'P4'
annotations:
Summary: "主机{{ $labels.instance }} 磁盘读取速率超过(50 MB/s) (当前值:{{ $value }})"
description: "主机: 【{{ $labels.instance }}】, 磁盘: {{ $labels.device }} 读取速度超过(50 MB/s), 当前值: {{ $value }}"
- alert: HostUnusualDiskWriteRate
expr: sum by (instance, device) (rate(node_disk_written_bytes_total[2m])) / 1024 / 1024 > 50
for: 2m
labels:
cluster: RTG
severity: 'P4'
annotations:
Summary: "主机{{ $labels.instance }} 磁盘读写入率超过(50 MB/s) (当前值:{{ $value }})"
description: "主机: 【{{ $labels.instance }}】, 磁盘: {{ $labels.device }} 写入速度超过(50 MB/s), 当前值: {{ $value }}"
- alert: HostOutOfInodes
expr: node_filesystem_files_free{fstype=~"ext4|xfs",mountpoint!~".*tmp|.*boot" } / node_filesystem_files{fstype=~"ext4|xfs",mountpoint!~".*tmp|.*boot" } * 100 < 10
for: 2m
labels:
cluster: RTG
severity: 'P3'
annotations:
Summary: "主机{{ $labels.instance }} {{ $labels.mountpoint }}分区主机Inode值小于5% (当前值:{{ $value }}) "
description: "主机: 【{{ $labels.instance }}】 {{ $labels.mountpoint }}分区inode节点不足 (可用值小于{{ $value }}%)"
- alert: HostUnusualDiskReadLatency
expr: rate(node_disk_read_time_seconds_total[2m]) / rate(node_disk_reads_completed_total[2m]) * 1000 > 100 and rate(node_disk_reads_completed_total[2m]) > 0
for: 5m
labels:
cluster: RTG
severity: 'P4'
annotations:
Summary: "主机{{ $labels.instance }} 主机磁盘Read延迟大于100ms (当前值:{{ $value }}ms)"
description: "主机: 【{{ $labels.instance }}】, 磁盘: {{ $labels.device }} Read延迟过高 (read operations > 100ms), 当前延迟值: {{ $value }}ms"
- alert: HostUnusualDiskWriteLatency
expr: rate(node_disk_write_time_seconds_total[2m]) / rate(node_disk_writes_completed_total[2m]) * 1000 > 100 and rate(node_disk_writes_completed_total[2m]) > 0
for: 5m
labels:
cluster: RTG
severity: 'P4'
annotations:
Summary: "主机{{ $labels.instance }} 主机磁盘write延迟大于100ms (当前值:{{ $value }}ms)"
description: "主机: 【{{ $labels.instance }}】, 磁盘: {{ $labels.device }} Write延迟过高 (write operations > 100ms), 当前延迟值: {{ $value }}ms"
- alert: NodeFilesystemFilesFillingUp
annotations:
description: '预计4小时后 分区:{{ $labels.device }} 主机:{{ $labels.instance }} 可用innode仅剩余 {{ printf "%.2f" $value }}%.'
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemfilesfillingup
Summary: '主机{{ $labels.instance }} 预计4小时后可用innode数会低于15% (当前值:{{ $value }})'
labels:
cluster: RTG
severity: p3
expr: |
(
node_filesystem_files_free{job="node-exporter|vm-node-exporter",fstype!=""} / node_filesystem_files{job="node-exporter|vm-node-exporter",fstype!=""} * 100 < 15
and
predict_linear(node_filesystem_files_free{job="node-exporter|vm-node-exporter",fstype!=""}[6h], 4*60*60) < 0
and
node_filesystem_readonly{job="node-exporter|vm-node-exporter",fstype!=""} == 0
)
for: 1h
- alert: NodeFileDescriptorLimit
annotations:
description: '主机:{{ $labels.instance }} 文件描述符使用率超过70% {{ printf "%.2f" $value }}%.'
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefiledescriptorlimit
Summary: '主机: {{ $labels.instance }}文件描述符即将被耗尽. (当前值:{{ $value }})'
expr: |
(
node_filefd_allocated{job="node-exporter|vm-node-exporter"} * 100 / node_filefd_maximum{job="node-exporter|vm-node-exporter"} > 70
)
for: 15m
labels:
severity: p3
action: monitor
cluster: RTG
- alert: NodeClockSkewDetected
annotations:
description: '主机: {{ $labels.instance }} 时钟延时超过 300s.'
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodeclockskewdetected
Summary: '主机: {{ $labels.instance }}时钟延时超过 300s.(当前值:{{ $value }})'
expr: |
(
node_timex_offset_seconds > 0.05
and
deriv(node_timex_offset_seconds[5m]) >= 0
)
or
(
node_timex_offset_seconds < -0.05
and
deriv(node_timex_offset_seconds[5m]) <= 0
)
for: 10m
labels:
severity: p3
cluster: RTG
- alert: NodeFilesystemFilesFillingUp
annotations:
description: '预计4小时后 分区:{{ $labels.device }} 主机:{{ $labels.instance }} 可用innode仅剩余 {{ printf "%.2f" $value }}%.'
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemfilesfillingup
Summary: '主机{{ $labels.instance }} 预计4小时后可用innode数会低于15% (当前值:{{ $value }})'
expr: |
(
node_filesystem_files_free{job="node-exporter|vm-node-exporter",fstype!=""} / node_filesystem_files{job="node-exporter|vm-node-exporter",fstype!=""} * 100 < 15
and
predict_linear(node_filesystem_files_free{job="node-exporter|vm-node-exporter",fstype!=""}[6h], 4*60*60) < 0
and
node_filesystem_readonly{job="node-exporter|vm-node-exporter",fstype!=""} == 0
)
for: 1h
labels:
severity: p3
cluster: RTG
- alert: NodeFilesystemSpaceFillingUp
annotations:
description: '主机: {{ $labels.instance }} 分区: {{ $labels.device }} 预计在4小时候只有 {{ printf "%.2f" $value }}%.'
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemspacefillingup
Summary: "主机: {{ $labels.instance }}预计4小时候磁盘空闲会低于15% (当前值:{{ $value }})"
expr: |
(
node_filesystem_avail_bytes{job="node-exporter|vm-node-exporter",fstype!=""} / node_filesystem_size_bytes{job="node-exporter|vm-node-exporter",fstype!=""} * 100 < 15
and
predict_linear(node_filesystem_avail_bytes{job="node-exporter|vm-node-exporter",fstype!=""}[6h], 4*60*60) < 0
and
node_filesystem_readonly{job="node-exporter|vm-node-exporter",fstype!=""} == 0
)
for: 1h
labels:
severity: p3
cluster: RTG
- alert: NodeNetworkReceiveErrs
annotations:
description: '{{ $labels.instance }} interface {{ $labels.device }} has encountered
{{ printf "%.0f" $value }} receive errors in the last two minutes.'
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-nodenetworkreceiveerrs
Summary: "主机{{ $labels.instance }} 网卡{{ $labels.device }} Node网络接受错误 (当前值:{{ $value }})"
expr: |
increase(node_network_receive_errs_total[2m]) > 10
for: 2h
labels:
severity: p3
cluster: RTG
- alert: NodeNetworkTransmitErrs
annotations:
description: '{{ $labels.instance }} interface {{ $labels.device }} has encountered
{{ printf "%.0f" $value }} transmit errors in the last two minutes.'
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-nodenetworktransmiterrs
Summary: "主机{{ $labels.instance }} 网卡{{ $labels.device }} Node网络传输错误 (当前值:{{ $value }})"
expr: |
increase(node_network_transmit_errs_total[2m]) > 10
for: 1h
labels:
severity: p3
cluster: RTG
- alert: NodeHighNumberConntrackEntriesUsed
annotations:
description: '{{ $value | humanizePercentage }} of conntrack entries are used.'
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-nodehighnumberconntrackentriesused
Summary: 主机{{ $labels.instance }} Conntrack条目使用率大于75% (当前值:{{ $value }})
expr: |
(node_nf_conntrack_entries / node_nf_conntrack_entries_limit) > 0.75
labels:
severity: p2
cluster: RTG
- alert: NodeTextFileCollectorScrapeError
annotations:
description: Node Exporter text file collector failed to scrape.
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-nodetextfilecollectorscrapeerror
Summary: 主机{{ $labels.instance }} 打开或读取文件时出错,(当前值:{{ $value }})
expr: |
node_textfile_scrape_error{job="node-exporter|vm-node-exporter"} == 1
labels:
severity: p2
cluster: RTG
- alert: NodeClockNotSynchronising
annotations:
message: Clock on {{ $labels.instance }} is not synchronising. Ensure NTP
is configured on this host.
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-nodeclocknotsynchronising
Summary: 主机{{ $labels.instance }} 时间不同步(当前值:{{ $value }})
expr: |
min_over_time(node_timex_sync_status[5m]) == 0
for: 10m
labels:
severity: p4
cluster: RTG
EOF
kubectl apply -f ~/prometheus-yml/prometheus-ConfigMap.yml
prometheus_podIP=`kubectl get pods -n monitoring -o custom-columns='NAME:metadata.name,podIP:status.podIPs[*].ip' |grep prometheus |awk '{print $2}'`
curl -X POST "http://$prometheus_podIP:9090/-/reload"
# 因为告警规则是以ConfigMap挂载Prometheus上,为了可以后期可以方便加规则,这里先创建一个空的告警规则ConfigMap(目的:先让Prometheus正常启动)
kubectl create configmap prometheus-rules --from-literal=empty=empty
cat > ~/prometheus-yml/prometheus-Deployment.yml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus
containers:
- name: prometheus
image: prom/prometheus:v2.49.0-rc.2
imagePullPolicy: IfNotPresent
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention.time=30d"
- "--web.enable-admin-api"
- "--web.enable-lifecycle"
ports:
- containerPort: 9090
name: http
volumeMounts:
- mountPath: "/prometheus"
subPath: prometheus
name: data
- mountPath: "/etc/prometheus"
name: config
- mountPath: "/etc/prometheus/rules"
name: rules
- name: localtime
mountPath: /etc/localtime
resources:
limits:
cpu: "2"
memory: "4Gi"
requests:
cpu: "1"
memory: "2Gi"
volumes:
- name: data
persistentVolumeClaim:
claimName: prometheus-nfs-client-pvc
- name: config
configMap:
name: prometheus-config
- name: rules
configMap:
name: prometheus-rules
- name: localtime
hostPath:
path: /etc/localtime
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-nfs-client-pvc
namespace: monitoring
spec:
storageClassName: nfs-storage
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 2Ti
EOF
kubectl apply -f ~/prometheus-yml/prometheus-Deployment.yml
cat > ~/prometheus-yml/prometheus-Service.yml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: prometheus-service
namespace: monitoring
labels:
app: prometheus
annotations:
prometheus.io/port: "9090"
prometheus.io/scrape: "true"
spec:
selector:
app: prometheus
type: NodePort
ports:
- name: web
port: 9090
targetPort: http
nodePort: 31111
EOF
kubectl apply -f ~/prometheus-yml/prometheus-Service.yml
cat > ~/prometheus-yml/prometheus-Ingress.yml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: prometheus-ingress
namespace: monitoring
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
nginx.ingress.kubernetes.io/proxy-body-size: '4G'
spec:
ingressClassName: nginx
rules:
- host: prometheus.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-service
port:
number: 9090
tls:
- hosts:
- prometheus.huanghuanhui.cloud
secretName: prometheus-ingress-tls
EOF
kubectl create secret -n monitoring \
tls prometheus-ingress-tls \
--key=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key \
--cert=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt
kubectl apply -f ~/prometheus-yml/prometheus-Ingress.yml
访问地址:prometheus.huanghuanhui.cloud
告警规则
更多告警规则查看:Awesome Prometheus alerts | Collection of alerting rules
mkdir -p ~/prometheus-yml/rules-yml
pod.rules
cat > ~/prometheus-yml/rules-yml/pod.rules.yml << 'EOF'
groups:
- name: pod.rules
rules:
- alert: PodDown
expr: kube_pod_container_status_running != 1
for: 2s
labels:
severity: warning
cluster: k8s
annotations:
summary: 'Container: {{ $labels.container }} down'
description: 'Namespace: {{ $labels.namespace }}, Pod: {{ $labels.pod }} is not running'
- alert: PodReady
expr: kube_pod_container_status_ready != 1
for: 5m # Ready持续5分钟,说明启动有问题
labels:
severity: warning
cluster: k8s
annotations:
summary: 'Container: {{ $labels.container }} ready'
description: 'Namespace: {{ $labels.namespace }}, Pod: {{ $labels.pod }} always ready for 5 minitue'
- alert: PodRestart
expr: changes(kube_pod_container_status_restarts_total[30m]) > 0 # 最近30分钟pod重启
for: 2s
labels:
severity: warning
cluster: k8s
annotations:
summary: 'Container: {{ $labels.container }} restart'
description: 'namespace: {{ $labels.namespace }}, pod: {{ $labels.pod }} restart {{ $value }} times'
- alert: PodFailed
expr: sum (kube_pod_status_phase{phase="Failed"}) by (pod,namespace) > 0
for: 5s
labels:
severity: error
annotations:
summary: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} Pod状态Failed (当前值: {{ $value }})"
- alert: PodPending
expr: sum (kube_pod_status_phase{phase="Pending"}) by (pod,namespace) > 0
for: 1m
labels:
severity: error
annotations:
summary: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} Pod状态Pending (当前值: {{ $value }})"
- alert: PodErrImagePull
expr: sum by(namespace,pod) (kube_pod_container_status_waiting_reason{reason="ErrImagePull"}) == 1
for: 1m
labels:
severity: warning
annotations:
summary: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} Pod状态ErrImagePull (当前值: {{ $value }})"
- alert: PodImagePullBackOff
expr: sum by(namespace,pod) (kube_pod_container_status_waiting_reason{reason="ImagePullBackOff"}) == 1
for: 1m
labels:
severity: warning
annotations:
summary: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} Pod状态ImagePullBackOff (当前值: {{ $value }})"
- alert: PodCrashLoopBackOff
expr: sum by(namespace,pod) (kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff"}) == 1
for: 1m
labels:
severity: warning
annotations:
summary: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} Pod状态CrashLoopBackOff (当前值: {{ $value }})"
- alert: PodCPUUsage
expr: sum by(pod, namespace) (rate(container_cpu_usage_seconds_total{image!=""}[5m]) * 100) > 5
for: 5m
labels:
severity: warning
annotations:
summary: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} CPU使用大于80% (当前值: {{ $value }})"
- alert: PodMemoryUsage
expr: sum(container_memory_rss{image!=""}) by(pod, namespace) / sum(container_spec_memory_limit_bytes{image!=""}) by(pod, namespace) * 100 != +inf > 80
for: 5m
labels:
severity: error
annotations:
summary: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} 内存使用大于80% (当前值: {{ $value }})"
- alert: PodStatusChange # Pod 状态异常变更警报
expr: changes(kube_pod_status_phase[5m]) > 5
for: 5m
annotations:
summary: "Pod 状态异常变更"
description: "Pod {{ $labels.pod }} 的状态异常变更次数超过 5 次."
- alert: ContainerCrash # Pod 容器崩溃警报
expr: increase(container_cpu_cfs_throttled_seconds_total{container!="",pod!=""}[5m]) > 0
for: 5m
annotations:
summary: "Pod 容器崩溃"
description: "Pod {{ $labels.pod }} 中的容器发生崩溃."
EOF
svc.rules
cat > ~/prometheus-yml/rules-yml/svc.rules.yml << 'EOF'
groups:
- name: svc.rules
rules:
- alert: ServiceDown
expr: avg_over_time(up[5m]) * 100 < 50
annotations:
description: The service {{ $labels.job }} instance {{ $labels.instance }} is not responding for more than 50% of the time for 5 minutes.
summary: The service {{ $labels.job }} is not responding
EOF
pvc.rules
cat > ~/prometheus-yml/rules-yml/pvc.rules.yml << 'EOF'
groups:
- name: pvc.rules
rules:
- alert: PersistentVolumeClaimLost
expr: sum by(namespace, persistentvolumeclaim) (kube_persistentvolumeclaim_status_phase{phase="Lost"}) == 1
for: 2m
labels:
severity: warning
annotations:
summary: "PersistentVolumeClaim {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is lost\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: PersistentVolumeClaimPendig
expr: sum by(namespace, persistentvolumeclaim) (kube_persistentvolumeclaim_status_phase{phase="Pendig"}) == 1
for: 2m
labels:
severity: warning
annotations:
summary: "PersistentVolumeClaim {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is pendig\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: HighPersistentVolumeUsage # PersistentVolume 使用率过高警报
expr: kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes * 100 > 90
for: 5m
annotations:
summary: "PersistentVolume 使用率过高"
description: "PersistentVolume {{ $labels.persistentvolume }} 的使用率超过 90%."
- alert: HighPVUsageForPod # Pod 挂载的 PersistentVolume 使用率过高警报
expr: kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes * 100 > 90
for: 5m
annotations:
summary: "Pod 挂载的 PersistentVolume 使用率过高"
description: "Pod {{ $labels.pod }} 挂载的 PersistentVolume 使用率超过 90%."
EOF
kubeadm.rules
cat > ~/prometheus-yml/rules-yml/kubeadm.rules.yml << 'EOF'
groups:
- name: kubeadm.rules
rules:
# Kubelet 健康状态检查
- alert: KubeletDown
expr: up{job="kubelet"} == 0
for: 1m
annotations:
summary: "Kubelet 不可用"
description: "Kubelet {{ $labels.instance }} 不可用."
# Node 不可用警报:
- alert: NodeDown
expr: up{job="k8s-nodes"} == 0
for: 1m
annotations:
summary: "Node 不可用"
description: "Node {{ $labels.node }} 不可用."
# Kube Proxy 健康状态检查
- alert: KubeProxyDown
expr: up{job="kube-proxy"} == 0
for: 1m
annotations:
summary: "Kube Proxy 不可用"
description: "Kube Proxy {{ $labels.instance }} 不可用."
# Kube Scheduler 健康状态检查
- alert: KubeSchedulerDown
expr: up{job="kube-scheduler"} == 0
for: 1m
annotations:
summary: "Kube Scheduler 不可用"
description: "Kube Scheduler 不可用."
# Kube Controller Manager 健康状态检查
- alert: KubeControllerManagerDown
expr: up{job="kube-controller-manager"} == 0
for: 1m
annotations:
summary: "Kube Controller Manager 不可用"
description: "Kube Controller Manager 不可用."
# Kube State Metrics 健康状态检查
- alert: KubeStateMetricsDown
expr: up{job="kube-state-metrics"} == 0
for: 1m
annotations:
summary: "Kube State Metrics 不可用"
description: "Kube State Metrics 不可用."
# KubernetesNodeNotReady
- alert: KubernetesNodeNotReady
expr: sum(kube_node_status_condition{condition="Ready",status="true"}) by (node) == 0
for: 10m
labels:
severity: critical
annotations:
summary: Kubernetes node is not ready
description: A node in the cluster is not ready, which may cause issues with cluster functionality.
EOF
# 更新前面创建空的prometheus-rules的ConfigMap
kubectl create configmap prometheus-rules \
--from-file=pod.rules.yml \
--from-file=svc.rules.yml \
--from-file=pvc.rules.yml \
--from-file=kubeadm.rules.yml \
-o yaml --dry-run=client | kubectl apply -f -
prometheus_podIP=`kubectl get pods -n monitoring -o custom-columns='NAME:metadata.name,podIP:status.podIPs[*].ip' |grep prometheus |awk '{print $2}'`
curl -X POST "http://$prometheus_podIP:9090/-/reload"
2.16.1.0、对k8s-node
的监控
cat > ~/prometheus-yml/node-exporter.yml << 'EOF'
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
labels:
app: node-exporter
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
hostPID: true
hostIPC: true
hostNetwork: true
nodeSelector:
kubernetes.io/os: linux
containers:
- name: node-exporter
image: prom/node-exporter:v1.7.0
args:
- --web.listen-address=$(HOSTIP):9100
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/host/root
- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
ports:
- containerPort: 9100
env:
- name: HOSTIP
valueFrom:
fieldRef:
fieldPath: status.hostIP
resources:
requests:
cpu: 150m
memory: 180Mi
limits:
cpu: 150m
memory: 180Mi
securityContext:
runAsNonRoot: true
runAsUser: 65534
volumeMounts:
- name: proc
mountPath: /host/proc
- name: sys
mountPath: /host/sys
- name: root
mountPath: /host/root
mountPropagation: HostToContainer
readOnly: true
- name: localtime
mountPath: /etc/localtime
tolerations:
- operator: "Exists"
volumes:
- name: proc
hostPath:
path: /proc
- name: dev
hostPath:
path: /dev
- name: sys
hostPath:
path: /sys
- name: root
hostPath:
path: /
- name: localtime
hostPath:
path: /etc/localtime
EOF
kubectl apply -f ~/prometheus-yml/node-exporter.yml
docker run -d \
--name node-exporter \
--restart=always \
--net="host" \
--pid="host" \
-v "/proc:/host/proc:ro" \
-v "/sys:/host/sys:ro" \
-v "/:/rootfs:ro" \
-e TZ=Asia/Shanghai \
-v /etc/localtime:/etc/localtime \
prom/node-exporter:v1.7.0 \
--path.procfs=/host/proc \
--path.rootfs=/rootfs \
--path.sysfs=/host/sys \
--collector.filesystem.ignored-mount-points='^/(sys|proc|dev|host|etc)($$|/)'
模版:8919、12159
方式1:
手动配置 node-exporter
# prometheus-ConfigMap.yml
- job_name: 192.168.1.200
static_configs:
- targets: ['192.168.1.200:9100']
方式2:
基于 consul 自动发现 node-exporter
mkdir -p ~/prometheus-yml/consul-yml
cat > ~/prometheus-yml/consul-yml/consul.yaml << 'EOF'
---
apiVersion: v1
kind: Service
metadata:
name: consul-server
namespace: monitoring
labels:
name: consul-server
spec:
selector:
name: consul-server
ports:
- name: http
port: 8500
targetPort: 8500
- name: https
port: 8443
targetPort: 8443
- name: rpc
port: 8400
targetPort: 8400
- name: serf-lan-tcp
protocol: "TCP"
port: 8301
targetPort: 8301
- name: serf-lan-udp
protocol: "UDP"
port: 8301
targetPort: 8301
- name: serf-wan-tcp
protocol: "TCP"
port: 8302
targetPort: 8302
- name: serf-wan-udp
protocol: "UDP"
port: 8302
targetPort: 8302
- name: server
port: 8300
targetPort: 8300
- name: consul-dns
port: 8600
targetPort: 8600
---
apiVersion: v1
kind: Service
metadata:
name: consul-server-http
namespace: monitoring
spec:
selector:
name: consul-server
type: NodePort
ports:
- protocol: TCP
port: 8500
targetPort: 8500
nodePort: 32685
name: consul-server-tcp
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: consul-server
namespace: monitoring
labels:
name: consul-server
spec:
serviceName: consul-server
selector:
matchLabels:
name: consul-server
replicas: 3
template:
metadata:
labels:
name: consul-server
annotations:
prometheus.io/scrape: "true" # prometueus自动发现标签
prometheus.io/path: "v1/agent/metrics" # consul的metrics路径
prometheus.io/port: "8500"
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "name"
operator: In
values:
- consul-server
topologyKey: "kubernetes.io/hostname"
terminationGracePeriodSeconds: 10
containers:
- name: consul
image: ccr.ccs.tencentyun.com/huanghuanhui/consul:1.15.4
imagePullPolicy: IfNotPresent
args:
- "agent"
- "-server"
- "-bootstrap-expect=3"
- "-ui"
- "-data-dir=/consul/data"
- "-bind=0.0.0.0"
- "-client=0.0.0.0"
- "-advertise=$(POD_IP)"
- "-retry-join=consul-server-0.consul-server.$(NAMESPACE).svc.cluster.local"
- "-retry-join=consul-server-1.consul-server.$(NAMESPACE).svc.cluster.local"
- "-retry-join=consul-server-2.consul-server.$(NAMESPACE).svc.cluster.local"
- "-domain=cluster.local"
- "-disable-host-node-id"
volumeMounts:
- name: consul-nfs-client-pvc
mountPath: /consul/data
- name: localtime
mountPath: /etc/localtime
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
ports:
- containerPort: 8500
name: http
- containerPort: 8400
name: rpc
- containerPort: 8443
name: https-port
- containerPort: 8301
name: serf-lan
- containerPort: 8302
name: serf-wan
- containerPort: 8600
name: consul-dns
- containerPort: 8300
name: server
volumes:
- name: localtime
hostPath:
path: /etc/localtime
volumeClaimTemplates:
- metadata:
name: consul-nfs-client-pvc
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: nfs-storage
resources:
requests:
storage: 20Gi
EOF
kubectl apply -f ~/prometheus-yml/consul-yml/consul.yaml
cat > ~/prometheus-yml/consul-yml/consul-Ingress.yml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: consul-ingress
namespace: monitoring
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
nginx.ingress.kubernetes.io/proxy-body-size: '4G'
spec:
ingressClassName: nginx
rules:
- host: consul.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: consul-server
port:
number: 8500
tls:
- hosts:
- consul.huanghuanhui.cloud
secretName: consul-ingress-tls
EOF
kubectl create secret -n monitoring \
tls consul-ingress-tls \
--key=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key \
--cert=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt
kubectl apply -f ~/prometheus-yml/consul-yml/consul-Ingress.yml
ip访问地址:192.168.1.201:32685
域名访问地址:consul.huanghuanhui.cloud
# prometheus-ConfigMap.yml
- job_name: 'consul-prometheus'
consul_sd_configs:
- server: 'consul-server-http.monitoring.svc.cluster.local:8500'
relabel_configs:
- source_labels: [__meta_consul_service_id]
regex: (.+)
target_label: 'node_name'
replacement: '$1'
- source_labels: [__meta_consul_service]
regex: '.*(node-exporter|hosts).*'
action: keep
# 服务注册(ip)
curl -X PUT -d '{"id": "1.15.172.119-node-exporter","name": "1.15.172.119-node-exporter","address": "1.15.172.119","port": 9100,"checks": [{"http": "http://1.15.172.119:9100/","interval": "5s"}]}' http://192.168.1.201:32685/v1/agent/service/register
curl -X PUT -d '{"id": "192.168.1.200-node-exporter","name": "192.168.1.200-node-exporter","address": "192.168.1.200","port": 9100,"checks": [{"http": "http://192.168.1.200:9100/","interval": "5s"}]}' http://192.168.1.201:32685/v1/agent/service/register
# 服务注册(域名)
curl -X PUT -d '{"id": "1.15.172.119-node-exporter","name": "1.15.172.119-node-exporter","address": "1.15.172.119","port": 9100,"checks": [{"http": "http://1.15.172.119:9100/","interval": "5s"}]}' https://consul.huanghuanhui.cloud/v1/agent/service/register
curl -X PUT -d '{"id": "192.168.1.200-node-exporter","name": "192.168.1.200-node-exporter","address": "192.168.1.200","port": 9100,"checks": [{"http": "http://192.168.1.200:9100/","interval": "5s"}]}' https://consul.huanghuanhui.cloud/v1/agent/service/register
id
或者name
要包含node-exporter|hosts
标签才能自动发现
# 下线服务(ip)
curl -X PUT http://192.168.1.201:32685/v1/agent/service/deregister/1.15.172.119-node-exporter
curl -X PUT http://192.168.1.201:32685/v1/agent/service/deregister/192.168.1.200-node-exporter
# 下线服务(域名)
curl -X PUT https://consul.huanghuanhui.cloud/v1/agent/service/deregister/1.15.172.119-node-exporter
curl -X PUT https://consul.huanghuanhui.cloud/v1/agent/service/deregister/192.168.1.200-node-exporter
consul 批量注册脚本
mkdir -p ~/prometheus-yml/consul-yml/node-exporter-json
cat > ~/prometheus-yml/consul-yml/node-exporter-json/node-exporter-1.15.172.119.json << 'EOF'
{
"id": "1.15.172.119-node-exporter",
"name": "1.15.172.119-node-exporter",
"address": "1.15.172.119",
"port": 9100,
"tags": ["node-exporter"],
"checks": [{
"http": "http://1.15.172.119:9100/metrics",
"interval": "5s"
}]
}
EOF
cat > ~/prometheus-yml/consul-yml/node-exporter-json/node-exporter-192.168.1.201.json << 'EOF'
{
"id": "192.168.1.201-node-exporter",
"name": "192.168.1.201-node-exporter",
"address": "192.168.1.201",
"port": 9100,
"tags": ["node-exporter"],
"checks": [{
"http": "http://192.168.1.201:9100/metrics",
"interval": "5s"
}]
}
EOF
cat > ~/prometheus-yml/consul-yml/node-exporter-json/node-exporter-192.168.1.202.json << 'EOF'
{
"id": "192.168.1.202-node-exporter",
"name": "192.168.1.202-node-exporter",
"address": "192.168.1.202",
"port": 9100,
"tags": ["node-exporter"],
"checks": [{
"http": "http://192.168.1.202:9100/metrics",
"interval": "5s"
}]
}
EOF
cat > ~/prometheus-yml/consul-yml/node-exporter-json/node-exporter-192.168.1.203.json << 'EOF'
{
"id": "192.168.1.203-node-exporter",
"name": "192.168.1.203-node-exporter",
"address": "192.168.1.203",
"port": 9100,
"tags": ["node-exporter"],
"checks": [{
"http": "http://192.168.1.203:9100/metrics",
"interval": "5s"
}]
}
EOF
cat > ~/prometheus-yml/consul-yml/node-exporter-json/node-exporter-192.168.1.204.json << 'EOF'
{
"id": "192.168.1.204-node-exporter",
"name": "192.168.1.204-node-exporter",
"address": "192.168.1.204",
"port": 9100,
"tags": ["node-exporter"],
"checks": [{
"http": "http://192.168.1.204:9100/metrics",
"interval": "5s"
}]
}
EOF
cat > ~/prometheus-yml/consul-yml/node-exporter-json/node-exporter-192.168.1.200.json << 'EOF'
{
"id": "192.168.1.200-node-exporter",
"name": "192.168.1.200-node-exporter",
"address": "192.168.1.200",
"port": 9100,
"tags": ["node-exporter"],
"checks": [{
"http": "http://192.168.1.200:9100/metrics",
"interval": "5s"
}]
}
EOF
# 添加更多 JSON 文件,每个文件包含一个服务的信息
# 批量注册脚本
cat > ~/prometheus-yml/consul-yml/node-exporter-json/register-service.sh << 'EOF'
#!/bin/bash
CONSUL_API="https://consul.huanghuanhui.cloud/v1/agent/service/register"
declare -a SERVICES=(
"node-exporter-1.15.172.119.json"
"node-exporter-192.168.1.201.json"
"node-exporter-192.168.1.202.json"
"node-exporter-192.168.1.203.json"
"node-exporter-192.168.1.204.json"
"node-exporter-192.168.1.200.json"
# 添加更多 JSON 文件,每个文件包含一个服务的信息
)
for SERVICE_FILE in "${SERVICES[@]}"; do
curl -X PUT --data @"$SERVICE_FILE" "$CONSUL_API"
done
EOF
# 批量下线脚本
cat > ~/prometheus-yml/consul-yml/node-exporter-json/deregister-service.sh << 'EOF'
#!/bin/bash
CONSUL_API="https://consul.huanghuanhui.cloud/v1/agent/service/deregister"
declare -a SERVICES=(
"node-exporter-1.15.172.119.json"
"node-exporter-192.168.1.201.json"
"node-exporter-192.168.1.202.json"
"node-exporter-192.168.1.203.json"
"node-exporter-192.168.1.204.json"
"node-exporter-192.168.1.200.json"
# 添加更多 JSON 文件,每个文件包含一个服务的信息
)
for SERVICE_FILE in "${SERVICES[@]}"; do
SERVICE_ID=$(jq -r .id "$SERVICE_FILE")
curl -X PUT "$CONSUL_API/$SERVICE_ID"
done
EOF
mkdir -p ~/prometheus-yml/kube-yml
2.16.1.1、对kube-controller-manager
的监控
sed -i 's/bind-address=127.0.0.1/bind-address=0.0.0.0/g' /etc/kubernetes/manifests/kube-controller-manager.yaml
cat > ~/prometheus-yml/kube-yml/prometheus-kube-controller-manager-Service.yml << 'EOF'
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-controller-manager
labels:
app.kubernetes.io/name: kube-controller-manager
spec:
selector:
component: kube-controller-manager
ports:
- name: https-metrics
port: 10257
targetPort: 10257
EOF
kubectl apply -f ~/prometheus-yml/kube-yml/prometheus-kube-scheduler-Service.yml
2.16.1.3、对kube-proxy
的监控
kubectl get configmap kube-proxy -n kube-system -o yaml | \
sed -e 's/metricsBindAddress: ""/metricsBindAddress: "0.0.0.0:10249"/' | \
kubectl diff -f - -n kube-system
kubectl get configmap kube-proxy -n kube-system -o yaml | \
sed -e 's/metricsBindAddress: ""/metricsBindAddress: "0.0.0.0:10249"/' | \
kubectl apply -f - -n kube-system
kubectl rollout restart daemonset kube-proxy -n kube-system
netstat -tnlp |grep kube-proxy
netstat -antp|grep 10249
cat > ~/prometheus-yml/kube-yml/prometheus-kube-proxy-Service.yml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: kube-proxy
namespace: kube-system
labels:
k8s-app: kube-proxy
spec:
selector:
k8s-app: kube-proxy
ports:
- name: https-metrics
port: 10249
targetPort: 10249
protocol: TCP
EOF
kubectl apply -f ~/prometheus-yml/kube-yml/prometheus-kube-proxy-Service.yml
2.16.1.4、对k8s-etcd
的监控
sed -i 's/127.0.0.1:2381/0.0.0.0:2381/g' /etc/kubernetes/manifests/etcd.yaml
cat > ~/prometheus-yml/kube-yml/etcd-k8s-master-Service.yml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: etcd-k8s
namespace: kube-system
labels:
k8s-app: etcd
spec:
type: ClusterIP
clusterIP: None
ports:
- name: port
port: 2381
---
apiVersion: v1
kind: Endpoints
metadata:
name: etcd-k8s
namespace: kube-system
labels:
k8s-app: etcd
subsets:
- addresses:
- ip: 192.168.1.201
nodeName: k8s-01
ports:
- name: port
port: 2381
EOF
kubectl apply -f ~/prometheus-yml/kube-yml/etcd-k8s-master-Service.yml
Etcd-for-k8s-cn中文 | Grafana Labs
模版:9733
2.16.2、k8s 手撕方式安装 grafana
cat > ~/prometheus-yml/grafana-ConfigMap.yml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-config
namespace: monitoring
data:
grafana.ini: |
[smtp]
enabled = false
host = localhost:25
user =
password =
skip_verify = false
from_address = admin@grafana.localhost
from_name = Grafana
[alerting]
enabled =
execute_alerts = true
EOF
kubectl apply -f ~/prometheus-yml/grafana-ConfigMap.yml
cat > ~/prometheus-yml/grafana-Deployment.yml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
labels:
app: grafana
spec:
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
securityContext:
fsGroup: 472
supplementalGroups:
- 0
containers:
- name: grafana
image: grafana/grafana:10.2.3
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3000
name: http-grafana
protocol: TCP
env:
- name: TZ
value: Asia/Shanghai
- name: GF_SECURITY_ADMIN_USER
value: admin
- name: GF_SECURITY_ADMIN_PASSWORD
value: Admin@2024
readinessProbe:
failureThreshold: 3
httpGet:
path: /robots.txt
port: 3000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 2
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 3000
timeoutSeconds: 1
resources:
limits:
cpu: "1"
memory: "2Gi"
requests:
cpu: "0.5"
memory: "1Gi"
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-data
- mountPath: /etc/grafana
name: config
volumes:
- name: grafana-data
persistentVolumeClaim:
claimName: grafana-nfs-client-pvc
- name: config
configMap:
name: grafana-config
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-nfs-client-pvc
namespace: monitoring
spec:
storageClassName: nfs-storage
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 2Ti
---
apiVersion: v1
kind: Service
metadata:
name: grafana-service
namespace: monitoring
labels:
app: grafana
spec:
type: NodePort
ports:
- nodePort: 31300
port: 3000
selector:
app: grafana
EOF
kubectl apply -f ~/prometheus-yml/grafana-Deployment.yml
cat > ~/prometheus-yml/grafana-Ingress.yml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana-ingress
namespace: monitoring
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
nginx.ingress.kubernetes.io/proxy-body-size: '4G'
spec:
ingressClassName: nginx
rules:
- host: grafana.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana-service
port:
number: 3000
tls:
- hosts:
- grafana.huanghuanhui.cloud
secretName: grafana-ingress-tls
EOF
kubectl create secret -n monitoring \
tls grafana-ingress-tls \
--key=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key \
--cert=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt
kubectl apply -f ~/prometheus-yml/grafana-Ingress.yml
访问地址:grafana.huanghuanhui.cloud
账号密码:admin、Admin@2024
模版:8919、12159、13105、9276、12006
2.16.3、k8s 手撕方式安装 alertmanager
与qq邮箱集成
cat > ~/prometheus-yml/alertmanager-qq-ConfigMap.yml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-config
namespace: monitoring
data:
alertmanager.yml: |-
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.qq.com:465'
smtp_from: '1308470940@qq.com'
smtp_auth_username: '1308470940@qq.com'
smtp_auth_password: 'kgwsqpzsvhxvjjii'
smtp_hello: 'qq.com'
smtp_require_tls: false
route:
group_by: ['alertname', 'cluster']
group_wait: 30s
group_interval: 5m
repeat_interval: 5m
receiver: default
routes:
- receiver: email
group_wait: 10s
match:
team: node
templates:
- '/etc/config/template/email.tmpl'
receivers:
- name: 'default'
email_configs:
- to: '1308470940@qq.com'
html: '{{ template "email.html" . }}'
headers: { Subject: "[WARN] Prometheus 告警邮件" }
- name: 'email'
email_configs:
- to: '1308470940@qq.com'
send_resolved: true
EOF
与钉钉集成(为例)
cat > ~/prometheus-yml/alertmanager-webhook-dingtalk-ConfigMap.yml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-config
namespace: monitoring
data:
alertmanager.yml: |-
global:
resolve_timeout: 5m
route:
receiver: webhook
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
group_by: [alertname]
routes:
- receiver: webhook
group_wait: 10s
match:
team: node
receivers:
- name: webhook
webhook_configs:
- url: 'http://alertmanager-webhook-dingtalk.monitoring.svc.cluster.local:8060/dingtalk/webhook1/send'
send_resolved: true
EOF
kubectl apply -f ~/prometheus-yml/alertmanager-webhook-dingtalk-ConfigMap.yml
cat > ~/prometheus-yml/alertmanager-Deployment.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: alertmanager
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: alertmanager
template:
metadata:
labels:
app: alertmanager
spec:
containers:
- name: alertmanager
image: prom/alertmanager:v0.26.0
ports:
- containerPort: 9093
name: http
volumeMounts:
- name: alertmanager-config
mountPath: /etc/alertmanager
- name: alertmanager-data
mountPath: /alertmanager
- name: localtime
mountPath: /etc/localtime
command:
- "/bin/alertmanager"
- "--config.file=/etc/alertmanager/alertmanager.yml"
- "--storage.path=/alertmanager"
volumes:
- name: alertmanager-config
configMap:
name: alertmanager-config
- name: alertmanager-data
persistentVolumeClaim:
claimName: alertmanager-nfs-client-pvc
- name: localtime
hostPath:
path: /etc/localtime
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: alertmanager-nfs-client-pvc
namespace: monitoring
spec:
storageClassName: nfs-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "20Gi"
EOF
kubectl apply -f ~/prometheus-yml/alertmanager-Deployment.yaml
cat > ~/prometheus-yml/alertmanager-Service.yml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: alertmanager-service
namespace: monitoring
spec:
selector:
app: alertmanager
type: NodePort
ports:
- name: web
port: 9093
targetPort: http
nodePort: 30093
EOF
kubectl apply -f ~/prometheus-yml/alertmanager-Service.yml
cat > ~/prometheus-yml/alertmanager-Ingress.yml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: alertmanager-ingress
namespace: monitoring
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
nginx.ingress.kubernetes.io/proxy-body-size: '4G'
spec:
ingressClassName: nginx
rules:
- host: alertmanager.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: alertmanager-service
port:
number: 9093
tls:
- hosts:
- prometheus.huanghuanhui.cloud
secretName: alertmanager-ingress-tls
EOF
kubectl create secret -n monitoring \
tls alertmanager-ingress-tls \
--key=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key \
--cert=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt
kubectl apply -f ~/prometheus-yml/alertmanager-Ingress.yml
访问地址:alertmanager.huanghuanhui.cloud
钉钉集成:alertmanager-webhook-dingtalk
cat > ~/prometheus-yml/alertmanager-webhook-dingtalk-Deployment.yaml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-webhook-dingtalk
namespace: monitoring
data:
config.yaml: |-
templates:
- /config/template.tmpl
targets:
webhook1:
url: https://oapi.dingtalk.com/robot/send?access_token=423eedfe3802198314e15f712f0578545b74a44cb982723623db2fb034bdc83e
secret: SECd3c53fbbb1df76a987a658e0ca759ef371ae955ff731af8945219e99d143d3ae
# 告警模版(也就是钉钉收到怎样的信息模板)
template.tmpl: |-
{{ define "__subject" }}
[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}]
{{ end }}
{{ define "__alert_list" }}{{ range . }}
---
{{ if .Labels.owner }}@{{ .Labels.owner }}{{ end }}
>- **告警状态 :** {{ .Status }}
>- **告警级别 :** **{{ .Labels.severity }}**
>- **告警类型 :** {{ .Labels.alertname }}
>- **告警主机 :** {{ .Labels.instance }}
>- **告警主题 :** {{ .Annotations.summary }}
>- **告警信息 :** {{ index .Annotations "description" }}
>- **告警时间 :** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
{{ end }}{{ end }}
{{ define "__resolved_list" }}{{ range . }}
---
{{ if .Labels.owner }}@{{ .Labels.owner }}{{ end }}
>- **告警状态 :** {{ .Status }}
>- **告警类型 :** {{ .Labels.alertname }}
>- **告警主机 :** {{ .Labels.instance }}
>- **告警主题 :** {{ .Annotations.summary }}
>- **告警信息 :** {{ index .Annotations "description" }}
>- **告警时间 :** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
>- **恢复时间 :** {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}
{{ end }}{{ end }}
{{ define "default.title" }}
{{ template "__subject" . }}
{{ end }}
{{ define "default.content" }}
{{ if gt (len .Alerts.Firing) 0 }}
**Prometheus-Alertmanager 监控到{{ .Alerts.Firing | len }}个故障**
{{ template "__alert_list" .Alerts.Firing }}
---
{{ end }}
{{ if gt (len .Alerts.Resolved) 0 }}
**恢复{{ .Alerts.Resolved | len }}个故障**
{{ template "__resolved_list" .Alerts.Resolved }}
{{ end }}
{{ end }}
{{ define "ding.link.title" }}{{ template "default.title" . }}{{ end }}
{{ define "ding.link.content" }}{{ template "default.content" . }}{{ end }}
{{ template "default.title" . }}
{{ template "default.content" . }}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: alertmanager-webhook-dingtalk
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: alertmanager-webhook-dingtalk
template:
metadata:
labels:
app: alertmanager-webhook-dingtalk
spec:
volumes:
- name: config
configMap:
name: alertmanager-webhook-dingtalk
containers:
- name: alertmanager-webhook-dingtalk
image: ccr.ccs.tencentyun.com/huanghuanhui/prometheus-alertmanager-webhook-dingtalk:v1
imagePullPolicy: Always
args:
- --web.listen-address=:8060
- --config.file=/config/config.yaml
volumeMounts:
- name: config
mountPath: /config
resources:
limits:
cpu: 100m
memory: 100Mi
ports:
- name: http
containerPort: 8060
---
apiVersion: v1
kind: Service
metadata:
name: alertmanager-webhook-dingtalk
namespace: monitoring
spec:
selector:
app: alertmanager-webhook-dingtalk
ports:
- name: http
port: 8060
targetPort: http
EOF
kubectl apply -f ~/prometheus-yml/alertmanager-webhook-dingtalk-Deployment.yaml
2.17、ELK
helm 安装 elkfk(kafka 集群外可访问)
ES/Kibana <--- Logstash <--- Kafka <--- Filebeat
部署顺序:
1、elasticsearch
2、kibana
3、kafka
4、logstash
5、filebeat
kubectl taint node master node-role.kubernetes.io/master-
kubectl create ns elk
2.17.1、helm3部署elkfk
2.17.1.1、elasticsearch
helm repo add elastic https://helm.elastic.co
helm repo list
helm repo update
helm search repo elastic/elasticsearch
cd && helm pull elastic/elasticsearch --untar --version 7.17.3
cd elasticsearch
cat > values-prod.yaml << EOF
# 集群名称
clusterName: "elasticsearch"
# ElasticSearch 6.8+ 默认安装了 x-pack 插件,部分功能免费,这里选禁用
image: "docker.elastic.co/elasticsearch/elasticsearch"
imageTag: "7.17.3"
imagePullPolicy: "IfNotPresent"
esConfig:
elasticsearch.yml: |
network.host: 0.0.0.0
cluster.name: "elasticsearch"
xpack.security.enabled: false
resources:
limits:
cpu: "2"
memory: "4Gi"
requests:
cpu: "1"
memory: "2Gi"
volumeClaimTemplate:
storageClassName: "nfs-storage"
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 2Ti
service:
type: NodePort
port: 9000
nodePort: 31311
EOF
禁用 Kibana 安全提示(Elasticsearch built-in security features are not enabled)xpack.security.enabled: false
helm upgrade --install --namespace elk es -f ./values-prod.yaml .
验证
curl 192.168.1.201:31311/_cat/health
curl 192.168.1.201:31311/_cat/nodes
2.17.1.2、kibana
helm search repo elastic/kibana
cd && helm pull elastic/kibana --untar --version 7.17.3
cd kibana
cat > values-prod.yaml << 'EOF'
kibanaConfig:
kibana.yml: |
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: [ "http://elasticsearch-master-headless:9200" ]
resources:
limits:
cpu: "2"
memory: "2Gi"
requests:
cpu: "1"
memory: "1Gi"
kibanaConfig:
kibana.yml: |
i18n.locale: "zh-CN"
service:
#type: ClusterIP
type: NodePort
loadBalancerIP: ""
port: 5601
nodePort: "30026"
EOF
helm upgrade --install --namespace elk kibana -f ./values-prod.yaml .
cat > ~/kibana/kibana-Ingress.yml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: kibana-ingress
namespace: elk
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
nginx.ingress.kubernetes.io/proxy-body-size: '4G'
nginx.ingress.kubernetes.io/auth-type: basic
nginx.ingress.kubernetes.io/auth-secret: kibana-auth-secret
nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required - admin'
spec:
ingressClassName: nginx
rules:
- host: kibana.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: kibana-kibana
port:
number: 5601
tls:
- hosts:
- kibana.huanghuanhui.cloud
secretName: kibana-ingress-tls
EOF
yum -y install httpd-tools
cd ~/kibana && htpasswd -bc auth admin Admin@2024
kubectl create secret generic kibana-auth-secret --from-file=auth -n elk
kubectl create secret generic kibana-auth-secret -n elk --dry-run=client -o yaml \
--from-file=auth | kubectl apply -f
kubectl create secret -n elk \
tls kibana-ingress-tls \
--key=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key \
--cert=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt
kubectl apply -f ~/kibana/kibana-Ingress.yml
访问地址:kibana.huanghuanhui.cloud
账号密码:admin、Admin@2024
http://192.168.1.201:30026/app/dev_tools#/console
GET _cat/nodes
GET _cat/health
GET _cat/indices
2.17.1.3、kafka(k8s部署kafka集群 ==》外部访问)
mkdir -p ~/kafka-yml && cd ~/kafka-yml
cat > ~/kafka-yml/zk.yml << 'EOF'
apiVersion: v1
kind: Service
metadata:
labels:
app: zookeeper-cluster
namespace: elk
name: zookeeper-cluster
spec:
selector:
app: zookeeper-cluster
ports:
- name: client
port: 2181
targetPort: 2181
- name: follower
port: 2888
targetPort: 2888
- name: leader
port: 3888
targetPort: 3888
clusterIP: None
---
apiVersion: v1
kind: Service
metadata:
namespace: elk
name: zookeeper-cs
spec:
selector:
app: zookeeper-cluster
type: NodePort
ports:
- name: client
port: 2181
nodePort: 30152
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
namespace: elk
name: crs-zookeeper
spec:
replicas: 3
podManagementPolicy: Parallel
serviceName: zookeeper-cluster
selector:
matchLabels:
app: zookeeper-cluster
template:
metadata:
labels:
component: zookeeper-cluster
app: zookeeper-cluster
spec:
containers:
- name: zookeeper
image: bitnami/zookeeper:3.8.2
imagePullPolicy: IfNotPresent
securityContext:
runAsUser: 0
ports:
- containerPort: 2181
- containerPort: 2888
- containerPort: 3888
lifecycle:
postStart:
exec:
command:
- "sh"
- "-c"
- >
echo $(( $(cat /etc/hosts | grep zookeeper | awk '{print($3)}' | awk '{split($0,array,"-")} END{print array[3]}') + 1 )) > /bitnami/zookeeper/data/myid
env:
- name: ALLOW_ANONYMOUS_LOGIN
value: "yes"
- name: ZOO_SERVERS
value: crs-zookeeper-0.zookeeper-cluster.elk.svc.cluster.local:2888:3888,crs-zookeeper-1.zookeeper-cluster.elk.svc.cluster.local:2888:3888,crs-zookeeper-2.zookeeper-cluster.elk.svc.cluster.local:2888:3888
volumeMounts:
- name: zoodata-outer
mountPath: /bitnami/zookeeper
volumeClaimTemplates:
- metadata:
name: zoodata-outer
spec:
storageClassName: nfs-storage
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 2Ti
EOF
kubectl apply -f ~/kafka-yml/zk.yml
cat > ~/kafka-yml/kafka.yml << 'EOF'
apiVersion: v1
kind: Service
metadata:
namespace: elk
name: kafka-headless
spec:
selector:
app: kafka-cluster
ports:
- name: client
port: 9092
targetPort: 9092
clusterIP: None
---
apiVersion: v1
kind: Service
metadata:
name: kafka-0
namespace: elk
labels:
app: kafka-cluster
spec:
ports:
- port: 9092
targetPort: 9092
nodePort: 30127
name: server
type: NodePort
selector:
statefulset.kubernetes.io/pod-name: crs-kafka-0
# app: kafka-cluster
---
apiVersion: v1
kind: Service
metadata:
name: kafka-1
namespace: elk
labels:
app: kafka-cluster
spec:
ports:
- port: 9092
targetPort: 9092
nodePort: 30128
name: server
type: NodePort
selector:
statefulset.kubernetes.io/pod-name: crs-kafka-1
---
apiVersion: v1
kind: Service
metadata:
name: kafka-2
namespace: elk
labels:
app: kafka-cluster
spec:
ports:
- port: 9092
targetPort: 9092
nodePort: 30129
name: server
type: NodePort
selector:
statefulset.kubernetes.io/pod-name: crs-kafka-2
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
namespace: elk
name: crs-kafka
spec:
replicas: 3
podManagementPolicy: Parallel
serviceName: kafka-cluster
selector:
matchLabels:
app: kafka-cluster
template:
metadata:
labels:
app: kafka-cluster
spec:
hostname: kafka
containers:
- name: kafka
command:
- bash
- -ec
- |
HOSTNAME=`hostname -s`
if [[ $HOSTNAME =~ (.*)-([0-9]+)$ ]]; then
ORD=${BASH_REMATCH[2]}
PORT=$((ORD + 30127))
export KAFKA_CFG_ADVERTISED_LISTENERS="PLAINTEXT://58.34.61.154:$PORT"
else
echo "Failed to get index from hostname $HOST"
exit 1
fi
exec /entrypoint.sh /run.sh
image: bitnami/kafka:3.5.1
# image: bitnami/kafka:latest
imagePullPolicy: IfNotPresent
securityContext:
runAsUser: 0
# resources:
# requests:
# memory: "1G"
# cpu: "0.5"
ports:
- containerPort: 9092
env:
- name: KAFKA_CFG_ZOOKEEPER_CONNECT
value: crs-zookeeper-0.zookeeper-cluster.elk.svc.cluster.local:2181,crs-zookeeper-1.zookeeper-cluster.elk.svc.cluster.local:2181,crs-zookeeper-2.zookeeper-cluster.elk.svc.cluster.local:2181
# value: zookeeper-cluster:2181
- name: ALLOW_PLAINTEXT_LISTENER
value: "yes"
volumeMounts:
- name: kafkadata-outer
mountPath: /bitnami/kafka
volumeClaimTemplates:
- metadata:
name: kafkadata-outer
spec:
storageClassName: nfs-storage
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 2Ti
EOF
kubectl apply -f ~/kafka-yml/kafka.yml
注意修改yml文件98行里面的export的ip地址
这里修改为公网的ip:58.34.61.154
kafka ui
docker pull provectuslabs/kafka-ui:latest
docker pull freakchicken/kafka-ui-lite
docker run -d \
--name kafka-ui1 \
--restart always \
--privileged=true \
-p 8888:8080 \
-e KAFKA_CLUSTERS_0_NAME=k8s-kafka \
-e KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS=192.168.1.201:30127,192.168.1.201:30128,192.168.1.201:30129 \
provectuslabs/kafka-ui:latest
访问地址:192.168.1.201:8888
docker run -d \
--name kafka-ui2 \
--restart always \
--privileged=true \
-p 8889:8889 \
freakchicken/kafka-ui-lite
访问地址:192.168.1.201:8889
2.17.1.4、filebeat
helm search repo elastic/filebeat
cd && helm pull elastic/filebeat --untar --version 7.17.3
cd filebeat
cat > my-values.yaml << 'EOF'
daemonset:
filebeatConfig:
filebeat.yml: |
filebeat.inputs:
- type: container
paths:
- /var/log/containers/*.log
output.elasticsearch:
enabled: false
host: '${NODE_NAME}'
hosts: '${ELASTICSEARCH_HOSTS:elasticsearch-master:9200}'
output.kafka:
enabled: true
hosts: ["192.168.1.201:30127","192.168.1.201:30128","192.168.1.201:30129"]
topic: test
EOF
helm install filebeat elastic/filebeat -f my-values.yaml --namespace elk
[root@master ~/kafka]# kubectl exec -it kafka-0 bash
I have no name!@kafka-0:/$ kafka-console-consumer.sh --bootstrap-server kafka.elk.svc.cluster.local:9092 --topic test
$ kafka-consumer-groups.sh --bootstrap-server kafka.elk.svc.cluster.local:9092 --describe --group mygroup
2.17.1.5、logstash
helm search repo elastic/logstash
cd && helm pull elastic/logstash --untar --version 7.17.3
cd logstash
cat <<EOF> my-values.yaml
logstashConfig:
logstash.yml: |
xpack.monitoring.enabled: false
logstashPipeline:
logstash.yml: |
input {
kafka {
bootstrap_servers => "58.34.61.154:30127,58.34.61.154:30128,58.34.61.154:30129"
topics => ["test"]
#group_id => "mygroup"
#如果使用元数据就不能使用下面的byte字节序列化,否则会报错
#key_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
#value_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
consumer_threads => 1
#默认为false,只有为true的时候才会获取到元数据
decorate_events => true
auto_offset_reset => "earliest"
}
}
filter {
mutate {
#从kafka的key中获取数据并按照逗号切割
split => ["[@metadata][kafka][key]", ","]
add_field => {
#将切割后的第一位数据放入自定义的“index”字段中
"index" => "%{[@metadata][kafka][key][0]}"
}
}
}
output {
elasticsearch {
pool_max => 1000
pool_max_per_route => 200
hosts => ["elasticsearch-master-headless.elk.svc.cluster.local:9200"]
index => "test-%{+YYYY.MM.dd}"
}
}
# 资源限制
resources:
requests:
cpu: "100m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "1Gi"
persistence:
enabled: true
volumeClaimTemplate:
accessModes: ["ReadWriteOnce"]
storageClassName: nfs-storage
resources:
requests:
storage: 3Gi
EOF
helm install logstash elastic/logstash -f my-values.yaml --namespace elk
2.17.1.6、收集云上日志
mkdir -p ~/filebeat/config
cat > ~/filebeat/config/filebeat.yml << 'EOF'
# 日志输入配置(可配置多个)
filebeat.inputs:
- type: log
enabled: true
paths:
- /mnt/nfs/logs/*/*/*.log
tags: ["gateway"]
fields:
server: 49.235.249.203
fields_under_root: true
#日志输出配置
output.kafka:
enabled: true
hosts: ["58.34.61.154:30127","58.34.61.154:30128","58.34.61.154:30129"]
topic: "sit"
partition.round_robin:
reachable_only: false
required_acks: 1
compression: gzip
max_message_bytes: 1000000
EOF
docker run -d --name filebeat --user=root \
-v /mnt/nfs/logs/:/mnt/nfs/logs/ \
-v /root/filebeat/config/filebeat.yml:/usr/share/filebeat/filebeat.yml \
elastic/filebeat:7.17.3
本地k8s操作
cat > logstash-sit.yaml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-sit-configmap
namespace: elk
data:
logstash.yml: |
http.host: "0.0.0.0"
path.config: /usr/share/logstash/pipeline
logstash.conf: |
input {
kafka {
bootstrap_servers => "58.34.61.154:30127,58.34.61.154:30128,58.34.61.154:30129"
topics => ["sit"]
codec => "json"
type => "sit"
group_id => "sit"
consumer_threads => 1
}
}
filter {
if [type] == "sit" {
json {
source => ["message"]
remove_field => ["offset","host","beat","@version","event","agent","ecs"]
}
mutate {
add_field => {
project_path => "%{[log][file][path]}"
}
}
mutate {
split => ["project_path", "/"]
add_field => {
"project_name" => "%{[project_path][-3]}"
}
}
date {
match => ["time","yyyy-MM-dd HH:mm:ss.SSS"]
timezone => "Asia/Shanghai"
target => "@timestamp"
}
mutate {
remove_field => ["log","project_path","time","input"]
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch-master-headless.elk.svc.cluster.local:9200"]
index => "sit-%{+YYYY.MM.dd}"
}
}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: logstash-sit
namespace: elk
spec:
selector:
matchLabels:
app: logstash-sit
replicas: 1
template:
metadata:
labels:
app: logstash-sit
spec:
containers:
- name: logstash
image: docker.elastic.co/logstash/logstash:7.17.3
ports:
- containerPort: 5044
volumeMounts:
- name: logstash-pipeline-volume
mountPath: /usr/share/logstash/pipeline
- mountPath: /etc/localtime
name: localtime
volumes:
- name: logstash-pipeline-volume
configMap:
name: logstash-sit-configmap
items:
- key: logstash.conf
path: logstash.conf
- hostPath:
path: /etc/localtime
name: localtime
---
kind: Service
apiVersion: v1
metadata:
name: logstash-sit
namespace: elk
spec:
selector:
app: logstash
type: ClusterIP
ports:
- protocol: TCP
port: 5044
targetPort: 5044
EOF
三、云原生分布式存储
3.1、helm 安装 rook-ceph(腾讯云镜像仓库)
在 Kubernetes 集群中通过 Rook 部署 ceph 分布式存储集群
Prerequisites
- Kubernetes 1.22+
- Helm 3.x
- 一主三从(最少)
- 所有 k8s 节点另外准备一块磁盘(裸盘)(/dev/sdb)
3.1.1、rook-ceph-operator
helm repo add rook-release https://charts.rook.io/release
helm repo update
helm search repo rook-release/rook-ceph
helm pull rook-release/rook-ceph --version v1.12.2 --untar
cat > ~/rook-ceph/values-prod.yml << 'EOF'
image:
repository: ccr.ccs.tencentyun.com/huanghuanhui/rook-ceph
tag: ceph-v1.12.2
pullPolicy: IfNotPresent
resources:
limits:
cpu: "2"
memory: "4Gi"
requests:
cpu: "1"
memory: "2Gi"
csi:
cephcsi:
# @default -- `quay.io/cephcsi/cephcsi:v3.9.0`
image: ccr.ccs.tencentyun.com/huanghuanhui/rook-ceph:cephcsi-v3.9.0
registrar:
# @default -- `registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.8.0`
image: ccr.ccs.tencentyun.com/huanghuanhui/rook-ceph:csi-node-driver-registrar-v2.8.0
provisioner:
# @default -- `registry.k8s.io/sig-storage/csi-provisioner:v3.5.0`
image: ccr.ccs.tencentyun.com/huanghuanhui/rook-ceph:csi-provisioner-v3.5.0
snapshotter:
# @default -- `registry.k8s.io/sig-storage/csi-snapshotter:v6.2.2`
image: ccr.ccs.tencentyun.com/huanghuanhui/rook-ceph:csi-snapshotter-v6.2.2
attacher:
# @default -- `registry.k8s.io/sig-storage/csi-attacher:v4.3.0`
image: ccr.ccs.tencentyun.com/huanghuanhui/rook-ceph:csi-attacher-v4.3.0
resizer:
# @default -- `registry.k8s.io/sig-storage/csi-resizer:v1.8.0`
image: ccr.ccs.tencentyun.com/huanghuanhui/rook-ceph:csi-resizer-v1.8.0
# -- Image pull policy
imagePullPolicy: IfNotPresent
EOF
cd ~/rook-ceph
helm upgrade --install --create-namespace --namespace rook-ceph rook-ceph -f ./values-prod.yml .
3.1.2、rook-ceph-cluster
helm repo add rook-release https://charts.rook.io/release
helm repo update
helm search repo rook-release/rook-ceph-cluster
helm pull rook-release/rook-ceph-cluster --version v1.12.2 --untar
cat > ~/rook-ceph-cluster/values-prod.yml << 'EOF'
toolbox:
enabled: true
image: ccr.ccs.tencentyun.com/huanghuanhui/rook-ceph:ceph-ceph-v17.2.6
cephClusterSpec:
cephVersion:
image: ccr.ccs.tencentyun.com/huanghuanhui/rook-ceph:ceph-ceph-v17.2.6
EOF
cd ~/rook-ceph-cluster
helm upgrade --install --create-namespace --namespace rook-ceph rook-ceph-cluster --set operatorNamespace=rook-ceph -f ./values-prod.yml .
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
3.1.3、NodePort(nodeport方式访问)
kubectl expose pod $(kubectl get pod -n rook-ceph | grep rook-ceph-mgr-a | awk '{print $1}') --type=NodePort --name=rook-ceph-mgr-a-service --port=8443
# kubectl delete service rook-ceph-mgr-a-service
# 访问地址:https://ip+端口
密码
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo
3.1.4、rook-ceph-mgr-dashboard-Ingress(ingress域名方式访问)
cat > ~/rook-ceph/rook-ceph-mgr-dashboard-Ingress.yml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: rook-ceph-mgr-dashboard-ingress
namespace: rook-ceph
annotations:
kubernetes.io/ingress.class: "nginx"
kubernetes.io/tls-acme: "true"
nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
nginx.ingress.kubernetes.io/server-snippet: |
proxy_ssl_verify off;
spec:
rules:
- host: rook-ceph.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: rook-ceph-mgr-dashboard
port:
name: https-dashboard
tls:
- hosts:
- rook-ceph.huanghuanhui.cloud
secretName: rook-ceph-mgr-dashboard-ingress-tls
EOF
kubectl create secret -n rook-ceph \
tls rook-ceph-mgr-dashboard-ingress-tls \
--key=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key \
--cert=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt
kubectl apply -f ~/rook-ceph/rook-ceph-mgr-dashboard-Ingress.yml
密码
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo
访问地址:rook-ceph.huanghuanhui.cloud
用户密码:admin、()
[root@k8s-master ~]# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
ceph-block (default) rook-ceph.rbd.csi.ceph.com Delete Immediate true 5h38m
ceph-bucket rook-ceph.ceph.rook.io/bucket Delete Immediate false 5h38m
ceph-filesystem rook-ceph.cephfs.csi.ceph.com Delete Immediate true 5h38m
helm 安装 rook-ceph 会自动安装3个sc,推荐使用:ceph-block
# https://github.com/rook/rook/issues/12758
ceph crash prune 3 # 保留最近3天的崩溃日志,并删除3天前的以前的日志
ceph crash prune 0
ceph crash ls
[root@k8s-master ~/rook-ceph-cluster]# pod
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-h6hrh 2/2 Running 0 18m
csi-cephfsplugin-j5h6h 2/2 Running 0 18m
csi-cephfsplugin-lhtt7 2/2 Running 0 18m
csi-cephfsplugin-provisioner-5c4cddd6b-ghpv2 5/5 Running 0 18m
csi-cephfsplugin-provisioner-5c4cddd6b-zknlk 5/5 Running 0 18m
csi-rbdplugin-7mgjv 2/2 Running 0 18m
csi-rbdplugin-pksw6 2/2 Running 0 18m
csi-rbdplugin-provisioner-5c6b576c5d-l47gs 5/5 Running 0 18m
csi-rbdplugin-provisioner-5c6b576c5d-sgtqc 5/5 Running 0 18m
csi-rbdplugin-xcjn8 2/2 Running 0 16s
rook-ceph-crashcollector-k8s-node1-5dc5b587fd-zq5jg 1/1 Running 0 15m
rook-ceph-crashcollector-k8s-node2-7f457d645-2h6lf 1/1 Running 0 14m
rook-ceph-crashcollector-k8s-node3-69d797bd46-bm8vc 1/1 Running 0 14m
rook-ceph-mds-ceph-filesystem-a-7df575df4d-w5zkt 2/2 Running 0 14m
rook-ceph-mds-ceph-filesystem-b-67896bc489-qxp44 2/2 Running 0 14m
rook-ceph-mgr-a-696c6b65f7-k4nng 3/3 Running 0 15m
rook-ceph-mgr-b-765ff4f954-h7fpw 3/3 Running 0 15m
rook-ceph-mon-a-6fcf8f985b-wg6zv 2/2 Running 0 18m
rook-ceph-mon-b-8d768bb94-fdb9r 2/2 Running 0 15m
rook-ceph-mon-c-784d9fc768-z2hs5 2/2 Running 0 15m
rook-ceph-operator-86888fdb75-7h4kl 1/1 Running 0 2m17s
rook-ceph-osd-0-85d4cf449-mq8pz 2/2 Running 0 14m
rook-ceph-osd-1-bfdff5dd-5m8lw 2/2 Running 0 14m
rook-ceph-osd-2-7d4f96f5f5-7k62p 2/2 Running 0 14m
rook-ceph-osd-prepare-k8s-node1-t9r2r 0/1 Completed 0 102s
rook-ceph-osd-prepare-k8s-node2-tr926 0/1 Completed 0 99s
rook-ceph-osd-prepare-k8s-node3-fsdfp 0/1 Completed 0 96s
rook-ceph-rgw-ceph-objectstore-a-5d9fdbbbff-sntfb 2/2 Running 0 13m
rook-ceph-tools-c9b9dd85f-b9g5s 1/1 Running 0 21m
[root@k8s-master ~/rook-ceph-cluster]# svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rook-ceph-mgr ClusterIP 10.103.189.158 <none> 9283/TCP 15m
rook-ceph-mgr-a-service NodePort 10.106.35.196 <none> 8443:31397/TCP 5m49s
rook-ceph-mgr-dashboard ClusterIP 10.105.214.130 <none> 8443/TCP 15m
rook-ceph-mon-a ClusterIP 10.102.21.160 <none> 6789/TCP,3300/TCP 18m
rook-ceph-mon-b ClusterIP 10.101.131.168 <none> 6789/TCP,3300/TCP 15m
rook-ceph-mon-c ClusterIP 10.108.229.248 <none> 6789/TCP,3300/TCP 15m
rook-ceph-rgw-ceph-objectstore ClusterIP 10.105.52.243 <none> 80/TCP 14m
[root@k8s-master ~/rook-ceph-cluster]# sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
ceph-block (default) rook-ceph.rbd.csi.ceph.com Delete Immediate true 21m
ceph-bucket rook-ceph.ceph.rook.io/bucket Delete Immediate false 21m
ceph-filesystem rook-ceph.cephfs.csi.ceph.com Delete Immediate true 21m
nfs-storage k8s-sigs.io/nfs-subdir-external-provisioner Delete Immediate false 40d
[root@k8s-master ~/rook-ceph-cluster]#
四、SpringCloud 业务组件
0、nginx-1.25.1
1、mysql-8.0.22
2、nacos-2.1.0
3、redis-7.2
4、mongo-7.0.0
5、kafka-3.5.1
6、minio
7、xxl-job-2.4.0
4.0、nginx-1.25.3
mkdir -p ~/nginx-yml
kubectl create ns nginx
cat > ~/nginx-yml/nginx-Deployment.yml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: nginx
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: nginx
topologyKey: kubernetes.io/hostname
containers:
- name: nginx
image: nginx:1.25.3-alpine
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
env:
- name: TZ
value: "Asia/Shanghai"
resources:
limits:
cpu: "2"
memory: "4Gi"
requests:
cpu: "1"
memory: "2Gi"
volumeMounts:
- name: nginx-html-volume
mountPath: /usr/share/nginx/html
volumes:
- name: nginx-html-volume
persistentVolumeClaim:
claimName: nginx-html-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nginx-html-pvc
namespace: nginx
spec:
storageClassName: "nfs-storage"
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 10Gi
EOF
kubectl apply -f ~/nginx-yml/nginx-Deployment.yml
cat > ~/nginx-yml/nginx-Service.yml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: nginx-service
namespace: nginx
spec:
selector:
app: nginx
type: NodePort
ports:
- protocol: TCP
port: 80
targetPort: 80
EOF
kubectl apply -f ~/nginx-yml/nginx-Service.yml
cat > ~/nginx-yml/nginx-Ingress.yml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: nginx-ingress
namespace: nginx
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: 'false'
spec:
ingressClassName: nginx
rules:
- host: www.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx-service
port:
number: 80
tls:
- hosts:
- www.huanghuanhui.cloud
secretName: nginx-ingress-tls
EOF
kubectl create secret -n nginx \
tls nginx-ingress-tls \
--key=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key \
--cert=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt
kubectl apply -f ~/nginx-yml/nginx-Ingress.yml
cat > ~/nginx-yml/nginx-HPA.yml << 'EOF'
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-deployment
minReplicas: 1
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
EOF
4.1、mysql-8.0.28
wget https://cdn.mysql.com/archives/mysql-8.0/mysql-8.0.28-linux-glibc2.12-x86_64.tar.xz
yum -y install expect
cat > ~/install-mysql-8.0.28.sh << 'eof'
useradd mysql -r -s /sbin/nologin
tar xf ~/mysql-8.0.28-linux-glibc2.12-x86_64.tar.xz
mv ~/mysql-8.0.28-linux-glibc2.12-x86_64 /usr/local/mysql
cd /usr/local/mysql
# 阿里云模板(8.0.28)
cat > my.cnf << 'EOF'
[client]
port=3306
socket=/usr/local/mysql/mysql.sock
[mysql]
socket=/usr/local/mysql/mysql.sock
[mysqld]
user=mysql
port=3306
basedir=/usr/local/mysql
datadir=/usr/local/mysql/data
socket=/usr/local/mysql/mysql.sock
pid-file=/usr/local/mysql/mysqld.pid
admin_address='127.0.0.1'
admin_port=33062
innodb_flush_log_at_trx_commit=2
loose_recycle_scheduler=OFF
innodb_buffer_pool_load_at_startup=ON
loose_performance_schema_max_index_stat=0
bulk_insert_buffer_size=4194304
show_old_temporals=OFF
ft_query_expansion_limit=20
innodb_old_blocks_time=1000
loose_ccl_queue_hot_delete=OFF
loose_rds_audit_log_event_buffer_size=8192
thread_stack=1048576
loose_performance_schema_max_digest_sample_age=0
innodb_thread_concurrency=0
loose_innodb_rds_flashback_task_enabled=OFF
default_time_zone=+8:00
loose_performance_schema_max_digest_length=0
loose_recycle_bin=OFF
optimizer_search_depth=62
max_sort_length=1024
max_binlog_cache_size=18446744073709547520
init_connect=''
innodb_adaptive_max_sleep_delay=150000
innodb_purge_rseg_truncate_frequency=128
innodb_lock_wait_timeout=50
loose_json_document_max_depth=100
innodb_compression_pad_pct_max=50
max_connections=2520
loose_binlog_parallel_flush=OFF
#opt_tablestat=OFF
max_execution_time=0
event_scheduler=ON
innodb_flush_method=O_DIRECT
loose_performance_schema_accounts_size=0
loose_optimizer_trace_features=greedy_search=on,range_optimizer=on,dynamic_range=on,repeated_subselect=on
innodb_purge_batch_size=300
loose_performance_schema_events_statements_history_size=0
avoid_temporal_upgrade=OFF
loose_group_replication_flow_control_member_quota_percent=0
innodb_sync_array_size=1
binlog_transaction_dependency_history_size=500000
net_read_timeout=30
end_markers_in_json=OFF
loose_performance_schema_hosts_size=0
loose_innodb_numa_interleave=ON
loose_performance_schema_max_cond_instances=0
max_binlog_stmt_cache_size=18446744073709547520
innodb_checksum_algorithm=crc32
loose_performance_schema_events_waits_history_long_size=0
innodb_ft_enable_stopword=ON
loose_innodb_undo_retention=0
#opt_indexstat=OFF
disconnect_on_expired_password=ON
default_storage_engine=InnoDB
loose_group_replication_flow_control_min_quota=0
loose_performance_schema_session_connect_attrs_size=0
#innodb_data_file_purge_max_size=128
innodb_ft_result_cache_limit=2000000000
explicit_defaults_for_timestamp=OFF
ft_max_word_len=84
innodb_autoextend_increment=64
sql_mode=ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
innodb_stats_transient_sample_pages=8
# table_open_cache={LEAST(DBInstanceClassMemory/1073741824*512, 8192)}
loose_performance_schema_max_rwlock_classes=0
range_optimizer_max_mem_size=8388608
loose_innodb_rds_faster_ddl=ON
innodb_status_output=OFF
innodb_log_compressed_pages=OFF
slave_net_timeout=60
max_points_in_geometry=65536
max_prepared_stmt_count=16382
wait_timeout=86400
loose_group_replication_flow_control_mode=DISABLED
innodb_print_all_deadlocks=OFF
loose_thread_pool_size=1
binlog_stmt_cache_size=32768
transaction_isolation=READ-COMMITTED
optimizer_trace_limit=1
innodb_max_purge_lag=0
innodb_buffer_pool_dump_pct=25
max_sp_recursion_depth=0
updatable_views_with_limit=YES
local_infile=ON
loose_opt_rds_last_error_gtid=ON
innodb_ft_max_token_size=84
loose_thread_pool_enabled=ON
innodb_adaptive_hash_index=OFF
net_write_timeout=60
flush_time=0
character_set_filesystem=binary
loose_performance_schema_max_statement_classes=0
key_cache_division_limit=100
#innodb_data_file_purge=ON
innodb_read_ahead_threshold=56
loose_optimizer_switch=index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on,use_index_extensions=on
loose_performance_schema_max_socket_classes=0
innodb_monitor_disable=
loose_performance_schema_max_program_instances=0
innodb_adaptive_flushing_lwm=10
innodb_log_checksums=ON
innodb_ft_sort_pll_degree=2
log_slow_admin_statements=OFF
innodb_stats_on_metadata=OFF
stored_program_cache=256
group_concat_max_len=1024
innodb_rollback_segments=128
loose_information_schema_stats_expiry=86400
innodb_commit_concurrency=0
# table_definition_cache={LEAST(DBInstanceClassMemory/1073741824*512, 8192)}
auto_increment_increment=1
max_seeks_for_key=18446744073709500000
#performance_point_iostat_volume_size=10000
loose_persist_binlog_to_redo=OFF
loose_ccl_queue_hot_update=OFF
back_log=3000
binlog_transaction_dependency_tracking=WRITESET
loose_recycle_bin_retention=604800
innodb_io_capacity_max=40000
loose_performance_schema_events_transactions_history_size=0
min_examined_row_limit=0
loose_performance_schema_events_transactions_history_long_size=0
sync_relay_log_info=10000
innodb_stats_auto_recalc=ON
max_connect_errors=100
loose_performance_schema_max_file_classes=0
innodb_change_buffering=all
loose_opt_rds_enable_show_slave_lag=ON
loose_group_replication_flow_control_min_recovery_quota=0
loose_performance_schema_max_statement_stack=0
max_join_size=18446744073709551615
loose_validate_password_length=8
innodb_max_purge_lag_delay=0
loose_optimizer_trace=enabled=off,one_line=off
default_week_format=0
innodb_cmp_per_index_enabled=OFF
host_cache_size=644
auto_increment_offset=1
ft_min_word_len=4
default_authentication_plugin=mysql_native_password
loose_performance_schema_max_sql_text_length=0
slave_type_conversions=
loose_group_replication_flow_control_certifier_threshold=25000
optimizer_trace_offset=-1
loose_force_memory_to_innodb=OFF
character_set_server=utf8
innodb_adaptive_flushing=ON
#performance_point_iostat_interval=2
innodb_monitor_enable=
loose_group_replication_flow_control_applier_threshold=25000
table_open_cache_instances=16
innodb_buffer_pool_instances=8
loose_multi_blocks_ddl_count=0
loose_performance_schema_max_table_instances=0
loose_group_replication_flow_control_release_percent=50
loose_innodb_undo_space_reserved_size=0
innodb_log_file_size=1500M
lc_time_names=en_US
sync_master_info=10000
innodb_compression_level=6
loose_innodb_log_optimize_ddl=OFF
loose_performance_schema_max_prepared_statements_instances=0
loose_innodb_log_write_ahead_size=4096
loose_performance_schema_max_mutex_classes=0
innodb_online_alter_log_max_size=134217728
key_cache_block_size=1024
mysql_native_password_proxy_users=OFF
loose_innodb_rds_chunk_flush_interval=100
query_alloc_block_size=8192
loose_performance_schema_max_socket_instances=0
#innodb_purge_threads={LEAST(DBInstanceClassMemory/1073741824, 8)}
loose_group_replication_transaction_size_limit=150000000
innodb_compression_failure_threshold_pct=5
loose_performance_schema_error_size=0
binlog_rows_query_log_events=OFF
loose_innodb_undo_space_supremum_size=10240
innodb_stats_persistent_sample_pages=20
innodb_ft_total_cache_size=640000000
eq_range_index_dive_limit=100
loose_sql_safe_updates=OFF
loose_performance_schema_events_stages_history_long_size=0
connect_timeout=10
div_precision_increment=4
#performance_point_lock_rwlock_enabled=ON
sync_binlog=1000
innodb_stats_method=nulls_equal
lock_wait_timeout=31536000
innodb_deadlock_detect=ON
innodb_write_io_threads=4
loose_ccl_queue_bucket_count=4
ngram_token_size=2
loose_performance_schema_max_table_lock_stat=0
loose_performance_schema_max_table_handles=0
loose_performance_schema_max_memory_classes=0
loose_ignore_index_hint_error=OFF
loose_innodb_rds_free_resize=ON
innodb_ft_enable_diag_print=OFF
innodb_io_capacity=20000
slow_launch_time=2
innodb_table_locks=ON
loose_performance_schema_events_stages_history_size=0
innodb_stats_persistent=ON
tmp_table_size=2097152
loose_performance_schema_max_thread_classes=0
net_retry_count=10
innodb_ft_cache_size=8000000
binlog_cache_size=1M
innodb_max_dirty_pages_pct=75
innodb_disable_sort_file_cache=OFF
# innodb_lru_scan_depth={LEAST(DBInstanceClassMemory/1048576/8, 8192)}
loose_performance_schema_max_mutex_instances=0
long_query_time=1
interactive_timeout=7200
innodb_read_io_threads=4
transaction_prealloc_size=4096
open_files_limit=655350
loose_performance_schema_max_metadata_locks=0
temptable_max_ram=1073741824
# innodb_open_files={LEAST(DBInstanceClassCPU*500, 8000)}
max_heap_table_size=67108864
loose_performance_schema_digests_size=0
automatic_sp_privileges=ON
max_user_connections=2000
innodb_random_read_ahead=OFF
loose_group_replication_flow_control_max_commit_quota=0
delay_key_write=ON
general_log=OFF
log_bin_use_v1_row_events=1
loose_performance_schema_setup_actors_size=0
#innodb_data_file_purge_interval=100
innodb_buffer_pool_dump_at_shutdown=ON
query_prealloc_size=8192
key_cache_age_threshold=300
loose_performance_schema_setup_objects_size=0
transaction_alloc_block_size=8192
optimizer_prune_level=1
loose_performance_schema_max_file_instances=0
innodb_max_dirty_pages_pct_lwm=0
innodb_status_output_locks=OFF
binlog_row_image=full
innodb_change_buffer_max_size=25
innodb_optimize_fulltext_only=OFF
loose_performance_schema_max_file_handles=0
loose_performance_schema_users_size=0
innodb_max_undo_log_size=1073741824
slave_parallel_type=LOGICAL_CLOCK
innodb_sync_spin_loops=30
loose_group_replication_flow_control_period=1
loose_internal_tmp_mem_storage_engine=MEMORY
lower_case_table_names=0
sha256_password_proxy_users=OFF
innodb_flush_sync=ON
#tls_version=TLSv1,TLSv1.1,TLSv1.2
loose_performance_schema_max_rwlock_instances=0
delayed_insert_timeout=300
preload_buffer_size=32768
concurrent_insert=1
block_encryption_mode="aes-128-ecb"
slow_query_log=ON
net_buffer_length=16384
#innodb_buffer_pool_size={DBInstanceClassMemory*3/4}
delayed_insert_limit=100
delayed_queue_size=1000
session_track_gtids=OFF
innodb_thread_sleep_delay=10000
sql_require_primary_key=OFF
innodb_old_blocks_pct=37
innodb_sort_buffer_size=1048576
innodb_page_cleaners=8
loose_innodb_parallel_read_threads=1
innodb_spin_wait_delay=6
myisam_sort_buffer_size=262144
innodb_concurrency_tickets=5000
loose_performance_schema_max_cond_classes=0
loose_innodb_doublewrite_pages=64
transaction_write_set_extraction=XXHASH64
binlog_checksum=CRC32
loose_performance_schema_max_stage_classes=0
loose_performance_schema_events_statements_history_long_size=0
loose_ccl_queue_bucket_size=64
max_length_for_sort_data=1024
max_error_count=64
innodb_strict_mode=OFF
binlog_order_commits=OFF
performance_schema={LEAST(DBInstanceClassMemory/8589934592, 1)}
innodb_ft_min_token_size=3
join_buffer_size=1M
optimizer_trace_max_mem_size=16384
innodb_autoinc_lock_mode=2
innodb_rollback_on_timeout=OFF
loose_performance_schema_max_thread_instances=0
max_write_lock_count=102400
loose_innodb_trx_resurrect_table_lock_accelerate=OFF
master_verify_checksum=OFF
innodb_ft_num_word_optimize=2000
log_error_verbosity=3
log_throttle_queries_not_using_indexes=0
loose_group_replication_flow_control_hold_percent=10
low_priority_updates=0
range_alloc_block_size=4096
sort_buffer_size=2M
max_allowed_packet=1073741824
read_buffer_size=1M
thread_cache_size=100
loose_performance_schema_events_waits_history_size=0
loose_thread_pool_oversubscribe=32
log_queries_not_using_indexes=OFF
innodb_flush_neighbors=0
EOF
chown -R mysql.mysql /usr/local/mysql
./bin/mysqld --defaults-file=/usr/local/mysql/my.cnf --initialize --user=mysql 2>&1 | tee password.txt
mysql_password=`awk '/A temporary password/{print $NF}' /usr/local/mysql/password.txt`
bin/mysql_ssl_rsa_setup --datadir=/usr/local/mysql/data
cat > /usr/lib/systemd/system/mysqld.service << 'EOF'
[Unit]
Description=MySQL Server
After=network.target
After=syslog.target
[Service]
User=mysql
Group=mysql
Type=notify
TimeoutSec=0
PermissionsStartOnly=true
# 修改这里的 ExecStart 为指定的 my.cnf 文件路径
ExecStart=/usr/local/mysql/bin/mysqld --defaults-file=/usr/local/mysql/my.cnf $MYSQLD_OPTS
EnvironmentFile=-/etc/sysconfig/mysql
LimitNOFILE = 10000
Restart=on-failure
RestartPreventExitStatus=1
Environment=MYSQLD_PARENT_PID=1
PrivateTmp=false
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable mysqld
systemctl start mysqld
./bin/mysqladmin -S /usr/local/mysql/mysql.sock -uroot password 'Admin@2024' -p$mysql_password
ln -sv /usr/local/mysql/bin/* /usr/bin/ &> /dev/null
expect &> /dev/null <<EOF
spawn ./bin/mysql_secure_installation -S /usr/local/mysql/mysql.sock
expect {
"Enter password" { send "Admin@2024\n";exp_continue }
"Press y" { send "n\n";exp_continue }
"Change the password" { send "n\n";exp_continue }
"Remove anonymous users" { send "y\n";exp_continue }
"Disallow root login" { send "n\n";exp_continue }
"Remove test database" { send "y\n";exp_continue }
"Reload privilege" { send "y\n" }
}
EOF
mysql -S /usr/local/mysql/mysql.sock -pAdmin@2024 -e "update mysql.user set host = '%' where user = 'root';"
mysql -S /usr/local/mysql/mysql.sock -pAdmin@2024 -e "flush privileges;"
mysql -S /usr/local/mysql/mysql.sock -pAdmin@2024 -e "select host,user from mysql.user;"
systemctl stop mysqld && systemctl start mysqld
echo "数据库安装成功"
eof
sh -x ~/install-mysql-8.0.28.sh
mysql -h 192.168.1.201 -u root -P 3306 -pAdmin@2024 -e "select host,user from mysql.user;"
cat > ~/remove-mysql-8.0.28.sh << 'EOF'
systemctl stop mysqld.service
rm -rf /usr/local/mysql
echo "数据库卸载成功"
EOF
sh -x ~/remove-mysql-8.0.28.sh
4.2、nacos-2.2.3
mkdir -p ~/nacos-yml
kubectl create ns nacos
cat > ~/nacos-yml/nacos-mysql.yml << 'EOF'
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: nacos
spec:
serviceName: mysql-headless
replicas: 1
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:5.7.40
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: "2"
memory: "4Gi"
requests:
cpu: "1"
memory: "2Gi"
ports:
- name: mysql
containerPort: 3306
env:
- name: MYSQL_ROOT_PASSWORD
value: "Admin@2024"
- name: MYSQL_DATABASE
value: "nacos"
- name: MYSQL_USER
value: "nacos"
- name: MYSQL_PASSWORD
value: "nacos@2024"
volumeMounts:
- name: nacos-mysql-data-pvc
mountPath: /var/lib/mysql
- mountPath: /etc/localtime
name: localtime
volumes:
- name: localtime
hostPath:
path: /etc/localtime
volumeClaimTemplates:
- metadata:
name: nacos-mysql-data-pvc
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: nfs-storage
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: Service
metadata:
name: mysql-headless
namespace: nacos
labels:
app: mysql
spec:
clusterIP: None
ports:
- port: 3306
name: mysql
targetPort: 3306
selector:
app: mysql
EOF
kubectl apply -f ~/nacos-yml/nacos-mysql.yml
https://github.com/alibaba/nacos/blob/2.2.3/config/src/main/resources/META-INF/nacos-db.sql(sql地址)
cd ~/nacos-yml && wget https://github.com/alibaba/nacos/raw/2.2.3/config/src/main/resources/META-INF/nacos-db.sql
kubectl cp nacos-db.sql mysql-0:/
kubectl exec mysql-0 -- mysql -pAdmin@2024 -e "use nacos;source /nacos-db.sql;"
kubectl exec mysql-0 -- mysql -pAdmin@2024 -e "use nacos;show tables;"
cat > ~/nacos-yml/nacos-v2.2.3-yml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: nacos-headless
namespace: nacos
labels:
app: nacos
spec:
clusterIP: None
ports:
- port: 8848
name: server
targetPort: 8848
- port: 9848
name: client-rpc
targetPort: 9848
- port: 9849
name: raft-rpc
targetPort: 9849
## 兼容1.4.x版本的选举端口
- port: 7848
name: old-raft-rpc
targetPort: 7848
selector:
app: nacos
---
apiVersion: v1
kind: Service
metadata:
name: nacos
namespace: nacos
labels:
app: nacos
spec:
type: NodePort
ports:
- port: 8848
name: server
targetPort: 8848
nodePort: 31000
- port: 9848
name: client-rpc
targetPort: 9848
nodePort: 32000
- port: 9849
name: raft-rpc
nodePort: 32001
## 兼容1.4.x版本的选举端口
- port: 7848
name: old-raft-rpc
targetPort: 7848
nodePort: 30000
selector:
app: nacos
---
apiVersion: v1
kind: ConfigMap
metadata:
name: nacos-cm
namespace: nacos
data:
mysql.host: "mysql-headless.nacos.svc.cluster.local"
mysql.db.name: "nacos"
mysql.port: "3306"
mysql.user: "nacos"
mysql.password: "nacos@2024"
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nacos
namespace: nacos
spec:
serviceName: nacos-headless
replicas: 3
template:
metadata:
labels:
app: nacos
annotations:
pod.alpha.kubernetes.io/initialized: "true"
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- nacos-headless
topologyKey: "kubernetes.io/hostname"
containers:
- name: k8snacos
image: nacos/nacos-server:v2.2.3
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 8
memory: 8Gi
requests:
cpu: 2
memory: 2Gi
ports:
- containerPort: 8848
name: client
- containerPort: 9848
name: client-rpc
- containerPort: 9849
name: raft-rpc
- containerPort: 7848
name: old-raft-rpc
env:
- name: NACOS_AUTH_ENABLE
value: "true"
- name: NACOS_AUTH_IDENTITY_KEY
value: "nacosAuthKey"
- name: NACOS_AUTH_IDENTITY_VALUE
value: "nacosSecurtyValue"
- name: NACOS_AUTH_TOKEN
value: "SecretKey012345678901234567890123456789012345678901234567890123456789"
- name: NACOS_AUTH_TOKEN_EXPIRE_SECONDS
value: "18000"
- name: NACOS_REPLICAS
value: "3"
- name: MYSQL_SERVICE_HOST
valueFrom:
configMapKeyRef:
name: nacos-cm
key: mysql.host
- name: MYSQL_SERVICE_DB_NAME
valueFrom:
configMapKeyRef:
name: nacos-cm
key: mysql.db.name
- name: MYSQL_SERVICE_PORT
valueFrom:
configMapKeyRef:
name: nacos-cm
key: mysql.port
- name: MYSQL_SERVICE_USER
valueFrom:
configMapKeyRef:
name: nacos-cm
key: mysql.user
- name: MYSQL_SERVICE_PASSWORD
valueFrom:
configMapKeyRef:
name: nacos-cm
key: mysql.password
- name: SPRING_DATASOURCE_PLATFORM
value: "mysql"
- name: MODE
value: "cluster"
- name: NACOS_SERVER_PORT
value: "8848"
- name: PREFER_HOST_MODE
value: "hostname"
- name: NACOS_SERVERS
value: "nacos-0.nacos-headless.nacos.svc.cluster.local:8848 nacos-1.nacos-headless.nacos.svc.cluster.local:8848 nacos-2.nacos-headless.nacos.svc.cluster.local:8848"
selector:
matchLabels:
app: nacos
EOF
kubectl apply -f ~/nacos-yml/nacos-v2.2.3-yml
cat > ~/nacos-yml/nacos-Ingress.yml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: nacos-ingress
namespace: nacos
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
nginx.ingress.kubernetes.io/proxy-body-size: '4G'
spec:
ingressClassName: nginx
rules:
- host: nacos.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nacos-headless
port:
number: 8848
tls:
- hosts:
- nacos.huanghuanhui.cloud
secretName: nacos-ingress-tls
EOF
kubectl create secret -n nacos \
tls nacos-ingress-tls \
--key=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key \
--cert=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt
kubectl apply -f ~/nacos-yml/nacos-Ingress.yml
kubectl exec -it nacos-0 bash
# 进容器里面执行
curl -X POST 'http://nacos-headless.nacos.svc.cluster.local:8848/nacos/v1/ns/instance?serviceName=nacos.naming.serviceName&ip=20.18.7.10&port=8080'
# 容器外执行
curl -X POST 'http://192.168.1.201:31000/nacos/v1/ns/instance?serviceName=nacos.naming.serviceName&ip=20.18.7.10&port=8080'
代码连接地址:nacos-headless.nacos.svc.cluster.local:8848
访问地址ip:http://192.168.1.201:31000/nacos/#/login
访问地址域名:https://nacos.huanghuanhui.cloud/nacos/#/login
默认用户密码:nacos、nacos
用户密码:nacos、nacos@2024
4.3、redis-7.2.4
4.3.1、(单机)
mkdir -p ~/redis-yml
kubectl create ns redis
cat > ~/redis-yml/redis-ConfigMap.yml << 'EOF'
kind: ConfigMap
apiVersion: v1
metadata:
name: redis-cm
namespace: redis
labels:
app: redis
data:
redis.conf: |-
dir /data
port 6379
bind 0.0.0.0
appendonly yes
protected-mode no
requirepass Admin@2024
pidfile /data/redis-6379.pid
EOF
kubectl apply -f ~/redis-yml/redis-ConfigMap.yml
cat > ~/redis-yml/redis-StatefulSet.yml << 'EOF'
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
namespace: redis
spec:
replicas: 1
serviceName: redis
selector:
matchLabels:
app: redis
template:
metadata:
name: redis
labels:
app: redis
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: redis
topologyKey: kubernetes.io/hostname
containers:
- name: redis
image: redis:7.2.4-alpine
imagePullPolicy: IfNotPresent
env:
- name: TZ
value: Asia/Shanghai
command:
- "sh"
- "-c"
- "redis-server /etc/redis/redis.conf"
ports:
- containerPort: 6379
name: tcp-redis
protocol: TCP
resources:
limits:
cpu: "2"
memory: "4Gi"
requests:
cpu: "1"
memory: "2Gi"
volumeMounts:
- name: redis-data
mountPath: /data
- name: config
mountPath: /etc/redis/redis.conf
subPath: redis.conf
volumes:
- name: config
configMap:
name: redis-cm
volumeClaimTemplates:
- metadata:
name: redis-data
spec:
storageClassName: "nfs-storage"
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 2Ti
EOF
kubectl apply -f ~/redis-yml/redis-StatefulSet.yml
cat > ~/redis-yml/redis-Service.yml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: redis
namespace: redis
spec:
type: NodePort
ports:
- name: redis
port: 6379
targetPort: 6379
protocol: TCP
nodePort: 30078
selector:
app: redis
EOF
kubectl apply -f ~/redis-yml/redis-Service.yml
访问地址:ip:192.168.1.213(端口30078)
密码:Admin@2024
4.3.2、(分片集群)
helm 安装 bitnami-redis-cluster
版本:redis-7.2.0
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
helm search repo bitnami/redis
helm pull bitnami/redis-cluster --version 8.8.0 --untar
cat > ~/redis-cluster/values-prod.yml << EOF
global:
storageClass: "nfs-storage"
redis:
password: "Admin@2024"
redis:
livenessProbe:
enabled: true
initialDelaySeconds: 60
periodSeconds: 5
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
enabled: true
initialDelaySeconds: 60
periodSeconds: 5
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 5
persistence:
enabled: true
size: 100Gi
service:
ports:
redis: 6379
type: NodePort
nodePorts:
redis: 30079
metrics:
enabled: true
EOF
kubectl create ns redis-cluster
helm upgrade --install --namespace redis-cluster redis-cluster -f ./values-prod.yml .
kubectl logs -f redis-cluster-0 -c redis-cluster
kubectl get secret --namespace "redis" redis-cluster -o jsonpath="{.data.redis-password}" | base64 -d
kubectl exec -it redis-cluster-0 -- redis-cli -c -h redis-cluster -a Admin@2024
kubectl exec -it redis-cluster-0 -- redis-cli -c -h 192.168.1.201 -p 30079 -a Admin@2024
kubectl expose pod redis-cluster-0 --type=NodePort --name=redis-cluster-0
kubectl expose pod redis-cluster-1 --type=NodePort --name=redis-cluster-1
kubectl expose pod redis-cluster-2 --type=NodePort --name=redis-cluster-2
kubectl expose pod redis-cluster-3 --type=NodePort --name=redis-cluster-3
kubectl expose pod redis-cluster-4 --type=NodePort --name=redis-cluster-4
kubectl expose pod redis-cluster-5 --type=NodePort --name=redis-cluster-5
# 查看集群状态
> cluster info
> cluster nodes
代码连接地址:redis-cluster-headless.redis-cluster.svc.cluster.local:6379
密码:Admin@2024
RedisInsight(可视化工具)
cat >> ~/redis-cluster/RedisInsight-Deployment.yml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: redisinsight
namespace: redis-cluster
spec:
replicas: 1
selector:
matchLabels:
app: redisinsight
template:
metadata:
labels:
app: redisinsight
spec:
containers:
- name: redisinsight
image: redislabs/redisinsight:1.14.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8001
volumeMounts:
- name: db
mountPath: /db
volumes:
- name: db
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: redisinsight-service
namespace: redis-cluster
spec:
type: NodePort
ports:
- port: 8001
targetPort: 8001
nodePort: 31888
selector:
app: redisinsight
EOF
https://github.com/RedisInsight/RedisInsight/issues/1931
pod 重启了要重新连接
cat > ~/redis-cluster/RedisInsight-Ingress.yml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: redisinsight-ingress
namespace: redis-cluster
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
nginx.ingress.kubernetes.io/proxy-body-size: '4G'
spec:
ingressClassName: nginx
rules:
- host: redisinsight.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: redisinsight-service
port:
number: 8001
tls:
- hosts:
- redisinsight.huanghuanhui.cloud
secretName: redisinsight-ingress-tls
EOF
kubectl create secret -n redis-cluster \
tls redisinsight-ingress-tls \
--key=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key \
--cert=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt
kubectl apply -f ~/redis-cluster/RedisInsight-Ingress.yml
访问地址:redisinsight.huanghuanhui.cloud
4.4、mongo-7.0.5
4.4.1、(单机)
mkdir -p ~/mongodb-yml
kubectl create ns mongodb
cat > ~/mongodb-yml/mongodb-StatefulSet.yml << 'EOF'
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mongodb
namespace: mongodb
spec:
replicas: 1
serviceName: mongodb-headless
selector:
matchLabels:
app: mongodb
template:
metadata:
labels:
app: mongodb
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: mongodb
topologyKey: kubernetes.io/hostname
containers:
- name: mongodb
image: mongo:7.0.5
imagePullPolicy: IfNotPresent
env:
- name: MONGO_INITDB_ROOT_USERNAME
value: root
- name: MONGO_INITDB_ROOT_PASSWORD
value: 'Admin@2024'
ports:
- containerPort: 27017
volumeMounts:
- name: mongo-data
mountPath: /data/db
- mountPath: /etc/localtime
name: localtime
volumes:
- name: mongo-data
persistentVolumeClaim:
claimName: mongodb-pvc
- name: localtime
hostPath:
path: /etc/localtime
volumeClaimTemplates:
- metadata:
name: mongo-data
spec:
storageClassName: "nfs-storage"
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 2Ti
EOF
kubectl apply -f ~/mongodb-yml/mongodb-StatefulSet.yml
cat > ~/mongodb-yml/mongodb-Service.yml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: mongodb-headless
namespace: mongodb
labels:
app: mongodb
spec:
clusterIP: None
ports:
- port: 27017
name: mongodb
targetPort: 27017
selector:
app: mongodb
---
apiVersion: v1
kind: Service
metadata:
name: mongodb-service
namespace: mongodb
spec:
type: NodePort
ports:
- name: mongodb
port: 27017
targetPort: 27017
protocol: TCP
nodePort: 30017
selector:
app: mongodb
EOF
kubectl apply -f ~/mongodb-yml/mongodb-Service.yml
代码连接地址:mongodb-headless.mongodb.svc.cluster.local:27017
Navicat 连接地址:ip:192.168.1.201、端口:30017
用户密码:root、Admin@2024(默认数据库admin)
4.4.2、(分片集群)
helm 安装 bitnami-mongodb-sharded
版本:mongodb-6.0.9
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
helm search repo bitnami/mongodb-sharded
helm pull bitnami/mongodb-sharded --version 6.6.2 --untar
最小分片
cat > ~/mongodb-sharded/values-prod.yml << EOF
global:
storageClass: "nfs-storage"
auth:
rootPassword: "Admin@2024"
service:
ports:
mongodb: 27017
type: NodePort
nodePorts:
mongodb: 30018
metrics:
enabled: true
EOF
推荐分片
cat > ~/mongodb-sharded/values-prod.yml << EOF
global:
storageClass: "nfs-storage"
auth:
rootPassword: "Admin@2024"
##### 配置多副本 #######
shards: 4 # 分片数
shardsvr:
dataNode:
replicaCount: 2 # 分片数副本数
persistence:
enabled: true
size: 100Gi
configsvr: # 配置服务器
replicaCount: 3
persistence:
enabled: true
size: 10Gi
mongos: # 路由
replicaCount: 3
##### 配置多副本 #######
service:
ports:
mongodb: 27017
type: NodePort
nodePorts:
mongodb: 30018
metrics:
enabled: true
image:
pullPolicy: IfNotPresent
EOF
kubectl create ns mongodb-sharded
helm upgrade --install --namespace mongodb-sharded mongodb-sharded -f ./values-prod.yml .
kubectl logs -f mongodb-sharded-shard0-data-0 -c mongodb
kubectl get secret --namespace mongodb-sharded mongodb-sharded -o jsonpath="{.data.mongodb-root-password}" | base64 -d
kubectl exec -it mongodb-sharded-shard0-data-0 -- mongosh --host mongodb-sharded --port 27017 --authenticationDatabase admin -u root -p Admin@2024
kubectl exec -it mongodb-sharded-shard0-data-0 -- mongosh --host 192.168.1.201 --port 30018 --authenticationDatabase admin -u root -p Admin@2024
代码连接地址:mongodb-sharded-headless.mongodb-sharded..svc.cluster.local:27017
Navicat 连接地址:ip:192.168.1.201、端口:30018
用户密码:root、Admin@2024(默认数据库admin)
4.5、kafka-3.5.1
4.6、minio
mkdir -p ~/minio-yml
kubectl create ns minio
kubectl label node k8s-node1 node=minio
kubectl label node k8s-node2 node=minio
cat > ~/minio-yml/minio-StatefulSet.yml << 'EOF'
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: minio
namespace: minio
spec:
serviceName: "minio-headless"
replicas: 4
selector:
matchLabels:
app: minio
template:
metadata:
labels:
app: minio
spec:
nodeSelector:
node: minio
containers:
- name: minio
env:
- name: MINIO_ROOT_USER
value: "admin"
- name: MINIO_ROOT_PASSWORD
value: "Admin@2024"
image: minio/minio:RELEASE.2024-11-01T18-37-25Z
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- minio server --console-address ":5000" http://minio-{0...3}.minio-headless.minio.svc.cluster.local:9000/data
ports:
- name: data
containerPort: 9000
protocol: "TCP"
- name: console
containerPort: 5000
protocol: "TCP"
volumeMounts:
- name: minio-data
mountPath: /data
- name: time-mount
mountPath: /etc/localtime
volumes:
- name: time-mount
hostPath:
path: /usr/share/zoneinfo/Asia/Shanghai
volumeClaimTemplates:
- metadata:
name: minio-data
spec:
storageClassName: "nfs-storage"
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Ti
EOF
kubectl apply -f ~/minio-yml/minio-StatefulSet.yml
cat > ~/minio-yml/minio-Service.yml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: minio-headless
namespace: minio
labels:
app: minio
spec:
clusterIP: None
ports:
- port: 9000
name: data
- port: 5000
name: console
selector:
app: minio
---
apiVersion: v1
kind: Service
metadata:
name: minio-service
namespace: minio
spec:
type: NodePort
ports:
- name: data
nodePort: 31900
port: 9000
targetPort: 9000
protocol: TCP
nodePort:
- name: console
nodePort: 31901
port: 5000
targetPort: 5000
protocol: TCP
nodePort:
selector:
app: minio
EOF
kubectl apply -f ~/minio-yml/minio-Service.yml
cat > ~/minio-yml/minio-console-Ingress.yml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: minio-console-ingress
namespace: minio
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
nginx.ingress.kubernetes.io/proxy-body-size: '4G'
spec:
ingressClassName: nginx
rules:
- host: minio-console.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: minio-service
port:
number: 5000
tls:
- hosts:
- minio.huanghuanhui.cloud
secretName: minio-ingress-tls
EOF
cat > ~/minio-yml/minio-Ingress.yml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: minio-ingress
namespace: minio
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
nginx.ingress.kubernetes.io/proxy-body-size: '4G'
spec:
ingressClassName: nginx
rules:
- host: minio.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: minio-service
port:
number: 9000
- host: webstatic.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: minio-service
port:
number: 9000
- host: uploadstatic.huanghuanhui.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: minio-service
port:
number: 9000
tls:
- hosts:
- minio.huanghuanhui.cloud
- webstatic.huanghuanhui.cloud
- uploadstatic.huanghuanhui.cloud
secretName: minio-ingress-tls
EOF
kubectl create secret -n minio \
tls minio-ingress-tls \
--key=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key \
--cert=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt
kubectl apply -f ~/minio-yml/minio-console-Ingress.yml
kubectl apply -f ~/minio-yml/minio-Ingress.yml
控制台访问地址:minio-console.huanghuanhui.cloud
账号密码:admin、Admin@2024
数据访问地址:minio.huanghuanhui.cloud、webstatic.huanghuanhui.cloud、uploadstatic.huanghuanhui.cloud
4.7、xxl-job-2.4.0
mkdir -p ~/xxl-job-yml
kubectl create ns xxl-job
cd ~/xxl-job-yml && wget https://github.com/xuxueli/xxl-job/raw/2.4.0/doc/db/tables_xxl_job.sql
mysql -h 192.168.1.201 -P 3306 -uroot -pAdmin@2024 < tables_xxl_job.sql
cat > ~/xxl-job-yml/xxl-job-admin-Deployment.yml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: xxl-job-admin
namespace: xxl-job
spec:
replicas: 3
selector:
matchLabels:
app: xxl-job-admin
template:
metadata:
labels:
app: xxl-job-admin
spec:
containers:
- name: xxl-job-admin
image: xuxueli/xxl-job-admin:2.4.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
volumeMounts:
- mountPath: /etc/localtime
name: localtime
env:
- name: PARAMS
value: "--spring.datasource.url=jdbc:mysql://192.168.1.201:3306/xxl_job?Unicode=true&characterEncoding=UTF-8&useSSL=false --spring.datasource.username=root --spring.datasource.password=Admin@2024"
volumes:
- name: localtime
hostPath:
path: /etc/localtime
EOF
kubectl apply -f ~/xxl-job-yml/xxl-job-admin-Deployment.yml
cat > ~/xxl-job-yml/xxl-job-admin-Service.yml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: xxl-job-admin-service
namespace: xxl-job
labels:
app: xxl-job-admin
spec:
type: NodePort
ports:
- port: 8080
protocol: TCP
name: http
nodePort: 30008
selector:
app: xxl-job-admin
EOF
kubectl apply -f ~/xxl-job-yml/xxl-job-admin-Service.yml
cat > ~/xxl-job-yml/xxl-job-admin-Ingress.yml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: xxl-job-admin-ingress
namespace: xxl-job
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
nginx.ingress.kubernetes.io/proxy-body-size: '4G'
spec:
ingressClassName: nginx
rules:
- host: www.huanghuanhui.cloud
http:
paths:
- path: /xxl-job-admin
pathType: Prefix
backend:
service:
name: xxl-job-admin-service
port:
number: 8080
tls:
- hosts:
- www.huanghuanhui.cloud
secretName: xxl-job-admin-ingress-tls
EOF
kubectl create secret -n xxl-job \
tls xxl-job-admin-ingress-tls \
--key=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud.key \
--cert=/root/ssl/huanghuanhui.cloud_nginx/huanghuanhui.cloud_bundle.crt
kubectl apply -f ~/xxl-job-yml/xxl-job-admin-Ingress.yml
web访问地址:www.huanghuanhui.cloud/xxl-job-admin
默认账号密码: admin、123456
账号密码: admin、Admin@2024