8. Kubernetes的服务调度

本文详细介绍了Kubernetes中的健康检查机制,包括进程级和业务级健康检查,以及三种类型的探针。接着讨论了调度器Scheduler的工作原理和调度策略,如亲和性和反亲和性、污点容忍。最后,文章探讨了不同的部署策略,如重建、滚动、蓝绿和金丝雀部署,及其在实际操作中的应用和注意事项。
摘要由CSDN通过智能技术生成

健康检查

在Kubernetes中,系统和应用程序的健康检查是由Kubelet来完成的。

健康检查包含容器的livenessProbe(活性探针)和服务的readinessProbe(就绪探针)。livenessProbe确保应用进程正常运行;readinessProbe确保应用进程可以对外提供服务。

  • 进程级健康检查:

最简单的健康检查是进程级的健康检查,即检验容器进程是否存活。健康检查的监控粒度是在Kubernetes集群中运行的单一容器。Kubelet会定期通过Docker Daemon获取所有Docker进程的运行情况,如果发现某个Docker容器未正常运行,则重新启动该容器进程。目前,进程级的健康检查都是默认启用的。

  • 业务级健康检查:

在实际场景下,仅仅使用进程级健康检查还远远不够。有时,从Docker的角度来看,容器依旧在运行,即入口进程(pid为1)在运行,而应用程序可能已经退出,此时容器就无法正常响应用户的业务,这是无法接受的。

为了解决以上问题,Kubernetes引人了一个在容器内执行的livenessProbe(活性探针)的概念,以支持用户自己实现应用业务级的健康检查。这些检查项由Kubelet代为执行,以确保用户的应用程序正确运转,至于什么样的状态才算“正确”,则由用户自己定义。

Kubernetes支持3种类型的应用健康检查动作,分别为Container ExecHTTP GetTCP Socket,均可用于livenessProbereadinessProbe

  • 进程级健康检查问题示例:
# vim web.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-demo
spec:
  selector:
    matchLabels:
      app: web-demo
  replicas: 1
  template:
    metadata:
      labels:
        app: web-demo
    spec:
      containers:
      - name: web-demo
        image: hub.lzxlinux.cn/kubernetes/web:latest
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: web-demo
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: web-demo
  type: ClusterIP
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: web-demo
spec:
  rules:
  - host: web.lzxlinux.cn
    http:
      paths:
      - path: /
        backend:
          serviceName: web-demo
          servicePort: 80
# kubectl apply -f web.yaml

# kubectl get pods -o wide

NAME                        READY   STATUS    RESTARTS   AGE    IP             NODE    NOMINATED NODE   READINESS GATES
web-demo-786c69fdf4-kxsmr   1/1     Running   0          4m8s   172.10.4.102   node1   <none>           <none>

可以看到,该pod在node1节点上运行。在Windows电脑hosts文件中添加本地dns:

192.168.1.54 web.lzxlinux.cn

访问http://web.lzxlinux.cn/examples/index.html

在这里插入图片描述

说明服务正常,容器运行没问题。进入容器查看进程,

# kubectl exec -it web-demo-786c69fdf4-kxsmr bash

bash-4.4# ps aux

PID   USER     TIME   COMMAND
    1 root       0:00 /bin/sh -c sh /usr/local/tomcat/bin/startup.sh && tail -f /usr/local/tomcat/logs/catalina.out
   13 root       0:20 /usr/lib/jvm/java-1.7-openjdk/jre/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLo
   14 root       0:00 tail -f /usr/local/tomcat/logs/catalina.out
   70 root       0:00 bash
   81 root       0:00 ps aux
   
bash-4.4# netstat -lntp

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.1:8005          0.0.0.0:*               LISTEN      13/java
tcp        0      0 :::8009                 :::*                    LISTEN      13/java
tcp        0      0 :::8080                 :::*                    LISTEN      13/java

程序正常运行,端口正常监听。其中入口进程的pid为1,而应用进程的pid为13。

杀死应用进程,看容器是否退出,

bash-4.4# kill 13

bash-4.4# ps aux

PID   USER     TIME   COMMAND
    1 root       0:00 /bin/sh -c sh /usr/local/tomcat/bin/startup.sh && tail -f /usr/local/tomcat/logs/catalina.out
   14 root       0:00 tail -f /usr/local/tomcat/logs/catalina.out
   70 root       0:00 bash
   90 root       0:00 ps aux

bash-4.4# !net

netstat -lntp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name

应用进程被杀死后,容器仍在运行,此时查看pod会发现pod并未发生重启。

bash-4.4# kill 1

bash-4.4# ps aux

PID   USER     TIME   COMMAND
    1 root       0:00 /bin/sh -c sh /usr/local/tomcat/bin/startup.sh && tail -f /usr/local/tomcat/logs/catalina.out
   14 root       0:00 tail -f /usr/local/tomcat/logs/catalina.out
   70 root       0:00 bash
   92 root       0:00 ps aux

bash-4.4# kill -9 1

bash-4.4# ps aux

PID   USER     TIME   COMMAND
    1 root       0:00 /bin/sh -c sh /usr/local/tomcat/bin/startup.sh && tail -f /usr/local/tomcat/logs/catalina.out
   14 root       0:00 tail -f /usr/local/tomcat/logs/catalina.out
   70 root       0:00 bash
   93 root       0:00 ps aux

容器的入口进程无法被杀死,强制杀死也不行。

bash-4.4# kill 14
bash-4.4# command terminated with exit code 137

但杀死入口进程相关的进程(pid为14)时,容器直接退出。查看pod,

# kubectl get pod -o wide

NAME                        READY   STATUS    RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
web-demo-786c69fdf4-kxsmr   1/1     Running   1          24m   172.10.4.102   node1   <none>           <none>

可以看到pod重启次数为1,说明已经重启了一次。再次进入容器查看进程,

# kubectl exec -it web-demo-786c69fdf4-kxsmr bash

bash-4.4# ps aux

PID   USER     TIME   COMMAND
    1 root       0:00 /bin/sh -c sh /usr/local/tomcat/bin/startup.sh && tail -f /usr/local/tomcat/logs/catalina.out
   13 root       0:19 /usr/lib/jvm/java-1.7-openjdk/jre/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLo
   14 root       0:00 tail -f /usr/local/tomcat/logs/catalina.out
   71 root       0:00 bash
   77 root       0:00 ps aux

bash-4.4# netstat -lntp

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.1:8005          0.0.0.0:*               LISTEN      13/java
tcp        0      0 :::8009                 :::*                    LISTEN      13/java
tcp        0      0 :::8080                 :::*                    LISTEN      13/java

所有的进程全部恢复运行。

进程级健康检查只关心容器的入口进程,当入口进程(相关的阻塞进程)被杀死后,k8s才会重启pod。这表明进程级健康检查并不是那么可靠,尤其在生产环境中要避免这种不可靠的因素存在。

  • Container Exec示例:

通过Container命令行来确保应用进程在容器中正常运行。检查命令正常执行(返回结果$?为0)时说明应用进程正常运行,否则说明应用进程没有运行。当应用进程没有运行时Kubernetes会重启pod。

# vim web-exec.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-demo
spec:
  selector:
    matchLabels:
      app: web-demo
  replicas: 1
  template:
    metadata:
      labels:
        app: web-demo
    spec:
      containers:
      - name: web-demo
        image: hub.lzxlinux.cn/kubernetes/web:latest
        ports:
        - containerPort: 8080
        livenessProbe:              #活性探针
          exec:             #检查命令
            command:
            - /bin/bash
            - -c
            - ps -ef |grep java |grep -v grep
          initialDelaySeconds: 10               #容器启动后等待该时间去执行命令
          periodSeconds: 10             #执行命令的间隔时间
          failureThreshold: 2               #允许失败次数
          successThreshold: 1               #允许成功次数
          timeoutSeconds: 5             #命令执行的超时时间
# kubectl apply -f web-exec.yaml

# kubectl get pod -o wide

NAME                        READY   STATUS        RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
web-demo-6dc98db94c-xt97m   1/1     Running       0          23s   172.10.4.103   node1   <none>           <none>
web-demo-786c69fdf4-kxsmr   1/1     Terminating   1          47m   172.10.4.102   node1   <none>           <none>

# kubectl describe pod web-demo-6dc98db94c-xt97m

Liveness:       exec [/bin/bash -c ps -ef |grep java |grep -v grep] delay=10s timeout=5s period=10s #success=1 #failure=2

# kubectl exec -it web-demo-6dc98db94c-xt97m bash

bash-4.4# /bin/bash -c ps -ef |grep java |grep -v grep

   13 root       0:19 /usr/lib/jvm/java-1.7-openjdk/jre/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dignore.endorsed.dirs= -classpath /usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat -Djava.io.tmpdir=/usr/local/tomcat/temp org.apache.catalina.startup.Bootstrap start
   
bash-4.4# echo $?
0

bash-4.4# /bin/bash -c ps -ef |grep javaaaa |grep -v grep

bash-4.4# echo $?
1

bash-4.4# kill 13               #kill之后等待一段时间,容器自动退出
bash-4.4# command terminated with exit code 137

# kubectl get pod -o wide

NAME                        READY   STATUS    RESTARTS   AGE     IP             NODE    NOMINATED NODE   READINESS GATES
web-demo-6dc98db94c-xt97m   1/1     Running   1          8m49s   172.10.4.103   node1   <none>           <none>

# kubectl describe pod web-demo-6dc98db94c-xt97m

Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  10m                    default-scheduler  Successfully assigned default/web-demo-6dc98db94c-xt97m to node1
  Warning  Unhealthy  2m22s (x2 over 2m32s)  kubelet, node1     Liveness probe failed:              #活性探针检查失败
  Normal   Killing    2m22s                  kubelet, node1     Container web-demo failed liveness probe, will be restarted
  Normal   Pulling    112s (x2 over 10m)     kubelet, node1     Pulling image "hub.lzxlinux.cn/kubernetes/web:latest"
  Normal   Pulled     111s (x2 over 10m)     kubelet, node1     Successfully pulled image "hub.lzxlinux.cn/kubernetes/web:latest"
  Normal   Created    111s (x2 over 10m)     kubelet, node1     Created container web-demo
  Normal   Started    111s (x2 over 10m)     kubelet, node1     Started container web-demo

说明配置的检查命令没问题,当应用进程运行时检查命令的返回结果为0,即判断为pod为健康状态。

  • HTTP Get示例:

通过获取网页来确保应用进程在容器中正常运行。当获取网页没问题时说明应用进程正常运行,否则说明应用进程没有运行。当应用进程没有运行时Kubernetes会重启pod。

# vim web-http.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-demo
spec:
  selector:
    matchLabels:
      app: web-demo
  replicas: 1
  template:
    metadata:
      labels:
        app: web-demo
    spec:
      containers:
      - name: web-demo
        image: hub.lzxlinux.cn/kubernetes/web:latest
        ports:
        - containerPort: 8080
        livenessProbe:
          httpGet:
            path: /examples/index.html              #url
            port: 8080              #容器端口
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          failureThreshold: 2
          successThreshold: 1
          timeoutSeconds: 5
# kubectl apply -f web-http.yaml

# kubectl get pods -o wide

NAME                        READY   STATUS        RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
web-demo-68b658757-m25jg    1/1     Running       0          26s   172.10.2.194   node2   <none>           <none>
web-demo-6dc98db94c-xt97m   1/1     Terminating   1          23m   172.10.4.103   node1   <none>           <none>

# kubectl describe pod web-demo-68b658757-m25jg

Liveness:       http-get http://:8080/examples/index.html delay=10s timeout=5s period=10s #success=1 #failure=2

Kubelet将调用容器内Web应用的web hook,如果返回的HTTP状态码在200~399之间,则认为容器运转正常,否则认为容器运转不正常。每进行一次HTTP健康检查都会访问一次指定的URL。

  • TCP Socket示例:

通过建立Socket连接来确保应用进程在容器中正常运行。当Socket连接没问题时说明应用进程正常运行,否则说明应用进程没有运行。当应用进程没有运行时Kubernetes会重启pod。

# vim web-tcp.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-demo
spec:
  selector:
    matchLabels:
      app: web-demo
  replicas: 1
  template:
    metadata:
      labels:
        app: web-demo
    spec:
      containers:
      - name: web-demo
        image: hub.lzxlinux.cn/kubernetes/web:latest
        ports:
        - containerPort: 8080
        livenessProbe:
          tcpSocket:
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 10
          failureThreshold: 2
          successThreshold: 1
          timeoutSeconds: 5
# kubectl apply -f web-tcp.yaml

# kubectl get pods -o wide

NAME                        READY   STATUS        RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
web-demo-65dcb9f7db-7jnj2   1/1     Running       0          9s    172.10.5.236   node3   <none>           <none>
web-demo-68b658757-m25jg    1/1     Terminating   0          10m   172.10.2.194   node2   <none>           <none>

# kubectl describe pod web-demo-65dcb9f7db-7jnj2

Liveness:       tcp-socket :8080 delay=10s timeout=5s period=10s #success=1 #failure=2
  • 健康检查总结:
探针类型说明通过健康检查标准
ExecActioncontainer内部执行shell命令shell命令返回0
TCPSocketAction通过container的IP、port执行tcp进行检查port是否打开
HTTPGetAction通过container的IP、port、path,用HTTP Get请求进行检查状态码200~399

建议:

1. 建议对全部服务同时设置 livenessProbe 和 readinessProbe 的健康检查。

2. 通过TCP对端口检查形式(TCPSocketAction),仅适用于端口已关闭或进程停止情况。因为即使服务异常,只要端口是打开状态,健康检查仍然是通过的。

3. 一般采用ExecAction自定义健康检查逻辑,或采用HTTP Get请求进行检查(HTTPGetAction)。

4. 无论采用哪种类型的探针,建议设置检查服务(readiness)的时间短于检查容器(liveness)的时间,也可以将时间设置为一致。
   目的是故障服务先下线,如果过一段时间还无法自动恢复,那么根据重启策略,重启该container、或其他机器重新创建一个pod恢复故障服务。

调度器Scheduler

当Scheduler通过API server的watch接口监听到新建Pod副本的信息后,它会检查所有符合该Pod要求的Node列表,开始执行Pod调度逻辑,调度成功后将Pod绑定到目标节点上。

Scheduler在整个系统中承担了承上启下的作用,承上是负责接收创建的新Pod,为安排一个落脚的地(Node);启下是安置工作完成后,目标Node上的kubelet服务进程接管后继工作,负责Pod生命周期的后半生。

具体来说,Scheduler的作用是将待调度的Pod安装特定的调度算法和调度策略绑定到集群中的某个合适的Node上,并将绑定信息传给API server写入etcd中。整个调度过程中涉及三个对象,分别是:待调度的Pod列表,合适的Node列表,以及调度算法和策略。

  • 调度流程:
1. 预选策略(predicate) 遍历nodelist,选择出符合要求的候选节点,Kubernetes内置了多种预选规则供用户选择。

2. 优选策略(priority) 在选择出符合要求的候选节点中,采用优选规则计算出每个节点的积分,最后选择得分最高的。

3. 选定(select)  如果最高得分有好几个节点,select就会从中随机选择一个节点。Pod与选中的节点绑定在一起。
  • 调度方式:
节点选择器: nodeSelector、nodeName

节点亲和性调度: nodeAffinity

Pod亲和性调度:podAffinity

Pod反亲和性调度:podAntiAffinity

污点容忍调度:给node打上污点,只允许特定pod在其上运行
  • nodeAffinity示例:

nodeAffinity表示pod与node之间的亲和性。

# vim web-node.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-demo
spec:
  selector:
    matchLabels:
      app: web-demo
  replicas: 1
  template:
    metadata:
      labels:
        app: web-demo
    spec:
      containers:
      - name: web-demo
        image: hub.lzxlinux.cn/kubernetes/web:latest
        ports:
        - containerPort: 8080
      affinity:
        nodeAffinity:               #node亲和性
          requiredDuringSchedulingIgnoredDuringExecution:               #必须满足的条件
            nodeSelectorTerms:
            - matchExpressions:             #匹配表达式
              - key: beta.kubernetes.io/arch
                operator: In
                values:
                - amd64
          preferredDuringSchedulingIgnoredDuringExecution:              #最好满足的条件
          - weight: 1               #权重
            preference:
              matchExpressions:             #匹配表达式
              - key: disktype
                operator: NotIn
                values:
                - ssd
# kubectl label nodes node2 disktype=ssd

# kubectl label nodes node3 disktype=ssd

# kubectl apply -f web-node.yaml

# kubectl get pods -o wide

NAME                        READY   STATUS        RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
web-demo-59966976c-bj45q    1/1     Running       0          14s   172.10.4.104   node1   <none>           <none>
web-demo-65dcb9f7db-7jnj2   1/1     Terminating   0          15m   172.10.5.236   node3   <none>           <none>

可以看到,因为node2和node3节点都有标签disktype=ssd,所以新pod更亲和node1节点,因此该pod在node1节点运行。

  • podAffinity示例:

podAffinity表示pod与pod之间的亲和性。

# vim web-pod.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-demo
spec:
  selector:
    matchLabels:
      app: web-demo
  replicas: 1
  template:
    metadata:
      labels:
        app: web-demo
    spec:
      containers:
      - name: web-demo
        image: hub.lzxlinux.cn/kubernetes/web:latest
        ports:
        - containerPort: 8080
      affinity:
        podAffinity:                #pod亲和性
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - web-demo
            topologyKey: kubernetes.io/hostname             #节点范围
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - web-demo-node
              topologyKey: kubernetes.io/hostname
# kubectl apply -f web-pod.yaml

# kubectl get pods -o wide

NAME                        READY   STATUS        RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
web-demo-59966976c-bj45q    1/1     Terminating   0          14m   172.10.4.104   node1   <none>           <none>
web-demo-758cccbcfc-4v8z2   1/1     Running       0          2s    172.10.4.105   node1   <none>           <none>

可以看到,因为pod亲和上面运行着web-demo的pod的节点,更亲和上面运行着web-demo-node的pod的节点,但这个pod不存在,因此新pod在node1节点运行。

  • podAntiAffinity示例:

podAntiAffinity表示pod与pod之间的反亲和性。

# vim web-antipod.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-demo
spec:
  selector:
    matchLabels:
      app: web-demo
  replicas: 1
  template:
    metadata:
      labels:
        app: web-demo
    spec:
      containers:
      - name: web-demo
        image: hub.lzxlinux.cn/kubernetes/web:latest
        ports:
        - containerPort: 8080
      affinity:
        podAntiAffinity:                #pod反亲和性
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - web-demo
            topologyKey: kubernetes.io/hostname
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - web-demo-node
              topologyKey: kubernetes.io/hostname
# kubectl apply -f web-antipod.yaml 

# kubectl get pods -o wide
NAME                        READY   STATUS        RESTARTS   AGE     IP             NODE    NOMINATED NODE   READINESS GATES
web-demo-5f9c6fb459-ctck4   1/1     Running       0          3s      172.10.2.195   node2   <none>           <none>
web-demo-758cccbcfc-4v8z2   1/1     Terminating   0          8m53s   172.10.4.105   node1   <none>           <none>

可以看到,因为pod不亲和上面运行着web-demo的pod的节点,因此新pod在node2和node3节点之间随机选择一个节点运行。

当设置该pod不亲和运行着自己的节点时,如果replicas不为1,则每个节点只运行一个该pod。

  • 污点容忍调度示例:

前面都是pod选择node,而污点调度是node选择的pod,污点就是定义在节点上的键值属性数据。主要作用是让节点拒绝pod,拒绝不合法node规则的pod。taint(污点)和 toleration(容忍)是相互配合的,可以用来避免pod被分配到不合适的节点上,每个节点上都可以应用一个或多个taint,这表示对于那些不能容忍这些taint的pod,是不会被该节点接受的。

污点行为(effect):

NoSchedule 仅影响调度过程,对现存的pod不影响

PreferNoSchedule 系统将尽量避免放置不容忍节点上污点的pod,但这不是必需的

NoExecute 既影响调度过程,也影响现存的pod,不满足的pod将被驱逐
# kubectl taint node node3 gpu=true:NoSchedule

# vim web-node.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-demo
spec:
  selector:
    matchLabels:
      app: web-demo
  replicas: 1
  template:
    metadata:
      labels:
        app: web-demo
    spec:
      containers:
      - name: web-demo
        image: hub.lzxlinux.cn/kubernetes/web:latest
        ports:
        - containerPort: 8080
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: beta.kubernetes.io/arch
                operator: In
                values:
                - amd64
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
              - key: disktype
                operator: In
                values:
                - ssd
# kubectl apply -f web-node.yaml

# kubectl get pod -o wide

NAME                        READY   STATUS        RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
web-demo-5f9c6fb459-ctck4   1/1     Terminating   0          18m   172.10.2.195   node2   <none>           <none>
web-demo-8df95f56d-bwv77    1/1     Running       0          10s   172.10.4.106   node1   <none>           <none>

可以看到,按亲和性来说,新的pod应该运行在node3上,但因为node3有污点,而新的pod没有配置污点容忍,所以新的pod无法运行在node3上。

接下来配置污点容忍,

# vim web-node.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-demo
spec:
  selector:
    matchLabels:
      app: web-demo
  replicas: 1
  template:
    metadata:
      labels:
        app: web-demo
    spec:
      containers:
      - name: web-demo
        image: hub.lzxlinux.cn/kubernetes/web:latest
        ports:
        - containerPort: 8080
      tolerations:
      - key: "gpu"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"                #需要与打污点时的effect一致
# kubectl apply -f web-node.yaml

# kubectl get pod -o wide

NAME                        READY   STATUS        RESTARTS   AGE    IP             NODE    NOMINATED NODE   READINESS GATES
web-demo-68b65c8579-hgn8l   1/1     Running       0          2s     172.10.5.237   node3   <none>           <none>
web-demo-8df95f56d-bwv77    1/1     Terminating   0          6m9s   172.10.4.106   node1   <none>           <none>

可以看到,配置了污点容忍之后,新的pod可以运行在node3节点之上。

当然,污点容忍只是容忍污点,当pod的replicas不为1时,并不是所有的pod都运行在有污点的node上。

另外,取消污点:

# kubectl taint node node3 gpu-

部署策略

常见的部署策略:

重建部署:  版本A下线后版本B上线

滚动部署(滚动更新或者增量发布):  版本B缓慢更新并替代版本A

蓝绿部署:  版本B并行与版本A发布,然后流量切换到版本B

金丝雀部署:    版本B向一部分用户发布,然后完全放开
  • 重建部署示例:

重建部署是一个冗余的方式,它包含下线版本A,然后部署版本B。这个方式意味着服务的宕机时间依赖于应用下线和启动耗时,对用户影响很大。

# vim web-recreate.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-demo
spec:
  strategy:
    type: Recreate              #重建部署
  selector:
    matchLabels:
      app: web-demo
  replicas: 3
  template:
    metadata:
      labels:
        app: web-demo
    spec:
      containers:
      - name: web-demo
        image: hub.lzxlinux.cn/kubernetes/web:latest
        ports:
        - containerPort: 8080
        livenessProbe:
          tcpSocket:
            port: 8080
          initialDelaySeconds: 20
          periodSeconds: 10
          failureThreshold: 2
          successThreshold: 1
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /examples/index.html
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 5
          failureThreshold: 2
          successThreshold: 1
          timeoutSeconds: 5
# kubectl apply -f web-recreate.yaml

# kubectl get pod -o wide

NAME                        READY   STATUS        RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
web-demo-68b65c8579-hgn8l   1/1     Terminating   0          19m   172.10.5.237   node3   <none>           <none>

# kubectl get pod -o wide

NAME                        READY   STATUS    RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
web-demo-5ff7d6c7d6-5ssll   0/1     Running   0          4s    172.10.2.196   node2   <none>           <none>
web-demo-5ff7d6c7d6-nv27k   0/1     Running   0          4s    172.10.4.107   node1   <none>           <none>
web-demo-5ff7d6c7d6-thddv   0/1     Running   0          4s    172.10.5.238   node3   <none>           <none>

可以看到,重建策略是先停止后启动。

  • 滚动部署示例:

滚动部署策略是指通过逐个替换应用的所有实例,来缓慢发布应用的一个新版本。

# vim web-rollingupdate.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-demo
spec:
  strategy:
    rollingUpdate:
      maxSurge: 25%             #最大启动的实例数量百分比
      maxUnavailable: 25%               #最大不可用的实例数量百分比
    type: RollingUpdate             #滚动部署
  selector:
    matchLabels:
      app: web-demo
  replicas: 4
  template:
    metadata:
      labels:
        app: web-demo
    spec:
      containers:
      - name: web-demo
        image: hub.lzxlinux.cn/kubernetes/web:release
        ports:
        - containerPort: 8080
        livenessProbe:
          tcpSocket:
            port: 8080
          initialDelaySeconds: 20
          periodSeconds: 10
          failureThreshold: 2
          successThreshold: 1
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /examples/index.html
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 5
          failureThreshold: 2
          successThreshold: 1
          timeoutSeconds: 5

k8s的deployment默认部署方式就是滚动部署,上面滚动部署的参数也是默认参数,参数除了可以是百分比之外还可以是整数。

# kubectl apply -f web-rollingupdate.yaml

# kubectl get pod -o wide

NAME                        READY   STATUS              RESTARTS   AGE     IP             NODE    NOMINATED NODE   READINESS GATES
web-demo-5ff7d6c7d6-5ssll   1/1     Running             0          27m     172.10.2.196   node2   <none>           <none>
web-demo-5ff7d6c7d6-nv27k   1/1     Running             0          27m     172.10.4.107   node1   <none>           <none>
web-demo-5ff7d6c7d6-thddv   1/1     Running             0          27m     172.10.5.238   node3   <none>           <none>
web-demo-5ff7d6c7d6-xbdv6   1/1     Terminating         0          2m27s   172.10.4.108   node1   <none>           <none>
web-demo-745cf4796-66jxh    0/1     Running             0          4s      172.10.5.239   node3   <none>           <none>
web-demo-745cf4796-zg7pm    0/1     ContainerCreating   0          4s      <none>         node1   <none>           <none>

# kubectl get pod -o wide

NAME                       READY   STATUS    RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
web-demo-745cf4796-66jxh   1/1     Running   0          77s   172.10.5.239   node3   <none>           <none>
web-demo-745cf4796-j2lkd   1/1     Running   0          63s   172.10.2.197   node2   <none>           <none>
web-demo-745cf4796-qw7xx   1/1     Running   0          63s   172.10.4.110   node1   <none>           <none>
web-demo-745cf4796-zg7pm   1/1     Running   0          77s   172.10.4.109   node1   <none>           <none>

先创建一个新的pod,然后停止一个之前的pod,重复此过程直到所有的pod替换完毕。

当新版本存在问题时,可以回滚版本:

# kubectl rollout undo deployment web-demo 
  • 蓝绿部署:

蓝绿部署与滚动部署不同,版本B(绿)同等数量的被并排部署在版本A(蓝)旁边。当新版本满足上线条件的测试后,流量在负载均衡层从版本A切换到版本B。

# vim web-deploy-blue.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-blue
spec:
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate             #滚动部署
  selector:
    matchLabels:
      app: web-blue
  replicas: 4
  template:
    metadata:
      labels:
        app: web-blue
        version: v1.0               #版本标签
    spec:
      containers:
      - name: web-blue
        image: hub.lzxlinux.cn/kubernetes/springboot-web:latest
        ports:
        - containerPort: 8080
        livenessProbe:
          tcpSocket:
            port: 8080
          initialDelaySeconds: 20
          periodSeconds: 10
          failureThreshold: 2
          successThreshold: 1
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /hello?name=linux
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 5
          failureThreshold: 2
          successThreshold: 1
          timeoutSeconds: 5
# vim web-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: web-blue
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: web-blue
    version: v1.0               #匹配版本标签的pod
  type: ClusterIP
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: web-blue
spec:
  rules:
  - host: web.lzxlinux.cn
    http:
      paths:
      - path: /
        backend:
          serviceName: web-blue
          servicePort: 80
# kubectl apply -f web-deploy-blue.yaml

# kubectl apply -f web-svc.yaml

# kubectl get pods -o wide

NAME                        READY   STATUS        RESTARTS   AGE    IP             NODE    NOMINATED NODE   READINESS GATES
web-blue-6b9bd7468c-2c8mh   1/1     Running       0          23s    172.10.2.203   node2   <none>           <none>
web-blue-6b9bd7468c-7mpxx   1/1     Running       0          34s    172.10.5.245   node3   <none>           <none>
web-blue-6b9bd7468c-bwzpt   1/1     Running       0          35s    172.10.4.115   node1   <none>           <none>
web-blue-6b9bd7468c-qb52h   1/1     Running       0          18s    172.10.4.116   node1   <none>           <none>

# echo "192.168.1.54 web.lzxlinux.cn" >> /etc/hosts

# while sleep 2; do curl "http://web.lzxlinux.cn/hello?name=linux"; echo ""; done

Hi linux! Cicd for the springboot-web-demo project in k8s!
Hi linux! Cicd for the springboot-web-demo project in k8s!
Hi linux! Cicd for the springboot-web-demo project in k8s!
Hi linux! Cicd for the springboot-web-demo project in k8s!
Hi linux! Cicd for the springboot-web-demo project in k8s!
Hi linux! Cicd for the springboot-web-demo project in k8s!

服务访问没有问题,接下来通过蓝绿部署进行版本升级。创建新的deployment,

# vim web-deploy-green.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-blue-v2
spec:
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  selector:
    matchLabels:
      app: web-blue
  replicas: 4
  template:
    metadata:
      labels:
        app: web-blue
        version: v2.0               #版本更改
    spec:
      containers:
      - name: web-blue
        image: hub.lzxlinux.cn/kubernetes/web:latest                #镜像更改
        ports:
        - containerPort: 8080
        livenessProbe:
          tcpSocket:
            port: 8080
          initialDelaySeconds: 20
          periodSeconds: 10
          failureThreshold: 2
          successThreshold: 1
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /k8s?name=linux
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 5
          failureThreshold: 2
          successThreshold: 1
          timeoutSeconds: 5
# kubectl apply -f web-deploy-green.yaml

# kubectl get pods -o wide

NAME                          READY   STATUS    RESTARTS   AGE     IP             NODE    NOMINATED NODE   READINESS GATES
web-blue-6b9bd7468c-2c8mh     1/1     Running   0          2m20s   172.10.2.203   node2   <none>           <none>
web-blue-6b9bd7468c-7mpxx     1/1     Running   0          2m31s   172.10.5.245   node3   <none>           <none>
web-blue-6b9bd7468c-bwzpt     1/1     Running   0          2m32s   172.10.4.115   node1   <none>           <none>
web-blue-6b9bd7468c-qb52h     1/1     Running   0          2m15s   172.10.4.116   node1   <none>           <none>
web-blue-v2-c4df64998-g6t6v   1/1     Running   0          26s     172.10.4.117   node1   <none>           <none>
web-blue-v2-c4df64998-nstpm   1/1     Running   0          26s     172.10.2.204   node2   <none>           <none>
web-blue-v2-c4df64998-tbzmp   1/1     Running   0          26s     172.10.5.246   node3   <none>           <none>
web-blue-v2-c4df64998-zgmnm   1/1     Running   0          26s     172.10.2.205   node2   <none>           <none>

deployment部署成功,修改service,将流量切换到新版本,

# vim web-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: web-blue
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: web-blue
    version: v2.0               #匹配新的版本标签的pod
  type: ClusterIP
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: web-blue
spec:
  rules:
  - host: web.lzxlinux.cn
    http:
      paths:
      - path: /
        backend:
          serviceName: web-blue
          servicePort: 80
# kubectl apply -f web-svc.yaml

# while sleep 2; do curl "http://web.lzxlinux.cn/k8s?name=linux"; echo ""; done

Hello linux! This is my dubbo service! This is the Web Service CI/CD!
Hello linux! This is my dubbo service! This is the Web Service CI/CD!
Hello linux! This is my dubbo service! This is the Web Service CI/CD!
Hello linux! This is my dubbo service! This is the Web Service CI/CD!
Hello linux! This is my dubbo service! This is the Web Service CI/CD!
Hello linux! This is my dubbo service! This is the Web Service CI/CD!
Hello linux! This is my dubbo service! This is the Web Service CI/CD!

可以看到,访问结果与之前不同。

蓝绿部署不会下线旧版本,整个应用状态统一,一次切换。当新版本满足上线条件的测试后,切换流量到新版本即可;如果新版本出现问题,流量再切回旧版本即可。

  • 金丝雀部署示例:

金丝雀部署是指逐渐将生产环境流量从版本A切换到版本B。通常流量是按比例分配的。例如90%的请求流向版本A,10%的流向版本B。

在上面蓝绿部署的基础上稍加更改即为金丝雀部署,

# vim web-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: web-blue
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: web-blue               #去掉版本标签,同时匹配新旧版本标签的pod
  type: ClusterIP
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: web-blue
spec:
  rules:
  - host: web.lzxlinux.cn
    http:
      paths:
      - path: /
        backend:
          serviceName: web-blue
          servicePort: 80
# kubectl apply -f web-svc.yaml

# while sleep 2; do curl "http://web.lzxlinux.cn/k8s?name=linux"; echo ""; done

{"timestamp":"2019-11-29T06:32:58.421+0000","status":404,"error":"Not Found","message":"No message available","path":"/k8s"}
{"timestamp":"2019-11-29T06:33:00.540+0000","status":404,"error":"Not Found","message":"No message available","path":"/k8s"}
Hello linux! This is my dubbo service! This is the Web Service CI/CD!
Hello linux! This is my dubbo service! This is the Web Service CI/CD!
Hello linux! This is my dubbo service! This is the Web Service CI/CD!
Hello linux! This is my dubbo service! This is the Web Service CI/CD!
{"timestamp":"2019-11-29T06:33:10.697+0000","status":404,"error":"Not Found","message":"No message available","path":"/k8s"}
Hello linux! This is my dubbo service! This is the Web Service CI/CD!

# while sleep 2; do curl "http://web.lzxlinux.cn/hello?name=linux"; echo ""; done

Hi linux! Cicd for the springboot-web-demo project in k8s!
<!DOCTYPE html><html><head><title>Apache Tomcat/8.0.51 - Error report</title><style type="text/css">H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}.line {height: 1px; background-color: #525D76; border: none;}</style> </head><body><h1>HTTP Status 404 - </h1><div class="line"></div><p><b>type</b> Status report</p><p><b>message</b> <u></u></p><p><b>description</b> <u>The requested resource is not available.</u></p><hr class="line"><h3>Apache Tomcat/8.0.51</h3></body></html>

这里因为其实是两个项目,所以需要切换url访问,不是那么直观,但结果和我们期望的是一致的。

通过金丝雀部署的方式,新旧版本可以同时访问。当轮询调度时,新旧版本的pod数量比例即流量分配比例。


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值