deployment的yaml字段详解

最新推荐文章于 2024-12-23 16:13:57 发布

运维开发小白丶

最新推荐文章于 2024-12-23 16:13:57 发布

阅读量524

点赞数

分类专栏： k8s学习文章标签： docker 容器运维

本文链接：https://blog.csdn.net/liulunan_lln/article/details/133885997

版权

k8s学习专栏收录该内容

1 篇文章

订阅专栏

使用yaml创建Deployment，k8s deployment资源创建流程具体如下：

用户通过 kubectl 创建 Deployment。
Deployment 创建 ReplicaSet。
ReplicaSet 创建 Pod。

deployment总共包含5个属性

apiVersion：资源的版本号
kind：资源的类型
metadata：资源的元数据信息
spec：资源的规格说明和预期状态
status：资源的实际状态

完整样例报文

kind: Deployment # 指定创建资源的角色/类型
apiVersion: apps/v1 # 指定api版本，此值必须在kubectl api-versions中
metadata: # 资源的元数据/属性
  annotations: # 自定义注释列表
    deployment.kubernetes.io/revision: '5'
  resourceVersion: '222060129'
  name: alert-webui # 资源的名字，在同一个namespace中必须唯一
  uid: 7132d0b2-7519-4c3c-8ad6-7b4d06b758af
  creationTimestamp: '2022-10-21T08:23:10Z'
  generation: 7
  managedFields:
    - manager: kubectl-create
      operation: Update
      apiVersion: apps/v1
      time: '2022-10-21T08:23:10Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:labels':
            .: {}
            'f:app': {}
            'f:application': {}
            'f:createdBy': {}
        'f:spec':
          'f:progressDeadlineSeconds': {}
          'f:revisionHistoryLimit': {}
          'f:selector': {}
          'f:strategy':
            'f:rollingUpdate':
              .: {}
              'f:maxSurge': {}
              'f:maxUnavailable': {}
            'f:type': {}
          'f:template':
            'f:metadata':
              'f:labels':
                .: {}
                'f:app': {}
            'f:spec':
              'f:volumes':
                .: {}
                'k:{"name":"volume-gzsy1"}':
                  .: {}
                  'f:configMap':
                    .: {}
                    'f:defaultMode': {}
                    'f:items': {}
                    'f:name': {}
                  'f:name': {}
              'f:containers':
                'k:{"name":"alert-webui"}':
                  .: {}
                  'f:imagePullPolicy': {}
                  'f:name': {}
                  'f:resources':
                    .: {}
                    'f:limits':
                      .: {}
                      'f:cpu': {}
                      'f:memory': {}
                    'f:requests':
                      .: {}
                      'f:cpu': {}
                      'f:memory': {}
                  'f:securityContext':
                    .: {}
                    'f:privileged': {}
                  'f:terminationMessagePath': {}
                  'f:terminationMessagePolicy': {}
                  'f:volumeMounts':
                    .: {}
                    'k:{"mountPath":"/etc/nginx/conf.d/"}':
                      .: {}
                      'f:mountPath': {}
                      'f:name': {}
              'f:dnsPolicy': {}
              'f:serviceAccount': {}
              'f:restartPolicy': {}
              'f:schedulerName': {}
              'f:terminationGracePeriodSeconds': {}
              'f:imagePullSecrets':
                .: {}
                'k:{"name":"amcrobot"}': {}
              'f:serviceAccountName': {}
              'f:securityContext': {}
    - manager: Mozilla
      operation: Update
      apiVersion: apps/v1
      time: '2023-01-07T01:57:54Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:spec':
          'f:template':
            'f:spec':
              'f:containers':
                'k:{"name":"alert-webui"}':
                  'f:image': {}
    - manager: kube-controller-manager
      operation: Update
      apiVersion: apps/v1
      time: '2023-04-12T08:01:00Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:deployment.kubernetes.io/revision': {}
        'f:status':
          'f:availableReplicas': {}
          'f:conditions':
            .: {}
            'k:{"type":"Available"}':
              .: {}
              'f:lastTransitionTime': {}
              'f:lastUpdateTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
            'k:{"type":"Progressing"}':
              .: {}
              'f:lastTransitionTime': {}
              'f:lastUpdateTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
          'f:observedGeneration': {}
          'f:readyReplicas': {}
          'f:replicas': {}
          'f:updatedReplicas': {}
      subresource: status
  namespace: amc # 部署在哪个namespace中
  labels: # 设定资源的标签,用于标识该资源
    app: new-amc
    application: alert-webui
    createdBy: xxx
spec: # 资源的规格说明和预期状态
  replicas: 1 # 声明副本数目,指定期望的pod数量，默认是1
  selector: # 选择器
    matchLabels: # 匹配标签
      app: alert-webui
  template: # 必填字段,设置deployment控制的pod的样式
    metadata:
      creationTimestamp: null
      labels:
        app: alert-webui
    spec:
      restartPolicy: Always # 容器重启策略,Always: 不管pod以何种方式终止运行都会将其重启;Never: 不管pod以何种方式终止运行都不会将其重启
      serviceAccountName: privilege-user
      readinessProbe: # 健康检测,具体参考2.6.3
        httpGet:
          httpHeaders:
          - name: Authorization
            value: Bearer xxxxxxx # token
          path: /health           # 请求路径
          port: 8888              # 请求端口
          scheme: HTTP            # 请求协议
        initialDelaySeconds: 30   # 容器启动完成后多长时间进行首次健康检测，单位为秒
        periodSeconds: 30         # 健康监测时间周期，单位为秒，默认10秒一次
        successThreshold: 1       # 从检测错误到成功需要几次才认为健康检测成功，默认为1次
        failureThreshold: 2       # 检测失败几次后就认为健康检测失败，默认为3次
        timeoutSeconds: 3         # 健康检测响应超时时间，单位为秒，默认为1秒
      imagePullSecrets:
        - name: amcrobot
      schedulerName: default-scheduler
      terminationGracePeriodSeconds: 30
      securityContext: {}
      containers:
        - name: alert-webui
          image: 'xxx'
          resources:
            limits:
              cpu: '2'
              memory: 512Mi
            requests:
              cpu: 100m
              memory: 100Mi
          volumeMounts:
            - name: volume-gzsy1
              mountPath: /etc/nginx/conf.d/
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: Always
          securityContext:
            privileged: true
      serviceAccount: privilege-user
      volumes:
        - name: volume-gzsy1
          configMap:
            name: alert-webui
            items:
              - key: default.conf
                path: default.conf
            defaultMode: 420
      dnsPolicy: ClusterFirst
  strategy: # 用来指定新的pod替换旧的pod的策略，包括RollingUpdate和Recreate两种
    type: RollingUpdate # RollingUpdate: 使用滚动的方式更新pod;Recreate: 在创建出新的pod之前会先杀掉所有已存在的pod
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 25%
  revisionHistoryLimit: 10 # 保留历史版本
  progressDeadlineSeconds: 600
status: # 资源的实际状态
  observedGeneration: 7
  replicas: 1
  updatedReplicas: 1
  readyReplicas: 1
  availableReplicas: 1
  conditions:
    - type: Available
      status: 'True'
      lastUpdateTime: '2023-03-15T16:11:41Z'
      lastTransitionTime: '2023-03-15T16:11:41Z'
      reason: MinimumReplicasAvailable
      message: Deployment has minimum availability.
    - type: Progressing
      status: 'True'
      lastUpdateTime: '2023-04-12T08:01:00Z'
      lastTransitionTime: '2022-10-21T08:23:10Z'
      reason: NewReplicaSetAvailable
      message: ReplicaSet "alert-webui-644c99fd98" has successfully progressed.

1.metadata介绍

metadata样例如下：

metadata: # 资源的元数据/属性
  annotations: # 自定义注释列表
    deployment.kubernetes.io/revision: '5'
  resourceVersion: '222060129'
  name: alert-webui # 资源的名字，在同一个namespace中必须唯一
  uid: 7132d0b2-7519-4c3c-8ad6-7b4d06b758af
  creationTimestamp: '2022-10-21T08:23:10Z'
  generation: 7
  managedFields:
    - manager: kubectl-create
      operation: Update
      apiVersion: apps/v1
      time: '2022-10-21T08:23:10Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:labels':
            .: {}
            'f:app': {}
            'f:application': {}
            'f:createdBy': {}
        'f:spec':
          'f:progressDeadlineSeconds': {}
          'f:revisionHistoryLimit': {}
          'f:selector': {}
          'f:strategy':
            'f:rollingUpdate':
              .: {}
              'f:maxSurge': {}
              'f:maxUnavailable': {}
            'f:type': {}
          'f:template':
            'f:metadata':
              'f:labels':
                .: {}
                'f:app': {}
            'f:spec':
              'f:volumes':
                .: {}
                'k:{"name":"volume-gzsy1"}':
                  .: {}
                  'f:configMap':
                    .: {}
                    'f:defaultMode': {}
                    'f:items': {}
                    'f:name': {}
                  'f:name': {}
              'f:containers':
                'k:{"name":"alert-webui"}':
                  .: {}
                  'f:imagePullPolicy': {}
                  'f:name': {}
                  'f:resources':
                    .: {}
                    'f:limits':
                      .: {}
                      'f:cpu': {}
                      'f:memory': {}
                    'f:requests':
                      .: {}
                      'f:cpu': {}
                      'f:memory': {}
                  'f:securityContext':
                    .: {}
                    'f:privileged': {}
                  'f:terminationMessagePath': {}
                  'f:terminationMessagePolicy': {}
                  'f:volumeMounts':
                    .: {}
                    'k:{"mountPath":"/etc/nginx/conf.d/"}':
                      .: {}
                      'f:mountPath': {}
                      'f:name': {}
              'f:dnsPolicy': {}
              'f:serviceAccount': {}
              'f:restartPolicy': {}
              'f:schedulerName': {}
              'f:terminationGracePeriodSeconds': {}
              'f:imagePullSecrets':
                .: {}
                'k:{"name":"amcrobot"}': {}
              'f:serviceAccountName': {}
              'f:securityContext': {}
    - manager: Mozilla
      operation: Update
      apiVersion: apps/v1
      time: '2023-01-07T01:57:54Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:spec':
          'f:template':
            'f:spec':
              'f:containers':
                'k:{"name":"alert-webui"}':
                  'f:image': {}
    - manager: kube-controller-manager
      operation: Update
      apiVersion: apps/v1
      time: '2023-04-12T08:01:00Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:deployment.kubernetes.io/revision': {}
        'f:status':
          'f:availableReplicas': {}
          'f:conditions':
            .: {}
            'k:{"type":"Available"}':
              .: {}
              'f:lastTransitionTime': {}
              'f:lastUpdateTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
            'k:{"type":"Progressing"}':
              .: {}
              'f:lastTransitionTime': {}
              'f:lastUpdateTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
          'f:observedGeneration': {}
          'f:readyReplicas': {}
          'f:replicas': {}
          'f:updatedReplicas': {}
      subresource: status
  namespace: amc # 部署在哪个namespace中
  labels: # 设定资源的标签,用于标识该资源
    app: new-amc
    application: alert-webui
    createdBy: xxx

在线服务场景中，会同时存在多个版本的模型服务，每个版本的模型服务都会对应一个deployment，所有版本的模型服务共用一个service。此时，通过labels:app:将deployment与service进行关联，又通过app+version对同一service下不同的deployment进行区分。

2.spec介绍

spec样例如下：

spec: # 资源的规格说明和预期状态
  replicas: 1 # 声明副本数目,指定期望的pod数量，默认是1
  selector: # 选择器
    matchLabels: # 匹配标签
      app: alert-webui
  template: # 必填字段,设置deployment控制的pod的样式
    metadata:
      creationTimestamp: null
      labels:
        app: alert-webui
    spec:
      restartPolicy: Always # 容器重启策略,Always: 不管pod以何种方式终止运行都会将其重启;Never: 不管pod以何种方式终止运行都不会将其重启
      nodeSelector: # Pod调度策略，详见2.6.5
        node: worker  # pod会调度到有worker标签的node上
      serviceAccountName: privilege-user
      imagePullSecrets:
        - name: amcrobot
      schedulerName: default-scheduler
      terminationGracePeriodSeconds: 30 # 容器删除策略,具体请参考2.6.7
      securityContext: {}
      containers:
        - name: alert-webui
          image: 'xxx'
          resources:
            limits:
              cpu: '2'
              memory: 512Mi
            requests:
              cpu: 100m
              memory: 100Mi
          volumeMounts:
            - name: volume-gzsy1
              mountPath: /etc/nginx/conf.d/
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: Always
          securityContext:
            privileged: true
      serviceAccount: privilege-user
      volumes:
        - name: volume-gzsy1
          configMap:
            name: alert-webui
            items:
              - key: default.conf
                path: default.conf
            defaultMode: 420
      dnsPolicy: ClusterFirst
  strategy: # 用来指定新的pod替换旧的pod的策略，包括RollingUpdate和Recreate两种
    type: RollingUpdate # RollingUpdate: 使用滚动的方式更新pod，具体参考2.5
    rollingUpdate:
      maxUnavailable: 25% # 具体参考2.5
      maxSurge: 25% # 具体参考2.5
  revisionHistoryLimit: 10 # 保留历史版本,具体查看2.3
  progressDeadlineSeconds: 600 # 查看2.1

2.1 progressDeadlineSeconds

可选字段,表示deployment controller等待多少秒才能确定（通过deployment status）deployment进程卡住了，单位：秒

2.2 replicas

可选字段，指定期望的pod数量，默认是1

2.3 revisionHistoryLimit

可选字段，用来指定可以保留的旧的ReplicaSet数量，余下的将在后台被当作垃圾收集，用于历史版本回滚

2.4 selector

可选字段，用来指定deployment管理的pod的范围

2.5 strategy

用来指定新的pod替换旧的pod的策略，包括RollingUpdate和Recreate两种：

RollingUpdate
- 使用滚动的方式更新pod
- 通过配置项maxUnavailable来指定在升级过程中不可用pod的最大数量；该值既可以是绝对值，也可以是百分比；通过百分比计算的绝对值向下取整
- 通过配置项maxSurge来指定可以超过期望的pod数量的最大个数；该值既可以是绝对值，也可以是百分比；通过百分比计算的绝对值向上取整；
Recreate

在创建出新的pod之前会先杀掉所有已存在的pod

2.6 template

必填字段，设置deployment控制的pod的样式，它跟 pod有一模一样的schema，是嵌套的类型，并且不需要apiVersion和 kind字段。

template: # 必填字段,设置deployment控制的pod的样式
    metadata:
      creationTimestamp: null
      labels:
        app: alert-webui
    spec:
      restartPolicy: Always # 容器重启策略,具体参考2.6.6章节
      nodeSelector: # Pod调度策略，详见2.6.5
        node: worker  # pod会调度到有worker标签的node上
      serviceAccountName: privilege-user
      readinessProbe: # 健康检测,具体参考2.6.3
        httpGet:
          httpHeaders:
          - name: Authorization
            value: Bearer xxxxxxx # token
          path: /health           # 请求路径
          port: 8888              # 请求端口
          scheme: HTTP            # 请求协议
        initialDelaySeconds: 30   # 容器启动完成后多长时间进行首次健康检测，单位为秒
        periodSeconds: 30         # 健康监测时间周期，单位为秒，默认10秒一次
        successThreshold: 1       # 从检测错误到成功需要几次才认为健康检测成功，默认为1次
        failureThreshold: 2       # 检测失败几次后就认为健康检测失败，默认为3次
        timeoutSeconds: 3         # 健康检测响应超时时间，单位为秒，默认为1秒 
      imagePullSecrets:
        - name: amcrobot
      schedulerName: default-scheduler
      terminationGracePeriodSeconds: 30 # 容器删除策略,具体请参考2.6.7
      securityContext: {}
      containers:
        - name: alert-webui
          image: 'xxx'
          resources:
            limits: # 设置资源上限值
              cpu: '2' # cpu，单位为core
              memory: 512Mi # 内存，单位为Mib/Gib，若不添加单位，则默认为byte
            requests: # 设置资源必需值
              cpu: 100m # cpu，若不足一个，则需要添加m
              memory: 100Mi # 内存
          volumeMounts:
            - name: volume-gzsy1
              mountPath: /etc/nginx/conf.d/
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: Always
          securityContext:
            privileged: true
      serviceAccount: privilege-user
      volumes:
        - name: volume-gzsy1
          configMap: # 
            name: alert-webui
            items:
              - key: default.conf
                path: default.conf
            defaultMode: 420
      dnsPolicy: ClusterFirst

2.6.1 环境变量

spec.containers.env: 
    - name: VECLIB_MAXIMUM_THREADS
      value: "1"
    - name: MKL_NUM_THREADS
      value: "1"
    - name: NUMEXPR_NUM_THREADS
      value: "1"
    - name: OPENBLAS_NUM_THREADS
      value: "1"
    - name: OMP_NUM_THREADS 
      value: "1"    
    - name: NVIDIA_VISIBLE_DEVICES
      value: none
    - name: ConCurrencyFlag
      value: "false"
    - name: SERVER_PROCESS_NUM
      value: "1"

VECLIB_MAXIMUM_THREADS、MKL_NUM_THREADS、NUMEXPR_NUM_THREADS、OPENBLAS_NUM_THREADS、OMP_NUM_THREADS：这五个环境变量是用于控制多线程的线程个数的，其值等于pod的cpu个数
NVIDIA_VISIBLE_DEVICES：gpu配置，当pod中不含有gpu时，添加该环境变量并将其设置为none
ConCurrencyFlag和SERVER_PROCESS_NUM：MPS相关的环境变量

2.6.2 容器拉取镜像的策略(template.spec.containers.imagePullPolicy)

Always：每次都会从镜像仓库拉取镜像
Never：仅使用本地镜像
IfNotPresent：优先使用本地镜像，若本地镜像不存在，则会拉取仓库镜像

2.6.3 健康检测(template.spec.readinessProbe)

livenessProbe：当健康检测不通过时会直接重启容器
readinessProbe：当健康检测不通过时会停止向容器发送流量

readinessProbe: # 健康检测
  httpGet:
    httpHeaders:
    - name: Authorization
      value: Bearer xxxxxxx # token
    path: /health           # 请求路径
    port: 8888              # 请求端口
    scheme: HTTP            # 请求协议
  initialDelaySeconds: 30   # 容器启动完成后多长时间进行首次健康检测，单位为秒
  periodSeconds: 30         # 健康监测时间周期，单位为秒，默认10秒一次
  successThreshold: 1       # 从检测错误到成功需要几次才认为健康检测成功，默认为1次
  failureThreshold: 2       # 检测失败几次后就认为健康检测失败，默认为3次
  timeoutSeconds: 3         # 健康检测响应超时时间，单位为秒，默认为1秒

2.6.4 资源信息

resources:
  limits: # 设置资源上限值
    cpu: '2' # cpu，单位为core
    memory: 512Mi # 内存，单位为Mib/Gib，若不添加单位，则默认为byte
  requests: # 设置资源必需值
    cpu: 100m # cpu，若不足一个，则需要添加m
    memory: 100Mi # 内存

2.6.5 pod调度策略

spec.nodeSelector: 
  node: worker  # pod会调度到有worker标签的node上

2.6.6 重启策略

Always ：不管pod以何种方式终止运行都会将其重启
Never：不管pod以何种方式终止运行都不会将其重启
OnFailure：只有pod以非0退出码退出才会重启

2.6.7 容器删除策略

terminationGracePeriodSeconds: 30

pod的升级（删除）过程：

K8S首先会启动新的pod
当新的pod进入Ready状态时，K8S会创建Endpoint并将新的pod纳入负载均衡
K8S移除与老pod相关的Endpoint，并且将老pod的状态设置为Terminating，此时将不会有新的请求到达老pod
同时K8S会给老pod发送SIGTERM信号，并且等待 terminationGracePeriodSeconds 这么长的时间。(默认为30秒)
超过terminationGracePeriodSeconds等待时间后， K8S会强制结束老pod
所以，terminationGracePeriodSeconds 要设置一个合适的值，至少保证所有现存的请求能被正确处理并返回程序处理SIGTERM信号，并且保证所有事务完成后再关闭程序

3.status介绍

status样例如下：

status: # 资源的实际状态
  observedGeneration: 7 # 观察到的实例
  replicas: 1 # 实例总数
  updatedReplicas: 1 # 已更新的实例
  readyReplicas: 1 # 准备好的实例
  availableReplicas: 1 # 表示至少在一段时间内准备好多少个pod，这可以防止状态波动
  conditions:
    - type: Available
      status: 'True'
      lastUpdateTime: '2023-03-15T16:11:41Z'
      lastTransitionTime: '2023-03-15T16:11:41Z'
      reason: MinimumReplicasAvailable
      message: Deployment has minimum availability.
    - type: Progressing
      status: 'True'
      lastUpdateTime: '2023-04-12T08:01:00Z'
      lastTransitionTime: '2022-10-21T08:23:10Z'
      reason: NewReplicaSetAvailable
      message: ReplicaSet "alert-webui-644c99fd98" has successfully progressed.

status：表示K8S对象在当前集群中实际的状态，往往通过资源的Controller控制