Kubenetes集群常用命令积累

pod故障排查常用命令

查看pod

kubectl get pod -o wide 

kubectl get pods --namespace kube-system

查看pod容器的日志 

  1. kubectl logs <pod name> :用于查看pod容器的日志

检索与pod相关的事件列表 

  1. 2. kubectl describe pod <pod name> :用于检索与pod相关的事件列表

3.kubectl get pod <pod name> -o yaml:用于提取存储在Kubernetes中的pod的yaml定义。

运行交互式命令 

  1. 4. kubectl exec -it <pod name> bash:用于在pod的一个容器中运行交互式命令。

 节点维护:禁用某工作节点
 

Kubernetes 中的三个命令:cordondrain 以及 delete 都可以实现 node 的停止调度,也就是后面再创建的 pod 不会继续被调度到该节点上,他们之间最大的区别是暴力程度不一样。

Cordon 控制

  • • 临时将节点从 Kubernets 集群隔离
  • • 影响最小,只会将 node 节点标识为 SchedulingDisabled 状态,也就是禁止调度
  • • 后面创建的 Pod,将不会调度到这个节点
  • • 原来节点运行的 Pod 不受影响,继续对外服务
  • • 具体命令:
kubectl cordon [node name]
  • • 恢复调度命令:
kubectl uncordon [node name]

Drain 控制

简要介绍

  • • 目标:先控制不可调度,然后将原来的 Pod 驱逐、排干
  • • 首先,将原来的 Pod 驱逐到其他节点重新创建运行
  • • 然后,将节点标识为 SchedulingDisabled 状态,也就是禁止调度
  • • 具体命令:
kubectl drain [node name] --force --ignore-daemonsets --delete-local-data

--force: 就算 Pod 不被 ReplicationController、ReplicaSet、Job、DaemonSet、StatefulSet 等控制器管理,也直接处理;不加 force 参数只会删除该Node 节点上前面提到的几个控制器类型的 Pod,加上之后所有的 Pod 都将删除

--ignore-daemonsets: 忽略 DeamonSet 管理的 Pod,否则 DaemonSet 被删除后,仍会自动重建

--delete-local-data: 删除本地数据,即使 emptyDir 也将删除

  • • 恢复调度命令:
kubectl uncordon [node name]
  • • drain 执行的方式是比较安全的,它会等到 Pod 容器应用程序优雅的停止之后再删除
  • • 详细的过程:先在当前节点删除 Pod,然后再在其他节点创建对应的 Pod。因此为了保证 Drain 驱逐过程中不中断服务,必须保证要驱逐的 Pod 副本的数量大于 1,并且采用“反亲和”策略将这些 Pod 调度到不同的节点。这样子可以保证驱逐过程对服务没有影响。

注意事项:

  1. 1. 对节点执行内核升级、硬件维护等操作之前,你可以使用 kubectl drain 命令安全地驱逐节点上面的 pod
  2. 2. drain 的驱逐方式,将通过容器指定的 PodDisruptionBudgets 来优雅的中止容器,也就是优雅的终止 Pod 中容器的进程
  3. 3. kubectl drain 会返回成功驱逐的Pod
  4. 4. 后续,通过物理机断电或者云平台删除虚拟机类型的节点都不影响整个集群

正常情况下,Kubernetes 的 PodDisruptionBudgets 配置时是符合 Pod 驱逐的理想情况的,也就是说 maxUnavailable 设置为 0, maxSurge 设置为 1:

replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1           
      maxUnavailable: 0

Delete 删除

  • • 首先,将原来的 Pod 驱逐到其他节点重新创建运行
  • • 驱逐过程:现在当前节点删除 Pod,然后再在其他节点创建这些 Pod
  • • Node 节点删除,Master 将会失去对其的控制,该节点从集群去除
  • • delete 是一种非常暴力的删除节点方式,驱逐时都是强制干掉容器的进程,并没有做到优雅终止 Pod,相比较而言,drain 相对比较安全。
  • • 执行命令:
kubectl delete node [node name]

 Kubectl CLI 使用

默认情况下,不管是 minikube 还是常规的 k8s 集群安装,都会在默认的用户主目录下面创建一个 ~/.kube/config 文件,kubectl 默认读取该配置的集群信息进行操作;

获取资源类型

集群支持的资源 CRD 类型,可以通过如下命令获取:

[root@master ~]# kubectl api-resources 
NAME                              SHORTNAMES         APIVERSION                             NAMESPACED   KIND
bindings                                             v1                                     true         Binding
componentstatuses                 cs                 v1                                     false        ComponentStatus
configmaps                        cm                 v1                                     true         ConfigMap
endpoints                         ep                 v1                                     true         Endpoints
events                            ev                 v1                                     true         Event
limitranges                       limits             v1                                     true         LimitRange
namespaces                        ns                 v1                                     false        Namespace
nodes                             no                 v1                                     false        Node
persistentvolumeclaims            pvc                v1                                     true         PersistentVolumeClaim
persistentvolumes                 pv                 v1                                     false        PersistentVolume
pods                              po                 v1                                     true         Pod
podtemplates                                         v1                                     true         PodTemplate
replicationcontrollers            rc                 v1                                     true         ReplicationController
resourcequotas                    quota              v1                                     true         ResourceQuota
secrets                                              v1                                     true         Secret
serviceaccounts                   sa                 v1                                     true         ServiceAccount
services                          svc                v1                                     true         Service
mutatingwebhookconfigurations                        admissionregistration.k8s.io/v1        false        MutatingWebhookConfiguration
validatingwebhookconfigurations                      admissionregistration.k8s.io/v1        false        ValidatingWebhookConfiguration
agents                            agent              agent.k8s.elastic.co/v1alpha1          true         Agent
customresourcedefinitions         crd,crds           apiextensions.k8s.io/v1                false        CustomResourceDefinition
apiservices                                          apiregistration.k8s.io/v1              false        APIService
  • • NAME : api 资源名称
  • • SHORTNAMES: api 资源简称,在查询时可以使用简称
  • • APIVERSION: api 资源版本
  • • NAMESPACED: api 资源是否是命名空间范围的,比如 pv 的值就是 false 代表 pv 是全局的,不是限定于某个具体命名空间的
  • • KIND:api 资源类型

查询资源清单配置结构信息

在 yaml 清单配置某类资源时,碰到不知道某段配置具体的路径以及值类型、是否必填时,可以通过如下命令查看,比如,查看 pod配置:

# 查看 pod 第一层级的 配置信息,每段配置有详细的配置
[root@master ~]# kubectl explain pod
KIND:     Pod
VERSION:  v1

DESCRIPTION:
   Pod is a collection of containers that can run on a host. This resource is
   created by clients and scheduled onto hosts.

FIELDS:
 apiVersion   <string>
   APIVersion defines the versioned schema of this representation of an
   object. Servers should convert recognized schemas to the latest internal
   value, and may reject unrecognized values. More info:
   https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources

 kind <string>
   Kind is a string value representing the REST resource this object
   represents. Servers may infer this from the endpoint the client submits
   requests to. Cannot be updated. In CamelCase. More info:
   https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

查询某类资源

以查询 POD 为例,其他类型的资源类型同样的查询方法,只是替换个类型:

# kubectl get  pod  -n [命名空间], 不指定命名空间,只会查询默认的命名空间:default
 [root@master ~]# kubectl get pod -n cos
NAME                                                   READY   STATUS    RESTARTS       AGE
cloud-bmp-7d688998f8-qprvw                             1/1     Running   0              9d
cloud-bsm-5df444986b-r9vmb                             1/1     Running   0              9d
cloud-component-elasticsearch-server-75c494957-n7phx   1/1     Running   0              16d

# -o wide 输出更多列
[root@master ~]# kubectl get pod -n cos -o wide
NAME                                                   READY   STATUS    RESTARTS       AGE     IP                NODE             NOMINATED NODE   READINESS GATES
cloud-bmp-7d688998f8-qprvw                             1/1     Running   0              9d      192.168.219.78    master           <none>           <none>
cloud-bsm-5df444986b-r9vmb                             1/1     Running   0              9d      192.168.219.82    master           <none>           <none>
cloud-component-elasticsearch-server-75c494957-n7phx   1/1     Running   0              16d     192.168.186.149   k8s-prod-node2   <none>           <none>

# 查询某个具体 pod 的明细
[root@master ~]# kubectl describe pod cloud-bmp-7d688998f8-qprvw -n cos
Name:         cloud-bmp-7d688998f8-qprvw
Namespace:    cos
Priority:     0
Node:         master/172.28.105.220
Start Time:   Tue, 12 Jul 2022 18:33:47 +0800
Labels:       app=cloud-bmp
             pod-template-hash=7d688998f8
Annotations:  cni.projectcalico.org/containerID: 8adb5ce7ffa7a891b28612646dacfd4f3a05084e1abd86dec9f9d5e1013ba869
             cni.projectcalico.org/podIP: 192.168.219.78/32
             cni.projectcalico.org/podIPs: 192.168.219.78/32
Status:       Running
IP:           192.168.219.78
IPs:
 IP:           192.168.219.78
Controlled By:  ReplicaSet/cloud-bmp-7d688998f8
Init Containers:
 sw-agent-sidecar:
   Container ID:  docker://6b24e1fa2b7768ce233926813d4c7e2ea7c220a660c6e293cfcfeac3866e3ef9
   Image:         reg.kolla.org/brs-dev/skywalking-agent-sidecar:8.9.0
   Image ID:      docker-pullable://reg.kolla.org/brs-dev/skywalking-agent-sidecar@sha256:6178f1bc6454523900f6d8a7bec15da3f086a175b213dc96af312e09690d8c26
   Port:          <none>
   Host Port:     <none>
   Command:
     sh
   Args:
     -c
     mkdir -p /skywalking/agent && cp -r /usr/skywalking/agent/* /skywalking/agent
   State:          Terminated
     Reason:       Completed
     Exit Code:    0
     Started:      Tue, 12 Jul 2022 18:34:03 +0800
     Finished:     Tue, 12 Jul 2022 18:34:04 +0800
   Ready:          True
   Restart Count:  0
   Environment:    <none>
   Mounts:
     /skywalking/agent from sw-agent (rw)
     /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hdpg5 (ro)
Containers:
 cloud-bmp:
   Container ID:   docker://0bc7917aff37fcd791162c9e27c908060240805a9abdf107b574fdc6b02a81d4
   Image:          reg.kolla.org/cos/bmp:ddc41bc5
   Image ID:       docker-pullable://reg.kolla.org/cos/bmp@sha256:291afcd930ca57e70171f26d7f73c8ecbe032db77ea4f731af542b6a5f248651
   Port:           8080/TCP
   Host Port:      0/TCP
   State:          Running
     Started:      Tue, 12 Jul 2022 18:34:09 +0800
   Ready:          True
   Restart Count:  0
   Limits:
     cpu:     1
     memory:  2Gi
   Requests:
     cpu:      200m
     memory:   1Gi
 ......
 
 # 输出配置信息到 yaml
 [root@master ~]# kubectl get pod cloud-bmp-7d688998f8-qprvw -n cos -o yaml
apiVersion: v1
kind: Pod
metadata:
 annotations:
   cni.projectcalico.org/containerID: 8adb5ce7ffa7a891b28612646dacfd4f3a05084e1abd86dec9f9d5e1013ba869
   cni.projectcalico.org/podIP: 192.168.219.78/32
   cni.projectcalico.org/podIPs: 192.168.219.78/32
 creationTimestamp: "2022-07-12T10:33:47Z"
 generateName: cloud-bmp-7d688998f8-
 labels:
   app: cloud-bmp
   pod-template-hash: 7d688998f8
 name: cloud-bmp-7d688998f8-qprvw
 namespace: cos
 ownerReferences:
 - apiVersion: apps/v1
   blockOwnerDeletion: true
......

# 根据 label 选择器匹配
 [root@master ~]# kubectl get pod -n cos -l app=cloud-bmp
NAME                         READY   STATUS    RESTARTS   AGE
cloud-bmp-7d688998f8-qprvw   1/1     Running   0          9d

pod 日志 & 登录

  • • 查看 pod 日志

# kubectl logs [pod name] -n [命名空间]
[root@master ~]# kubectl logs cloud-bmp-7d688998f8-qprvw -n cos
[INFO ] 2022-07-21 17:50:12.695 [org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer#2-1raceId] [org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer#2-1] TaskServiceImpl - received message: {"host": "172.28.102.242", "message": {"changed": false, "skipped": true, "task": "TASK [command - shown /nasfs/orabaknas owner]"}, "playbook": true, "recordId": 116006, "success": true}
[INFO ] 2022-07-21 17:50:12.790 [org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer#2-1raceId] [org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer#2-1] TaskServiceImpl - received message: {"host": "172.28.102.242", "message": {"changed": false, "skipped": true, "task": "TASK [template - generate create oracle instance shell]"}, "playbook": true, "recordId": 116006, "success": true}
[INFO ] 2022-07-21 17:50:12.893 [org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer#2-1raceId] [org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer#2-1] TaskServiceImpl - received message: {"host": "172.28.102.242", "message": {"changed": false, "skipped": true, "task": "TASK [copy - copy pw.dmp]"}, "playbook": true, "recordId": 116006, "success": true}
  • • 登录,进入 pod 容器

# kubectl exec -it [pod name] -c [container name] -n [namespace] [command] (command 可以是 bash 这种直接登录的,也可以直接执行远程命令)
[root@master ~]# kubectl exec -it cloud-bmp-7d688998f8-qprvw -c cloud-bmp -n cos bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
bash-4.4# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
  link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
  inet 127.0.0.1/8 scope host lo
     valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN qlen 1000
  link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if6385: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1440 qdisc noqueue state UP 
  link/ether 6e:d6:24:df:1f:ee brd ff:ff:ff:ff:ff:ff
  inet 192.168.219.78/32 scope global eth0
     valid_lft forever preferred_lft forever
bash-4.4#

# 远程命令 (无需登录,执行 ls 命令)
[root@master ~]# kubectl exec -it cloud-bmp-7d688998f8-qprvw -c cloud-bmp -n cos ls 
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
BUILDING.txt     README.md        conf             native-jni-lib
CONTRIBUTING.md  RELEASE-NOTES    include          temp
LICENSE          RUNNING.txt      lib              webapps
NOTICE           bin              logs             work·

与宿主机拷贝文件

  • 从 pod拷贝文件到 宿主机

# kubectl cp [namespace]/[podname]:容器中绝对路径 宿主机目标路径(容器绝对路径前的 / 不要加)
[root@master ~]# kubectl cp default/node:etc/hosts  ./hosts
[root@master ~]# ll
total 258400
-rw-------. 1 root root      1461 Nov 10  2021 anaconda-ks.cfg
-rw-r--r--  1 root root    217525 Nov 10  2021 calico.yaml
-rw-------  1 root root   7463424 Feb 23 13:30 curl.tar
drwxr-xr-x  3 root root        17 Dec 25  2021 go
-rw-r--r--  1 root root       204 Jul 21 23:28 hosts
drwxr-xr-x  2 root root         6 Dec 24  2021 images
-rwxr-xr-x. 1 root root       444 Nov 10  2021 image.sh
drwxr-xr-x  2 root root        57 Nov 10  2021 ingress
-rw-r--r--  1 root root       248 Nov 10  2021 ingress-http.yaml
-rw-r--r--  1 root root       465 Dec 16  2021 kubeadm.config
-rw-r--r--  1 root root     37261 Feb 28 11:12 nginx-9.5.18.tgz
drwxr-xr-x  3 root root        19 Jul 21 23:28 ssl
-rw-r--r--  1 root root      1179 Mar 24 20:35 test.json
-rw-r--r--  1 root root 256823296 Jun  8 17:50 test.tar.gz
-rw-r--r--  1 root root       663 Dec 29  2021 test.yaml
-rw-r--r--  1 root root     19539 Jul 21 23:26 xx.txt
drwxr-xr-x  7 root root       285 Jun  1 21:43 yy_work
[root@master ~]# more hosts 
# Kubernetes-managed hosts file.
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
192.168.219.189 node
  • 将宿主机中的文件拷贝到容器中

# kubectl cp 宿主机文件路径 [namespace]/[podname]:容器中目标路径
[root@master ~]# kubectl cp /root/test.yaml  default/node:/etc
[root@master ~]# kubectl exec -it node -n default ls more /etc/test.yaml
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
ls: more: No such file or directory
/etc/test.yaml
command terminated with exit code 1
[root@master ~]# kubectl exec -it node -n default more /etc/test.yaml
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: saturn-executor
  name: saturn-executor
spec:
  selector:
    matchLabels:
      app: saturn-executor
  template:
    metadata:
      labels:
        app: saturn-executor
    spec:
      containers:
      - name: saturn-executor
        image: reg.kolla.org/saturn/demo-java-job:0.0.2-saturn-v3.5.1
        imagePullPolicy: IfNotPresent
        #resources:
        #  limits:  # 最大使用量
        #    cpu: 500m  
        #    memory: 2Gi
--More-- (85% of 663 bytes)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值