MGR_MODULE_ERROR: Module ‘dashboard’ has failed: OSError(“No socket could be created – ((‘10.xx.xx.xx’, 8443): [Errno 99] Cannot assign requested addre ss)”
解决地址
是由于我之前手动创建dashboard设置的问题
# 查看信息
ceph config dump
# 删除我之前的设置
ceph config rm mgr mgr/dashboard/server_addr
ceph mgr services
ceph mgr module disable dashboard
ceph mgr module enable dashboard
# 最后mgr-a 和 b两个pod 删掉,重新起dashboard-https.yaml那个
Node hangs after reboot
2023-3-30新加的,先停其他节点,最后挺monitor,cordon,然后stop kubelet docker,未测试
When the reboot command is issued, network interfaces are terminated before disks are unmounted. This results in the node hanging as repeated attempts to unmount Ceph persistent volumes fail
Solution
# The node needs to be drained before reboot. After the successful drain, the node can be rebooted as usual.
kubectl drain <node-name> --ignore-daemonsets --delete-local-data
# the node needs to be uncordoned once it's back online.
kubectl uncordon <node-name>
Node节点禁止调度(平滑维护)方式- cordon,drain,delete
关闭重启rook-ceph不丢数据(未实际测试)
首先备份重要数据
1、停止所有的应用程序和客户端,以停止对存储集群的 IO 访问
2、停止 Rook-Ceph 算子
kubectl delete -f operator.yaml
3、cordon所有节点 先关闭mon节点,再关闭osd节点
启动的时候反过来
Issues
在k8s里的 rook-ceph
# Rook pod staus
kubectl get pod -n rook-ceph -o wide
# Logs for the operator or other pod
kubectl logs -n rook-ceph -l app=rook-ceph-operator
kubectl logs -n rook-ceph -l mon=a
# Logs on 登录 a specific node to find why a PVC is failing to mount:
journalctl -u kubelet
# Logs for pods which are no longer running. --previous 上一个
kubectl -n <cluster-namespace> logs --previous <pod-name>
ceph Issues 通过链接自己在官网搜吧
Tools
# rook-ceph-tools pod
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[*].metadata.name}') bash
ceph status
ceph osd status
ceph osd df
ceph osd utilization
ceph osd pool stats
ceph osd tree
ceph pg stat
mon 频繁重启
Monitors are the only pods running
集群启动后,operator回一次启动一个mon,如果三个mon不能全启动,就不会向下进行
链接地址同上