一 Rook概述
1.1 Ceph简介
Ceph是一种高度可扩展的分布式存储解决方案,提供对象、文件和块存储。在每个存储节点上,将找到Ceph存储对象的文件系统和Ceph OSD(对象存储守护程序)进程。在Ceph集群上,还存在Ceph MON(监控)守护程序,它们确保Ceph集群保持高可用性。
更多Ceph介绍参考:https://www.cnblogs.com/itzgr/category/1382602.html
1.2 Rook简介
Rook 是一个开源的cloud-native storage编排, 提供平台和框架;为各种存储解决方案提供平台、框架和支持,以便与云原生环境本地集成。目前主要专用于Cloud-Native环境的文件、块、对象存储服务。它实现了一个自我管理的、自我扩容的、自我修复的分布式存储服务。
Rook支持自动部署、启动、配置、分配(provisioning)、扩容/缩容、升级、迁移、灾难恢复、监控,以及资源管理。为了实现所有这些功能,Rook依赖底层的容器编排平台,例如 kubernetes、CoreOS 等。。
Rook 目前支持Ceph、NFS、Minio Object Store、Edegefs、Cassandra、CockroachDB 存储的搭建。
Rook机制:
Rook 提供了卷插件,来扩展了 K8S 的存储系统,使用 Kubelet 代理程序 Pod 可以挂载 Rook 管理的块设备和文件系统。
Rook Operator 负责启动并监控整个底层存储系统,例如 Ceph Pod、Ceph OSD 等,同时它还管理 CRD、对象存储、文件系统。
Rook Agent 代理部署在 K8S 每个节点上以 Pod 容器运行,每个代理 Pod 都配置一个 Flexvolume 驱动,该驱动主要用来跟 K8S 的卷控制框架集成起来,每个节点上的相关的操作,例如添加存储设备、挂载、格式化、删除存储等操作,都有该代理来完成。
更多参考如下官网:
https://rook.io
https://ceph.com/
1.3 Rook架构
Rook架构如下:
Kubernetes集成Rook架构如下:
二 Rook部署
2.1 前期规划
主机 磁盘 IP
centos8-master01 sdb 192.168.10.131
centos8-master02 sdb 192.168.10.132
centos8-master03 sdb 192.168.10.133
centos8-node01 sdb 192.168.10.181
centos8-node02 sdb 192.168.10.182
centos8-node03 sdb 192.168.10.183
2.2 获取YAML
#国外地址比较慢 使用gitee地址
git clone --single-branch --branch release-1.5 https://gitee.com/estarhaohao/rook.git
2.3 部署Rook Operator
部署在master节点不需要打污点,这布可以省略
进入ceph部署目录
[root@centos8-master01 ]# cd rook/cluster/examples/kubernetes/ceph
[root@centos8-master01 ceph]# kubectl taint node centos8-master01 node-role.kubernetes.io/master="":NoSchedule
[root@centos8-master01 ceph]# kubectl taint node centos8-master02 node-role.kubernetes.io/master="":NoSchedule
[root@centos8-master01 ceph]# kubectl taint node centos8-master03 node-role.kubernetes.io/master="":NoSchedule
#master节点设置标签部署到master
[root@k8smaster01 ceph]# kubectl label nodes {centos8-master01,centos8-master02,centos8-master03} ceph-osd=enabled
[root@k8smaster01 ceph]# kubectl label nodes {centos8-master01,centos8-master02,centos8-master03} ceph-mon=enabled
[root@k8smaster01 ceph]# kubectl label nodes centos8-master01 ceph-mgr=enabled
开始部署
[root@k8smaster01 ceph]# kubectl create -f common.yaml #这个不需要修改
[root@k8smaster01 ceph]# kubectl create -f operator.yaml
#修改对应的镜像地址 使用的都是国内阿里云的镜像 不会报错放心使用
#可以直接复制粘贴
#################################################################################################################
# The deployment for the rook operator
# Contains the common settings for most Kubernetes deployments.
# For example, to create the rook-ceph cluster:
# kubectl create -f crds.yaml -f common.yaml -f operator.yaml
# kubectl create -f cluster.yaml
#
# Also see other operator sample files for variations of operator.yaml:
# - operator-openshift.yaml: Common settings for running in OpenShift
###############################################################################################################
# Rook Ceph Operator Config ConfigMap
# Use this ConfigMap to override Rook-Ceph Operator configurations.
# NOTE! Precedence will be given to this config if the same Env Var config also exists in the
# Operator Deployment.
# To move a configuration(s) from the Operator Deployment to this ConfigMap, add the config
# here. It is recommended to then remove it from the Deployment to eliminate any future confusion.
kind: ConfigMap
apiVersion: v1
metadata:
name: rook-ceph-operator-config
# should be in the namespace of the operator
namespace: rook-ceph # namespace:operator
data:
# Enable the CSI driver.
# To run the non-default version of the CSI driver, see the override-able image properties in operator.yaml
ROOK_CSI_ENABLE_CEPHFS: "true"
# Enable the default version of the CSI RBD driver. To start another version of the CSI driver, see image properties below.
ROOK_CSI_ENABLE_RBD: "true"
ROOK_CSI_ENABLE_GRPC_METRICS: "false"
# Set logging level for csi containers.
# Supported values from 0 to 5. 0 for general useful logs, 5 for trace level verbosity.
# CSI_LOG_LEVEL: "0"
# OMAP generator will generate the omap mapping between the PV name and the RBD image.
# CSI_ENABLE_OMAP_GENERATOR need to be enabled when we are using rbd mirroring feature.
# By default OMAP generator sidecar is deployed with CSI provisioner pod, to disable
# it set it to false.
# CSI_ENABLE_OMAP_GENERATOR: "false"
# set to false to disable deployment of snapshotter container in CephFS provisioner pod.
CSI_ENABLE_CEPHFS_SNAPSHOTTER: "true"
# set to false to disable deployment of snapshotter container in RBD provisioner pod.
CSI_ENABLE_RBD_SNAPSHOTTER: "true"
# Enable cephfs kernel driver instead of ceph-fuse.
# If you disable the kernel client, your application may be disrupted during upgrade.
# See the upgrade guide: https://rook.io/docs/rook/master/ceph-upgrade.html
# NOTE! cephfs quota is not supported in kernel version < 4.17
CSI_FORCE_CEPHFS_KERNEL_CLIENT: "true"
# (Optional) policy for modifying a volume's ownership or permissions when the RBD PVC is being mounted.
# supported values are documented at https://kubernetes-csi.github.io/docs/support-fsgroup.html
CSI_RBD_FSGROUPPOLICY: "ReadWriteOnceWithFSType"
# (Optional) policy for modifying a volume's ownership or permissions when the CephFS PVC is being mounted.
# supported values are documented at https://kubernetes-csi.github.io/docs/support-fsgroup.html
CSI_CEPHFS_FSGROUPPOLICY: "ReadWriteOnceWithFSType"
# (Optional) Allow starting unsupported ceph-csi image
ROOK_CSI_ALLOW_UNSUPPORTED_VERSION: "false"
# The default version of CSI supported by Rook will be started. To change the version
# of the CSI driver to something other than what is officially supported, change
# these images to the desired release of the CSI driver.
#修改这里
ROOK_CSI_CEPH_IMAGE: "registry.cn-hangzhou.aliyuncs.com/haoyustorage/cephcsi:v3.2.2"
ROOK_CSI_REGISTRAR_IMAGE: "registry.cn-hangzhou.aliyuncs.com/haoyustorage/csi-node-driver-registrar:v2.0.1"
ROOK_CSI_RESIZER_IMAGE: "registry.cn-hangzhou.aliyuncs.com/haoyustorage/csi-resizer:v1.0.1"
ROOK_CSI_PROVISIONER_IMAGE: "registry.cn-hangzhou.aliyuncs.com/haoyustorage/csi-provisioner:v2.0.4"
ROOK_CSI_SNAPSHOTTER_IMAGE: "registry.cn-hangzhou.aliyuncs.com/haoyustorage/csi-snapshotter:v3.0.2"
ROOK_CSI_ATTACHER_IMAGE: "registry.cn-hangzhou.aliyuncs.com/haoyustorage/csi-attacher:v3.0.2"
# (Optional) set user created priorityclassName for csi plugin pods.
# CSI_PLUGIN_PRIORITY_CLASSNAME: "system-node-critical"
# (Optional) set user created priorityclassName for csi provisioner pods.
# CSI_PROVISIONER_PRIORITY_CLASSNAME: "system-cluster-critical"
# CSI CephFS plugin daemonset update strategy, supported values are OnDelete and RollingUpdate.
# Default value is RollingUpdate.
# CSI_CEPHFS_PLUGIN_UPDATE_STRATEGY: "OnDelete"
# CSI RBD plugin daemonset update strategy, supported values are OnDelete and RollingUpdate.
# Default value is RollingUpdate.
# CSI_RBD_PLUGIN_UPDATE_STRATEGY: "OnDelete"
# kubelet directory path, if kubelet configured to use other than /var/lib/kubelet path.
ROOK_CSI_KUBELET_DIR_PATH: "/data/kubernetes/kubelet"
# Labels to add to the CSI CephFS Deployments and DaemonSets Pods.
# ROOK_CSI_CEPHFS_POD_LABELS: "key1=value1,key2=value2"
# Labels to add to the CSI RBD Deployments and DaemonSets Pods.
# ROOK_CSI_RBD_POD_LABELS: "key1=value1,key2=value2"
# (Optional) Ceph Provisioner NodeAffinity.
# CSI_PROVISIONER_NODE_AFFINITY: "role=storage-node; storage=rook, ceph"
# (Optional) CEPH CSI provisioner tolerations list. Put here list of taints you want to tolerate in YAML format.
# CSI provisioner would be best to start on the same nodes as other ceph daemons.
# CSI_PROVISIONER_TOLERATIONS: |
# - effect: NoSchedule
# key: node-role.kubernetes.io/controlplane
# operator: Exists
# - effect: NoExecute
# key: node-role.kubernetes.io/etcd
# operator: Exists
# (Optional) Ceph CSI plugin NodeAffinity.
# CSI_PLUGIN_NODE_AFFINITY: "role=storage-node; storage=rook, ceph"
# (Optional) CEPH CSI plugin tolerations list. Put here list of taints you want to tolerate in YAML format.
# CSI plugins need to be started on all the nodes where the clients need to mount the storage.
# CSI_PLUGIN_TOLERATIONS: |
# - effect: NoSchedule
# key: node-role.kubernetes.io/controlplane
# operator: Exists
# - effect: NoExecute
# key: node-role.kubernetes.io/etcd
# operator: Exists
# (Optional) CEPH CSI RBD provisioner resource requirement list, Put here list of resource
# requests and limits you want to apply for provisioner pod
# CSI_RBD_PROVISIONER_RESOURCE: |
# - name : csi-provisioner
# resource:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 256Mi
# cpu: 200m
# - name : csi-resizer
# resource:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 256Mi
# cpu: 200m
# - name : csi-attacher
# resource:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 256Mi
# cpu: 200m
# - name : csi-snapshotter
# resource:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 256Mi
# cpu: 200m
# - name : csi-rbdplugin
# resource:
# requests:
# memory: 512Mi
# cpu: 250m
# limits:
# memory: 1Gi
# cpu: 500m
# - name : liveness-prometheus
# resource:
# requests:
# memory: 128Mi
# cpu: 50m
# limits:
# memory: 256Mi
# cpu: 100m
# (Optional) CEPH CSI RBD plugin resource requirement list, Put here list of resource
# requests and limits you want to apply for plugin pod
# CSI_RBD_PLUGIN_RESOURCE: |
# - name : driver-registrar
# resource:
# requests:
# memory: 128Mi
# cpu: 50m
# limits:
# memory: 256Mi
# cpu: 100m
# - name : csi-rbdplugin
# resource:
# requests:
# memory: 512Mi
# cpu: 250m
# limits:
# memory: 1Gi
# cpu: 500m
# - name : liveness-prometheus
# resource:
# requests:
# memory: 128Mi
# cpu: 50m
# limits:
# memory: 256Mi
# cpu: 100m
# (Optional) CEPH CSI CephFS provisioner resource requirement list, Put here list of resource
# requests and limits you want to apply for provisioner pod
# CSI_CEPHFS_PROVISIONER_RESOURCE: |
# - name : csi-provisioner
# resource:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 256Mi
# cpu: 200m
# - name : csi-resizer
# resource:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 256Mi
# cpu: 200m
# - name : csi-attacher
# resource:
# requests:
# memory: 128Mi
# cpu: 100m
# limits:
# memory: 256Mi
# cpu: 200m
# - name : csi-cephfsplugin
# resource:
# requests:
# memory: 512Mi
# cpu: 250m
# limits:
# memory: 1Gi
# cpu: 500m
# - name : liveness-prometheus
# resource:
# requests:
# memory: 128Mi
# cpu: 50m
# limits:
# memory: 256Mi
# cpu: 100m
# (Optional) CEPH CSI CephFS plugin resource requirement list, Put here list of resource
# requests and limits you want to apply for plugin pod
# CSI_CEPHFS_PLUGIN_RESOURCE: |
# - name : driver-registrar
# resource:
# requests:
# memory: 128Mi
# cpu: 50m
# limits:
# memory: 256Mi
# cpu: 100m
# - name : csi-cephfsplugin
# resource:
# requests:
# memory: 512Mi
# cpu: 250m
# limits:
# memory: 1Gi
# cpu: 500m
# - name : liveness-prometheus
# resource:
# requests:
# memory: 128Mi
# cpu: 50m
# limits:
# memory: 256Mi
# cpu: 100m
# Configure CSI CSI Ceph FS grpc and liveness metrics port
# CSI_CEPHFS_GRPC_METRICS_PORT: "9091"
# CSI_CEPHFS_LIVENESS_METRICS_PORT: "9081"
# Configure CSI RBD grpc and liveness metrics port
# CSI_RBD_GRPC_METRICS_PORT: "9090"
# CSI_RBD_LIVENESS_METRICS_PORT: "9080"
# Whether the OBC provisioner should watch on the operator namespace or not, if not the namespace of the cluster will be used
ROOK_OBC_WATCH_OPERATOR_NAMESPACE: "true"
# (Optional) Admission controller NodeAffinity.
# ADMISSION_CONTROLLER_NODE_AFFINITY: "role=storage-node; storage=rook, ceph"
# (Optional) Admission controller tolerations list. Put here list of taints you want to tolerate in YAML format.
# Admission controller would be best to start on the same nodes as other ceph daemons.
# ADMISSION_CONTROLLER_TOLERATIONS: |
# - effect: NoSchedule
# key: node-role.kubernetes.io/controlplane
# operator: Exists
# - effect: NoExecute
# key: node-role.kubernetes.io/etcd
# operator: Exists
---
# OLM: BEGIN OPERATOR DEPLOYMENT
apiVersion: apps/v1
kind: Deployment
metadata:
name: rook-ceph-operator
namespace: rook-ceph # namespace:operator
labels:
operator: rook
storage-backend: ceph
spec:
selector:
matchLabels:
app: rook-ceph-operator
replicas: 1
template:
metadata:
labels:
app: rook-ceph-operator
spec:
serviceAccountName: rook-ceph-system
containers:
- name: rook-ceph-operator
image: registry.cn-hangzhou.aliyuncs.com/haoyustorage/rookceph:v1.5.12
args: ["ceph", "operator"]
volumeMounts:
- mountPath: /var/lib/rook
name: rook-config
- mountPath: /etc/ceph
name: default-config-dir
env:
# If the operator should only watch for cluster CRDs in the same namespace, set this to "true".
# If this is not set to true, the operator will watch for cluster CRDs in all namespaces.
- name: ROOK_CURRENT_NAMESPACE_ONLY
value: "false"
# To disable RBAC, uncomment the following:
# - name: RBAC_ENABLED
# value: "false"
# Rook Agent toleration. Will tolerate all taints with all keys.
# Choose between NoSchedule, PreferNoSchedule and NoExecute:
# - name: AGENT_TOLERATION
# value: "NoSchedule"
# (Optional) Rook Agent toleration key. Set this to the key of the taint you want to tolerate
# - name: AGENT_TOLERATION_KEY
# value: "<KeyOfTheTaintToTolerate>"
# (Optional) Rook Agent tolerations list. Put here list of taints you want to tolerate in YAML format.
# - name: AGENT_TOLERATIONS
# value: |
# - effect: NoSchedule
# key: node-role.kubernetes.io/controlplane
# operator: Exists
# - effect: NoExecute
# key: node-role.kubernetes.io/etcd
# operator: Exists
# (Optional) Rook Agent priority class name to set on the pod(s)
# - name: AGENT_PRIORITY_CLASS_NAME
# value: "<PriorityClassName>"
# (Optional) Rook Agent NodeAffinity.
# - name: AGENT_NODE_AFFINITY
# value: "role=storage-node; storage=rook,ceph"
# (Optional) Rook Agent mount security mode. Can by `Any` or `Restricted`.
# `Any` uses Ceph admin credentials by default/fallback.
# For using `Restricted` you must have a Ceph secret in each namespace storage should be consumed from and
# set `mountUser` to the Ceph user, `mountSecret` to the Kubernetes secret name.
# to the namespace in which the `mountSecret` Kubernetes secret namespace.
# - name: AGENT_MOUNT_SECURITY_MODE
# value: "Any"
# Set the path where the Rook agent can find the flex volumes
# - name: FLEXVOLUME_DIR_PATH
# value: "<PathToFlexVolumes>"
# Set the path where kernel modules can be found
# - name: LIB_MODULES_DIR_PATH
# value: "<PathToLibModules>"
# Mount any extra directories into the agent container
# - name: AGENT_MOUNTS
# value: "somemount=/host/path:/container/path,someothermount=/host/path2:/container/path2"
# Rook Discover toleration. Will tolerate all taints with all keys.
# Choose between NoSchedule, PreferNoSchedule and NoExecute:
# - name: DISCOVER_TOLERATION
# value: "NoSchedule"
# (Optional) Rook Discover toleration key. Set this to the key of the taint you want to tolerate
# - name: DISCOVER_TOLERATION_KEY
# value: "<KeyOfTheTaintToTolerate>"
# (Optional) Rook Discover tolerations list. Put here list of taints you want to tolerate in YAML format.
# - name: DISCOVER_TOLERATIONS
# value: |
# - effect: NoSchedule
# key: node-role.kubernetes.io/controlplane
# operator: Exists
# - effect: NoExecute
# key: node-role.kubernetes.io/etcd
# operator: Exists
# (Optional) Rook Discover priority class name to set on the pod(s)
# - name: DISCOVER_PRIORITY_CLASS_NAME
# value: "<PriorityClassName>"
# (Optional) Discover Agent NodeAffinity.
# - name: DISCOVER_AGENT_NODE_AFFINITY
# value: "role=storage-node; storage=rook, ceph"
# (Optional) Discover Agent Pod Labels.
# - name: DISCOVER_AGENT_POD_LABELS
# value: "key1=value1,key2=value2"
# Allow rook to create multiple file systems. Note: This is considered
# an experimental feature in Ceph as described at
# http://docs.ceph.com/docs/master/cephfs/experimental-features/#multiple-filesystems-within-a-ceph-cluster
# which might cause mons to crash as seen in https://github.com/rook/rook/issues/1027
- name: ROOK_ALLOW_MULTIPLE_FILESYSTEMS
value: "false"
# The logging level for the operator: INFO | DEBUG
- name: ROOK_LOG_LEVEL
value: "INFO"
# The duration between discovering devices in the rook-discover daemonset.
- name: ROOK_DISCOVER_DEVICES_INTERVAL
value: "60m"
# Whether to start pods as privileged that mount a host path, which includes the Ceph mon and osd pods.
# Set this to true if SELinux is enabled (e.g. OpenShift) to workaround the anyuid issues.
# For more details see https://github.com/rook/rook/issues/1314#issuecomment-355799641
- name: ROOK_HOSTPATH_REQUIRES_PRIVILEGED
value: "false"
# In some situations SELinux relabelling breaks (times out) on large filesystems, and doesn't work with cephfs ReadWriteMany volumes (last relabel wins).
# Disable it here if you have similar issues.
# For more details see https://github.com/rook/rook/issues/2417
- name: ROOK_ENABLE_SELINUX_RELABELING
value: "true"
# In large volumes it will take some time to chown all the files. Disable it here if you have performance issues.
# For more details see https://github.com/rook/rook/issues/2254
- name: ROOK_ENABLE_FSGROUP
value: "true"
# Disable automatic orchestration when new devices are discovered
- name: ROOK_DISABLE_DEVICE_HOTPLUG
value: "false"
# Provide customised regex as the values using comma. For eg. regex for rbd based volume, value will be like "(?i)rbd[0-9]+".
# In case of more than one regex, use comma to separate between them.
# Default regex will be "(?i)dm-[0-9]+,(?i)rbd[0-9]+,(?i)nbd[0-9]+"
# Add regex expression after putting a comma to blacklist a disk
# If value is empty, the default regex will be used.
- name: DISCOVER_DAEMON_UDEV_BLACKLIST
value: "(?i)dm-[0-9]+,(?i)rbd[0-9]+,(?i)nbd[0-9]+"
# Whether to enable the flex driver. By default it is enabled and is fully supported, but will be deprecated in some future release
# in favor of the CSI driver.
- name: ROOK_ENABLE_FLEX_DRIVER
value: "false"
# Whether to start the discovery daemon to watch for raw storage devices on nodes in the cluster.
# This daemon does not need to run if you are only going to create your OSDs based on StorageClassDeviceSets with PVCs.
- name: ROOK_ENABLE_DISCOVERY_DAEMON
value: "false"
# Time to wait until the node controller will move Rook pods to other
# nodes after detecting an unreachable node.
# Pods affected by this setting are:
# mgr, rbd, mds, rgw, nfs, PVC based mons and osds, and ceph toolbox
# The value used in this variable replaces the default value of 300 secs
# added automatically by k8s as Toleration for
# <node.kubernetes.io/unreachable>
# The total amount of time to reschedule Rook pods in healthy nodes
# before detecting a <not ready node> condition will be the sum of:
# --> node-monitor-grace-period: 40 seconds (k8s kube-controller-manager flag)
# --> ROOK_UNREACHABLE_NODE_TOLERATION_SECONDS: 5 seconds
- name: ROOK_UNREACHABLE_NODE_TOLERATION_SECONDS
value: "5"
# The name of the node to pass with the downward API
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# The pod name to pass with the downward API
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
# The pod namespace to pass with the downward API
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
# Uncomment it to run lib bucket provisioner in multithreaded mode
#- name: LIB_BUCKET_PROVISIONER_THREADS
# value: "5"
# Uncomment it to run rook operator on the host network
#hostNetwork: true
volumes:
- name: rook-config
emptyDir: {}
- name: default-config-dir
emptyDir: {}
# OLM: END OPERATOR DEPLOYMENT
解读:如上创建了相应的基础服务(如serviceaccounts),同时rook-ceph-operator会在每个节点创建 rook-ceph-agent 和 rook-discover。
2.4 配置cluster
[root@k8smaster01 ceph]# vi cluster.yaml
#################################################################################################################
# Define the settings for the rook-ceph cluster with common settings for a production cluster.
# All nodes with available raw devices will be used for the Ceph cluster. At least three nodes are required
# in this example. See the documentation for more details on storage settings available.
# For example, to create the cluster:
# kubectl create -f crds.yaml -f common.yaml -f operator.yaml
# kubectl create -f cluster.yaml
#################################################################################################################
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph # namespace:cluster
spec:
cephVersion:
# The container image used to launch the Ceph daemon pods (mon, mgr, osd, mds, rgw).
# v13 is mimic, v14 is nautilus, and v15 is octopus.
# RECOMMENDATION: In production, use a specific version tag instead of the general v14 flag, which pulls the latest release and could result in different
# versions running within the cluster. See tags available at https://hub.docker.com/r/ceph/ceph/tags/.
# If you want to be more precise, you can always use a timestamp tag such ceph/ceph:v15.2.11-20200419
# This tag might not contain a new Ceph version, just security fixes from the underlying operating system, which will reduce vulnerabilities
image: registry.cn-hangzhou.aliyuncs.com/haoyustorage/ceph:v15.2.11
# Whether to allow unsupported versions of Ceph. Currently `nautilus` and `octopus` are supported.
# Future versions such as `pacific` would require this to be set to `true`.
# Do not set to true in production.
allowUnsupported: false
# The path on the host where configuration files will be persisted. Must be specified.
# Important: if you reinstall the cluster, make sure you delete this directory from each host or else the mons will fail to start on the new cluster.
# In Minikube, the '/data' directory is configured to persist across reboots. Use "/data/rook" in Minikube environment.
dataDirHostPath: /var/lib/rook
# Whether or not upgrade should continue even if a check fails
# This means Ceph's status could be degraded and we don't recommend upgrading but you might decide otherwise
# Use at your OWN risk
# To understand Rook's upgrade process of Ceph, read https://rook.io/docs/rook/master/ceph-upgrade.html#ceph-version-upgrades
skipUpgradeChecks: false
# Whether or not continue if PGs are not clean during an upgrade
continueUpgradeAfterChecksEvenIfNotHealthy: false
# WaitTimeoutForHealthyOSDInMinutes defines the time (in minutes) the operator would wait before an OSD can be stopped for upgrade or restart.
# If the timeout exceeds and OSD is not ok to stop, then the operator would skip upgrade for the current OSD and proceed with the next one
# if `continueUpgradeAfterChecksEvenIfNotHealthy` is `false`. If `continueUpgradeAfterChecksEvenIfNotHealthy` is `true`, then opertor would
# continue with the upgrade of an OSD even if its not ok to stop after the timeout. This timeout won't be applied if `skipUpgradeChecks` is `true`.
# The default wait timeout is 10 minutes.
waitTimeoutForHealthyOSDInMinutes: 10
mon:
count: 3
allowMultiplePerNode: false
mgr:
modules:
- name: pg_autoscaler
enabled: true
# enable the ceph dashboard for viewing cluster status
dashboard:
enabled: true
# serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)
# urlPrefix: /ceph-dashboard
# serve the dashboard at the given port.
# port: 8443
# serve the dashboard using SSL
ssl: true
# enable prometheus alerting for cluster
monitoring:
# requires Prometheus to be pre-installed
enabled: false
# namespace to deploy prometheusRule in. If empty, namespace of the cluster will be used.
# Recommended:
# If you have a single rook-ceph cluster, set the rulesNamespace to the same namespace as the cluster or keep it empty.
# If you have multiple rook-ceph clusters in the same k8s cluster, choose the same namespace (ideally, namespace with prometheus
# deployed) to set rulesNamespace for all the clusters. Otherwise, you will get duplicate alerts with multiple alert definitions.
rulesNamespace: rook-ceph
network:
# enable host networking
#provider: host
# EXPERIMENTAL: enable the Multus network provider
#provider: multus
#selectors:
# The selector keys are required to be `public` and `cluster`.
# Based on the configuration, the operator will do the following:
# 1. if only the `public` selector key is specified both public_network and cluster_network Ceph settings will listen on that interface
# 2. if both `public` and `cluster` selector keys are specified the first one will point to 'public_network' flag and the second one to 'cluster_network'
#
# In order to work, each selector value must match a NetworkAttachmentDefinition object in Multus
#
#public: public-conf --> NetworkAttachmentDefinition object name in Multus
#cluster: cluster-conf --> NetworkAttachmentDefinition object name in Multus
# Provide internet protocol version. IPv6, IPv4 or empty string are valid options. Empty string would mean IPv4
#ipFamily: "IPv6"
# enable the crash collector for ceph daemon crash collection
crashCollector:
disable: false
# enable log collector, daemons will log on files and rotate
# logCollector:
# enabled: true
# periodicity: 24h # SUFFIX may be 'h' for hours or 'd' for days.
# automate [data cleanup process](https://github.com/rook/rook/blob/master/Documentation/ceph-teardown.md#delete-the-data-on-hosts) in cluster destruction.
cleanupPolicy:
# Since cluster cleanup is destructive to data, confirmation is required.
# To destroy all Rook data on hosts during uninstall, confirmation must be set to "yes-really-destroy-data".
# This value should only be set when the cluster is about to be deleted. After the confirmation is set,
# Rook will immediately stop configuring the cluster and only wait for the delete command.
# If the empty string is set, Rook will not destroy any data on hosts during uninstall.
confirmation: ""
# sanitizeDisks represents settings for sanitizing OSD disks on cluster deletion
# To control where various services will be scheduled by kubernetes, use the placement configuration sections below.
# The example under 'all' would have all services scheduled on kubernetes nodes labeled with 'role=storage-node' and
# tolerate taints with a key of 'storage-node'.
placement: #配置特定节点亲和力保证Node作为存储节点
# all:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: role
# operator: In
# values:
# - storage-node
# tolerations:
# - key: storage-node
# operator: Exists
mon:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: ceph-mon
operator: In
values:
- enabled
tolerations:
- key: ceph-mon
operator: Exists
ods:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: ceph-osd
operator: In
values:
- enabled
tolerations:
- key: ceph-osd
operator: Exists
mgr:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: ceph-mgr
operator: In
values:
- enabled
tolerations:
- key: ceph-mgr
operator: Exists
annotations:
resources:
# placement:
# all:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: role
# operator: In
# values:
# - storage-node
# podAffinity:
# podAntiAffinity:
# topologySpreadConstraints:
# tolerations:
# - key: storage-node
# operator: Exists
# The above placement information can also be specified for mon, osd, and mgr components
# mon:
# Monitor deployments may contain an anti-affinity rule for avoiding monitor
# collocation on the same node. This is a required rule when host network is used
# or when AllowMultiplePerNode is false. Otherwise this anti-affinity rule is a
# preferred rule with weight: 50.
# osd:
# mgr:
# cleanup:
# all:
# mon:
# osd:
# cleanup:
# prepareosd:
# If no mgr annotations are set, prometheus scrape annotations will be set by default.
# mon:
# osd:
# cleanup:
# mgr:
# prepareosd:
# The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory
# mgr:
# limits:
# cpu: "500m"
# memory: "1024Mi"
# requests:
# cpu: "500m"
# memory: "1024Mi"
# The above example requests/limits can also be added to the mon and osd components
# mon:
# osd:
# prepareosd:
# crashcollector:
# logcollector:
# cleanup:
# The option to automatically remove OSDs that are out and are safe to destroy.
removeOSDsIfOutAndSafeToRemove: false
# priorityClassNames:
# all: rook-ceph-default-priority-class
# mon: rook-ceph-mon-priority-class
# osd: rook-ceph-osd-priority-class
# mgr: rook-ceph-mgr-priority-class
storage:
useAllNodes: false #关闭使用所有Node
useAllDevices: false #关闭使用所有设备
deviceFilter: sdb
config:
metadataDevice:
databaseSizeMB: "1024"
journalSizeMB: "1024"
nodes:
- name: "centos8-master01" #指定存储节点主机
config:
storeType: bluestore #指定类型为裸磁盘
devices:
- name: "sdb" #指定磁盘为sdb
- name: "centos8-master02"
config:
storeType: bluestore
devices:
- name: "sdb"
- name: "centos8-master03"
config:
storeType: bluestore
devices:
- name: "sdb"
# crushRoot: "custom-root" # specify a non-default root label for the CRUSH map
# metadataDevice: "md0" # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore.
# databaseSizeMB: "1024" # uncomment if the disks are smaller than 100 GB
# journalSizeMB: "1024" # uncomment if the disks are 20 GB or smaller
# osdsPerDevice: "1" # this value can be overridden at the node or device level
# encryptedDevice: "true" # the default value for this option is "false"
# Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
# nodes below will be used as storage resources. Each node's 'name' field should match their 'kubernetes.io/hostname' label.
# nodes:
# - name: "172.17.4.201"
# devices: # specific devices to use for storage can be specified for each node
# - name: "sdb"
# - name: "nvme01" # multiple osds can be created on high performance devices
# config:
# osdsPerDevice: "5"
# - name: "/dev/disk/by-id/ata-ST4000DM004-XXXX" # devices can be specified using full udev paths
# config: # configuration can be specified at the node level which overrides the cluster level config
# storeType: filestore
# - name: "172.17.4.301"
# deviceFilter: "^sd."
# The section for configuring management of daemon disruptions during upgrade or fencing.
disruptionManagement:
# If true, the operator will create and manage PodDisruptionBudgets for OSD, Mon, RGW, and MDS daemons. OSD PDBs are managed dynamically
# via the strategy outlined in the [design](https://github.com/rook/rook/blob/master/design/ceph/ceph-managed-disruptionbudgets.md). The operator will
# block eviction of OSDs by default and unblock them safely when drains are detected.
managePodBudgets: false
# A duration in minutes that determines how long an entire failureDomain like `region/zone/host` will be held in `noout` (in addition to the
# default DOWN/OUT interval) when it is draining. This is only relevant when `managePodBudgets` is `true`. The default value is `30` minutes.
osdMaintenanceTimeout: 30
# A duration in minutes that the operator will wait for the placement groups to become healthy (active+clean) after a drain was completed and OSDs came back up.
# Operator will continue with the next drain if the timeout exceeds. It only works if `managePodBudgets` is `true`.
# No values or 0 means that the operator will wait until the placement groups are healthy before unblocking the next drain.
pgHealthCheckTimeout: 0
# If true, the operator will create and manage MachineDisruptionBudgets to ensure OSDs are only fenced when the cluster is healthy.
# Only available on OpenShift.
manageMachineDisruptionBudgets: false
# Namespace in which to watch for the MachineDisruptionBudgets.
machineDisruptionBudgetNamespace: openshift-machine-api
# healthChecks
# Valid values for daemons are 'mon', 'osd', 'status'
healthCheck:
daemonHealth:
mon:
disabled: false
interval: 45s
osd:
disabled: false
interval: 60s
status:
disabled: false
interval: 60s
# Change pod liveness probe, it works for all mon,mgr,osd daemons
livenessProbe:
mon:
disabled: false
mgr:
disabled: false
osd:
disabled: false
提示:更多cluster的CRD配置参考:https://github.com/rook/rook/blob/master/Documentation/ceph-cluster-crd.md。
https://blog.gmem.cc/rook-based-k8s-storage-solution
2.5 部署cluster
[root@k8smaster01 ceph]# kubectl create -f cluster.yaml
[root@k8smaster01 ceph]# kubectl logs -f -n rook-ceph rook-ceph-operator-cb47c46bc-pszfh #可查看部署log
[root@k8smaster01 ceph]# kubectl get pods -n rook-ceph -o wide #需要等待一定时间,部分中间态容器可能会波动
[root@centos8-master01 ceph]# kubectl get pod -n rook-ceph
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-5fpwr 3/3 Running 0 168m
csi-cephfsplugin-7zpd6 3/3 Running 0 168m
csi-cephfsplugin-cp27r 3/3 Running 0 168m
csi-cephfsplugin-nfl9t 3/3 Running 2 168m
csi-cephfsplugin-provisioner-f57576c9f-6k7dd 6/6 Running 5 168m
csi-cephfsplugin-provisioner-f57576c9f-9td6k 6/6 Running 0 168m
csi-cephfsplugin-qf7kw 3/3 Running 0 168m
csi-cephfsplugin-srf8k 3/3 Running 0 168m
csi-rbdplugin-5srf4 3/3 Running 0 168m
csi-rbdplugin-br5gm 3/3 Running 0 168m
csi-rbdplugin-dnmnl 3/3 Running 0 168m
csi-rbdplugin-gbpf4 3/3 Running 0 168m
csi-rbdplugin-n5bm9 3/3 Running 0 168m
csi-rbdplugin-provisioner-8557f6cd8-7t4nb 6/6 Running 5 168m
csi-rbdplugin-provisioner-8557f6cd8-pnn6f 6/6 Running 0 168m
csi-rbdplugin-rj6j5 3/3 Running 2 168m
rook-ceph-crashcollector-centos8-master01-64f57f48b8-xfgpb 1/1 Running 0 166m
rook-ceph-crashcollector-centos8-master02-778cf6d7f6-wpnxv 1/1 Running 0 166m
rook-ceph-crashcollector-centos8-master03-56476849b7-hp5m8 1/1 Running 0 166m
rook-ceph-mgr-a-f5d8d9fc8-5plv9 1/1 Running 0 166m
rook-ceph-mon-a-66c78577df-jsj2m 1/1 Running 0 168m
rook-ceph-mon-b-5cb6bdbb-b7g4r 1/1 Running 0 167m
rook-ceph-mon-c-78c5889d7-hwpxk 1/1 Running 0 166m
rook-ceph-operator-7968ff9886-tszjt 1/1 Running 0 169m
rook-ceph-osd-0-7dcddcbc6b-6m2hc 1/1 Running 0 166m
rook-ceph-osd-1-54b47bf778-zlftn 1/1 Running 0 166m
rook-ceph-osd-2-56c5b9c48b-jcbcp 1/1 Running 0 166m
rook-ceph-osd-prepare-centos8-master01-q9v7d 0/1 Completed 0 95m
rook-ceph-osd-prepare-centos8-master02-kqlqz 0/1 Completed 0 95m
rook-ceph-osd-prepare-centos8-master03-jxdk8 0/1 Completed 0 94m
提示:若部署失败
master节点执行[root@k8smaster01 ceph]# kubectl delete -f ./
所有master节点执行如下清理操作:
rm -rf /var/lib/rook
/dev/mapper/ceph-*
dmsetup ls
dmsetup remove_all
dd if=/dev/zero of=/dev/sdb bs=512k count=1
wipefs -af /dev/sdb
2.7 部署Toolbox
toolbox是一个rook的工具集容器,该容器中的命令可以用来调试、测试Rook,对Ceph临时测试的操作一般在这个容器内执行。
[root@centos8-master01 ceph]# kubectl create -f toolbox.yaml
[root@centos8-master01 ceph]# kubectl -n rook-ceph get pod -l "app=rook-ceph-tools"
NAME READY STATUS RESTARTS AGE
rook-ceph-tools-8574b74c5d-25bp9 1/1 Running 0 143m
2.8 测试Rook
[root@centos8-master01 ceph]# kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
[root@rook-ceph-tools-8574b74c5d-25bp9 /]# ceph status
cluster:
id: 0b5933ca-7a97-4176-b17f-9a07aa19560b
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 98m)
mgr: a(active, since 2h)
osd: 3 osds: 3 up (since 2h), 3 in (since 2h)
data:
pools: 2 pools, 33 pgs
objects: 5 objects, 19 B
usage: 3.0 GiB used, 897 GiB / 900 GiB avail
pgs: 33 active+clean
[root@rook-ceph-tools-8574b74c5d-25bp9 /]# ceph osd status
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 centos8-master02 1027M 298G 0 0 0 0 exists,up
1 centos8-master03 1027M 298G 0 0 0 0 exists,up
2 centos8-master01 1027M 298G 0 0 0 0 exists,up
[root@rook-ceph-tools-8574b74c5d-25bp9 /]# ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 900 GiB 897 GiB 11 MiB 3.0 GiB 0.33
TOTAL 900 GiB 897 GiB 11 MiB 3.0 GiB 0.33
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
device_health_metrics 1 1 0 B 0 0 B 0 284 GiB
replicapool 2 32 19 B 5 192 KiB 0 284 GiB
[root@rook-ceph-tools-8574b74c5d-25bp9 /]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
device_health_metrics 0 B 0 0 0 0 0 0 0 0 B 0 0 B 0 B 0 B
replicapool 192 KiB 5 0 15 0 0 0 1214 8.8 MiB 220 1.6 MiB 0 B 0 B
total_objects 5
total_used 3.0 GiB
total_avail 897 GiB
total_space 900 GiB
[root@rook-ceph-tools-8574b74c5d-25bp9 /]# ceph auth ls
installed auth entries:
osd.0
key: AQAfvIxhHjXgLRAA1oy7wvyANwp0EE9wm5Q+cQ==
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
osd.1
key: AQAgvIxhF9MVABAAN5XEgvloNL5SCFUwjcL99g==
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
osd.2
key: AQAivIxh7oI7MRAAq1QMd4Pnkc1n93mcS8ibzw==
caps: [mgr] allow profile osd
caps: [mon] allow profile osd
caps: [osd] allow *
client.admin
key: AQCCu4xhsRX/AhAA9j6pgi7ZxSmcOz9g1bnQXA==
caps: [mds] allow *
caps: [mgr] allow *
caps: [mon] allow *
caps: [osd] allow *
client.bootstrap-mds
key: AQDvu4xhLtqcOhAARwiC7ZGStBjajADg3d/dTQ==
caps: [mon] allow profile bootstrap-mds
client.bootstrap-mgr
key: AQDvu4xhd+KcOhAA0x/8kUkT7ibC+6VfXszGSw==
caps: [mon] allow profile bootstrap-mgr
client.bootstrap-osd
key: AQDvu4xh4emcOhAAUnKWOm+ZFZ9wL24JyFPtrA==
caps: [mon] allow profile bootstrap-osd
client.bootstrap-rbd
key: AQDvu4xhbfKcOhAAi+7TE6qVR5c5PDZJVwLBRg==
caps: [mon] allow profile bootstrap-rbd
client.bootstrap-rbd-mirror
key: AQDvu4xhbvqcOhAA78ZXb46BrBfL6A7xRLZuDw==
caps: [mon] allow profile bootstrap-rbd-mirror
client.bootstrap-rgw
key: AQDvu4xhigKdOhAA4IEL9YbcPTmb2kbCPYsSOw==
caps: [mon] allow profile bootstrap-rgw
client.crash
key: AQAUvIxhhvI3DhAAx+p9+90bnD7HNp8/ilGLSg==
caps: [mgr] allow profile crash
caps: [mon] allow profile crash
client.csi-cephfs-node
key: AQATvIxh6NPyNxAAZopNxf/8vkU4raGaRl5B1g==
caps: [mds] allow rw
caps: [mgr] allow rw
caps: [mon] allow r
caps: [osd] allow rw tag cephfs *=*
client.csi-cephfs-provisioner
key: AQATvIxheVlIKxAAI+pnqJaqu9XXdTEezaHL9g==
caps: [mgr] allow rw
caps: [mon] allow r
caps: [osd] allow rw tag cephfs metadata=*
client.csi-rbd-node
key: AQATvIxhOer9HRAAWwnDk/dmahHyWZbGWSbWjg==
caps: [mgr] allow rw
caps: [mon] profile rbd
caps: [osd] profile rbd
client.csi-rbd-provisioner
key: AQATvIxhI1kPEBAA2KRxn7qIdb92z/GSUruZxw==
caps: [mgr] allow rw
caps: [mon] profile rbd
caps: [osd] profile rbd
mgr.a
key: AQAUvIxhEmq6KhAA2cOgyM4GyoysKPZfqDOlsw==
caps: [mds] allow *
caps: [mon] allow profile mgr
caps: [osd] allow *
[root@rook-ceph-tools-8574b74c5d-25bp9 /]# ceph version
ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)
提示:更多Ceph管理参考《008.RHCS-管理Ceph存储集群》,如上工具中也支持使用独立的ceph命令ceph osd pool create ceph-test 512创建相关pool,实际Kubernetes rook中,不建议直接操作底层Ceph,以防止上层Kubernetes而言数据不一致性。
2.10 复制key和config
为方便管理,可将Ceph的keyring和config在master节点也创建一份,从而实现在Kubernetes外部宿主机对rook Ceph集群的简单查看。
[root@k8smaster01 ~]# kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') cat /etc/ceph/ceph.conf > /etc/ceph/ceph.conf
[root@k8smaster01 ~]# kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') cat /etc/ceph/keyring > /etc/ceph/keyring
[root@k8smaster01 ceph]# tee /etc/yum.repos.d/ceph.repo <<-'EOF'
[Ceph]
name=Ceph packages for $basearch
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/$basearch
enabled=1
gpgcheck=0
type=rpm-md
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc
priority=1
[Ceph-noarch]
name=Ceph noarch packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch
enabled=1
gpgcheck=0
type=rpm-md
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc
priority=1
[ceph-source]
name=Ceph source packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/SRPMS
enabled=1
gpgcheck=0
type=rpm-md
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc
priority=1
EOF
[root@k8smaster01 ceph]# yum -y install ceph-common ceph-fuse #安装客户端
[root@k8smaster01 ~]# ceph status
提示:rpm-nautilus版本建议和2.8所查看的版本一致。基于Kubernetes的rook Ceph集群,强烈不建议直接使用ceph命令进行管理,否则可能出现非一致性,对于rook集群的使用参考步骤三,ceph命令仅限于简单的集群查看。
三 Ceph 块存储
3.1 创建StorageClass
在提供(Provisioning)块存储之前,需要先创建StorageClass和存储池。K8S需要这两类资源,才能和Rook交互,进而分配持久卷(PV)。
复制代码
[root@k8smaster01 ceph]# kubectl create -f csi/rbd/storageclass.yaml
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicapool
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 3
requireSafeReplicaSize: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
clusterID: rook-ceph # namespace:cluster
pool: replicapool
imageFormat: "2"
imageFeatures: layering
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph # namespace:cluster
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph # namespace:cluster
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph # namespace:cluster
csi.storage.k8s.io/fstype: ext4
allowVolumeExpansion: true
reclaimPolicy: Delete
[root@centos8-master01 rbd]# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
rook-ceph-block rook-ceph.rbd.csi.ceph.com Delete Immediate true 107m
3.2 创建PVC
[root@centos8-master01 rbd]# cat pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: block-pvc
spec:
storageClassName: rook-ceph-block
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Mi
[root@centos8-master01 rbd]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
block-pvc Bound pvc-6328beff-bfe6-4a26-be53-4c1ffc4c9bb3 200Mi RWO rook-ceph-block 8s
[root@centos8-master01 rbd]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-6328beff-bfe6-4a26-be53-4c1ffc4c9bb3 200Mi RWO Delete Bound default/block-pvc rook-ceph-block 9s
解读:如上创建相应的PVC,storageClassName:为基于rook Ceph集群的rook-ceph-block。
3.3 消费块设备
[root@centos8-master01 rbd]# cat podo1.yaml
apiVersion: v1
kind: Pod
metadata:
name: rookpod01
spec:
restartPolicy: OnFailure
containers:
- name: test-container
image: busybox
volumeMounts:
- name: block-pvc
mountPath: /var/test
command: ['sh', '-c', 'echo "Hello World" > /var/test/data; exit 0']
volumes:
- name: block-pvc
persistentVolumeClaim:
claimName: block-pvc
[root@centos8-master01 rbd]# kubectl apply -f podo1.yaml
pod/rookpod01 created
[root@centos8-master01 rbd]# kubectl get pod
NAME READY STATUS RESTARTS AGE
rookpod01 0/1 ContainerCreating 0 9s
[root@centos8-master01 rbd]# kubectl get pod
NAME READY STATUS RESTARTS AGE
rookpod01 0/1 Completed 0 81s
解读:创建如上Pod,并挂载3.2所创建的PVC,等待执行完毕。
3.4 测试持久性
[root@centos8-master01 rbd]# kubectl delete -f podo1.yaml
pod "rookpod01" deleted
[root@centos8-master01 rbd]# cat pod02.yaml
apiVersion: v1
kind: Pod
metadata:
name: rookpod02
spec:
restartPolicy: OnFailure
containers:
- name: test-container
image: busybox
volumeMounts:
- name: block-pvc
mountPath: /var/test
command: ['sh', '-c', 'cat /var/test/data; exit 0']
volumes:
- name: block-pvc
persistentVolumeClaim:
claimName: block-pvc
[root@k8smaster01 ceph]# kubectl create -f rookpod02.yaml
[root@k8smaster01 ceph]# kubectl logs rookpod02 test-container
Hello World
解读:创建rookpod02,并使用所创建的PVC,测试持久性。
提示:更多Ceph块设备知识参考《003.RHCS-RBD块存储使用》。
四 Ceph 对象存储
4.1 创建CephObjectStore
在提供(object)对象存储之前,需要先创建相应的支持,使用如下官方提供的默认yaml可部署对象存储的CephObjectStore。
[root@k8smaster01 ceph]# kubectl create -f object.yaml
apiVersion: ceph.rook.io/v1
kind: CephObjectStore
metadata:
name: my-store
namespace: rook-ceph # namespace:cluster
spec:
# The pool spec used to create the metadata pools. Must use replication.
metadataPool:
failureDomain: host
replicated:
size: 3
# Disallow setting pool with replica 1, this could lead to data loss without recovery.
# Make sure you're *ABSOLUTELY CERTAIN* that is what you want
requireSafeReplicaSize: true
parameters:
# Inline compression mode for the data pool
# Further reference: https://docs.ceph.com/docs/nautilus/rados/configuration/bluestore-config-ref/#inline-compression
compression_mode: none
# gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool
# for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size
#target_size_ratio: ".5"
# The pool spec used to create the data pool. Can use replication or erasure coding.
dataPool:
failureDomain: host
replicated:
size: 3
# Disallow setting pool with replica 1, this could lead to data loss without recovery.
# Make sure you're *ABSOLUTELY CERTAIN* that is what you want
requireSafeReplicaSize: true
parameters:
# Inline compression mode for the data pool
# Further reference: https://docs.ceph.com/docs/nautilus/rados/configuration/bluestore-config-ref/#inline-compression
compression_mode: none
# gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool
# for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size
#target_size_ratio: ".5"
# Whether to preserve metadata and data pools on object store deletion
preservePoolsOnDelete: false
# The gateway service configuration
gateway:
# type of the gateway (s3)
type: s3
# A reference to the secret in the rook namespace where the ssl certificate is stored
sslCertificateRef:
# The port that RGW pods will listen on (http)
port: 80
# The port that RGW pods will listen on (https). An ssl certificate is required.
# securePort: 443
# The number of pods in the rgw deployment
instances: 1
# The affinity rules to apply to the rgw deployment or daemonset.
placement:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: role
# operator: In
# values:
# - rgw-node
# topologySpreadConstraints:
# tolerations:
# - key: rgw-node
# operator: Exists
# podAffinity:
# podAntiAffinity:
# A key/value list of annotations
annotations:
# key: value
# A key/value list of labels
labels:
# key: value
resources:
# The requests and limits set here, allow the object store gateway Pod(s) to use half of one CPU core and 1 gigabyte of memory
# limits:
# cpu: "500m"
# memory: "1024Mi"
# requests:
# cpu: "500m"
# memory: "1024Mi"
# priorityClassName: my-priority-class
#zone:
#name: zone-a
# service endpoint healthcheck
healthCheck:
bucket:
disabled: false
interval: 60s
# Configure the pod liveness probe for the rgw daemon
livenessProbe:
disabled: false
[root@centos8-master01 ceph]# kubectl apply -f object.yaml
cephobjectstore.ceph.rook.io/my-store created
[root@centos8-master01 ceph]# kubectl -n rook-ceph get pod -l app=rook-ceph-rgw -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rook-ceph-rgw-my-store-a-857d775fbd-bl2x7 1/1 Running 0 39s 10.10.108.168 centos8-master02 <none> <none>
4.2 创建StorageClass
使用如下官方提供的默认yaml可部署对象存储的StorageClass。
[root@centos8-master01 ceph]# cat storageclass-bucket-delete.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-delete-bucket
provisioner: rook-ceph.ceph.rook.io/bucket # driver:namespace:cluster
# set the reclaim policy to delete the bucket and all objects
# when its OBC is deleted.
reclaimPolicy: Delete
parameters:
objectStoreName: my-store
objectStoreNamespace: rook-ceph # namespace:cluster
region: us-east-1
# To accommodate brownfield cases reference the existing bucket name here instead
# of in the ObjectBucketClaim (OBC). In this case the provisioner will grant
# access to the bucket by creating a new user, attaching it to the bucket, and
# providing the credentials via a Secret in the namespace of the requesting OBC.
#bucketName:
[root@centos8-master01 ceph]# kubectl apply -f storageclass-bucket-delete.yaml
storageclass.storage.k8s.io/rook-ceph-delete-bucket created
[root@centos8-master01 ceph]# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
rook-ceph-block rook-ceph.rbd.csi.ceph.com Delete Immediate true 3h2m
rook-ceph-delete-bucket rook-ceph.ceph.rook.io/bucket Delete Immediate false 37s
4.3 创建bucket
使用如下
官方提供的默认yaml可部署对象存储的bucket。
[root@centos8-master01 ceph]# cat object-bucket-claim-delete.yaml
apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
name: ceph-delete-bucket
spec:
# To create a new bucket specify either `bucketName` or
# `generateBucketName` here. Both cannot be used. To access
# an existing bucket the bucket name needs to be defined in
# the StorageClass referenced here, and both `bucketName` and
# `generateBucketName` must be omitted in the OBC.
#bucketName:
generateBucketName: ceph-bkt
storageClassName: rook-ceph-delete-bucket
additionalConfig:
# To set for quota for OBC
#maxObjects: "1000"
#maxSize: "2G"
[root@centos8-master01 ceph]# kubectl get cm
NAME DATA AGE
ceph-delete-bucket 5 41s
4.4 设置对象存储访问信息
[root@k8smaster01 ceph]# kubectl -n default get cm ceph-delete-bucket -o yaml | grep BUCKET_HOST | awk '{print KaTeX parse error: Expected 'EOF', got '}' at position 2: 2}̲' rook-ceph-rgw…(kubectl -n default get cm ceph-delete-bucket -o yaml | grep BUCKET_HOST | awk '{print KaTeX parse error: Expected 'EOF', got '}' at position 2: 2}̲') [root@k8smas…(kubectl -n default get secret ceph-delete-bucket -o yaml | grep AWS_ACCESS_KEY_ID | awk ‘{print KaTeX parse error: Expected 'EOF', got '}' at position 2: 2}̲' | base64 --de…(kubectl -n default get secret ceph-delete-bucket -o yaml | grep AWS_SECRET_ACCESS_KEY | awk ‘{print KaTeX parse error: Expected 'EOF', got '}' at position 2: 2}̲' | base64 --de…{AWS_HOST} --host-bucket= s3://ceph-bkt-377bf96f-aea8-4838-82bc-2cb2c16cccfb/test.txt #测试上传至bucket
提示:更多rook 对象存储使用,如创建用户等参考:https://rook.io/docs/rook/v1.1/ceph-object.html。
回到顶部
五 Ceph 文件存储
5.1 创建CephFilesystem
默认Ceph未部署对CephFS的支持,使用如下官方提供的默认yaml可部署文件存储的filesystem。
[root@k8smaster01 ceph]# kubectl create -f filesystem.yaml
复制代码
1 apiVersion: ceph.rook.io/v1
2 kind: CephFilesystem
3 metadata:
4 name: myfs
5 namespace: rook-ceph
6 spec:
7 metadataPool:
8 replicated:
9 size: 3
10 dataPools:
11 - failureDomain: host
12 replicated:
13 size: 3
14 preservePoolsOnDelete: true
15 metadataServer:
16 activeCount: 1
17 activeStandby: true
18 placement:
19 podAntiAffinity:
20 requiredDuringSchedulingIgnoredDuringExecution:
21 - labelSelector:
22 matchExpressions:
23 - key: app
24 operator: In
25 values:
26 - rook-ceph-mds
27 topologyKey: kubernetes.io/hostname
28 annotations:
29 resources:
复制代码
[root@k8smaster01 ceph]# kubectl get cephfilesystems.ceph.rook.io -n rook-ceph
NAME ACTIVEMDS AGE
myfs 1 27s
5.2 创建StorageClass
[root@k8smaster01 ceph]# kubectl create -f csi/cephfs/storageclass.yaml
使用如下官方提供的默认yaml可部署文件存储的StorageClass。
[root@k8smaster01 ceph]# vi csi/cephfs/storageclass.yaml
复制代码
1 apiVersion: storage.k8s.io/v1
2 kind: StorageClass
3 metadata:
4 name: csi-cephfs
5 provisioner: rook-ceph.cephfs.csi.ceph.com
6 parameters:
7 clusterID: rook-ceph
8 fsName: myfs
9 pool: myfs-data0
10 csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
11 csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
12 csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
13 csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
14 reclaimPolicy: Delete
15 mountOptions:
复制代码
[root@k8smaster01 ceph]# kubectl get sc
NAME PROVISIONER AGE
csi-cephfs rook-ceph.cephfs.csi.ceph.com 10m
5.3 创建PVC
[root@k8smaster01 ceph]# vi rookpvc03.yaml
复制代码
1 apiVersion: v1
2 kind: PersistentVolumeClaim
3 metadata:
4 name: cephfs-pvc
5 spec:
6 storageClassName: csi-cephfs
7 accessModes:
8 - ReadWriteOnce
9 resources:
10 requests:
11 storage: 200Mi
复制代码
[root@k8smaster01 ceph]# kubectl create -f rookpvc03.yaml
[root@k8smaster01 ceph]# kubectl get pv
[root@k8smaster01 ceph]# kubectl get pvc
clipboard
5.4 消费PVC
[root@k8smaster01 ceph]# vi rookpod03.yaml
复制代码
1 —
2 apiVersion: v1
3 kind: Pod
4 metadata:
5 name: csicephfs-demo-pod
6 spec:
7 containers:
8 - name: web-server
9 image: nginx
10 volumeMounts:
11 - name: mypvc
12 mountPath: /var/lib/www/html
13 volumes:
14 - name: mypvc
15 persistentVolumeClaim:
16 claimName: cephfs-pvc
17 readOnly: false
复制代码
[root@k8smaster01 ceph]# kubectl create -f rookpod03.yaml
[root@k8smaster01 ceph]# kubectl get pods
NAME READY STATUS RESTARTS AGE
csicephfs-demo-pod 1/1 Running 0 24s
回到顶部
六 设置dashboard
6.1 部署Node SVC
步骤2.4已创建dashboard,但仅使用clusterIP暴露服务,使用如下官方提供的默认yaml可部署外部nodePort方式暴露服务的dashboard。
[root@k8smaster01 ceph]# kubectl create -f dashboard-external-https.yaml
[root@k8smaster01 ceph]# vi dashboard-external-https.yaml
复制代码
1 apiVersion: v1
2 kind: Service
3 metadata:
4 name: rook-ceph-mgr-dashboard-external-https
5 namespace: rook-ceph
6 labels:
7 app: rook-ceph-mgr
8 rook_cluster: rook-ceph
9 spec:
10 ports:
11 - name: dashboard
12 port: 8443
13 protocol: TCP
14 targetPort: 8443
15 selector:
16 app: rook-ceph-mgr
17 rook_cluster: rook-ceph
18 sessionAffinity: None
19 type: NodePort
复制代码
[root@k8smaster01 ceph]# kubectl get svc -n rook-ceph
clipboard
6.2 确认验证
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath=’{.data.password}’ | base64 --decode #获取初始密码
浏览器访问:https://172.24.8.71:31097
clipboard
账号:admin,密码:如上查找即可。
clipboard
回到顶部
七 集群管理
7.1 修改配置
默认创建Ceph集群的配置参数在创建Cluster的时候生成Ceph集群的配置参数,若需要在部署完成后修改相应参数,可通过如下操作试下:
[root@k8smaster01 ceph]# kubectl -n rook-ceph get configmap rook-config-override -o yaml #获取参数
[root@k8snode02 ~]# cat /var/lib/rook/rook-ceph/rook-ceph.config #也可在任何node上查看
[root@k8smaster01 ceph]# kubectl -n rook-ceph edit configmap rook-config-override -o yaml #修改参数
复制代码
1 ……
2 apiVersion: v1
3 data:
4 config: |
5 [global]
6 osd pool default size = 2
7 ……
复制代码
依次重启ceph组件
[root@k8smaster01 ceph]# kubectl -n rook-ceph delete pod rook-ceph-mgr-a-5699bb7984-kpxgp
[root@k8smaster01 ceph]# kubectl -n rook-ceph delete pod rook-ceph-mon-a-85698dfff9-w5l8c
[root@k8smaster01 ceph]# kubectl -n rook-ceph delete pod rook-ceph-mgr-a-d58847d5-dj62p
[root@k8smaster01 ceph]# kubectl -n rook-ceph delete pod rook-ceph-mon-b-76559bf966-652nl
[root@k8smaster01 ceph]# kubectl -n rook-ceph delete pod rook-ceph-mon-c-74dd86589d-s84cz
注意:ceph-mon, ceph-osd的delete最后是one-by-one的,等待ceph集群状态为HEALTH_OK后再delete另一个。
提示:其他更多rook配置参数参考:https://rook.io/docs/rook/v1.1/。
7.2 创建Pool
对rook Ceph集群的pool创建,建议采用Kubernetes的方式,而不建议使用toolbox中的ceph命令。
使用如下官方提供的默认yaml可部署Pool。
[root@k8smaster01 ceph]# kubectl create -f pool.yaml
复制代码
1 apiVersion: ceph.rook.io/v1
2 kind: CephBlockPool
3 metadata:
4 name: replicapool2
5 namespace: rook-ceph
6 spec:
7 failureDomain: host
8 replicated:
9 size: 3
10 annotations:
复制代码
7.3 删除Pool
[root@k8smaster01 ceph]# kubectl delete -f pool.yaml
提示:更多Pool管理,如纠删码池参考:https://rook.io/docs/rook/v1.1/ceph-pool-crd.html。
7.4 添加OSD节点
本步骤模拟将k8smaster的sdb添加为OSD。
[root@k8smaster01 ceph]# kubectl taint node k8smaster01 node-role.kubernetes.io/master- #允许调度Pod
[root@k8smaster01 ceph]# kubectl label nodes k8smaster01 ceph-osd=enabled #设置标签
[root@k8smaster01 ceph]# vi cluster.yaml #追加master01的配置
……
- name: “k8smaster01”
config:
storeType: bluestore
devices:
- name: “sdb”
……
clipboard
[root@k8smaster01 ceph]# kubectl apply -f cluster.yaml
[root@k8smaster01 ceph]# kubectl -n rook-ceph get pod -o wide -w
clipboard
ceph osd tree
7.5 删除OSD节点
[root@k8smaster01 ceph]# kubectl label nodes k8smaster01 ceph-osd- #删除标签
[root@k8smaster01 ceph]# vi cluster.yaml #删除如下master01的配置
复制代码
1 ……
2 - name: “k8smaster01”
3 config:
4 storeType: bluestore
5 devices:
6 - name: “sdb”
7 ……
复制代码
clipboard
[root@k8smaster01 ceph]# kubectl apply -f cluster.yaml
[root@k8smaster01 ceph]# kubectl -n rook-ceph get pod -o wide -w
[root@k8smaster01 ceph]# rm -rf /var/lib/rook
7.6 删除Cluster
完整优雅删除rook集群的方式参考:https://github.com/rook/rook/blob/master/Documentation/ceph-teardown.md
7.7 升级rook
参考:http://www.yangguanjun.com/2018/12/28/rook-ceph-practice-part2/。
更多官网文档参考:https://rook.github.io/docs/rook/v1.1/
推荐博文:http://www.yangguanjun.com/archives/
https://sealyun.com/post/rook/