ceph的crush规则

最新推荐文章于 2023-05-21 16:31:32 发布

givenchy_yzl

最新推荐文章于 2023-05-21 16:31:32 发布

阅读量1.7k

点赞数 1

分类专栏： ceph

本文链接：https://blog.csdn.net/givenchy_yzl/article/details/117088581

版权

ceph 专栏收录该内容

8 篇文章 3 订阅

订阅专栏

分布式存储ceph之crush规则配置

一、命令生成osd树形结构

创建数据中心:datacenter0
 ceph osd crush add-bucket datacenter0 datacenter

#创建机房:roomo
 ceph osd erush add-bucket roomo room

#创建机架:rackorack1、 rack2
 ceph osd crush add-bucket racko rack ceph osd crush add-bucket rack1 rack ceph osd crush add-bucket rack2 rack

#把机房roomo移动到数据中心datacenter0下
 ceph osd crush move roomo datacenter=datacenter0

# 把机架racko、rack1、rack2移动到机房roomo下 ceph osd crush move racko room=room0

#把主机osd01移动到:datacenter0/room0/racko下
#把主机osd02移动到:datacenter0/room0/racki下

把主机osd03移动到:datacenter0/room0/rack2下

查看

[root@admin ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0.17537 datacenter datacenter0 
-10 0.17537 room room0 
-11 0.05846 rack rack0 
-3 0.05846 host osd01 
 0 hdd 0.01949 osd.0 up 1.00000 1.00000
 1 hdd 0.01949 osd.1 up 1.00000 1.00000
 2 hdd 0.01949 osd.2 up 1.00000 1.00000
-12 0.05846 rack rack1 
-5 0.05846 host osd02 
 3 hdd 0.01949 osd.3 up 1.00000 1.00000
 4 hdd 0.01949 osd.4 up 1.00000 1.00000
 5 hdd 0.01949 osd.5 up 1.00000 1.00000
-13 0.05846 rack rack2 
-7 0.05846 host osd03 
 6 hdd 0.01949 osd.6 up 1.00000 1.00000
 7 hdd 0.01949 osd.7 up 1.00000 1.00000
 8 hdd 0.01949 osd.8 up 1.00000 1.00000
-1 0 root default

二、crushmap信息介绍

我们可以通过命令导出集群当前的crushmap信息

把二进制格式的crush map导出到test.bin文件中 
ceph osd qeterushmap -o test.bin

用crushtool 工具把test.bin里的二进制数据转换成文本形式保存到 test.txt 文档里。 crushtool -d test.bin -o test.txt

crushmap配置中最核心的当属rule了，crushrule决定了三点重要事项:
1、从OSDMap中的哪个节点开始查找
2、使用那个节点作为故障隔离域
3定位副本的搜索模式(广度优先or 深度优先)。

# rules
 rule egon ruleset #规则集的命名，创建poo1时可以指定rule集
 {
 id 1  #rules集的编号，顺序编即可
 type replicated #定义poo1类型为replicated(还有esurecode模式)
 min_size 1 #poo1中最小指定的副本数量不能小\

 max_size 10 #pool中最大指定的副本数量不能大于10

 step take defauit #定义pg查找副本的入口点
 step chooseleaf firstn  type #选叶子节点、深度优先、隔离host host
 step emit #结束
 }

总结

pg 选择osd的过程，首先要知道在rules中指明从osdmnap中哪个节点开始查找，入口点默认为 default也就是root节点，
然后隔离域为host节点(也就是同一个host下面不能选择两个子节点)。由default到3个host的选择过程，
这里由default根据节点的bucket类型选择下一个子节点，由子节点再根据本身的类型继续选择，直到

三、修改 crushmap 信息

3.1导出crush map
把 ceph 的二进制格式的 crush map 导出并转换为文本格式

#把二进制格式的crush map导出到test.bin文件中
ceph osd getcrushmap -o test.bin
用crushtool 工具把test.bin 里的二进制数据转换成文本形式保存到 test.txt文档里。 erushtool -d test.bin-o test.txt

3.2 修改test.txt

#1、将take default改成take datacenter0，修改权重,
#2、设置叶节点weight：
以容量为指标，设置weight值:推荐以1TB为基数，设置weight值为1.0。这样500G的则为0.5，3TB则为3.0.

以性能为指标，设置weight值:比如性能好些的盘，weight设置为1.2，性能差些的weight设置为0.8.
 bucket 节点weight为下级items weight之和

参考设置

# tunable:这里都是一些微调的参数，通常不建议修改，一般得集群规模够大，到了专家级别才会去修改。
tunable choose_local_tries 0
 tunable choose_local_fallback_tries 0 
 tunable choose total tries 50
 tunable chooseleaf_descend_once 1 
 tunable chooseleaf_vary_r 1 
 tunable chooseleaf stable 1 
 tunable straw cale version 1 
 tunable allowed bucket algs 54

 #devices:这下面将列出集群中的所有OSD基本信息。 
 device o osd.o elass hdd 
 device 1 osd.1 class hdd 
 device 2 osd.2 elass hdd 
 device 3 osd.3 class hdd 
 device 4 osd.4 class hdd 
 device 5 osd.5 class hdd 
 device 6 osd.6 class hdd 
 device 7 osd.7 class hdd 
 device 8 osd.8 class hdd

#types:这里列出集中可用的故障域，可以自定义。
 type 0 osd # 硬盘
 type 1 host # 服务器
 type # 机箱 chassis
 type 3 rack # 机架(一个机架包含多个机箱)
 type 4 row # 机排
 type 5 pdu # 配电单元(有可能多个机排共用一个配电单元)
 type 6 pod # 多个机排
 type 7 room # 机房
 type 8datacenter# 数据中心(有可能多个机房组成一个数据中心) 
 type 9 region # 区域《华东1，华东2等)
 type 10 root # 最顶级，必须存在注意:这些故障域也称之为Bucket，但有些Bucket非
 radowsgw里面的bucket。

 # buckets:这里就是定义故障域名。 

root default {
 id -1 # do not change unnecessarily
 id -2 class hdd # do not change unnecessarily
 # weight 0.000
 alg straw2
 hash 0 # rjenkins1
}
host osd01 {
 id -3 # do not change unnecessarily
 id -4 class hdd # do not change unnecessarily
 # weight 0.058
 alg straw2
 hash 0 # rjenkins1
 item osd.0 weight 1.000
 item osd.1 weight 1.000
 item osd.2 weight 1.000
}
host osd02 {
 id -5 # do not change unnecessarily
 id -6 class hdd # do not change unnecessarily
 # weight 0.058
 alg straw2
 hash 0 # rjenkins1
 item osd.3 weight 1.000
 item osd.4 weight 1.000
 item osd.5 weight 1.000
}
host osd03 {
 id -7 # do not change unnecessarily
 id -8 class hdd # do not change unnecessarily
 # weight 0.058
 alg straw2
 hash 0 # rjenkins1
 item osd.6 weight 1.000
 item osd.7 weight 1.000
 item osd.8 weight 1.000
}
rack rack0 {
 id -11 # do not change unnecessarily
 id -16 class hdd # do not change unnecessarily
 # weight 0.058
 alg straw2
 hash 0 # rjenkins1
 item osd01 weight 3.000
}
rack rack1 {
 id -12 # do not change unnecessarily
 id -15 class hdd # do not change unnecessarily
 # weight 0.058
 alg straw2
 hash 0 # rjenkins1
 item osd02 weight 3.000
}
rack rack2 {
 id -13 # do not change unnecessarily
 id -14 class hdd # do not change unnecessarily
 # weight 0.058
 alg straw2
 hash 0 # rjenkins1
 item osd03 weight 3.000
}
room room0 {
 id -10 # do not change unnecessarily
 id -17 class hdd # do not change unnecessarily
 # weight 0.175
 alg straw2
 hash 0 # rjenkins1
 item rack0 weight 3.000
 item rack1 weight 3.000
 item rack2 weight 3.000
}
datacenter datacenter0 {
 id -9 # do not change unnecessarily
 id -18 class hdd # do not change unnecessarily
 # weight 0.175
 alg straw2
 hash 0 # rjenkins1
 item room0 weight 9.000
}

# rules:这里定义的是存储池的规则
#1、type为存储池的类型，replicated代表副本池。如果有纠删码池也会创建出一个默认的配置这里没有。
# 2.min size代表允许的最少副本数
#3、max_size允许的最大副本数
#4、step代表每一个步验，基本第二步为选择如何找到OSD，需要指定故障域级别，这里定义为 host，如果有机房或者其它的，可以将故障域定义为更高的级别。
rule replicated_rule {
 id 0
 type replicated
 min_size 1
 max_size 10
 step take default
 step chooseleaf firstn 0 type host
 step emit
}

 rule egon_rule{ #规则集的命名，创建poo1时可以指定rule集
 id 1  # id设置为1
 type replicated   #定义poo1类型为replicated(还有esurecode模式
 min_size 1    #pool中最小指定的副本数量不能小1
 max_size 10   #poo1中最大指定的副本数量不能大于10 
 step take datacenter0   #定义pg查找副本的入口点
 step chooseleaf firstn type host   #深度优先、隔离默认为host，设置为host
 step emit      # 结束
# end crush map

**3.3把重新写的ceph crush 导入ceph 集群**
```bash
# 把 test1 转换成二进制形式
crushtool -c test.txt -o new.bin
#把 test2 导入集群
 ceph osd setcrushmap -i new.bin

3.4修改现有存储池的crush_rule
重新导入集群后，需要把之前存在过的pool池的crush_rule都修一下，否则集群会出现unknown状态有无法达到activee+clean状态

 ceph osd pool set cephfs data crush rule egon rule
 ceph osd pool set cephfs metadata crush rule egon rule 
 ceph osd pool set egon_test crush_rule egon_rule

3.5 查看集群状态
ceph osd dump # 发现使用的crush_rule的id变为1

givenchy_yzl

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
ceph的crush规则

分布式存储ceph之crush规则配置一、命令生成osd树形结构创建数据中心:datacenter0 ceph osd crush add-bucket datacenter0 datacenter#创建机房:roomo ceph osd erush add-bucket roomo room#创建机架:rackorack1、 rack2 ceph osd crush add-bucket racko rack ceph osd crush add-bucket rack1 rack ce
复制链接

扫一扫