源码文件
文件名 | 说明 |
build.c/.h | 对crush_map构建等操作。 |
Crush.c/h | 定义crush_map相关的数据结构,及销毁数据结构等简单基本操作 |
curshCompiler.cc/.h | Compile/decompile crush_map相关数据结构。文本<->二进制 |
Crushwrapper.cc/.h | 将cursh的操作封装到C++的类里 |
Hash.c/.h | 分别提供1/2/3/4/5个32位无符号整数进行Hash,返回一个32无符号整数的hash值,目前只实现了rjenkins算法 |
Mapper.c/.h | crush_do_rule()crush_find_rule() |
crushTester.c/.h | 测试crush相关操作 |
CRUSH maps
如下表所示,CRUSH maps由三部分组成:
一个OSD列表;
一个Bucket列表:标识存储设备的组织形式;
一个Rule列表:标识如何复制数据;
Device | device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5 device 6 osd.6 device 7 osd.7 |
Bucket | host ceph-osd-ssd-server-1 { id -1 alg straw hash 0 item osd.0 weight 1.00 item osd.1 weight 1.00 }
host ceph-osd-ssd-server-2 { id -2 alg straw hash 0 item osd.2 weight 1.00 item osd.3 weight 1.00 }
host ceph-osd-platter-server-1 { id -3 alg straw hash 0 item osd.4 weight 1.00 item osd.5 weight 1.00 }
host ceph-osd-platter-server-2 { id -4 alg straw hash 0 item osd.6 weight 1.00 item osd.7 weight 1.00 } root platter { id -5 alg straw hash 0 item ceph-osd-platter-server-1 weight 2.00 item ceph-osd-platter-server-2 weight 2.00 }
root ssd { id -6 alg straw hash 0 item ceph-osd-ssd-server-1 weight 2.00 item ceph-osd-ssd-server-2 weight 2.00 } |
Rule | rule data { ruleset 0 type replicated min_size 2 max_size 2 step take platter step chooseleaf firstn 0 type host step emit }
rule metadata { ruleset 1 type replicated min_size 0 max_size 10 step take platter step chooseleaf firstn 0 type host step emit }
rule rbd { ruleset 2 type replicated min_size 0 max_size 10 step take platter step chooseleaf firstn 0 type host step emit }
rule platter { ruleset 3 type replicated min_size 0 max_size 10 step take platter step chooseleaf firstn 0 type host step emit }
rule ssd { ruleset 4 type replicated min_size 0 max_size 4 step take ssd step chooseleaf firstn 0 type host step emit }
rule ssd-primary { ruleset 5 type replicated min_size 5 max_size 10 step take ssd step chooseleaf firstn 1 type host step emit step take platter step chooseleaf firstn -1 type host step emit } |
Do Rule
Do_rule
CrushWrapper.h
void do_rule(intrule, int x, vector<int>& out, int maxout,
const vector<__u32>& weight)const {
@rule:使用的crush_rule在crush_map的rules列表中所在index
@x:输入Hash ID,object_id或者pg_id)
@out:输出Device ID列表
@maxout:在输出Device ID的个数,副本的个数
@weight:输出Device列表对应的权重
具体的工作是调用crush_do_rule完成
crush_do_rule
Mapper.c
int crush_do_rule(conststruct crush_map *map,
int ruleno, int x, int *result, intresult_max,
const __u32 *weight, int weight_max,
int *scratch);
@map:保存在CrushWrapper类中的crush_map,其中包含crush_bucket和crush_rule
@scratch: 3倍于result列表的列表,用于执行rulestep的临时数组
该函数按照map->rules[ruleno]中的steps列表定义的步骤顺序执行。
CRUSH_RULE_TAKE
类别 | Rule step名称 | 说明 |
开始/结束 | CRUSH_RULE_TAKE | 开始Step,设置参数 |
CRUSH_RULE_EMIT | 结束Step | |
选择bucket | CRUSH_RULE_CHOOSE_FIRSTN |
|
CRUSH_RULE_CHOOSE_INDEP |
| |
选择device | CRUSH_RULE_CHOOSELEAF_FIRSTN |
|
CRUSH_RULE_CHOOSELEAF_INDEP |
| |
设置参数 | CRUSH_RULE_SET_CHOOSE_TRIES |
|
CRUSH_RULE_SET_CHOOSELEAF_TRIES |
|
crush_choose_firstn & crush_choose_indep
static int crush_choose_firstn(const structcrush_map *map,
struct crush_bucket *bucket,
const __u32 *weight, intweight_max,
int x, int numrep, int type,
int *out, int outpos,
unsigned attempts, unsignedrecurse_attempts,
int recurse_to_leaf,
int descend_once, int *out2)
@map: 存储存储架构的crush_map
@bucket: 从该crush_bucket中的items选择item,the bucketwe are choose an item from
@weight/weight_max:device对应的权重的列表
@x:输入Hash ID,object_id或者pg_id),crush inputvalue
@numrep: the number of items to choose
@type:the type of item tochoose
@out: pointer to output vector
@outpos:our position in thatvector,j=0
@attempts: 选择尝试的次数choose_tries
@ recurse_attempts: choose_leaf_tries
@ recurse_to_leaf: true if we want onedevice under each item of given type
@descend_once: true if we should only tryone descent before giving up
@param out2 second output vector for leafitems (device id)(if @a recurse_to_leaf)
crush_choose_firstn:用于replicatedpools
crush_choose_indep:用于erasurecoded pools
bucket choose methods
每种bucket算法都对应一个choose方法。
该方法的输入参数如下:
@Bucket:从该bucket选择item
@x: crush input value,object_id或pg_id
@r:replica position (usually,position in output set)
该方法的返回值:
选择的item ID,bucket为负数,device为正数
该方法只会选择其直接包含的item,而不会选择包含的item中包含的item
参考文档:
CRUSH Operation
http://ceph.com/docs/master/rados/operations/crush-map/
http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/