Ceph学习-CRUSH算法介绍(二)

 

源码文件

文件名

说明

build.c/.h

对crush_map构建等操作。

Crush.c/h

定义crush_map相关的数据结构,及销毁数据结构等简单基本操作

curshCompiler.cc/.h

Compile/decompile crush_map相关数据结构。文本<->二进制

Crushwrapper.cc/.h

将cursh的操作封装到C++的类里

Hash.c/.h

分别提供1/2/3/4/5个32位无符号整数进行Hash,返回一个32无符号整数的hash值,目前只实现了rjenkins算法

Mapper.c/.h

crush_do_rule()crush_find_rule()

crushTester.c/.h

测试crush相关操作

 

CRUSH maps

如下表所示,CRUSH maps由三部分组成:

一个OSD列表;

一个Bucket列表:标识存储设备的组织形式;

一个Rule列表:标识如何复制数据;

 

Device

device 0 osd.0

device 1 osd.1

device 2 osd.2

device 3 osd.3

device 4 osd.4

device 5 osd.5

device 6 osd.6

device 7 osd.7

Bucket

host ceph-osd-ssd-server-1 {

              id -1

              alg straw

              hash 0

              item osd.0 weight 1.00

              item osd.1 weight 1.00

      }

 

      host ceph-osd-ssd-server-2 {

              id -2

              alg straw

              hash 0

              item osd.2 weight 1.00

              item osd.3 weight 1.00

      }

 

      host ceph-osd-platter-server-1 {

              id -3

              alg straw

              hash 0

              item osd.4 weight 1.00

              item osd.5 weight 1.00

      }

 

      host ceph-osd-platter-server-2 {

              id -4

              alg straw

              hash 0

              item osd.6 weight 1.00

              item osd.7 weight 1.00

      }

      root platter {

              id -5

              alg straw

              hash 0

              item ceph-osd-platter-server-1 weight 2.00

              item ceph-osd-platter-server-2 weight 2.00

      }

 

      root ssd {

              id -6

              alg straw

              hash 0

              item ceph-osd-ssd-server-1 weight 2.00

              item ceph-osd-ssd-server-2 weight 2.00

      }

Rule

rule data {

              ruleset 0

              type replicated

              min_size 2

              max_size 2

              step take platter

              step chooseleaf firstn 0 type host

              step emit

      }

 

      rule metadata {

              ruleset 1

              type replicated

              min_size 0

              max_size 10

              step take platter

              step chooseleaf firstn 0 type host

              step emit

      }

 

      rule rbd {

              ruleset 2

              type replicated

              min_size 0

              max_size 10

              step take platter

              step chooseleaf firstn 0 type host

              step emit

      }

 

      rule platter {

              ruleset 3

              type replicated

              min_size 0

              max_size 10

              step take platter

              step chooseleaf firstn 0 type host

              step emit

      }

 

      rule ssd {

              ruleset 4

              type replicated

              min_size 0

              max_size 4

              step take ssd

              step chooseleaf firstn 0 type host

              step emit

      }

 

      rule ssd-primary {

              ruleset 5

              type replicated

              min_size 5

              max_size 10

              step take ssd

              step chooseleaf firstn 1 type host

              step emit

              step take platter

              step chooseleaf firstn -1 type host

              step emit

      }

 

Do Rule

Do_rule

CrushWrapper.h

void do_rule(intrule, int x, vector<int>& out, int maxout,

                const vector<__u32>& weight)const {

@rule:使用的crush_rule在crush_map的rules列表中所在index

@x:输入Hash ID,object_id或者pg_id)

@out:输出Device ID列表

@maxout:在输出Device ID的个数,副本的个数

@weight:输出Device列表对应的权重

 

具体的工作是调用crush_do_rule完成

crush_do_rule

Mapper.c

int crush_do_rule(conststruct crush_map *map,

                     int ruleno, int x, int *result, intresult_max,

                     const __u32 *weight, int weight_max,

                     int *scratch);

@map:保存在CrushWrapper类中的crush_map,其中包含crush_bucket和crush_rule

@scratch: 3倍于result列表的列表,用于执行rulestep的临时数组

 

该函数按照map->rules[ruleno]中的steps列表定义的步骤顺序执行。

CRUSH_RULE_TAKE

类别

Rule step名称

说明

开始/结束

CRUSH_RULE_TAKE

开始Step,设置参数

CRUSH_RULE_EMIT

结束Step

选择bucket

CRUSH_RULE_CHOOSE_FIRSTN

 

CRUSH_RULE_CHOOSE_INDEP

 

选择device

CRUSH_RULE_CHOOSELEAF_FIRSTN

 

CRUSH_RULE_CHOOSELEAF_INDEP

 

设置参数

CRUSH_RULE_SET_CHOOSE_TRIES

 

CRUSH_RULE_SET_CHOOSELEAF_TRIES

 

 

 

crush_choose_firstn & crush_choose_indep

static int crush_choose_firstn(const structcrush_map *map,

                                   struct crush_bucket *bucket,

                                   const __u32 *weight, intweight_max,

                                   int x, int numrep, int type,

                                   int *out, int outpos,

                                   unsigned attempts, unsignedrecurse_attempts,

                                   int recurse_to_leaf,

                                   int descend_once, int *out2)

@map: 存储存储架构的crush_map

@bucket: 从该crush_bucket中的items选择item,the bucketwe are choose an item from

@weight/weight_max:device对应的权重的列表

@x:输入Hash ID,object_id或者pg_id),crush inputvalue

@numrep: the number of items to choose

@type:the type of item tochoose

@out: pointer to output vector

@outpos:our position in thatvector,j=0

@attempts: 选择尝试的次数choose_tries

@ recurse_attempts: choose_leaf_tries

@ recurse_to_leaf: true if we want onedevice under each item of given type

@descend_once: true if we should only tryone descent before giving up

@param out2 second output vector for leafitems (device id)(if @a recurse_to_leaf)

 

crush_choose_firstn:用于replicatedpools

crush_choose_indep:用于erasurecoded pools

 

bucket choose methods

每种bucket算法都对应一个choose方法。

该方法的输入参数如下:

@Bucket:从该bucket选择item

@x:  crush input value,object_id或pg_id

@r:replica position (usually,position in output set)

该方法的返回值

选择的item ID,bucket为负数,device为正数

该方法只会选择其直接包含的item,而不会选择包含的item中包含的item

 

 

参考文档:

CRUSH Operation

http://ceph.com/docs/master/rados/operations/crush-map/

http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/

http://way4ever.com/?p=122

 


  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值