ceph小细节之chooseleaf_stable--减少pg迁移

前言

看代码的时候无意间看到了这个参数,查了官方的文档,说明是这样的:
chooseleaf_stable: Whether a recursive chooseleaf attempt will use a better value for an inner loop that greatly reduces the number of mapping changes when an OSD is marked out. The legacy value is 0, while the new value of 1 uses the new approach.
翻译过来就是当osd被标记为out了以后,是否用尝试一种可以减少map变化的方法.
意思就是当osd被标记为out了以后,可以尽量少的修改pg的映射,从而达到最少的迁移.

实验

修改chooseleaf_stable的值,out一个osd,对比pgmap迁移变化.
环境
3副本,3节点,每个节点3个osd
实验脚本
获取单个pool中的pg对应的osd up set,然后out osd之后再获取 osd up set,比较同一个pg内,osd变化的数目,osd变化的数目越多,则迁移就越多.由于环境内没有数据,所以peering过程等待了10s,足够用。

import os
import sys
import commands
import time




def getPgUpSet():
    #pool id = 8
    cmd = "ceph pg dump |grep '^8\.' | awk '{print $1}'"
    (status,output) = commands.getstatusoutput(cmd)
    output = output.split('\n')
    #skip "dumped all in format plain"
    pglist = output[1:]
    cmd = "ceph pg dump |grep '^8\.' | awk '{print $15}'"
    (status,output) = commands.getstatusoutput(cmd)
    output = output.split('\n')
    osdsets = output[1:]
    pgosdset = {}
    for i in range(0,len(pglist)):
        pgosdset[pglist[i]] = osdsets[i]
    return pgosdset


#compare before after pgosdupset
def compBASet(bpgosdset,apgosdset):
    change0 = 0
    change1 = 0
    change2 = 0
    change3 = 0
    for key in bpgosdset.keys():
        change = 0
        for osd in bpgosdset[key]:
            if osd not in apgosdset[key]:
                change += 1
        if change == 0:
            change0 += 1
        elif change == 1:
            change1 += 1
        elif change == 2:
            change2 += 1
            print "change 2 osd in pg ",str(key)
            print "before out up set ",bpgosdset[key]
            print "after out up set ",apgosdset[key]
        elif change == 3:
            change3 += 1
            print "change 3 osd in pg ",str(key)
            print "before out up set ",bpgosdset[key]
            print "after out up set ",apgosdset[key]
    print "change0 = ",change0
    print "change1 = ",change1
    print "change2 = ",change2
    print "change3 = ",change3

def test():
    bpgosdset = getPgUpSet()
    os.system("systemctl stop ceph-osd@0")
    os.system("ceph osd out 0")
    #wait peering
    time.sleep(10)
    apgosdset = getPgUpSet()
    compBASet(bpgosdset,apgosdset)

if __name__ == "__main__":
    test()
    


实验步骤

  1. 把chooseleaf_stable 设置为0,然后执行脚本.
[root@host196 yg]# ceph osd crush dump | grep stable
        "chooseleaf_stable": 0,
[root@host196 home]# python pgmapcompare.py 
marked out osd.0. 
change 2 osd in pg  8.14
before out up set  [0,8,3]
after out up set  [8,4,2]
change 2 osd in pg  8.10
before out up set  [0,4,7]
after out up set  [4,6,1]
change 2 osd in pg  8.1f
before out up set  [0,7,4]
after out up set  [6,1,4]
change 2 osd in pg  8.64
before out up set  [4,0,8]
after out up set  [4,7,2]
change 3 osd in pg  8.66
before out up set  [0,4,8]
after out up set  [5,6,2]
change 2 osd in pg  8.4b
before out up set  [0,5,8]
after out up set  [4,1,8]
change 2 osd in pg  8.71
before out up set  [0,4,6]
after out up set  [3,2,6]
change 2 osd in pg  8.7e
before out up set  [3,0,6]
after out up set  [3,7,1]
change 2 osd in pg  8.59
before out up set  [0,3,7]
after out up set  [4,1,7]
change 2 osd in pg  8.78
before out up set  [0,5,8]
after out up set  [3,2,8]
change 2 osd in pg  8.9
before out up set  [0,8,4]
after out up set  [8,3,2]
change 2 osd in pg  8.23
before out up set  [0,4,7]
after out up set  [5,1,7]
change 2 osd in pg  8.31
before out up set  [5,0,7]
after out up set  [5,6,2]
change 3 osd in pg  8.39
before out up set  [0,6,5]
after out up set  [7,3,1]
change 2 osd in pg  8.3d
before out up set  [0,8,5]
after out up set  [6,5,1]
change 2 osd in pg  8.5a
before out up set  [0,4,6]
after out up set  [5,6,1]
change0 =  86
change1 =  26
change2 =  14
change3 =  2

可以看到当osd0out了以后,pg进行了迁移,pg没有迁移的pg个数为86个,迁移了1个osd的pg为26个,迁移了2个osd的pg有14个,osd全部变化的pg个数为2个.
pool内有128个pg.

  1. 重新把osd加入集群,然后修改chooseleaf_stable 为1,重新执行脚本.
[root@host196 yg]# ceph osd crush dump | grep stable
        "chooseleaf_stable": 1,
[root@host196 home]# python pgmapcompare.py 
marked out osd.0. 
change0 =  88
change1 =  40
change2 =  0
change3 =  0

结论

当chooseleaf_stable为0时,迁移相当于26×1+14×2+2×3=60个pg分片进行了迁移,修改为1以后,相当于40×1=40个pg分片进行了迁移.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值