1.1 Scrub的调度
解决一个PG何时启动Scrub扫描机制。主要有以下方式:
1. 手动立即启动执行扫描
2. 在后台设置一定的时间间隔,按照时间间隔的时间启动。默认时间为一天执行一次。
3. 设置启动的时间段。一般设定一个系统负载较轻的时间段。
数据结构
在类OSDService里有与Scrub有关的数据结构(文件src\osd\OSD.h)
private:
// -- scrub scheduling --
Mutexsched_scrub_lock; //Scrub相关变量的保护锁
int scrubs_pending; //资源预约已经成功,正等待Scrub的PG
int scrubs_active; //正在进行Scrub的PG
public:
struct ScrubJob{
CephContext*cct;
/// pg to be scrubbed
spg_tpgid; //Scrub对应的PG
/// a time scheduled for scrub. but the scrub could bedelayed if system
/// load istoo high or it fails to fall in the scrub hours
//Scrub任务调度的时间,如果负载比较高,或者当前时间不在设定的Scrub范围内,就会延迟调度
utime_tsched_time; /// the hardupper bound of scrub time
utime_tdeadline;//调度时间的上限,必须得调度
ScrubJob() : cct(nullptr){}
explicit ScrubJob(CephContext* cct, const spg_t& pg,
const utime_t& timestamp,
double pool_scrub_min_interval = 0,
double pool_scrub_max_interval = 0, bool must =true);
/// order the jobs by sched_time
booloperator<(const ScrubJob& rhs)const;
};
set<ScrubJob>sched_scrub_pg; //PG对应的所有的ScrubJob列表
1.2 Scrub的调度实现
1.2.1 注册定时任务
在OSD的初始化函数OSD::init中,首先会初始化一个定时器:
int OSD::init()
{
CompatSet initial, diff;
Mutex::Locker lock(osd_lock);
if(is_stopping())
return 0;
tick_timer.init();
tick_timer_without_osd_lock.init();
…
然后会注册定时任务:
// tick
tick_timer.add_event_after(cct->_conf->osd_heartbeat_interval, newC_Tick(this));
{
Mutex::Locker l(tick_timer_lock);
tick_timer_without_osd_lock.add_event_after(cct->_conf->osd_heartbeat_interval,new C_Tick_WithoutOSDLock(this));
}
流程如下:
1.2.2 任务调度
定时任务每个osd_heartbeat_interval的时间段,就会触发定时器的回调函数OSD:: tick_without_osd_lock()
class OSD::C_Tick_WithoutOSDLock: public Context {
OSD *osd;
public:
explicitC_Tick_WithoutOSDLock(OSD *o) : osd(o) {}
voidfinish(intr) override{
osd->tick_without_osd_lock();
}
};
tick_without_osd_lock处理过程如下,会调用sched_scrub():
void OSD::tick_without_osd_lock()
{
assert(tick_timer_lock.is_locked());
dout(10) << "tick_without_osd_lock"<< dendl;
…
//osd_lock is not being held, which means the OSD state
//might change when doing the monitor report
if(is_active() || is_waiting_for_healthy()) {
…
}
if(is_active()) {
if(!scrub_random_backoff()) {
sched_scrub();
}
service.promote_throttle_recalibrate();
}
check_ops_in_flight();
service.kick_recovery_queue();
tick_timer_without_osd_lock.add_event_after(OSD_TICK_INTERVAL,new C_Tick_WithoutOSDLock(this));
}
整个处理过程如下:
1.2.3 函数OSD::sched_scrub
函数OSD::sched_scrub是控制一个PG的Scrub过程启动时机,具体过程如下:
void OSD::sched_scrub()
{
// if notpermitted, fail fast
/* 检查当前预约或正在进行Scrub的PG数量是否超过配额,
if(!service.can_inc_scrubs_pending()) {
return;
}
//检查是否在允许的时间段内
utime_t now =ceph_clock_now();
booltime_permit = scrub_time_permit(now);
//检查系统负载是否允许
bool load_is_low =scrub_load_below_threshold();
dout(20) <<"sched_scrub load_is_low=" << (int)load_is_low << dendl;
OSDService::ScrubJob scrub;
if(service.first_scrub_stamp(&scrub)) {
//获取任务列表,一次判断是否可以执行Scrub
do {
dout(30) <<"sched_scrub examine " << scrub.pgid << " at "<< scrub.sched_time << dendl;
if (scrub.sched_time> now) {
// save ourselves someeffort
dout(10) <<"sched_scrub " << scrub.pgid << " scheduled at" << scrub.sched_time
<< "> " << now << dendl;
break;
}
if(!cct->_conf->osd_scrub_during_recovery &&service.is_recovery_active()) {
dout(10) <<__func__ << "not scheduling scrub of " << scrub.pgid<< " due to active recovery ops" << dendl;
break;
}
if ((scrub.deadline>= now) && !(time_permit && load_is_low)) {
dout(10) <<__func__ << " not scheduling scrub for " << scrub.pgid<< " due to "
<<(!time_permit ? "time not permit" : "high load") <<dendl;
continue;
}
PG *pg =_lookup_lock_pg(scrub.pgid);
if (!pg)
continue;
if(pg->get_pgbackend()->scrub_supported() && pg->is_active()) {
dout(10) <<"sched_scrub scrubbing " << scrub.pgid << " at" << scrub.sched_time
<<(pg->scrubber.must_scrub ? ", explicitly requested" :
(load_is_low? ", load_is_low" : " deadline < now"))
<< dendl;
if(pg->sched_scrub()) {
pg->unlock();
break;
}
}
pg->unlock();
} while(service.next_scrub_stamp(scrub, &scrub));
}
dout(20) <<"sched_scrub done" << dendl;
}
步骤:
1. 检查当前预约或正在进行Scrub的PG数量是否超过配额
变量scrubs_pending记录着已经完成资源预约正在等待Scrub的PG的数量,而Scrub_active记录着正在进行scrub PG的数量,两者之和不能超过osd_max_scrubs——同时允许Scrub的最大PG数。
bool OSDService::can_inc_scrubs_pending()
{
boolcan_inc = false;
Mutex::Locker l(sched_scrub_lock);
if(scrubs_pending + scrubs_active < cct->_conf->osd_max_scrubs){
dout(20) << __func__ << " " << scrubs_pending <<" -> " << (scrubs_pending+1)
<< " (max " << cct->_conf->osd_max_scrubs<< ", active " <<scrubs_active << ")" <<dendl;
can_inc = true;
} else {
dout(20) << __func__ <<scrubs_pending << " + "<< scrubs_active << "active >= max " << cct->_conf->osd_max_scrubs<< dendl;
}
returncan_inc;
}
2. 检查是否在允许的时间段内
如果cct->_conf->osd_scrub_begin_hour 大于cct->_conf->osd_scrub_end_hour,当前时间必须在两者之间才允许;
如果cct->_conf->osd_scrub_begin_hour 小于等于cct->_conf->osd_scrub_end_hour,当前时间必须在两者之外才允许;
3. 检查系统负责是否允许执行Scrub
获取最近1分钟、5分钟、10分钟的系统负载;然后进行判断:
若最近1分钟的负载小于cct->_conf->osd_scrub_load_threshold,则允许进行Scrub;
若最近1分钟的负载小于daily_loadavg,并且最近1分钟的负载小于最近15分钟的负载,就允许执行
4. 检查获取等待的Scrub列表,如果还未到达该PG的调度时间,就跳过该PG等待下一个PG
5. 获取该PG的pg对象,如果该PG的pgbackend支持Scrub,并且处于active状态:
如果scrub.deadline小于now值。也就是已经超过最后的期限,必须启动Scrub
如果时间允许且系统负载允许,也可以启动Scrub
在符合上述两种情况,调用函数pg->sched_scrub来执行Scrub
1.2.4 函数PG::sched_scrub
PG::sched_scrub实现对Scrub任务参数的设置,并完成所需参数的预约。流程如下:
1. 检查PG的状态,必须是主OSD,并且处于active和clean状态,并且没有进行Scrub操作。否则将不执行scrub操作。这一步检查可以防止多个节点对同一个PG进行Scrub操作
2. 设置deep_scrub_interval的值,默认设置为cct->_conf->osd_deep_scrub_interval
3. 检查当前时间是否需要启动deep_scrub,
4. 如果用户手动强制启动deep_scrub操作(即:设置了scrubber.must_scrub为True,否则是系统自动以一定概率来执行deep_srcub操作)
5. 根据3和4的判断,来启动deep_scrub
6. 如果osdmap或者pool不支持deep_scrub,则不启动deep_scrub
7. 如果osdmap或者pool不支持scrub,则不启动scrub
8. 检查是否需要设置自动记性修复
9. 检查是否已经完成资源的预约,如果没有则进行资源的预约。然后把该PG添加到工作队列op_wq中,触发Scrub任务开始执行。(调用函数queue_scrub)