关于该话题,在OSD的类中有两个成员重要
-
成员1:ShardedThreadPool osd_op_tp
-
初始化
osd_op_tp(cct, "OSD::osd_op_tp", "tp_osd_tp", get_num_op_threads())
该变量在构造函数中进行初始化,其中get_num_op_threads的实现如下,返回该线程池的工作线程的数量。
int OSD::get_num_op_threads() { if (cct->_conf->osd_op_num_threads_per_shard) return get_num_op_shards() * cct->_conf->osd_op_num_threads_per_shard; if (store_is_rotational) return get_num_op_shards() * cct->_conf->osd_op_num_threads_per_shard_hdd; else return get_num_op_shards() * cct->_conf->osd_op_num_threads_per_shard_ssd; }
-
关键函数
void ShardedThreadPool::start() { ldout(cct, 10) << "start" << dendl; shardedpool_lock.lock(); start_threads(); shardedpool_lock.unlock(); ldout(cct, 15) << "started" << dendl; }
start_threads函数会去启动num_threads的工作线程。
void ShardedThreadPool::shardedthreadpool_worker(uint32_t thread_index) { ... wq->_process(thread_index, hb); }
shardedthreadpool_worker是线程的处理函数,最终会调用业务队列的处理函数进行处理,在这里就是
void OSD::ShardedOpWQ::_process(uint32_t thread_index, heartbeat_handle_d *hb)
函数
-
-
成员2:ShardedOpWQ op_shardedwq,该成员是一个内部类
- 初始化:
①op_shardedwq( this, cct->_conf->osd_op_thread_timeout, cct->_conf->osd_op_thread_suicide_timeout, &osd_op_tp) ②ShardedWQ(time_t ti, time_t sti, ShardedThreadPool* tp): BaseShardedWQ(ti, sti), sharded_pool(tp) { tp->set_wq(this); }
①第四个参数就是上面提到的成员1:ShardedThreadPool,决定了该队列处理线程的个数。
②把work_queue设置到了threadpool
- 关键函数
void OSD::ShardedOpWQ::_enqueue(OpQueueItem&& item) { uint32_t shard_index = item.get_ordering_token().hash_to_shard(osd->shards.size()); OSDShard* sdata = osd->shards[shard_index]; assert (NULL != sdata); unsigned priority = item.get_priority(); unsigned cost = item.get_cost(); sdata->shard_lock.lock(); dout(20) << __func__ << " " << item << dendl; if (priority >= osd->op_prio_cutoff) sdata->pqueue->enqueue_strict( item.get_owner(), priority, std::move(item)); else sdata->pqueue->enqueue( item.get_owner(), priority, cost, std::move(item)); sdata->shard_lock.unlock(); std::lock_guard l{sdata->sdata_wait_lock}; sdata->sdata_cond.notify_one(); }
OSDShard可以理解成一个队列的分片,每个分片里面实现了一个队列。
void OSD::ShardedOpWQ::_process(uint32_t thread_index, heartbeat_handle_d *hb) { uint32_t shard_index = thread_index % osd->num_shards; auto& sdata = osd->shards[shard_index]; ... OpQueueItem item = sdata->pqueue->dequeue(); ... auto r = sdata->pg_slots.emplace(token, nullptr); if (r.second) { r.first->second = make_unique<OSDShardPGSlot>(); } OSDShardPGSlot *slot = r.first->second.get(); pg->lock(); auto qi = std::move(slot->to_process.front()); qi.run(osd, sdata, pg, tp_handle); }
在该函数中会加一把pg的大锁,从上面可以看出最终会调用入队的op的run函数,对于普通的pg op,对应着函数如下:
void PGOpItem::run( OSD *osd, OSDShard *sdata, PGRef& pg, ThreadPool::TPHandle &handle) { osd->dequeue_op(pg, op, handle); pg->unlock(); }
调用dequeue_op以后就直接释放了pg lock。
dequeue_op以写为例,本地事务提交到bluestore以后,函数执行结束,然后释放pg lock。