OSDService比对当前所持有的osdmap版本与请求所携带的osdmap版本,如果请求携带的osdmap版本低则与请求端共享新的osdmap。
1.是否与请求端共享osdmap
//与请求端对比是否需要共享osdmap,如需要则,共享服务端的osdmap与请求端。
void OSD::maybe_share_map(
Session *session,
OpRequestRef op,
OSDMapRef osdmap)
{
if (!op->check_send_map) {
¦ return;
}
epoch_t last_sent_epoch = 0;
session->sent_epoch_lock.lock();
last_sent_epoch = session->last_sent_epoch;
session->sent_epoch_lock.unlock();
const Message *m = op->get_req();
service.share_map(
¦ m->get_source(),
¦ m->get_connection().get(),
¦ op->sent_epoch,
¦ osdmap,
¦ session ? &last_sent_epoch : NULL);
session->sent_epoch_lock.lock();
if (session->last_sent_epoch < last_sent_epoch) {
¦ session->last_sent_epoch = last_sent_epoch;
}
session->sent_epoch_lock.unlock();
op->check_send_map = false;
}
2.共享osdmap
//将服务端的osdmap与请求端共享(将所持有的最新osdmap,发送给请求端)。
void OSDService::share_map(
¦ entity_name_t name,
¦ Connection *con,
¦ epoch_t epoch,
¦ OSDMapRef& osdmap,
¦ epoch_t *sent_epoch_p)
{
dout(20) << "share_map "
¦ << name << " " << con->get_peer_addr()
¦ << " " << epoch << dendl;
if (!osd->is_active()) {
¦ /*It is safe not to proceed as OSD is not in healthy state*/
¦ return;
}
//判断是否需要分享osdmap
bool want_shared = should_share_map(name, con, epoch,
¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ osdmap, sent_epoch_p);
if (want_shared){
//如果请求端是client,则更新与其相关的session中的epoch,并发送给请求端。
¦ if (name.is_client()) {
¦ ¦ dout(10) << name << " has old map " << epoch
¦ ¦ ¦ ¦ << " < " << osdmap->get_epoch() << dendl;
¦ ¦ // we know the Session is valid or we wouldn't be sending
¦ ¦ if (sent_epoch_p) {
*sent_epoch_p = osdmap->get_epoch();
¦ ¦ }
¦ ¦ send_incremental_map(epoch, con, osdmap);
¦ } else if (con->get_messenger() == osd->cluster_messenger &&
¦ ¦ ¦ osdmap->is_up(name.num()) &&
¦ ¦ ¦ (osdmap->get_cluster_addr(name.num()) == con->get_peer_addr() ||
¦ ¦ ¦ ¦ ¦ osdmap->get_hb_back_addr(name.num()) == con->get_peer_addr())) {
¦ ¦ dout(10) << name << " " << con->get_peer_addr()
¦ ¦ ¦ ¦ ¦ ¦ ¦ << " has old map " << epoch << " < "
¦ ¦ ¦ ¦ ¦ ¦ ¦ << osdmap->get_epoch() << dendl;
//如果请求端是peer,则记录该peer与epoch的映射,并把所持有的最新的osdmap发送给peer。
¦ ¦ note_peer_epoch(name.num(), osdmap->get_epoch());
¦ ¦ send_incremental_map(epoch, con, osdmap);
¦ }
}
3.判断是否共享osdmap
//判断是否要与请求端共享当前的osdmap,需要就返回true,否则就返回false,
bool OSDService::should_share_map(entity_name_t name, Connection *con,
¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ epoch_t epoch, const OSDMapRef& osdmap,
¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ const epoch_t *sent_epoch_p)
{
dout(20) << "should_share_map "
¦ ¦ ¦ ¦ ¦<< name << " " << con->get_peer_addr()
¦ ¦ ¦ ¦ ¦<< " " << epoch << dendl;
//请求端可能是client和peer,如果是client这op中的osdmap的epoch小于服务端的PG osdmap则返回true。
// does client have old map?
if (name.is_client()) {
¦ bool message_sendmap = epoch < osdmap->get_epoch();
¦ if (message_sendmap && sent_epoch_p) {
¦ ¦ dout(20) << "client session last_sent_epoch: "
¦ ¦ ¦ ¦ ¦ ¦ ¦<< *sent_epoch_p
¦ ¦ ¦ ¦ ¦ ¦ ¦<< " versus osdmap epoch " << osdmap->get_epoch() << dendl;
¦ ¦ if (*sent_epoch_p < osdmap->get_epoch()) {
¦ ¦ ¦ return true;
¦ ¦ } // else we don't need to send it out again
¦ }
}
//请求时peer,如果pg osdmap中的epoch大于session中和op中携带的epoch,则返回true。
if (con->get_messenger() == osd->cluster_messenger &&
¦ ¦ con != osd->cluster_messenger->get_loopback_connection() &&
¦ ¦ osdmap->is_up(name.num()) &&
¦ ¦ (osdmap->get_cluster_addr(name.num()) == con->get_peer_addr() ||
¦ ¦ ¦osdmap->get_hb_back_addr(name.num()) == con->get_peer_addr())) {
¦ // remember
¦ epoch_t has = MAX(get_peer_epoch(name.num()), epoch);
¦ // share?
¦ if (has < osdmap->get_epoch()) {
¦ ¦ dout(10) << name << " " << con->get_peer_addr()
¦ ¦ ¦ ¦ ¦ ¦ ¦<< " has old map " << epoch << " < "
¦ ¦ ¦ ¦ ¦ ¦ ¦<< osdmap->get_epoch() << dendl;
¦ ¦ return true;
¦ }
}
return false;
}
4.记录peer epoch映射
//在osd内部维护peer_map_epoch,记录peer与osdmap epoch映射。
epoch_t OSDService::note_peer_epoch(int peer, epoch_t e)
{
Mutex::Locker l(peer_map_epoch_lock);
map<int,epoch_t>::iterator p = peer_map_epoch.find(peer);
if (p != peer_map_epoch.end()) {
¦ if (p->second < e) {
¦ ¦ dout(10) << "note_peer_epoch osd." << peer << " has " << e << dendl;
¦ ¦ p->second = e;
¦ } else {
¦ ¦ dout(30) << "note_peer_epoch osd." << peer << " has " << p->second << " >= " << e << dendl;
¦ }
¦ return p->second;
} else {
¦ dout(10) << "note_peer_epoch osd." << peer << " now has " << e << dendl;
¦ peer_map_epoch[peer] = e;
¦ return e;
}
}
5.发送osdmap消息
void OSDService::send_incremental_map(epoch_t since, Connection *con,
¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ OSDMapRef& osdmap)
{
epoch_t to = osdmap->get_epoch();
dout(10) << "send_incremental_map " << since << " -> " << to
¦ ¦ ¦ ¦ ¦<< " to " << con << " " << con->get_peer_addr() << dendl;
MOSDMap *m = NULL;
while (!m) {
¦ OSDSuperblock sblock(get_superblock());
¦ if (since < sblock.oldest_map) {
¦ ¦ // just send latest full map
¦ ¦ MOSDMap *m = new MOSDMap(monc->get_fsid());
¦ ¦ m->oldest_map = max_oldest_map;
¦ ¦ m->newest_map = sblock.newest_map;
¦ ¦ get_map_bl(to, m->maps[to]);
¦ ¦ send_map(m, con);
¦ ¦ return;
¦ }
¦ if (to > since && (int64_t)(to - since) > cct->_conf->osd_map_share_max_epochs) {
¦ ¦ dout(10) << " " << (to - since) << " > max " << cct->_conf->osd_map_share_max_epochs
¦ ¦ ¦ << ", only sending most recent" << dendl;
¦ ¦ since = to - cct->_conf->osd_map_share_max_epochs;
¦ }
¦ if (to - since > (epoch_t)cct->_conf->osd_map_message_max)
¦ ¦ to = since + cct->_conf->osd_map_message_max;
¦ m = build_incremental_map_msg(since, to, sblock);
}
send_map(m, con);
}
//从OSDService::map_bl_cache中找出指定epoch的osdmap(bl),如果没有命中就重新加载(磁盘中读取)
bool OSDService::_get_map_bl(epoch_t e, bufferlist& bl)
{
bool found = map_bl_cache.lookup(e, &bl);
if (found) {
¦ if (logger)
¦ ¦ logger->inc(l_osd_map_bl_cache_hit);
¦ return true;
}
if (logger)
¦ logger->inc(l_osd_map_bl_cache_miss);
found = store->read(coll_t::meta(),
¦ ¦ ¦ OSD::get_osdmap_pobject_name(e), 0, 0, bl,
¦ ¦ ¦ CEPH_OSD_OP_FLAG_FADVISE_WILLNEED) >= 0;
if (found) {
¦ _add_map_bl(e, bl);
}
return found;
}
//构造osdmap消息,并发送到请求端。
void OSDService::send_incremental_map(epoch_t since, Connection *con,
¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ OSDMapRef& osdmap)
{
epoch_t to = osdmap->get_epoch();
dout(10) << "send_incremental_map " << since << " -> " << to
¦ ¦ ¦ ¦ ¦<< " to " << con << " " << con->get_peer_addr() << dendl;
MOSDMap *m = NULL;
while (!m) {
¦ OSDSuperblock sblock(get_superblock());
¦ if (since < sblock.oldest_map) {
¦ ¦ // just send latest full map
¦ ¦ MOSDMap *m = new MOSDMap(monc->get_fsid());
¦ ¦ m->oldest_map = max_oldest_map;
¦ ¦ m->newest_map = sblock.newest_map;
¦ ¦ get_map_bl(to, m->maps[to]);
¦ ¦ send_map(m, con);
¦ ¦ return;
¦ }
¦ if (to > since && (int64_t)(to - since) > cct->_conf->osd_map_share_max_epochs) {
¦ ¦ dout(10) << " " << (to - since) << " > max " << cct->_conf->osd_map_share_max_epochs
¦ ¦ ¦ << ", only sending most recent" << dendl;
¦ ¦ since = to - cct->_conf->osd_map_share_max_epochs;
¦ }
¦ if (to - since > (epoch_t)cct->_conf->osd_map_message_max)
¦ ¦ to = since + cct->_conf->osd_map_message_max;
¦ m = build_incremental_map_msg(since, to, sblock);
}
send_map(m, con);
}