ceph-rados_write流程及源码分析

ceph-rados_write流程及源码分析

参考文档, 深入理解ceph crush(3)—Object至PG映射源码分析: https://www.dovefi.com/post/%E6%B7%B1%E5%85%A5%E7%90%86%E8%A7%A3crush3object%E8%87%B3pg%E6%98%A0%E5%B0%84%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90/
iopath, cd /home/xb/project/stor/ceph/xb/docker/ceph/test/librados_example; ./gdb_rados_write.sh
rados_write.c:83 -> main
  rados_create2(&cluster, cluster_name, user_name, flags) -> extern "C" int _rados_create2 -> rados_create 的扩展版本,与 rados_create 类似,但不假设 client + id,允许完整指定名称允许指定集群名称标志以供将来扩展
    CephInitParameters iparams(CEPH_ENTITY_TYPE_CLIENT) -> CephInitParameters::CephInitParameters -> name.set(module_type, "admin") -> librados、rados.py:添加rados_create2/init2、librados客户端,特别是ceph工具,需要能够指定完整的“名称”; rados_create 强制执行“client.<param>”,没有解决方法。 新界面。 Python Rados().__init__ 根据是否提供名称或 id 选择适当的创建函数
      name.set(module_type, "admin") -> module_type=CEPH_ENTITY_TYPE_CLIENT(0x8)
    rados_create_cct -> static CephContext *rados_create_cct -> 创建上下文
      common_preinit CODE_ENVIRONMENT_LIBRARY -> CephContext *common_preinit -> 预初始化公共部分, 注意:如果您正在编写 Ceph 守护进程,请忽略此函数并调用 global_init 来代替。 它会为您调用 common_preinit。 common_preinit 创建 CephContext。 该函数为您提供 CephContext 后,您需要设置 Ceph 配置,该配置位于 CephContext 内,名为 md_config_t。 初始设置不是很有用,因为它们没有反映用户的要求。 这通常是通过以下方式完成的:cct->_conf.parse_env(); cct->_conf.apply_changes(); 您的库还可能提供读取配置文件的函数
        注释良性种族大小, helgrind:注释假阳性竞争条件
        new CephContext -> class CephContext -> 实例化ceph上下文
          _log = new ceph::logging::Log(&_conf->subsys) -> 日志实例化
          _log_obs = new LogObs(_log) -> 观察日志记录配置更改,日志记录子系统位于大多数 ceph 代码(包括配置子系统)下方,以保持简单且独立。 将与日志记录相关的配置更改馈送到日志中
          _conf.add_observer(_log_obs) -> common,rbd,rgw,osd:将配置值提取到 ConfigValues 中,此更改引入了三个类:ConfigValues、ConfigProxy 和 ConfigReader。 在OSD的seastar端口中,每个CPU分片将保存自己的配置参考,并且在设置更改时,每个分片将使用异步的新设置进行更新。 所以这迫使我们能够同时保留两组配置。 所以我们需要将md_config_t中可更改的部分提取出来。 因此我们可以根据需要用新的替换旧的,并让不同的分片共享相同的未更改部分,以及选项映射和查找表等其他内容。 这就是我们需要 ConfigValues 的原因。 我们将为此类添加一个策略模板,这样我们就可以专门用于 Seastar 实现,以允许不同的 ConfigProxy 实例将 md_config_impl<> 指向不同的 ConfigValues。 由于观察者接口仍然使用md_config_t,为了尽量减少此更改的影响,handle_conf_change()handle_subsys_change()没有改变。 但由于它接受一个 `const md_config_t`,它不能用于创建/引用持有它的 ConfigProxy,我们需要引入 ConfigReader 来以更简单的方式从 md_config_t 读取更新的设置,而不暴露内部“values”成员变量
            obs_mgr.add_observer(obs) -> 我们可以将实现放在 .cc 文件中,并且仅显式实例化所使用的模板专业化,但这迫使我们在编译时涉及未使用的标头和库。 例如,为了实例化,要实例化seastar的ObserverMgr,我们需要包含seastar标头以获取必要的类型,但这将迫使我们将非seastar二进制文件链接到seastar库。 因此,为了避免以增加编译时间为代价引入未使用的依赖项,我们将实现放在头文件中
              const char **keys = observer->get_tracked_conf_keys() -> 获取日志所有跟踪的键 -> "log_coarse_timestamps" ...
              observers.emplace(*k, observer) -> 将日志键加入到观察服务器中
            obs_call_gate.emplace(obs, std::make_unique<CallGate>()) -> 开关门机制? -> config: drop config_proxy::lock 当调用配置观察者时为了防止当观察者获取自己的锁(锁定顺序:config_proxy::lock -> foo::lock)并且另一个线程(比如IO路径)尝试获取配置值时发生死锁( 锁定顺序:foo:lock -> config_proxy::lock)。 调用配置观察者时释放锁的副作用是,当观察者仍在执行时,remove_observer() 可能会潜入,导致释放后使用。 为了缓解这种情况,任何正在进行的观察者调用都需要在删除观察者之前完成。 此外,需要在不持有任何观察者锁的情况下调用remove_observer(),以免陷入死锁
          _cct_obs = new CephContextObs(this) -> enable_experimental_unrecoverable_data_corrupting_features | crush_location | container_image
          _lockdep_obs = new LockdepObs(this) -> lockdep
          _perf_counters_collection = new PerfCountersCollection(this) -> 实例化性能计数器收集器
          _admin_socket = new AdminSocket(this) -> 实例化asok
          _heartbeat_map = new HeartbeatMap(this) -> 心跳
          _plugin_registry = new PluginRegistry(this) -> 注册插件, EC?
          _admin_hook = new CephContextHook(this) -> asok 实例化管理套接字钩子
          _admin_socket->register_command("assert", _admin_hook, "") -> int AdminSocket::register_command
            cmddesc_get_prefix(cmddesc)
            hooks.find(prefix)
            hooks.emplace_hint -> emplace()emplace_hint() 是 C++ 11 标准加入到 set 类模板中的,相比具有同样功能的 insert() 方法,完成同样的任务,emplace()emplace_hint() 的效率会更高
          ...
          _crypto_none = CryptoHandler::create(CEPH_CRYPTO_NONE)
          _crypto_aes = CryptoHandler::create(CEPH_CRYPTO_AES)
          _crypto_random.reset(new CryptoRandom())
          lookup_or_create_singleton_object<MempoolObs>("mempool_obs", false, this) -> 实例化内存池
        conf.set_val_default("log_to_stderr", "false")
      parse_env
      apply_changes -> void apply_changes(std::ostream* oss) -> 展开所有元变量。 进行任何挂起的观察者回调
        _gather_changes -> “call_gate_leave”方法正在访问“obs_call_gate”映射,但未持有所需的锁。 该数据结构可以由观察者回调线程上下文下的另一个线程操作
          map_observer_changes
        call_observers
          obs->handle_conf_change
          call_gate_leave
      TracepointProvider::initialize<tracepoint_traits>(cct) -> 初始化lttng跟踪点
    new librados::RadosClient -> librados::RadosClient::RadosClient
      add_observer -> void add_observer(md_config_obs_t* obs)
        obs_mgr.add_observer(obs) -> 我们可以将实现放在 .cc 文件中,并且仅显式实例化所使用的模板专业化,但这迫使我们在编译时涉及未使用的标头和库。 例如,为了实例化,要实例化seastar的ObserverMgr,我们需要包含seastar标头以获取必要的类型,但这将迫使我们将非seastar二进制文件链接到seastar库。 因此,为了避免以增加编译时间为代价引入未使用的依赖项,我们将实现放在头文件中
          observers.emplace -> rados_mon_op_timeout
        obs_call_gate.emplace -> 安置
    cct->put() -> void CephContext::put() -> 减引用
      if (--nref == 0)
  rados_conf_read_file(cluster, "/home/xb/project/ceph/xb/ceph/build/ceph.conf") -> extern "C" int _rados_conf_read_file
    librados::RadosClient *client = (librados::RadosClient *)cluster -> 集群转rados客户端
    conf.parse_config_files -> int md_config_t::parse_config_files
      const char *c = getenv("CEPH_CONF") -> 没有指定配置文件就从环境变量中获取
      void md_config_t::early_expand_meta
        _expand_meta -> Option::value_t md_config_t::_expand_meta
          ...
        conf_stringify -> to_str -> stringify(v)
      cf.parse_file
      _get_my_sections
      ...
    parse_env
    apply_changes
    complain_about_parse_error
  rados_conf_parse_argv -> extern "C" int _rados_conf_parse_argv
    argv_to_vec
    parse_argv -> int md_config_t::parse_argv -> Ceph 的命令行参数处理, 配置参数, 从命令行参数中读取配置值, https://runsisi.com/2019/02/23/ceph-opt/, 可配置参数: https://zhuanlan.zhihu.com/p/110079635
    conf.apply_changes
  rados_connect -> extern "C" int _rados_connect(rados_t cluster) -> 连接集群, 参考: https://www.jianshu.com/p/58956728dadc
    client->connect() -> int librados::RadosClient::connect()
      state = CONNECTING -> 连接状态机 
      cct->_log->start() -> 启动日志线程
        create("log") -> void Thread::create
      MonClient mc_bootstrap(cct) -> MonClient::MonClient(CephContext *cct_) -> 实例化monitor客户端(启动monc)
        want_monmap(true)
      MonClient::get_monmap_and_config -> mon/MonClient:一次性 mon 连接开始获取配置,这不是特别有效,但它可以工作: - 连接到监视器以获取 monmap 和配置 - 将其全部拆除 - 继续正常启动(这可能涉及 再次重新连接到 mon)。 这允许我们设置可能影响 mon 通信本身的配置选项,例如 ms_type -> 配置网络通信,并开启3个 msgr_worker 线程, 参考: https://blog.csdn.net/DeamonXiao/article/details/120879244
        init_crypto ... -> static void init() 
        build_initial_monmap
          int MonMap::build_initial
            init_with_ips -> ceph.conf -> global -> mon host -> [v2:172.17.0.2:40287,v1:172.17.0.2:40288] [v2:172.17.0.2:40289,v1:172.17.0.2:40290] [v2:172.17.0.2:40291,v1:172.17.0.2:40292]
              init_with_addrs ->
                _add_ambiguous_addr -> no, 不明确的 mon addr 可能是遗留的,也可能是 msgr2——我们不确定何时发生这种情况,我们需要同时尝试它们(除非我们可以从端口号合理地推断出它是
                add(name, addr, 0) -> void add(const std::string &name -> void add(const mon_info_t& m) -> 添加monitor到monmap, 优先级, 权重
                  mon_info[m.name] = m -> noname-a:m
                  calc_legacy_ranks()
                  calc_addr_mons()
                    addr_mons
            calc_legacy_ranks -> mon/MonMap:将排名顺序与entity_addr_t分开 我们当前根据mon地址的排序顺序定义mon排名顺序。 更改它,以便将排名顺序显式编码在 MonMap 的排名字段中。 如果我们加载旧版 MonMap,请计算旧版排序。 如果 monmap 尚不需要 nautilus 功能,请强制使用旧排序。 一旦所有 mons >= nautilus,我们就可以重新排序排名。 请注意,守护进程和客户端 (MonClients) 可能会看到不同的排名顺序。 那应该没问题
            calc_legacy_ranks -> 计算遗留ranks
            monmap.print(*_dout) -> 打印monmap
        messenger = Messenger::create_client_messenger temp_mon_client -> 创建mon临时客户端消息msg对象,启动3个work线程, messenger是MonC的属性
        add_dispatcher_head -> 启动2线程
        messenger->start() -> int AsyncMessenger::start()
        make_scope_guard
        init() -> int MonClient::init() -> MonC初始化
          refresh_config
          new RotatingKeyRing
          set_auth_client
          add_dispatcher_head
          timer.init() -> 启动MonC定时任务
            thread = new SafeTimerThread(this)
            thread->create("safe_timer") -> void SafeTimer::timer_thread() -> ceph中的SafeTimer类详解: https://blog.csdn.net/turou3442/article/details/96441221, https://blog.csdn.net/tiantao2012/article/details/78426276?ydreferer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8%3D, 由于是thread的子类,从entry的实现就知道这个thread的回调函数是timer_thread, void *entry() override
          finisher.start() -> finisher_thread_entry -> finisher_empty_cond.notify_all() -> 唤醒条件变量,执行回调函数 -> ceph中的finisher类, https://blog.csdn.net/tiantao2012/article/details/79419556?ydreferer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8%3D, 要结束操作的类会把自己添加到finisher类的queue中,然后finisher类分别调用要结束操作类的complete函数
          schedule_tick() -> void MonClient::schedule_tick() -> 定时扫描发送, 设置了一个定时器,到时间后会被执行(scan_request),执行完后重新设置新的定时器,所以会一直周期性地扫描 -> void MonClient::tick()
            auto do_tick = make_lambda_context([this](int) { tick(); })
            timer.add_event_after(hunt_interval, do_tick)
        map_cond.wait_for <- 等待monitor发配置给monc, monc处理消息并唤醒(MonClient::handle_config), 然后设置mon启动配置
        cct->_conf.set_mon_vals(cct, config->config, config_cb) -> int set_mon_vals(CephContext *cct
          config.set_mon_vals(cct, values, obs_mgr, kv, config_cb)
          _gather_changes(values.changed, &rev_obs, nullptr)
          call_observers(locker, rev_obs)
      common_init_finish(cct)
        cct->start_service_thread
          _enable_perf_counter
          call_all_observers -> 调用所有订阅者
           _admin_socket->init -> bool AdminSocket::init -> 初始化asok -> 处理 ceph daemon 命令的线程, 和 service 线程一起提供了性能监控服务,service 线程更新各个模块性能参数,admin_socket 线程提供对外查询接口
            create_wakeup_pipe 
            bind_and_listen
      monclient.build_initial_monmap() -> 第二次获取monmap
      Messenger::create_client_messenger(cct, "radosclient")
      objecter = new (std::nothrow) Objecter(cct, messenger, &monclient, &finisher) -> Objecter::Objecter -> 构造函数
        ...
        osdmap{std::make_unique<OSDMap>()}
        homeless_session(new OSDSession(cct, -1))
        mon_timeout = cct->_conf.get_val<std::chrono::seconds>("rados_mon_op_timeout")
        osd_timeout = cct->_conf.get_val<std::chrono::seconds>("rados_osd_op_timeout")
        ...
      objecter->set_balanced_budget()
      monclient.set_messenger(messenger) -> monc绑定msg
      mgrclient.set_messenger(messenger)
      objecter->init() -> void Objecter::init() -> ... -> create_rados_client -> 初始化objecter,打点等
        PerfCountersBuilder pcb(cct, "objecter", l_osdc_first, l_osdc_last) -> 性能统计
        RequestStateHook
        AdminSocket* admin_socket = cct->get_admin_socket()
        admin_socket->register_command("objecter_requests"
        update_crush_location
        add_observer
          obs_mgr.add_observer(obs)
            get_tracked_conf_keys -> crush_location -> crush算法(crush_map): https://docs.ceph.com/en/mimic/rados/operations/crush-map/
        initialized = true
      messenger->add_dispatcher_head(&mgrclient)
      messenger->add_dispatcher_tail(objecter)
      messenger->start()
      monclient.set_want_keys -> monc会订阅Monitor的OSDMap、MonMap相关信息, EPH_ENTITY_TYPE_MON | CEPH_ENTITY_TYPE_OSD | CEPH_ENTITY_TYPE_MGR
      monclient.init() -> monc正式初始化(非临时)
      monclient.authenticate -> monc认证
      monclient.sub_want("mgrmap", 0, 0) -> 订阅mgrmap更新的信息, mgr订阅模式: https://blog.csdn.net/tiantao2012/article/details/80109739 -> == "xxx" -> switch (m->get_type()) -> bool Objecter::ms_dispatch(Message *m) -> case MSG_MGR_MAP -> handle_mgr_map
        return sub.want(what, start, flags) -> bool MonSub::want(
      monclient.renew_subs() -> void renew_subs() -> void MonClient::_renew_subs
        _reopen_session
        _send_mon_message(std::move(m)) -> monc -> monitor
          ...
        sub.renewed()
      mgrclient.init()
      objecter->set_client_incarnation -> 客户端化身, 向Monitor订阅 osdmap 请求, 获取 osdmap 数据
      objecter->start()  -> void Objecter::start
        start_tick -> _send_linger_ping -> 心跳
        Objecter::tick()
          osd_sessions.begin
          timer.reschedule_me
      timer.init()
      finisher.start()
      instance_id = monclient.get_global_id()
  rados_ioctx_create(cluster, poolname, &io) -> extern "C" int _rados_ioctx_create -> 基于池和集群创建IO上下文, ioctx 的功能有:读写数据、读写属性、快照 pool、读取快照等
    create_ioctx -> int librados::RadosClient::create_ioctx(
      lookup_pool -> int64_t librados::RadosClient::lookup_pool -> lookup_pg_pool_name -> osdmap中查找池ID?
        wait_for_osdmap
          with_osdmap
      *io = new librados::IoCtxImpl(this, objecter, poolid, CEPH_NOSNAP) -> librados::IoCtxImpl::IoCtxImpl -> 用池ID,对象, 快照(), 实例化io上下文
    *io = ctx -> io即带有池和集群信息的上下文
  rados_write 同步写 -> CEPH_RADOS_API int rados_write -> LIBRADOS_C_API_BASE_DEFAULT(rados_write) -> extern "C" int _rados_write -> bl.append(buf, len) 追加IO数据(将数据追加到_buffers尾部) -> ctx->write(oid, bl, len, off) -> int librados::IoCtxImpl::write
    ::ObjectOperation op -> 构造OP
    prepare_assert_ops(&op) -> 添加任何适合给定 IoCtx 中统计信息的版本断言操作,无论是目标版本断言还是任何 src 对象断言。 这些会影响单个 ioctx 操作,因此在我们执行操作时清除 ioctx 状态。 如果我们添加了任何事件,则返回指向 ObjectOperation 的指针; 这对于将 extra_ops 参数传递到 Objecter 方法中很方便
    mybl.substr_of(bl, 0, len) -> 先申明mybl, 获取子字符串, 截取子串
    op.write(off, mybl) -> void write(uint64_t off, ceph::buffer::list& bl) -> write(off, bl, 0 /* trunk_size */, 0 /* trunk_seq */)
      add_data(CEPH_OSD_OP_WRITE, off, bl.length(), bl) -> 封装OP, 增加操作码 -> void add_data(int op, uint64_t off, uint64_t len, ceph::buffer::list& bl)
        OSDOp& osd_op = add_op(op) -> 添加操作码
          ops[s].op.op = op
          ...
        osd_op.indata.claim_append(bl) -> 将bl的数据复制_buffers (indata)的尾部/头部,然后接bl的数据清空 -> osd/osd_types:添加每个操作返回字段以记录 [dup] 记录,允许总体正返回值,以及请求中每个操作的每个操作返回值和输出数据
      OSDOp& o = *ops.rbegin() -> 反向迭代, 返回反向迭代器以反向开始, 返回指向向量中最后一个元素(即其反向开头)的反向迭代器。反向迭代器向后迭代:增加它们将它们移向容器的开头。rbegin指向成员end指向的元素之前的元素。 请注意,与成员vector::back返回对同一元素的引用不同,此函数返回一个反向随机访问迭代器
      o.op.extent.truncate_size = truncate_size -> 写操作支持截断参数
    return operate(oid, &op, NULL) -> int librados::IoCtxImpl::operate -> 池IO上下文操作
      Context *oncommit = new C_SafeCond(mylock, cond, &done, &r) -> 通知机制, finish -> cond.notify_all, 回调?
      Objecter::Op *objecter_op = objecter->prepare_mutate_op(oid, oloc, *o, snapc, ut, flags, oncommit, &ver) -> 准备对象OP(把ObjectOperation封装为Op类型) -> Op *prepare_mutate_op
        Op *o = new Op(oid, oloc, op.ops, flags | global_op_flags | CEPH_OSD_FLAG_WRITE, oncommit, objver, nullptr, parent_trace) -> Op(const object_t& o, const object_locator_t& ol -> 构造对象OP, 设置回调, oncommit -> onfinish
          ops.swap(op) -> 向量交换, 交换容器的内容
          ...
          trace.init("op", nullptr, parent_trace) -> 初始化跟踪, 为librados和objecter添加blkin跟踪
      objecter->op_submit(objecter_op) -> 提交OP -> void Objecter::op_submit -> op->trace.event("op submit") 插入性能跟踪点 -> _op_submit_with_budget(op, rl, ptid, ctx_budget) 带预算 -> void Objecter::_op_submit_with_budget
        int op_budget = _take_op_budget(op, sul) -> 流控,减去该Op的预算 -> _op_submit(op, sul, ptid) -> void Objecter::_op_submit
          bool check_for_latest_map = _calc_target(&op->target, nullptr) -> RECALC_OP_TARGET_POOL_DNE -> 计算目标节点(crush) -> false -> target结构参考: (gdb) p op->target -> 关键函数: int Objecter::_calc_target(op_target_t *t, Connection *con, bool any_change) -> crush算法
            通过标记判断是读还是写, 获取epoch(85), 打印基本信息, 通过目标上的基本对象分布在哪个池(t->base_oloc.pool)来获取pg池, osdmap上有pools信息, 默认无分层, 
            get_epoch
            get_pg_pool -> 对象定位器结构说明: 定位器限制对象的放置。 主要是进入哪个池, 3, 
            if ((t->flags & CEPH_OSD_FLAG_IGNORE_OVERLAY) == 0) -> osdc/Objecter:在每次 _calc_target 调用时重新计算 target_*,任何时候我们被要求计算目标时,我们都应该应用池分层参数。 之前仅在未计算目标时才这样做的逻辑没有多大意义,并且破坏了我们为获取目标池的正确 pg_num 所需的 *pi 更新。 对于采用原始 pg 的旧集群来说,这并不重要,但对于 luminous 及其他集群,我们需要精确的 spg_t,这需要正确的 pg_num, 如果设置了该标志(CEPH_OSD_FLAG_IGNORE_OVERLAY),则将操作发送到指定的池并忽略覆盖。 请注意,这会废弃全局 Objecter 标志
            pi = osdmap->get_pg_pool(t->target_oloc.pool)
            int ret = osdmap->object_locator_to_pg(t->target_oid, t->target_oloc, pgid) -> 将对象(oid)映射到放置组pg -> int OSDMap::object_locator_to_pg( -> 执行后,仅得到对象的hash值, 为pool_id+pool_seed的对象pgid
              return map_to_pg(loc.get_pool(), oid.name, loc.key, loc.nspace, &pg)
                const pg_pool_t *pool = get_pg_pool(poolid)
                pool->hash_key(key, nspace) -> uint32_t pg_pool_t::hash_key
                  eturn ceph_str_hash(object_hash, &buf[0], len)
                    return ceph_str_hash_linux(s, len) -> unsigned ceph_str_hash_linux(const char *str, unsigned length) -> hash算法
            ps_t actual_ps = ceph_stable_mod(pgid.ps(), pg_num, pg_num_mask) -> stable_mod -> ceph_stable_mod(int x, int b, int bmask) -> 稳定模函数, 用于控制放置组的数量(pg_num)。 与直接取模类似,但随着 b 随着时间的推移而增加,会产生稳定的映射。 b是bin数,bmask是21的包含幂,b <= bmask且bmask=(2**n)-1,例如,b=12 -> bmask=15,b=123 -> b掩码=127, 用hash算法得到的pool_seed(x)作为入参, 继续计算,得到最终的ps(实际的ps)
            pg_t actual_pgid(actual_ps, pgid.pool())
            lookup_pg_mapping
            osdmap->pg_to_up_acting_osds
            update_pg_mapping(actual_pgid, std::move(pg_mapping))
        _get_session(op->target.osd, &s, sul) -> 获取会话 -> int Objecter::_get_session
          if (osd < 0) -> osd异常
            *session = homeless_session -> 设置会话为无家可归会话(暂不发)
          map<int,OSDSession*>::iterator p = osd_sessions.find(osd)
          if (p != osd_sessions.end())
            *session = s -> 如果在会话(连接池)中找到了会话, 则直接返回该会话
          OSDSession *s = new OSDSession(cct, osd) -> 新建OSD会话
          s->con = messenger->connect_to_osd(osdmap->get_addrs(osd)) -> 连接目标OSD
          *session = s -> 返回该会话
        if (orig_epoch != osdmap->get_epoch()) -> 比较epoch(任期)
        _send_op_account(op) -> 登记本次操作
          if (op->onfinish) -> 如果有回调,则通过飞行计数器跟踪
            num_in_flight++
          op->target.flags & CEPH_OSD_FLAG_WRITE
            logger->inc(l_osdc_op_w) -> 增加写计数(性能计数器) -> 增加指标: 父指标: PerfCountersBuilder pcb(cct, "objecter", l_osdc_first, l_osdc_last) -> pcb.add_u64_counter(l_osdc_op_w, "op_w", "Write operations", "wr", PerfCountersBuilder::PRIO_CRITICAL)
          case CEPH_OSD_OP_WRITE: code = l_osdc_osdop_write -> 设置code, ceph所有osd操作码: __CEPH_FORALL_OSD_OPS
          logger->inc(code) -> 增加统计计数
        if (op->target.paused) -> 目标osd处于暂停状态(不可用)
          _maybe_request_map() -> 更新map
            monc->sub_want("osdmap"
            monc->renew_subs()
        if (!s->is_homeless()) -> 不是无家可归(osd不是-1)
          need_send = true -> 不是无家可归, 需要发送
        _session_op_assign(s, op)
          get_session(to) -> s->get()
        if (need_send)
          _send_op(op) -> 需要发送, 发送OP -> void Objecter::_send_op(Op *op)
            backoff ? -> 退避, 客户端协议, https://docs.ceph.com/en/latest/dev/rados-client-protocol/
            MOSDOp *m = _prepare_osd_op(op) -> 将对象op转化为MOSDop(多osd操作),后面会以MOSDOp为对象进行处理的, MOSDOp封装了一些基本的请求。在ops里分装了多个OSDOp操作。每个OSDOp操作里又有一个soid, MOSDOp 封装的操作都是关于oid相关的操作,也就是说,一个MOSDOp只封装针对同一个oid 的操作。但是对于rados_clone_range这样的操作,有一个dest oid, 还有一个src oid,那么src oid 就保存在OSDOp的soid中
              hobject_t hobj = op->target.get_hobj()
              MOSDOp *m = new MOSDOp
              m->set_snapid(op->snapid) -> 设置快照id, 快照序号, 快照等...
              m->set_retry_attempt(op->attempts++) -> 设置重试
              m->set_priority(cct->_conf->osd_client_op_priority) -> 设置op优先级
              logger->inc(l_osdc_op_send)
              logger->inc(l_osdc_op_send_bytes, sum) -> 增加统计计数
            op->session->con->send_message(m) -> 通过操作的会话的连接发送消息 -> int AsyncConnection::send_message(Message *m)
              is_blackhole -> 黑洞, 减少重复代码
              protocol->send_message(m) -> 通过协议层发送 -> void ProtocolV2::send_message
                const bool can_fast_prepare = messenger->ms_can_fast_dispatch(m)
                out_queue[m->get_priority()].emplace_back -> 放入发送队列out_queue
                ...
      cond.wait(l, [&done] { return done;}) -> 等待OP返回,: handle_osd_op_reply -> cond.notify_all()
      set_sync_op_version(ver)


回调, 回复, 响应:
Objecter::ms_dispatch -> case CEPH_MSG_OSD_OPREPLY -> void Objecter::handle_osd_op_reply
  op->trace.event("osd op reply")
  m->get_result()
  if (op->onfinish)
    onfinish = op->onfinish
  _finish_op(op, 0) -> void Objecter::_finish_op
    put_op_budget_bytes(op->budget)
    _session_op_remove(op->session, op)
    op->put()
  if (onfinish) 
    onfinish->complete(rc) -> finish(r) -> void finish(int r) override
      *done = true
      cond.notify_all() -> 唤醒等待该条件变量的线程 -> 唤醒 cond.wait(l, [&done] { return done;})
  m->put()





handle_osd_op_reply
#0  C_SafeCond::finish (this=0x586520, r=0) at /home/xb/project/ceph/xb/ceph/src/common/Cond.h:66
#1  0x00007ffff7bc5c49 in Context::complete (this=0x586520, r=0) at /home/xb/project/ceph/xb/ceph/src/include/Context.h:77
#2  0x00007ffff7c73cf3 in Objecter::handle_osd_op_reply (this=0x4e5490, m=0x7fffd4013c60) at /home/xb/project/ceph/xb/ceph/src/osdc/Objecter.cc:3558
#3  0x00007ffff7c5e6f7 in Objecter::ms_dispatch (this=0x4e5490, m=0x7fffd4013c60) at /home/xb/project/ceph/xb/ceph/src/osdc/Objecter.cc:996
#4  0x00007ffff7c966aa in Objecter::ms_fast_dispatch (this=0x4e5490, m=0x7fffd4013c60) at /home/xb/project/ceph/xb/ceph/src/osdc/Objecter.h:2195
#5  0x00007ffff7c44c3a in Dispatcher::ms_fast_dispatch2 (this=0x4e5498, m=...) at /home/xb/project/ceph/xb/ceph/src/msg/Dispatcher.h:84
#6  0x00007fffee7fd18c in Messenger::ms_fast_dispatch (this=0x4de6a0, m=...) at /home/xb/project/ceph/xb/ceph/src/msg/Messenger.h:676
#7  0x00007fffee7fb13a in DispatchQueue::fast_dispatch (this=0x4de9f8, m=...) at /home/xb/project/ceph/xb/ceph/src/msg/DispatchQueue.cc:72
#8  0x00007fffee942ab3 in DispatchQueue::fast_dispatch (this=0x4de9f8, m=0x7fffd4013c60) at /home/xb/project/ceph/xb/ceph/src/msg/DispatchQueue.h:203
#9  0x00007fffee9a1825 in ProtocolV2::handle_message (this=0x585920) at /home/xb/project/ceph/xb/ceph/src/msg/async/ProtocolV2.cc:1479
#10 0x00007fffee99dc09 in ProtocolV2::handle_read_frame_dispatch (this=0x585920) at /home/xb/project/ceph/xb/ceph/src/msg/async/ProtocolV2.cc:1137
#11 0x00007fffee99fc0b in ProtocolV2::_handle_read_frame_epilogue_main (this=0x585920) at /home/xb/project/ceph/xb/ceph/src/msg/async/ProtocolV2.cc:1325
#12 0x00007fffee99fa44 in ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int) (this=0x585920, 
    buffer=<unknown type in /home/xb/project/ceph/xb/ceph/build/lib/libceph-common.so.2, CU 0x2845a97, DIE 0x294b353>, r=0) at /home/xb/project/ceph/xb/ceph/src/msg/async/ProtocolV2.cc:1300
#13 0x00007fffee9cc2ba in CtRxNode<ProtocolV2>::call (this=0x585cc8, foo=0x585920) at /home/xb/project/ceph/xb/ceph/src/msg/async/Protocol.h:67
#14 0x00007fffee98f816 in ProtocolV2::run_continuation (this=0x585920, continuation=...) at /home/xb/project/ceph/xb/ceph/src/msg/async/ProtocolV2.cc:47
#15 0x00007fffee998265 in operator() (__closure=0x58c610, buffer=0x7fffd400d970 "\021\002)", r=0) at /home/xb/project/ceph/xb/ceph/src/msg/async/ProtocolV2.cc:755
#16 0x00007fffee9b6d73 in std::__invoke_impl<void, ProtocolV2::read(CONTINUATION_RXBPTR_TYPE<ProtocolV2>&, rx_buffer_t&&)::<lambda(char*, int)>&, char*, long int>(std::__invoke_other, struct {...} &) (__f=...)
    at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/invoke.h:61
#17 0x00007fffee9b694f in std::__invoke_r<void, ProtocolV2::read(CONTINUATION_RXBPTR_TYPE<ProtocolV2>&, rx_buffer_t&&)::<lambda(char*, int)>&, char*, long int>(struct {...} &) (__fn=...) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/invoke.h:111
#18 0x00007fffee9b60cb in std::_Function_handler<void(char*, long int), ProtocolV2::read(CONTINUATION_RXBPTR_TYPE<ProtocolV2>&, rx_buffer_t&&)::<lambda(char*, int)> >::_M_invoke(const std::_Any_data &, <unknown type in /home/xb/project/ceph/xb/ceph/build/lib/libceph-common.so.2, CU 0x2845a97, DIE 0x292f28b>, <unknown type in /home/xb/project/ceph/xb/ceph/build/lib/libceph-common.so.2, CU 0x2845a97, DIE 0x292f29b>) (__functor=..., 
    __args#0=<unknown type in /home/xb/project/ceph/xb/ceph/build/lib/libceph-common.so.2, CU 0x2845a97, DIE 0x292f28b>, __args#1=<unknown type in /home/xb/project/ceph/xb/ceph/build/lib/libceph-common.so.2, CU 0x2845a97, DIE 0x292f29b>)
    at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_function.h:290
#19 0x00007fffee94454f in std::function<void (char*, long)>::operator()(char*, long) const (this=0x58c610, __args#0=0x7fffd400d970 "\021\002)", __args#1=0) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_function.h:590
#20 0x00007fffee93e40b in AsyncConnection::process (this=0x58c290) at /home/xb/project/ceph/xb/ceph/src/msg/async/AsyncConnection.cc:458
#21 0x00007fffee94345e in C_handle_read::do_request (this=0x4b2b30, fd_or_id=21) at /home/xb/project/ceph/xb/ceph/src/msg/async/AsyncConnection.cc:71
#22 0x00007fffee9d0368 in EventCenter::process_events (this=0x519ab0, timeout_microseconds=30000000, working_dur=0x7fffe3ffd230) at /home/xb/project/ceph/xb/ceph/src/msg/async/Event.cc:406
#23 0x00007fffee9ddc23 in operator() (__closure=0x573b48) at /home/xb/project/ceph/xb/ceph/src/msg/async/Stack.cc:53
#24 0x00007fffee9df78e in std::__invoke_impl<void, NetworkStack::add_thread(unsigned int)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/invoke.h:61
#25 0x00007fffee9df675 in std::__invoke_r<void, NetworkStack::add_thread(unsigned int)::<lambda()>&>(struct {...} &) (__fn=...) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/invoke.h:111
#26 0x00007fffee9df55c in std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_function.h:290
#27 0x00007fffee9dd3f4 in std::function<void ()>::operator()() const (this=0x573b48) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_function.h:590
#28 0x00007fffee9dd3a4 in std::__invoke_impl<void, std::function<void ()>>(std::__invoke_other, std::function<void ()>&&) (__f=<unknown type in /home/xb/project/ceph/xb/ceph/build/lib/libceph-common.so.2, CU 0x2a65ef0, DIE 0x2ae0546>)
    at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/invoke.h:61
#29 0x00007fffee9dd359 in std::__invoke<std::function<void ()>>(std::function<void ()>&&) (__fn=<unknown type in /home/xb/project/ceph/xb/ceph/build/lib/libceph-common.so.2, CU 0x2a65ef0, DIE 0x2ae0af2>)
    at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/invoke.h:96
#30 0x00007fffee9dd306 in std::thread::_Invoker<std::tuple<std::function<void ()> > >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x573b48) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_thread.h:253
#31 0x00007fffee9dd2da in std::thread::_Invoker<std::tuple<std::function<void ()> > >::operator()() (this=0x573b48) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_thread.h:260
#32 0x00007fffee9dd2be in std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void ()> > > >::_M_run() (this=0x573b40) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_thread.h:211
#33 0x00007fffeef0a7c4 in execute_native_thread_routine () from /home/xb/project/ceph/xb/ceph/build/lib/libceph-common.so.2
#34 0x00007fffec27bea5 in start_thread () from /lib64/libpthread.so.0
#35 0x00007ffff78b5b0d in clone () from /lib64/libc.so.6




rados_write 堆栈:
#0  Objecter::ms_dispatch (this=0x4e5490, m=0x7fffd4013c60) at /home/xb/project/ceph/xb/ceph/src/osdc/Objecter.cc:991
#1  0x00007ffff7c966aa in Objecter::ms_fast_dispatch (this=0x4e5490, m=0x7fffd4013c60) at /home/xb/project/ceph/xb/ceph/src/osdc/Objecter.h:2195
#2  0x00007ffff7c44c3a in Dispatcher::ms_fast_dispatch2 (this=0x4e5498, m=...) at /home/xb/project/ceph/xb/ceph/src/msg/Dispatcher.h:84
#3  0x00007fffee7fd18c in Messenger::ms_fast_dispatch (this=0x4de6a0, m=...) at /home/xb/project/ceph/xb/ceph/src/msg/Messenger.h:676
#4  0x00007fffee7fb13a in DispatchQueue::fast_dispatch (this=0x4de9f8, m=...) at /home/xb/project/ceph/xb/ceph/src/msg/DispatchQueue.cc:72
#5  0x00007fffee942ab3 in DispatchQueue::fast_dispatch (this=0x4de9f8, m=0x7fffd4013c60) at /home/xb/project/ceph/xb/ceph/src/msg/DispatchQueue.h:203
#6  0x00007fffee9a1825 in ProtocolV2::handle_message (this=0x585920) at /home/xb/project/ceph/xb/ceph/src/msg/async/ProtocolV2.cc:1479
#7  0x00007fffee99dc09 in ProtocolV2::handle_read_frame_dispatch (this=0x585920) at /home/xb/project/ceph/xb/ceph/src/msg/async/ProtocolV2.cc:1137
#8  0x00007fffee99fc0b in ProtocolV2::_handle_read_frame_epilogue_main (this=0x585920) at /home/xb/project/ceph/xb/ceph/src/msg/async/ProtocolV2.cc:1325
#9  0x00007fffee99fa44 in ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int) (this=0x585920, 
    buffer=<unknown type in /home/xb/project/ceph/xb/ceph/build/lib/libceph-common.so.2, CU 0x2845a97, DIE 0x294b353>, r=0) at /home/xb/project/ceph/xb/ceph/src/msg/async/ProtocolV2.cc:1300
#10 0x00007fffee9cc2ba in CtRxNode<ProtocolV2>::call (this=0x585cc8, foo=0x585920) at /home/xb/project/ceph/xb/ceph/src/msg/async/Protocol.h:67
#11 0x00007fffee98f816 in ProtocolV2::run_continuation (this=0x585920, continuation=...) at /home/xb/project/ceph/xb/ceph/src/msg/async/ProtocolV2.cc:47
#12 0x00007fffee998265 in operator() (__closure=0x58c610, buffer=0x7fffd400d970 "\021\002)", r=0) at /home/xb/project/ceph/xb/ceph/src/msg/async/ProtocolV2.cc:755
#13 0x00007fffee9b6d73 in std::__invoke_impl<void, ProtocolV2::read(CONTINUATION_RXBPTR_TYPE<ProtocolV2>&, rx_buffer_t&&)::<lambda(char*, int)>&, char*, long int>(std::__invoke_other, struct {...} &) (__f=...)
    at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/invoke.h:61
#14 0x00007fffee9b694f in std::__invoke_r<void, ProtocolV2::read(CONTINUATION_RXBPTR_TYPE<ProtocolV2>&, rx_buffer_t&&)::<lambda(char*, int)>&, char*, long int>(struct {...} &) (__fn=...) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/invoke.h:111
#15 0x00007fffee9b60cb in std::_Function_handler<void(char*, long int), ProtocolV2::read(CONTINUATION_RXBPTR_TYPE<ProtocolV2>&, rx_buffer_t&&)::<lambda(char*, int)> >::_M_invoke(const std::_Any_data &, <unknown type in /home/xb/project/ceph/xb/ceph/build/lib/libceph-common.so.2, CU 0x2845a97, DIE 0x292f28b>, <unknown type in /home/xb/project/ceph/xb/ceph/build/lib/libceph-common.so.2, CU 0x2845a97, DIE 0x292f29b>) (__functor=..., 
    __args#0=<unknown type in /home/xb/project/ceph/xb/ceph/build/lib/libceph-common.so.2, CU 0x2845a97, DIE 0x292f28b>, __args#1=<unknown type in /home/xb/project/ceph/xb/ceph/build/lib/libceph-common.so.2, CU 0x2845a97, DIE 0x292f29b>)
    at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_function.h:290
#16 0x00007fffee94454f in std::function<void (char*, long)>::operator()(char*, long) const (this=0x58c610, __args#0=0x7fffd400d970 "\021\002)", __args#1=0) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_function.h:590
#17 0x00007fffee93e40b in AsyncConnection::process (this=0x58c290) at /home/xb/project/ceph/xb/ceph/src/msg/async/AsyncConnection.cc:458
#18 0x00007fffee94345e in C_handle_read::do_request (this=0x4b2b30, fd_or_id=21) at /home/xb/project/ceph/xb/ceph/src/msg/async/AsyncConnection.cc:71
#19 0x00007fffee9d0368 in EventCenter::process_events (this=0x519ab0, timeout_microseconds=30000000, working_dur=0x7fffe3ffd230) at /home/xb/project/ceph/xb/ceph/src/msg/async/Event.cc:406
#20 0x00007fffee9ddc23 in operator() (__closure=0x573b48) at /home/xb/project/ceph/xb/ceph/src/msg/async/Stack.cc:53
#21 0x00007fffee9df78e in std::__invoke_impl<void, NetworkStack::add_thread(unsigned int)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/invoke.h:61
#22 0x00007fffee9df675 in std::__invoke_r<void, NetworkStack::add_thread(unsigned int)::<lambda()>&>(struct {...} &) (__fn=...) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/invoke.h:111
#23 0x00007fffee9df55c in std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_function.h:290
#24 0x00007fffee9dd3f4 in std::function<void ()>::operator()() const (this=0x573b48) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_function.h:590
#25 0x00007fffee9dd3a4 in std::__invoke_impl<void, std::function<void ()>>(std::__invoke_other, std::function<void ()>&&) (__f=<unknown type in /home/xb/project/ceph/xb/ceph/build/lib/libceph-common.so.2, CU 0x2a65ef0, DIE 0x2ae0546>)
    at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/invoke.h:61
#26 0x00007fffee9dd359 in std::__invoke<std::function<void ()>>(std::function<void ()>&&) (__fn=<unknown type in /home/xb/project/ceph/xb/ceph/build/lib/libceph-common.so.2, CU 0x2a65ef0, DIE 0x2ae0af2>)
    at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/invoke.h:96
#27 0x00007fffee9dd306 in std::thread::_Invoker<std::tuple<std::function<void ()> > >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x573b48) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_thread.h:253
#28 0x00007fffee9dd2da in std::thread::_Invoker<std::tuple<std::function<void ()> > >::operator()() (this=0x573b48) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_thread.h:260
#29 0x00007fffee9dd2be in std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void ()> > > >::_M_run() (this=0x573b40) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_thread.h:211
#30 0x00007fffeef0a7c4 in execute_native_thread_routine () from /home/xb/project/ceph/xb/ceph/build/lib/libceph-common.so.2
#31 0x00007fffec27bea5 in start_thread () from /lib64/libpthread.so.0
#32 0x00007ffff78b5b0d in clone () from /lib64/libc.so.6
(gdb) 




(gdb) p op->target
$9 = {
  flags = 32, 
  epoch = 0, 
  base_oid = {
    name = "neo-obj"
  }, 
  base_oloc = {
    pool = 1, 
    key = "", 
    nspace = "", 
    hash = -1
  }, 
  target_oid = {
    name = ""
  }, 
  target_oloc = {
    pool = -1, 
    key = "", 
    nspace = "", 
    hash = -1
  }, 
  precalc_pgid = false, 
  pool_ever_existed = false, 
  base_pgid = {
    m_pool = 0, 
    m_seed = 0, 
    static calc_name_buf_size = 36 '$'
  }, 
  pgid = {
    m_pool = 0, 
    m_seed = 0, 
    static calc_name_buf_size = 36 '$'
  }, 
  actual_pgid = {
    pgid = {
      m_pool = 0, 
      m_seed = 0, 
      static calc_name_buf_size = 36 '$'
    }, 
    shard = {
      id = -1 '\377', 
      static NO_SHARD = {
        id = -1 '\377', 
        static NO_SHARD = <same as static member of an already seen type>
      }
    }, 
    static calc_name_buf_size = 40 '('
  }, 
  pg_num = 0, 
  pg_num_mask = 0, 
  pg_num_pending = 0, 
  up = std::vector of length 0, capacity 0, 
  acting = std::vector of length 0, capacity 0, 
  up_primary = -1, 
  acting_primary = -1, 
  size = -1, 
  min_size = -1, 
  sort_bitwise = false, 
  recovery_deletes = false, 
  used_replica = false, 
  paused = false, 
  osd = -1, 
  last_force_resend = 0
}

        

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值