rocksdb写流程DBImpl::WriteImpl()源代码分析


1. WriteThread的状态

写线程WriteThread会有几种状态,在写memtable的过程中通这几种状态来决定当前w当前执行到哪儿了,下一步该怎么执行。

  enum State : uint8_t {
   
    // The initial state of a writer.  This is a Writer that is
    // waiting in JoinBatchGroup.  This state can be left when another
    // thread informs the waiter that it has become a group leader
    // (-> STATE_GROUP_LEADER), when a leader that has chosen to be
    // non-parallel informs a follower that its writes have been committed
    // (-> STATE_COMPLETED), or when a leader that has chosen to perform
    // updates in parallel and needs this Writer to apply its batch (->
    // STATE_PARALLEL_FOLLOWER).
    STATE_INIT = 1,

    // The state used to inform a waiting Writer that it has become the
    // leader, and it should now build a write batch group.  Tricky:
    // this state is not used if newest_writer_ is empty when a writer
    // enqueues itself, because there is no need to wait (or even to
    // create the mutex and condvar used to wait) in that case.  This is
    // a terminal state unless the leader chooses to make this a parallel
    // batch, in which case the last parallel worker to finish will move
    // the leader to STATE_COMPLETED.
    STATE_GROUP_LEADER = 2,

    // The state used to inform a waiting writer that it has become the
    // leader of memtable writer group. The leader will either write
    // memtable for the whole group, or launch a parallel group write
    // to memtable by calling LaunchParallelMemTableWrite.
    STATE_MEMTABLE_WRITER_LEADER = 4,

    // The state used to inform a waiting writer that it has become a
    // parallel memtable writer. It can be the group leader who launch the
    // parallel writer group, or one of the followers. The writer should then
    // apply its batch to the memtable concurrently and call
    // CompleteParallelMemTableWriter.
    STATE_PARALLEL_MEMTABLE_WRITER = 8,

    // A follower whose writes have been applied, or a parallel leader
    // whose followers have all finished their work.  This is a terminal
    // state.
    STATE_COMPLETED = 16,

    // A state indicating that the thread may be waiting using StateMutex()
    // and StateCondVar()
    STATE_LOCKED_WAITING = 32,
  };

2. write_group结构

在这里插入图片描述

3. 写过程WriteImpl()函数源码分析

// The main write queue. This is the only write queue that updates LastSequence.
// When using one write queue, the same sequence also indicates the last
// published sequence.
Status DBImpl::WriteImpl(const WriteOptions& write_options,
                         WriteBatch* my_batch, WriteCallback* callback,
                         uint64_t* log_used, uint64_t log_ref,
                         bool disable_memtable, uint64_t* seq_used,
                         size_t batch_cnt,
                         PreReleaseCallback* pre_release_callback) {
   
  assert(!seq_per_batch_ || batch_cnt != 0);
  if (my_batch == nullptr) {
   
    return Status::Corruption("Batch is nullptr!");
  }
  if (tracer_) {
   
    InstrumentedMutexLock lock(&trace_mutex_);
    if (tracer_) {
   
      tracer_->Write(my_batch);
    }
  }
  if (write_options.sync && write_options.disableWAL) {
   
    return Status::InvalidArgument("Sync writes has to enable WAL.");
  }
  if (two_write_queues_ && immutable_db_options_.enable_pipelined_write) {
   
    return Status::NotSupported(
        "pipelined_writes is not compatible with concurrent prepares");
  }
  if (seq_per_batch_ && immutable_db_options_.enable_pipelined_write) {
   
    // TODO(yiwu): update pipeline write with seq_per_batch and batch_cnt
    return Status::NotSupported(
        "pipelined_writes is not compatible with seq_per_batch");
  }
  if (immutable_db_options_.unordered_write &&
      immutable_db_options_.enable_pipelined_write) {
   
    return Status::NotSupported(
        "pipelined_writes is not compatible with unordered_write");
  }
  // Otherwise IsLatestPersistentState optimization does not make sense
  assert(!WriteBatchInternal::IsLatestPersistentState(my_batch) ||
         disable_memtable);

  Status status;
  IOStatus io_s;
  if (write_options.low_pri) {
   
    status = ThrottleLowPriWritesIfNeeded(write_options, my_batch);
    if (!status.ok()) {
   
      return status;
    }
  }

  /* 
  只写WAL不写memtable
  two_wrote_queues_:
        当设置时,我们使用一个单独的队列来处理不写入memtable的写操作。
        在2PC中,这些是准备阶段的写操作。
  */
  if (two_write_queues_ && disable_memtable) {
   
    AssignOrder assign_order =
        seq_per_batch_ ? kDoAssignOrder : kDontAssignOrder;
    // Otherwise it is WAL-only Prepare batches in WriteCommitted policy and
    // they don't consume sequence.
    return WriteImplWALOnly(&nonmem_write_thread_, write_options, my_batch,
                            callback, log_used, log_ref, seq_used, batch_cnt,
                            pre_release_callback, assign_order,
                            kDontPublishLastSeq, disable_memtable);
  }
  
  // 将unordered_w
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
根据Valgrind提供的信息,可以得出以下分析: 这段Valgrind信息表示在程序运行结束时,有24字节的内存块是明确丢失的。这是在294条记录中的第68条记录。 这个内存块的分配是通过`operator new`函数进行的,具体是在`vg_replace_malloc.c`文件的`operator new(unsigned long, std::nothrow_t const&)`函数中进行的。这个函数用于分配内存,并且使用了`std::nothrow_t`参数,表示在分配失败时不抛出异常。 这个内存块的丢失发生在`libstdc++.so.6.0.19`库文件中的`__cxa_thread_atexit`函数中。这个函数是C++标准库中的一个线程退出钩子函数,用于在线程退出时执行清理操作。 进一步跟踪,这个内存块的丢失是在`librocksdb.so.6.20.3`库文件中的`rocksdb::InstrumentedMutex::Lock()`函数中发生的。这个函数是RocksDB数据库引擎的一个锁操作函数,用于获取互斥锁。 在调用堆栈中,可以看到这个内存块丢失是在RocksDB数据库引擎的后台合并线程(Background Compaction)中发生的。具体是在`rocksdb::DBImpl::BackgroundCallCompaction()`和`rocksdb::DBImpl::BGWorkCompaction()`函数中进行的合并操作。 最后,从调用堆栈中可以看到,这个内存块的丢失是在后台线程中发生的。这是在`librocksdb.so.6.20.3`库文件中的`rocksdb::ThreadPoolImpl::Impl::BGThread()`和`rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper()`函数中执行的。 综上所述,根据Valgrind的信息分析,这段代码中存在一个明确的内存泄漏问题,24字节的内存块在后台合并线程中丢失。需要进一步检查代码,确保在合适的时机释放这些内存块,以避免资源泄漏和潜在的问题。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值