PyTorch中的intrusive_ptr

前言

boost::intrusive_ptrstd::unique_ptrstd::shared_ptr等一樣,都是smart pointer。

但是boost::intrusive_ptr比較特別,人如其名,它是一種侵入式的指標。它所指向的物件必須要有一個表示引用計數的成員變數,讓它能自己計算引用計數。另外還需要實作intrusive_ptr_add_refintrusive_ptr_release這兩個用於修改引用計數的函數,boost::intrusive_ptr中會透過這兩個函數增加或減少它所指向的物件的引用計數。

想要循環引用時,如果使用shared_ptr會出現無法析構的問題,這個問題可以透過weak_ptr來解決。因為weak_ptr不會增加所指向物件的引用計數,所以從引用計數的角度來看,就不會有deadlock的問題。weak_ptr可以說是為了解決這個問題而出現的,所以它必須搭配shared_ptr來使用。

想要有多個指標指向同一物件時,如果是使用shared_ptr的話,因為引用計數是儲存在shared_ptr裡,當第一個shared_ptr把物件銷毀後,第二個shared_ptr的引用計數仍然為1,所以它也會嘗試銷毀物件,出現重複析構的問題;如果使用intrusive_ptr,因為引用計數是儲存在指向的物件裡,一個物件只有一個引用計數,所以不會出現重複析構的問題。

intrusive_ptr的缺點是無法使用weak_ptr,所以不能用在循環引用的場景中。

PyTorch中也有類似的c10::intrusive_ptr,它所指向的物件必須繼承自c10::intrusive_ptr_target。與boost類似,c10::intrusive_ptr也需要increfdecref這兩個函數管理引用計數。在PyTorch中,這兩個函數是為c10::intrusive_ptr的成員函數,它們會去操作c10::intrusive_ptr_target的成員變數refcount_

本篇記錄了研讀c10::intrusive_ptr相關代碼的筆記,以及在PyTorch中c10::intrusive_ptr的實際使用案例,最後是用一個demo來探討c10::intrusive_ptrstd::shared_ptr, boost::intrusive_ptr的差別。

c10::intrusive_ptr_target

c10::intrusive_ptr_target在PyTorch中是一個提供引用計數功能的base class(基類)。

引用計數成員

c10/util/intrusive_ptr.h

class C10_API intrusive_ptr_target {
  // Note [Weak references for intrusive refcounting]
  // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  // Here's the scheme:
  //
  //  - refcount == number of strong references to the object
  //    weakcount == number of weak references to the object,
  //      plus one more if refcount > 0
  //    An invariant: refcount > 0  =>  weakcount > 0
  //
  //  - c10::StorageImpl stays live as long as there are any strong
  //    or weak pointers to it (weakcount > 0, since strong
  //    references count as a +1 to weakcount)
  //
  //  - finalizers are called and data_ptr is deallocated when refcount == 0
  //
  //  - Once refcount == 0, it can never again be > 0 (the transition
  //    from > 0 to == 0 is monotonic)
  //
  //  - When you access c10::StorageImpl via a weak pointer, you must
  //    atomically increment the use count, if it is greater than 0.
  //    If it is not, you must report that the storage is dead.
  //
  mutable std::atomic<size_t> refcount_;
  mutable std::atomic<size_t> weakcount_;
};

我們已經知道boost::intrusive_ptr所指向的物件必須擁有reference count成員變數,因此這裡也有一個refcount_成員變數,那麼weakcount_是從何而來呢?

因為boost::instrusive_ptr在循環引用時會有無法析構的問題,PyTorch中的intrusive_ptr為了避免出現這種情況,被設計成兼具intrusive_ptrweak_ptr的功能,所以除了refcount_外,還有weakcount_成員變數。

注:此處的refcount_weakcount_是私有成員變數。前面的mutable修飾字表示即使我們宣告了一個const的intrusive_ptr_target的物件,它的這兩個成員變數仍可被修改;另外這兩個成員變數也可以被intrusive_ptr_target的const成員函數修改。

constructors

  constexpr intrusive_ptr_target() noexcept : refcount_(0), weakcount_(0) {}

  // intrusive_ptr_target supports copy and move: but refcount and weakcount
  // don't participate (since they are intrinsic properties of the memory
  // location)
  intrusive_ptr_target(intrusive_ptr_target&& /*other*/) noexcept
      : intrusive_ptr_target() {}

  intrusive_ptr_target& operator=(intrusive_ptr_target&& /*other*/) noexcept {
    return *this;
  }

  intrusive_ptr_target(const intrusive_ptr_target& /*other*/) noexcept
      : intrusive_ptr_target() {}

  intrusive_ptr_target& operator=(
      const intrusive_ptr_target& /*other*/) noexcept {
    return *this;
  }
  • default constructor:將refcount_weakcount_都設為0
  • copy constructor:intrusive_ptr_target(const intrusive_ptr_target& /*other*/),無視傳入的參數,直接調用default constructor將refcount_weakcount_都設為0
  • move constructor:intrusive_ptr_target(intrusive_ptr_target&& /*other*/),無視傳入的參數,直接調用default constructor將refcount_weakcount_都設為0

注:在複製和移動物件時,因為新創建的物件跟傳入的物件參數會被視為不同的物件,所以這裡會將refcount_weakcount_歸零。

destructor

 protected:
  // protected destructor. We never want to destruct intrusive_ptr_target*
  // directly.
  virtual ~intrusive_ptr_target() {
// Disable -Wterminate and -Wexceptions so we're allowed to use assertions
// (i.e. throw exceptions) in a destructor.
// We also have to disable -Wunknown-warning-option and -Wpragmas, because
// some other compilers don't know about -Wterminate or -Wexceptions and
// will show a warning about unknown warning options otherwise.
#if defined(_MSC_VER) && !defined(__clang__)
#pragma warning(push)
#pragma warning( \
    disable : 4297) // function assumed not to throw an exception but does
#else
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wpragmas"
#pragma GCC diagnostic ignored "-Wunknown-warning-option"
#pragma GCC diagnostic ignored "-Wterminate"
#pragma GCC diagnostic ignored "-Wexceptions"
#endif
    TORCH_INTERNAL_ASSERT_DEBUG_ONLY(
        // Second condition is there to accommodate
        // unsafe_adapt_non_heap_allocated: since we are doing our own
        // deallocation in that case, it is correct for each
        // expected_decref to have happened (some user code tried to
        // decref and thus free the object, but it didn't happen right
        // away) or not (no user code tried to free the object, and
        // now it's getting destroyed through whatever mechanism the
        // caller of unsafe_adapt_non_heap_allocated wanted to
        // use). We choose our reference count such that the count
        // will not dip below INT_MAX regardless.
        refcount_.load() == 0 || refcount_.load() >= INT_MAX,
        "Tried to destruct an intrusive_ptr_target that still has intrusive_ptr to it; refcount was ",
        refcount_.load());
    TORCH_INTERNAL_ASSERT_DEBUG_ONLY(
        // See ~intrusive_ptr for optimization that will frequently result in 1
        // at destruction time.
        weakcount_.load() == 1 || weakcount_.load() == 0 ||
            weakcount_.load() == INT_MAX - 1 || weakcount_.load() == INT_MAX,
        "Tried to destruct an intrusive_ptr_target that still has weak_intrusive_ptr to it");
#if defined(_MSC_VER) && !defined(__clang__)
#pragma warning(pop)
#else
#pragma GCC diagnostic pop
#endif
  }

在調用destructor前,refcount_必須為0,weakcount_必須為1或0(之後會在c10::intrusive_ptr的destructor中看到weakcount_為1時可以被銷毀的原因)。

release_resources

class C10_API intrusive_ptr_target {
 private:
  /**
   * This is called when refcount reaches zero.
   * You can override this to release expensive resources.
   * There might still be weak references, so your object might not get
   * destructed yet, but you can assume the object isn't used anymore,
   * i.e. no more calls to methods or accesses to members (we just can't
   * destruct it yet because we need the weakcount accessible).
   *
   * If there are no weak references (i.e. your class is about to be
   * destructed), this function WILL NOT be called.
   */
  virtual void release_resources() {}
};

release_resources函數用於釋放資源,是一個虛擬函數,其具體內容由intrusive_ptr_target的各子類別來實作。intrusive_ptr_target的一個子類別便是TensorImpl,其release_resources函數實作如下。

TensorImpl::release_resources

torch/include/c10/core/TensorImpl.h

在看release_resources之前,先大致看一下TensorImpl的成員變數,它們就是待會release_resources要釋放的“資源”:

struct C10_API TensorImpl : public c10::intrusive_ptr_target {
 // ...
 protected:
  Storage storage_;

 private:
  // ...
  std::unique_ptr<c10::AutogradMetaInterface> autograd_meta_ = nullptr;
  // ...

 protected:
  // ...
  impl::PyObjectSlot pyobj_slot_;
  // ...
}

TensorImpl::release_resources如下:

c10/core/TensorImpl.cpp

void TensorImpl::release_resources() {
  autograd_meta_.reset();
  if (storage_) {
    storage_ = {};
  }
  pyobj_slot_.destroy_pyobj_if_needed();
}

看起來release_resources函數的功能就是把各成員變數清空。

friend classes/functions

class C10_API intrusive_ptr_target {
  // ...
  template <typename T, typename NullType>
  friend class intrusive_ptr;
  friend inline void raw::intrusive_ptr::incref(intrusive_ptr_target* self);

  template <typename T, typename NullType>
  friend class weak_intrusive_ptr;
  friend inline void raw::weak_intrusive_ptr::incref(
      intrusive_ptr_target* self);

  template <typename T>
  friend struct ExclusivelyOwnedTensorTraits;
};

這裡將intrusive_ptr::incref函數宣告成friend function,所以待會會看到:incref可以自由存取refcount_這個私有成員變數。

關於friend function,詳見Friend Class and Function in C++2. Member Function of Another Class as Friend Function章節。

c10::intrusive_ptr

target_

template <
    class TTarget,
    class NullType = detail::intrusive_target_default_null_type<TTarget>>
class intrusive_ptr final {
 private:
//  the following static assert would be nice to have but it requires
//  the target class T to be fully defined when intrusive_ptr<T> is instantiated
//  this is a problem for classes that contain pointers to themselves
//  static_assert(
//      std::is_base_of<intrusive_ptr_target, TTarget>::value,
//      "intrusive_ptr can only be used for classes that inherit from
//      intrusive_ptr_target.");
#ifndef _WIN32
  // This static_assert triggers on MSVC
  //  error C2131: expression did not evaluate to a constant
  static_assert(
      NullType::singleton() == NullType::singleton(),
      "NullType must have a constexpr singleton() method");
#endif
  static_assert(
      std::is_base_of<
          TTarget,
          typename std::remove_pointer<decltype(NullType::singleton())>::type>::
          value,
      "NullType::singleton() must return a element_type* pointer");

  TTarget* target_;
  // ...

c10::intrusive_ptr是一個class template,target_的型別即模板參數TTarget

target_intrusive_ptr的私有成員變數,是intrusive_ptr所管理的raw pointer,注釋中說明TTarget必須繼承自intrusive_ptr_target

incref

c10::intrusive_ptr::increfc10:intrusive_ptr用來增加底層物件引用計數的函數。

incref

namespace c10 {
// ...

namespace raw {

namespace intrusive_ptr {

// WARNING: Unlike the reclaim() API, it is NOT valid to pass
// NullType::singleton to this function
inline void incref(intrusive_ptr_target* self) {
  if (self) {
    detail::atomic_refcount_increment(self->refcount_);
  }
}

// ...

} // namespace intrusive_ptr

// ...

} // namespace raw

} // namespace c10

因為intrusive_ptr_targetc10::raw::intrusive_ptr::incref宣告為friend function,所以c10::raw::intrusive_ptr::incref才可以存取intrusive_ptr_target的私有成員變數refcount_

detail::atomic_refcount_increment

incref中調用了detail::atomic_refcount_increment

// Increment needs to be acquire-release to make use_count() and
// unique() reliable.
inline size_t atomic_refcount_increment(std::atomic<size_t>& refcount) {
  return refcount.fetch_add(1, std::memory_order_acq_rel) + 1;
}

參考std::atomic::fetch_add

Atomically replaces the current value with the result of arithmetic addition of the value and arg. That is, it performs atomic post-increment. The operation is a read-modify-write operation. Memory is affected according to the value of order.

簡單來說,fetch_add就是原子操作版本的i++。

atomic_refcount_increment回傳的是fetch_add的回傳值+1,來看看refcount的回傳值為何?

Return value
The value immediately preceding the effects of this function in the modification order of *this.

因為fetch_add回傳的是在本操作前變數原有的值,所以atomic_refcount_increment回傳fetch_add的回傳值+1表示的是經過fetch_add操作後的值。

decref

decrefc10:intrusive_ptr用來減少底層物件引用計數的函數。

decref相關的函數有releasedecrefreclaim三個。其中decref會調用reclaim,而傳入reclaim的指標必須由release創造。所以這裡先從release函數開始。

release

template <
    class TTarget,
    class NullType = detail::intrusive_target_default_null_type<TTarget>>
class intrusive_ptr final {
//...
 public:
  //...
  /**
   * Returns an owning (!) pointer to the underlying object and makes the
   * intrusive_ptr instance invalid. That means the refcount is not decreased.
   * You *must* put the returned pointer back into a intrusive_ptr using
   * intrusive_ptr::reclaim(ptr) to properly destruct it.
   * This is helpful for C APIs.
   */
  TTarget* release() noexcept {
    // NOLINTNEXTLINE(clang-analyzer-core.uninitialized.Assign)
    TTarget* result = target_;
    target_ = NullType::singleton();
    return result;
  }

首先將底層物件的raw pointertarget_暫存至result,這個result會在函數最後被返回。

接著將target_設為空,使intrusive_ptr無效化,注意此處並未減少引用計數也尚未析構底層物件。

如注釋中所說,必須將本函數返回的raw pointer result當作參數傳入reclaim來正確地析構。

decref

namespace c10 {
// ...

namespace raw {

namespace intrusive_ptr {

// WARNING: Unlike the reclaim() API, it is NOT valid to pass
// NullType::singleton to this function
inline void decref(intrusive_ptr_target* self) {
  // Let it die
  c10::intrusive_ptr<intrusive_ptr_target>::reclaim(self);
  // NB: Caller still has 'self' pointer, but it's now invalid.
  // If you want more safety, used the actual c10::intrusive_ptr class
}

// ...

} // namespace intrusive_ptr

// ...

} // namespace raw

} // namespace c10

看起來只是reclaim函數的wrapper。

reclaim

  /**
   * Takes an owning pointer to TTarget* and creates an intrusive_ptr that takes
   * over ownership. That means the refcount is not increased.
   * This is the counter-part to intrusive_ptr::release() and the pointer
   * passed in *must* have been created using intrusive_ptr::release().
   */
  static intrusive_ptr reclaim(TTarget* owning_ptr) {
    TORCH_INTERNAL_ASSERT_DEBUG_ONLY(
        owning_ptr == NullType::singleton() ||
            owning_ptr->refcount_.load() == 0 || owning_ptr->weakcount_.load(),
        "TTarget violates the invariant that refcount > 0  =>  weakcount > 0");
    return intrusive_ptr(owning_ptr, raw::DontIncreaseRefcount{});
  }

如前所述,傳入reclaimowning_ptr必須由release創造。

這個函數會創造一個intrusive_ptr,拿走owning_ptr的所有權,用它來建構另外一個引用計數為0的intrusive_ptr

在函數返回時,因為新建的intrusive_ptr的生命週期已到了盡頭,所以會調用其解構子~intrusive_ptr

以下進一步查看intrusive_ptr的建構子和解構子。

constructor

不增加引用計數的建構子

intrusive_ptr有八種建構子,以下這個版本的建構子就只是設定target_,並不涉及引用計數的改動:

public:
 // This constructor will not increase the ref counter for you.
  // We use the tagged dispatch mechanism to explicitly mark this constructor
  // to not increase the refcount
  explicit intrusive_ptr(TTarget* target, raw::DontIncreaseRefcount) noexcept
      : target_(target) {}

其中raw::DontIncreaseRefcount如下,就只是一個空的結構體:

namespace c10 {
// ...
namespace raw {
// ...
// constructor tag used by intrusive_ptr constructors
struct DontIncreaseRefcount {};
} // namespace raw

它的存在讓上面的intrusive_ptr建構子跟其它建構子有了不同的簽名,調用時可以用類似intrusive_ptr(NullType::singleton(), raw::DontIncreaseRefcount{})的寫法,有極高的可讀性。

private的建構子

以下這個版本的建構子只接受target一個參數,它首先調用raw::DontIncreaseRefcount版本的建構子,再自行設定引用計數相關的成員變數。

注意它是private的。

private:
 // This constructor will increase the ref counter for you.
  // This constructor will be used by the make_intrusive(), and also pybind11,
  // which wrap the intrusive_ptr holder around the raw pointer and incref
  // correspondingly (pybind11 requires raw pointer constructor to incref by
  // default).
  explicit intrusive_ptr(TTarget* target)
      : intrusive_ptr(target, raw::DontIncreaseRefcount{}) {
    if (target_ != NullType::singleton()) {
      // We just created result.target_, so we know no other thread has
      // access to it, so we know we needn't care about memory ordering.
      // (On x86_64, a store with memory_order_relaxed generates a plain old
      // `mov`, whereas an atomic increment does a lock-prefixed `add`, which is
      // much more expensive: https://godbolt.org/z/eKPzj8.)
      TORCH_INTERNAL_ASSERT_DEBUG_ONLY(
          target_->refcount_ == 0 && target_->weakcount_ == 0,
          "intrusive_ptr: Newly-created target had non-zero refcounts. Does its "
          "constructor do something strange like incref or create an "
          "intrusive_ptr from `this`?");
      target_->refcount_.store(1, std::memory_order_relaxed);
      target_->weakcount_.store(1, std::memory_order_relaxed);
    }
  }

此處將refcount_設為1,因為有一個intrusive_ptr指向它(也就是這個intrusive_ptr本身)。

weakcount_設為1則是因為PyTorch中自己立下的規定:在refcount_大於0時,weakcount_會比它實際應有的值大1。

  //  - refcount == number of strong references to the object
  //    weakcount == number of weak references to the object,
  //      plus one more if refcount > 0
  //    An invariant: refcount > 0  =>  weakcount > 0

destructor

destructor

  ~intrusive_ptr() noexcept {
    reset_();
  }

解構子會調用reset_函數。

_reset

 private:
  // ...
  void reset_() noexcept {
    if (target_ != NullType::singleton() &&
        detail::atomic_refcount_decrement(target_->refcount_) == 0) {
      // See comment above about weakcount. As long as refcount>0,
      // weakcount is one larger than the actual number of weak references.
      // So we need to decrement it here.
      bool should_delete =
          target_->weakcount_.load(std::memory_order_acquire) == 1;
      if (!should_delete) {
        // justification for const_cast: release_resources is basically a
        // destructor and a destructor always mutates the object, even for const
        // objects. NOLINTNEXTLINE(cppcoreguidelines-pro-type-const-cast)
        const_cast<std::remove_const_t<TTarget>*>(target_)->release_resources();
        should_delete =
            detail::atomic_weakcount_decrement(target_->weakcount_) == 0;
      }
      if (should_delete) {
        delete target_;
      }
    }
  }

一開始會先遞減refcount_,如果變成0,表示已經沒有指標指向它,才會繼續做接下來的事。

要注意的是,在refcount_大於0時,weakcount_會比它實際應有的值大1。所以這裡檢查weakcount_是否為1,它的實際意義是檢查weak count是否為0,如果為真,則將should_delete設為true,代表底層物件的生命已經到盡頭了。

如果weak count仍大於0(should_delete為false),就調用release_resources釋放底層物件所佔用的資源,但是保留住target_指標。接著遞減weakcount_,遞減之後,weakcount_變為實際意義上的weak count。檢查它是否為0,如果為0,則將should_delete設為true。

如果should_delete為true(即weak count為0),就delete target_,正式刪除底層物件。

總結一下,如果只關注refcount_,那麼_reset函數的功用就是將它減1,如果減1之後refcount_為0,就會銷毀它所指向的物件。

這裡有個疑問:當refcount_為0而weakcount_大於1時,會釋放底層物件所佔用的資源,但是保留住target_指標。不懂為何資源可以先被釋放,之後如果需要存取底層物件時不會出錯?

另外decrefreclaim~intrusive_ptr_reset都不是intrusive_ptr_target的friend function,為何這裡可以修改intrusive_ptr_target的私有成員變數refcount_

detail::atomic_refcount_decrement

_reset中調用了detail::atomic_refcount_decrement

// Both decrements need to be acquire-release for correctness. See
// e.g. std::shared_ptr implementation.
inline size_t atomic_refcount_decrement(std::atomic<size_t>& refcount) {
  return refcount.fetch_sub(1, std::memory_order_acq_rel) - 1;
}

參考std::atomic::fetch_sub

Atomically replaces the current value with the result of arithmetic subtraction of the value and arg. That is, it performs atomic post-decrement. The operation is read-modify-write operation. Memory is affected according to the value of order.

簡單來說,fetch_sub就是原子操作版本的i–。

atomic_refcount_decrement回傳的是fetch_sub的回傳值-1,來看看fetch_sub的回傳值為何?

Return value
The value immediately preceding the effects of this function in the modification order of *this.

因為fetch_sub回傳的是在本操作前變數原有的值,所以atomic_refcount_decrement回傳fetch_sub的回傳值-1表示的是經過fetch_sub操作後的值。

detail::atomic_weakcount_decrement

_reset中還調用了detail::atomic_weakcount_decrement

inline size_t atomic_weakcount_decrement(std::atomic<size_t>& weakcount) {
  return weakcount.fetch_sub(1, std::memory_order_acq_rel) - 1;
}

其作用與detail::atomic_refcount_decrement類似,只不過作用的對象變為weakcount_

使用案例一 - c10::make_intrusive

at::detail::_empty_generic

aten/src/ATen/EmptyTensor.cpp

先來看看在aten/src/ATen/EmptyTensor.cppat::detail::_empty_generic函數中intrusive_ptr是如何被使用的:

  auto storage_impl = c10::make_intrusive<StorageImpl>(
      c10::StorageImpl::use_byte_size_t(),
      size_bytes,
      allocator,
      /*resizeable=*/true);

可以看到它呼叫了c10::make_intrusive,並以StorageImpl為模板參數,且傳入四個參數(這四個參數是StorageImpl建構子所需的)。

c10::make_intrusive

c10/util/intrusive_ptr.h

template <
    class TTarget,
    class NullType = detail::intrusive_target_default_null_type<TTarget>,
    class... Args>
inline intrusive_ptr<TTarget, NullType> make_intrusive(Args&&... args) {
  return intrusive_ptr<TTarget, NullType>::make(std::forward<Args>(args)...);
}

模板參數TTargetStorageImpl,傳入四個型別分別為use_byte_size_t, SymInt size_bytes, at::Allocator*, bool的參數,這四個參數被接力傳入c10::intrusive_ptr::make

c10::intrusive_ptr::make

c10/util/intrusive_ptr.h

  /**
   * Allocate a heap object with args and wrap it inside a intrusive_ptr and
   * incref. This is a helper function to let make_intrusive() access private
   * intrusive_ptr constructors.
   */
  template <class... Args>
  static intrusive_ptr make(Args&&... args) {
    return intrusive_ptr(new TTarget(std::forward<Args>(args)...));
  }

模板參數TTargetStorageImpl,這裡會將四個型別分別為use_byte_size_t, SymInt size_bytes, at::Allocator*, bool的參數接力傳給TTarget建構子。

此處透過new TTarget得到StorageImpl物件指標後,會接著呼叫intrusive_ptr的private的建構子。注意這裡因為make也是c10::intrusive_ptr的成員函數之一,所以可以自由地調用private的建構子。

c10::intrusive_ptr::intrusive_ptr

回頭看c10::intrusive的建構子,發現有個條件:TTarget必須繼承自intrusive_ptr_target

c10::StorageImpl

c10/core/StorageImpl.h中檢查一下StorageImpl是否符合這個條件:

struct C10_API StorageImpl : public c10::intrusive_ptr_target {
    //...
};

使用案例二 - c10::Storage

StorageImpl繼承自c10::intrusive_ptr_target,所以c10::intrusive_ptr可以與StorageImpl搭配使用。

同樣地,TensorImpl也繼承自c10::intrusive_ptr_target,而TensorBase就是透過 c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl> impl_;這個成員變數來存取TensorImpl物件的。

此處從c10::StorageImpl入手:

c10::StorageImpl

c10/core/StorageImpl.h

c10::StorageImpl是繼承自c10::intrusive_ptr_target的具體功能類:

struct C10_API StorageImpl : public c10::intrusive_ptr_target {
    //...
};

接下來看看c10::intrusive_ptr_target是怎麼與c10::intrusive_ptr搭配使用的。

c10::Storage constructor

torch/include/c10/core/Storage.h

c10/core/Storage.h

struct C10_API Storage {
  // ...
  Storage(c10::intrusive_ptr<StorageImpl> ptr)
      : storage_impl_(std::move(ptr)) {}
  // ...
 protected:
  c10::intrusive_ptr<StorageImpl> storage_impl_;
}

Storage類別有一個成員變數storage_impl_,是經intrusive_ptr包裝過後的StorageImpl。記得我們之前看過StorageImplc10::intrusive_ptr_target的子類別,這也印證了剛才所說intrusive_ptr必須搭配intrusive_ptr_target使用的規定。

注意到這裡初始化storage_impl_時用到了std::move,也就是調用了c10::intrusive_ptr的move constructor。

c10::intrusive_ptr move constructor

c10/util/intrusive_ptr.h

c10::intrusive_ptr的move constructor如下。把rhstarget_佔為己有後,將rhs.target_設為空:

  intrusive_ptr(intrusive_ptr&& rhs) noexcept : target_(rhs.target_) {
    rhs.target_ = NullType::singleton();
  }

下面這種move constructor支援不同類型的rhs,並多了相應的類型檢查功能,如果模板參數From可以被轉換成TTarget*(也就是target_的型別)才算成功:

  template <class From, class FromNullType>
  /* implicit */ intrusive_ptr(intrusive_ptr<From, FromNullType>&& rhs) noexcept
      : target_(
            detail::assign_ptr_<TTarget, NullType, FromNullType>(rhs.target_)) {
    static_assert(
        std::is_convertible<From*, TTarget*>::value,
        "Type mismatch. intrusive_ptr move constructor got pointer of wrong type.");
    rhs.target_ = FromNullType::singleton();
  }

至於為何要使用move constructor呢?

根據Why would I std::move an std::shared_ptr?

std::shared_ptr reference count is atomic. increasing or decreasing the reference count requires atomic increment or decrement. This is hundred times slower than non-atomic increment/decrement, not to mention that if we increment and decrement the same counter we wind up with the exact number, wasting a ton of time and resources in the process.

By moving the shared_ptr instead of copying it, we "steal" the atomic reference count and we nullify the other shared_ptr. "stealing" the reference count is not atomic, and it is hundred times faster than copying the shared_ptr (and causing atomic reference increment or decrement).

如果使用copy constructor的話,就需要atomic地增/減smart pointer的引用計數,而這個操作是十分耗時的,改用move constructor就可以免去這個atomic操作,節省大量時間。

使用案例三 - THPPointer<c10::StorageImpl>::free

注釋中說傳入reclaimdecref)的owning_ptr必須由release創造,但實際去PyTorch代碼中尋找,卻沒有找到releasedecref連用的案例。但注意到release回傳的是raw pointer,推測reclaimdecref)的參數只要是raw pointer就好,正符合此處THPPointer::free的使用方式。

THPPointer

torch/csrc/utils/object_ptr.h

template <class T>
class THPPointer {
 public:
  THPPointer() : ptr(nullptr){};
  explicit THPPointer(T* ptr) noexcept : ptr(ptr){};
  THPPointer(THPPointer&& p) noexcept {
    free();
    ptr = p.ptr;
    p.ptr = nullptr;
  };

  ~THPPointer() {
    free();
  };
  // ...
  T* release() {
    T* tmp = ptr;
    ptr = nullptr;
    return tmp;
  }
  // ...

 private:
  void free();
  T* ptr = nullptr;
};

THPPointer是PyTorch自己實現的RAII的smart pointer,是一個類別模板。

其私有成員變數ptr是指向T類型物件的raw pointer。

free函數則用於釋放raw pointer,會在destructor中被調用。從本檔案中沒有找到free函數的定義,其實它是由THPPointer的模板特化(template specialization)來實現。

THPPointer<c10::StorageImpl>::free

torch/csrc/Storage.cpp

template <>
void THPPointer<c10::StorageImpl>::free() {
  if (ptr) {
    c10::raw::intrusive_ptr::decref(ptr);
  }
}

THPPointer<c10::StorageImpl>就是THPPointer的模板特化,這裡實現了free函數。

當raw pointer ptr非空時,就用decref將底層物件的引用計數減1,如果減1之後為0,則銷毀該物件。

demo

編輯intrusive_ptr.cpp如下:

#include <torch/torch.h>
#include <iostream>
#include <vector>
#include <memory> //shared_ptr
#include <boost/intrusive_ptr.hpp>
#include <boost/detail/atomic_count.hpp> // boost::detail::atomic_count
#include <boost/checked_delete.hpp> // boost::checked_delete

// #define use_weak

// #define use_shared
// #define use_boost
// #define use_c10

#ifdef use_c10
#define smart_ptr c10::intrusive_ptr
#define make_ptr c10::make_intrusive
#define weak_smart_ptr c10::weak_intrusive_ptr
#elif defined(use_boost)
#define smart_ptr boost::intrusive_ptr
#elif defined(use_shared)
#define smart_ptr std::shared_ptr
#define make_ptr std::make_shared
#define weak_smart_ptr std::weak_ptr
#endif

#ifdef use_boost
template<class T>
class intrusive_ptr_base {
public:
    /**
    * 缺省构造函数
    */
    intrusive_ptr_base(): ref_count(0) {
        // std::cout << "intrusive_ptr_base default constructor" << std::endl;
    }
     
    /**
    * 不允许拷贝构造,只能使用intrusive_ptr来构造另一个intrusive_ptr
    */
    intrusive_ptr_base(intrusive_ptr_base<T> const&): ref_count(0) {
        std::cout << "intrusive_ptr_base copy constructor" << std::endl;
    }

    ~intrusive_ptr_base(){
        std::cout << "intrusive_ptr_base destructor" << std::endl;
    }
     
    /**
    * 不允许进行赋值操作
    */
    intrusive_ptr_base& operator=(intrusive_ptr_base const& rhs) {
        std::cout << "Assignment operator" << std::endl;
        return *this;
    }
     
    /**
    * 递增引用计数(放到基类中以便compiler能找到,否则需要放到boost名字空间中)
    */
    friend void intrusive_ptr_add_ref(intrusive_ptr_base<T> const* s) {
        std::cout << "intrusive_ptr_base add ref" << std::endl;
        assert(s->ref_count >= 0);
        assert(s != 0);
        ++s->ref_count;
    }
 
    /**
    * 递减引用计数
    */
    friend void intrusive_ptr_release(intrusive_ptr_base<T> const* s) {
        std::cout << "intrusive_ptr_base release" << std::endl;
        assert(s->ref_count > 0);
        assert(s != 0);
        if (--s->ref_count == 0)
            boost::checked_delete(static_cast<T const*>(s));  //s的实际类型就是T,intrusive_ptr_base<T>为基类
    }
     
    /**
    * 类似于shared_from_this()函数
    */
    boost::intrusive_ptr<T> self() {
        return boost::intrusive_ptr<T>((T*)this);
    }
     
    boost::intrusive_ptr<const T> self() const {
        return boost::intrusive_ptr<const T>((T const*)this);
    }
     
    int refcount() const {
        return ref_count;
    }
     
private:
    ///should be modifiable even from const intrusive_ptr objects
    mutable boost::detail::atomic_count ref_count;
 
};
#endif

#ifdef use_c10
class MyVector : public c10::intrusive_ptr_target {
#elif defined(use_boost)
class MyVector : public intrusive_ptr_base<MyVector> {
#elif defined(use_shared)
class MyVector {
#endif
public:
  MyVector(const std::vector<int>& d) : data(d) {
    std::cout << "MyVector constructor" << std::endl;
  }
  ~MyVector() {
    std::cout << "MyVector destructor" << std::endl;
  }

  std::vector<int> data;
};

class A;
class B;

#ifdef use_c10
class A : public c10::intrusive_ptr_target {
#elif defined(use_boost)
class A : public intrusive_ptr_base<A> {
#elif defined(use_shared)
class A {
#endif
public:
  A() {
    // std::cout << "A constructor" << std::endl;
  }

  ~A() {
    std::cout << "A destructor" << std::endl;
  }

#ifdef use_weak
  weak_smart_ptr<B> pointer;
#else
  smart_ptr<B> pointer;
#endif
};

#ifdef use_c10
class B : public c10::intrusive_ptr_target {
#elif defined(use_boost)
class B : public intrusive_ptr_base<B> {
#elif defined(use_shared)
class B {
#endif
public:
  B() {
    // std::cout << "B constructor" << std::endl;
  }

  ~B() {
    std::cout << "B destructor" << std::endl;
  }

#ifdef use_weak
  weak_smart_ptr<A> pointer;
#else
  smart_ptr<A> pointer;
#endif
};

int main() {
  {
    // 多指標指向同一物件
    std::cout << "Multiple smart pointer point to the same object" << std::endl;
    std::vector<int> vec({1,2,3});
    
    MyVector* raw_ptr = new MyVector(vec);
    smart_ptr<MyVector> ip, ip2;
#if defined(use_c10)
    std::cout << "Create 1st smart pointer" << std::endl;
    // intrusive_ptr沒有提供接受raw pointer的建構子,也沒有接受物件的建構子
    // make_intrusive則是會新創建一個物件
    // 所以此處借用接受raw pointer為參數的recliam函數
    ip.reclaim(raw_ptr);
    // 但是以下報錯說無法用一個refcount非0的raw pointer創建c10::intrusive_ptr
    // 所以使用intrusive_ptr似乎無法讓多個指標指向同一個物件
    /*
    terminate called after throwing an instance of 'c10::Error'
      what():  owning_ptr == NullType::singleton() || owning_ptr->refcount_.load() == 0 || owning_ptr->weakcount_.load() INTERNAL ASSERT FAILED at "/root/Documents/installation/libtorch/include/c10/util/intrusive_ptr.h":471, please report a bug to PyTorch. TTarget violates the invariant that refcount > 0  =>  weakcount > 0
    Exception raised from reclaim at /xxx/libtorch/include/c10/util/intrusive_ptr.h:471 (most recent call first):
    */
    // std::cout << "Create 2nd smart pointer" << std::endl;
    // ip2.reclaim(raw_ptr);
#else
    std::cout << "Create 1st smart pointer" << std::endl;
    ip = smart_ptr<MyVector>(raw_ptr);
    std::cout << "Create 2nd smart pointer" << std::endl;
    ip2 = smart_ptr<MyVector>(raw_ptr);
#endif
    // shared_ptr: MyVector的destructor會被調用兩次,出現Segmentation fault (core dumped)
    // boost::intrusive_ptr: destructor只會被調用一次
  }

  std::cout << std::endl;

  {
    // 循環引用
    std::cout << "Circular reference" << std::endl;
#if defined(use_c10)
    smart_ptr<A> a_ptr = make_ptr<A>();
    smart_ptr<B> b_ptr = make_ptr<B>();
#else
    A* a_raw_ptr = new A();
    B* b_raw_ptr = new B();
    std::cout << "Create A's smart pointer" << std::endl;
    smart_ptr<A> a_ptr(a_raw_ptr);
    std::cout << "Create B's smart pointer" << std::endl;
    smart_ptr<B> b_ptr(b_raw_ptr);
#endif

#if !defined(use_boost)
    std::cout << "A ref count: " << a_ptr.use_count() << std::endl;
    std::cout << "B ref count: " << b_ptr.use_count() << std::endl;
#else
    std::cout << "A ref count: " << a_ptr->refcount() << std::endl;
    std::cout << "B ref count: " << b_ptr->refcount() << std::endl;
#endif

    std::cout << "A's smart pointer references to B" << std::endl;
    a_ptr->pointer = b_ptr;
    std::cout << "B's smart pointer references to A" << std::endl;
    b_ptr->pointer = a_ptr;

#if !defined(use_boost)
    std::cout << "A ref count: " << a_ptr.use_count() << std::endl;
    std::cout << "B ref count: " << b_ptr.use_count() << std::endl;
#else
    std::cout << "A ref count: " << a_ptr->refcount() << std::endl;
    std::cout << "B ref count: " << b_ptr->refcount() << std::endl;
#endif
    // shared_ptr, boost::intrusive_ptr: 引用計數都由1變成2,最後destructor不會被調用
  }
  return 0;
}

boost::intrusive_ptr所指向的物件必須自己實做引用計數功能。此處讓boost::intrusive_ptr指向MyVector,而MyVector繼承了intrusive_ptr_base類別。intrusive_ptr_base除了ref_count這個表示引用計數的成員變數外,還自己實現了intrusive_ptr_add_refintrusive_ptr_release,讓boost::intrusive_ptr能直接調用。

使用以下指令編譯執行:

rm -rf * && cmake -DCMAKE_PREFIX_PATH="<libtorch_installation_path>;<boost_installation_path>" .. && make && ./intrusive_ptr

多指標指向同一物件

std::shared_ptr

MyVector的destructor會被調用兩次,出現Segmentation fault (core dumped):

Multiple smart pointer point to the same object
MyVector constructor
Create 1st smart pointer
Create 2nd smart pointer
MyVector destructor
MyVector destructor
Segmentation fault (core dumped)

boost::intrusive_ptr

MyVector的destructor只會被調用一次,成功被析構:

Multiple smart pointer point to the same object
MyVector constructor
Create 1st smart pointer
intrusive_ptr_base add ref
Create 2nd smart pointer
intrusive_ptr_base add ref
intrusive_ptr_base release
intrusive_ptr_base release
MyVector destructor
intrusive_ptr_base destructor

c10::intrusive_ptr

intrusive_ptr中接受raw pointer的建構子是private的,也沒有接受物件的建構子;make_intrusive則是會新創建一個物件。所以此處借用接受raw pointer為參數的recliam函數來試圖讓多個intrusive_ptr指向同一個物件。
但是以下報錯說無法用一個refcount非0的raw pointer創建c10::intrusive_ptr

    terminate called after throwing an instance of 'c10::Error'
      what():  owning_ptr == NullType::singleton() || owning_ptr->refcount_.load() == 0 || owning_ptr->weakcount_.load() INTERNAL ASSERT FAILED at "/root/Documents/installation/libtorch/include/c10/util/intrusive_ptr.h":471, please report a bug to PyTorch. TTarget violates the invariant that refcount > 0  =>  weakcount > 0
    Exception raised from reclaim at /xxx/libtorch/include/c10/util/intrusive_ptr.h:471 (most recent call first):

所以這項嘗試失敗了,只能創建第一個intrusive_ptr

Multiple smart pointer point to the same object
MyVector constructor
Create 1st smart pointer

循環引用

std::shared_ptr

循環引用之後兩個指標的引用計數都由1變成2,最後destructor都不會被調用:

Circular reference
Create A's smart pointer
Create B's smart pointer
A ref count: 1
B ref count: 1
A's smart pointer references to B
B's smart pointer references to A
A ref count: 2
B ref count: 2

如果將AB的成員改成std::weak_ptr

Circular reference
Create A's smart pointer
Create B's smart pointer
A ref count: 1
B ref count: 1
A's smart pointer references to B
B's smart pointer references to A
A ref count: 1
B ref count: 1
B destructor
A destructor

在循環引用後它們的reference count不會增加,並且在離開scope後AB的destructor都會被調用。

boost::intrusive_ptr

Circular reference
Create A's smart pointer
intrusive_ptr_base add ref
Create B's smart pointer
intrusive_ptr_base add ref
A ref count: 1
B ref count: 1
A's smart pointer references to B
intrusive_ptr_base add ref
B's smart pointer references to A
intrusive_ptr_base add ref
A ref count: 2
B ref count: 2
intrusive_ptr_base release
intrusive_ptr_base release

可以看到AB各自的intrusive_ptr_add_ref都被調用了兩次,引用計數變成2。但最後intrusive_ptr_release都只被調用了一次,兩者的引用計數變成1,無法變成0,所以最後無法調用destructor完成析構。

boost::intrusive_ptr無法與std::weak_ptr搭配使用,所以循環引用對boost::intrusive_ptr仍是個問題。

c10::intrusive_ptr

如果AB都是用c10::intrusive_ptr的方式引用對方,結果會跟std::shared_ptr一樣無法析構:

Circular reference
A ref count: 1
B ref count: 1
A's smart pointer references to B
B's smart pointer references to A
A ref count: 2
B ref count: 2

如果改用c10::weak_intrusive_ptr,因為它沒有default constructor,會出現以下錯誤:

error: no matching function for call to 'c10::weak_intrusive_ptr<B>::wea
k_intrusive_ptr()'

所以這項嘗試失敗了,無法用c10::intrusive_ptr造成循環引用的情況。

參考連結

【C++11新特性】 C++11智能指针之weak_ptr

boost::intrusive_ptr原理介绍

Smart Ptr 一點訣 (1):使用 intrusive_ptr

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值