PyTorch中的intrusive_ptr

keineahnung2345

已于 2023-11-19 20:37:29 修改

阅读量880

点赞数 2

分类专栏： PyTorch 文章标签： 1024程序员节 python pytorch c++

于 2023-10-24 22:23:22 首次发布

本文链接：https://blog.csdn.net/keineahnung2345/article/details/134023307

版权

PyTorch 专栏收录该内容

10 篇文章 1 订阅

订阅专栏

PyTorch中的intrusive_ptr

前言
c10::intrusive_ptr_target
c10::intrusive_ptr
使用案例一 - c10::make_intrusive
使用案例二 - c10::Storage
使用案例三 - THPPointer\<c10::StorageImpl>::free
- THPPointer
- THPPointer\<c10::StorageImpl>::free
demo
參考連結

前言

boost::intrusive_ptr與std::unique_ptr，std::shared_ptr等一樣，都是smart pointer。

但是boost::intrusive_ptr比較特別，人如其名，它是一種侵入式的指標。它所指向的物件必須要有一個表示引用計數的成員變數，讓它能自己計算引用計數。另外還需要實作intrusive_ptr_add_ref和intrusive_ptr_release這兩個用於修改引用計數的函數，boost::intrusive_ptr中會透過這兩個函數增加或減少它所指向的物件的引用計數。

想要循環引用時，如果使用shared_ptr會出現無法析構的問題，這個問題可以透過weak_ptr來解決。因為weak_ptr不會增加所指向物件的引用計數，所以從引用計數的角度來看，就不會有deadlock的問題。weak_ptr可以說是為了解決這個問題而出現的，所以它必須搭配shared_ptr來使用。

想要有多個指標指向同一物件時，如果是使用shared_ptr的話，因為引用計數是儲存在shared_ptr裡，當第一個shared_ptr把物件銷毀後，第二個shared_ptr的引用計數仍然為1，所以它也會嘗試銷毀物件，出現重複析構的問題；如果使用intrusive_ptr，因為引用計數是儲存在指向的物件裡，一個物件只有一個引用計數，所以不會出現重複析構的問題。

intrusive_ptr的缺點是無法使用weak_ptr，所以不能用在循環引用的場景中。

PyTorch中也有類似的c10::intrusive_ptr，它所指向的物件必須繼承自c10::intrusive_ptr_target。與boost類似，c10::intrusive_ptr也需要incref和decref這兩個函數管理引用計數。在PyTorch中，這兩個函數是為c10::intrusive_ptr的成員函數，它們會去操作c10::intrusive_ptr_target的成員變數refcount_。

本篇記錄了研讀c10::intrusive_ptr相關代碼的筆記，以及在PyTorch中c10::intrusive_ptr的實際使用案例，最後是用一個demo來探討c10::intrusive_ptr與std::shared_ptr, boost::intrusive_ptr的差別。

c10::intrusive_ptr_target

c10::intrusive_ptr_target在PyTorch中是一個提供引用計數功能的base class（基類）。

引用計數成員

c10/util/intrusive_ptr.h

class C10_API intrusive_ptr_target {
  // Note [Weak references for intrusive refcounting]
  // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  // Here's the scheme:
  //
  //  - refcount == number of strong references to the object
  //    weakcount == number of weak references to the object,
  //      plus one more if refcount > 0
  //    An invariant: refcount > 0  =>  weakcount > 0
  //
  //  - c10::StorageImpl stays live as long as there are any strong
  //    or weak pointers to it (weakcount > 0, since strong
  //    references count as a +1 to weakcount)
  //
  //  - finalizers are called and data_ptr is deallocated when refcount == 0
  //
  //  - Once refcount == 0, it can never again be > 0 (the transition
  //    from > 0 to == 0 is monotonic)
  //
  //  - When you access c10::StorageImpl via a weak pointer, you must
  //    atomically increment the use count, if it is greater than 0.
  //    If it is not, you must report that the storage is dead.
  //
  mutable std::atomic<size_t> refcount_;
  mutable std::atomic<size_t> weakcount_;
};

我們已經知道boost::intrusive_ptr所指向的物件必須擁有reference count成員變數，因此這裡也有一個refcount_成員變數，那麼weakcount_是從何而來呢？

因為boost::instrusive_ptr在循環引用時會有無法析構的問題，PyTorch中的intrusive_ptr為了避免出現這種情況，被設計成兼具intrusive_ptr和weak_ptr的功能，所以除了refcount_外，還有weakcount_成員變數。

注：此處的refcount_和weakcount_是私有成員變數。前面的mutable修飾字表示即使我們宣告了一個const的intrusive_ptr_target的物件，它的這兩個成員變數仍可被修改；另外這兩個成員變數也可以被intrusive_ptr_target的const成員函數修改。

constructors

  constexpr intrusive_ptr_target() noexcept : refcount_(0), weakcount_(0) {}

  // intrusive_ptr_target supports copy and move: but refcount and weakcount
  // don't participate (since they are intrinsic properties of the memory
  // location)
  intrusive_ptr_target(intrusive_ptr_target&& /*other*/) noexcept
      : intrusive_ptr_target() {}

  intrusive_ptr_target& operator=(intrusive_ptr_target&& /*other*/) noexcept {
    return *this;
  }

  intrusive_ptr_target(const intrusive_ptr_target& /*other*/) noexcept
      : intrusive_ptr_target() {}

  intrusive_ptr_target& operator=(
      const intrusive_ptr_target& /*other*/) noexcept {
    return *this;
  }

default constructor：將refcount_和weakcount_都設為0
copy constructor：intrusive_ptr_target(const intrusive_ptr_target& /*other*/)，無視傳入的參數，直接調用default constructor將refcount_和weakcount_都設為0
move constructor：intrusive_ptr_target(intrusive_ptr_target&& /*other*/)，無視傳入的參數，直接調用default constructor將refcount_和weakcount_都設為0

注：在複製和移動物件時，因為新創建的物件跟傳入的物件參數會被視為不同的物件，所以這裡會將refcount_和weakcount_歸零。

destructor

 protected:
  // protected destructor. We never want to destruct intrusive_ptr_target*
  // directly.
  virtual ~intrusive_ptr_target() {
// Disable -Wterminate and -Wexceptions so we're allowed to use assertions
// (i.e. throw exceptions) in a destructor.
// We also have to disable -Wunknown-warning-option and -Wpragmas, because
// some other compilers don't know about -Wterminate or -Wexceptions and
// will show a warning about unknown warning options otherwise.
#if defined(_MSC_VER) && !defined(__clang__)
#pragma warning(push)
#pragma warning( \
    disable : 4297) // function assumed not to throw an exception but does
#else
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wpragmas"
#pragma GCC diagnostic ignored "-Wunknown-warning-option"
#pragma GCC diagnostic ignored "-Wterminate"
#pragma GCC diagnostic ignored "-Wexceptions"
#endif
    TORCH_INTERNAL_ASSERT_DEBUG_ONLY(
        // Second condition is there to accommodate
        // unsafe_adapt_non_heap_allocated: since we are doing our own
        // deallocation in that case, it is correct for each
        // expected_decref to have happened (some user code tried to
        // decref and thus free the object, but it didn't happen right
        // away) or not (no user code tried to free the object, and
        // now it's getting destroyed through whatever mechanism the
        // caller of unsafe_adapt_non_heap_allocated wanted to
        // use). We choose our reference count such that the count
        // will not dip below INT_MAX regardless.
        refcount_.load() == 0 || refcount_.load() >= INT_MAX,
        "Tried to destruct an intrusive_ptr_target that still has intrusive_ptr to it; refcount was ",
        refcount_.load());
    TORCH_INTERNAL_ASSERT_DEBUG_ONLY(
        // See ~intrusive_ptr for optimization that will frequently result in 1
        // at destruction time.
        weakcount_.load() == 1 || weakcount_.load() == 0 ||
            weakcount_.load() == INT_MAX - 1 || weakcount_.load() == INT_MAX,
        "Tried to destruct an intrusive_ptr_target that still has weak_intrusive_ptr to it");
#if defined(_MSC_VER) && !defined(__clang__)
#pragma warning(pop)
#else
#pragma GCC diagnostic pop
#endif
  }

在調用destructor前，refcount_必須為0，weakcount_必須為1或0（之後會在c10::intrusive_ptr的destructor中看到weakcount_為1時可以被銷毀的原因）。

release_resources

class C10_API intrusive_ptr_target {
 private:
  /**
   * This is called when refcount reaches zero.
   * You can override this to release expensive resources.
   * There might still be weak references, so your object might not get
   * destructed yet, but you can assume the object isn't used anymore,
   * i.e. no more calls to methods or accesses to members (we just can't
   * destruct it yet because we need the weakcount accessible).
   *
   * If there are no weak references (i.e. your class is about to be
   * destructed), this function WILL NOT be called.
   */
  virtual void release_resources() {}
};

release_resources函數用於釋放資源，是一個虛擬函數，其具體內容由intrusive_ptr_target的各子類別來實作。intrusive_ptr_target的一個子類別便是TensorImpl，其release_resources函數實作如下。

TensorImpl::release_resources

torch/include/c10/core/TensorImpl.h

在看release_resources之前，先大致看一下TensorImpl的成員變數，它們就是待會release_resources要釋放的“資源”：

struct C10_API TensorImpl : public c10::intrusive_ptr_target {
 // ...
 protected:
  Storage storage_;

 private:
  // ...
  std::unique_ptr<c10::AutogradMetaInterface> autograd_meta_ = nullptr;
  // ...

 protected:
  // ...
  impl::PyObjectSlot pyobj_slot_;
  // ...
}

TensorImpl::release_resources如下：

c10/core/TensorImpl.cpp

void TensorImpl::release_resources() {
  autograd_meta_.reset();
  if (storage_) {
    storage_ = {};
  }
  pyobj_slot_.destroy_pyobj_if_needed();
}

看起來release_resources函數的功能就是把各成員變數清空。

friend classes/functions

class C10_API intrusive_ptr_target {
  // ...
  template <typename T, typename NullType>
  friend class intrusive_ptr;
  friend inline void raw::intrusive_ptr::incref(intrusive_ptr_target* self);

  template <typename T, typename NullType>
  friend class weak_intrusive_ptr;
  friend inline void raw::weak_intrusive_ptr::incref(
      intrusive_ptr_target* self);

  template <typename T>
  friend struct ExclusivelyOwnedTensorTraits;
};

這裡將intrusive_ptr::incref函數宣告成friend function，所以待會會看到：incref可以自由存取refcount_這個私有成員變數。

關於friend function，詳見Friend Class and Function in C++的2. Member Function of Another Class as Friend Function章節。

c10::intrusive_ptr

target_

template <
    class TTarget,
    class NullType = detail::intrusive_target_default_null_type<TTarget>>
class intrusive_ptr final {
 private:
//  the following static assert would be nice to have but it requires
//  the target class T to be fully defined when intrusive_ptr<T> is instantiated
//  this is a problem for classes that contain pointers to themselves
//  static_assert(
//      std::is_base_of<intrusive_ptr_target, TTarget>::value,
//      "intrusive_ptr can only be used for classes that inherit from
//      intrusive_ptr_target.");
#ifndef _WIN32
  // This static_assert triggers on MSVC
  //  error C2131: expression did not evaluate to a constant
  static_assert(
      NullType::singleton() == NullType::singleton(),
      "NullType must have a constexpr singleton() method");
#endif
  static_assert(
      std::is_base_of<
          TTarget,
          typename std::remove_pointer<decltype(NullType::singleton())>::type>::
          value,
      "NullType::singleton() must return a element_type* pointer");

  TTarget* target_;
  // ...

c10::intrusive_ptr是一個class template，target_的型別即模板參數TTarget。

target_是intrusive_ptr的私有成員變數，是intrusive_ptr所管理的raw pointer，注釋中說明TTarget必須繼承自intrusive_ptr_target。

incref

c10::intrusive_ptr::incref是c10:intrusive_ptr用來增加底層物件引用計數的函數。

incref

namespace c10 {
// ...

namespace raw {

namespace intrusive_ptr {

// WARNING: Unlike the reclaim() API, it is NOT valid to pass
// NullType::singleton to this function
inline void incref(intrusive_ptr_target* self) {
  if (self) {
    detail::atomic_refcount_increment(self->refcount_);
  }
}

// ...

} // namespace intrusive_ptr

// ...

} // namespace raw

} // namespace c10

因為intrusive_ptr_target將c10::raw::intrusive_ptr::incref宣告為friend function，所以c10::raw::intrusive_ptr::incref才可以存取intrusive_ptr_target的私有成員變數refcount_。

detail::atomic_refcount_increment

incref中調用了detail::atomic_refcount_increment：

// Increment needs to be acquire-release to make use_count() and
// unique() reliable.
inline size_t atomic_refcount_increment(std::atomic<size_t>& refcount) {
  return refcount.fetch_add(1, std::memory_order_acq_rel) + 1;
}

參考std::atomic::fetch_add：

Atomically replaces the current value with the result of arithmetic addition of the value and arg. That is, it performs atomic post-increment. The operation is a read-modify-write operation. Memory is affected according to the value of order.

簡單來說，fetch_add就是原子操作版本的i++。

atomic_refcount_increment回傳的是fetch_add的回傳值＋1，來看看refcount的回傳值為何？

Return value
The value immediately preceding the effects of this function in the modification order of *this.

因為fetch_add回傳的是在本操作前變數原有的值，所以atomic_refcount_increment回傳fetch_add的回傳值＋1表示的是經過fetch_add操作後的值。

decref

decref是c10:intrusive_ptr用來減少底層物件引用計數的函數。

與decref相關的函數有release，decref和reclaim三個。其中decref會調用reclaim，而傳入reclaim的指標必須由release創造。所以這裡先從release函數開始。

release

template <
    class TTarget,
    class NullType = detail::intrusive_target_default_null_type<TTarget>>
class intrusive_ptr final {
//...
 public:
  //...
  /**
   * Returns an owning (!) pointer to the underlying object and makes the
   * intrusive_ptr instance invalid. That means the refcount is not decreased.
   * You *must* put the returned pointer back into a intrusive_ptr using
   * intrusive_ptr::reclaim(ptr) to properly destruct it.
   * This is helpful for C APIs.
   */
  TTarget* release() noexcept {
    // NOLINTNEXTLINE(clang-analyzer-core.uninitialized.Assign)
    TTarget* result = target_;
    target_ = NullType::singleton();
    return result;
  }

首先將底層物件的raw pointertarget_暫存至result，這個result會在函數最後被返回。

接著將target_設為空，使intrusive_ptr無效化，注意此處並未減少引用計數也尚未析構底層物件。

如注釋中所說，必須將本函數返回的raw pointer result當作參數傳入reclaim來正確地析構。

decref

namespace c10 {
// ...

namespace raw {

namespace intrusive_ptr {

// WARNING: Unlike the reclaim() API, it is NOT valid to pass
// NullType::singleton to this function
inline void decref(intrusive_ptr_target* self) {
  // Let it die
  c10::intrusive_ptr<intrusive_ptr_target>::reclaim(self);
  // NB: Caller still has 'self' pointer, but it's now invalid.
  // If you want more safety, used the actual c10::intrusive_ptr class
}

// ...

} // namespace intrusive_ptr

// ...

} // namespace raw

} // namespace c10

看起來只是reclaim函數的wrapper。

reclaim

  /**
   * Takes an owning pointer to TTarget* and creates an intrusive_ptr that takes
   * over ownership. That means the refcount is not increased.
   * This is the counter-part to intrusive_ptr::release() and the pointer
   * passed in *must* have been created using intrusive_ptr::release().
   */
  static intrusive_ptr reclaim(TTarget* owning_ptr) {
    TORCH_INTERNAL_ASSERT_DEBUG_ONLY(
        owning_ptr == NullType::singleton() ||
            owning_ptr->refcount_.load() == 0 || owning_ptr->weakcount_.load(),
        "TTarget violates the invariant that refcount > 0  =>  weakcount > 0");
    return intrusive_ptr(owning_ptr, raw::DontIncreaseRefcount{});
  }

如前所述，傳入reclaim的owning_ptr必須由release創造。

這個函數會創造一個intrusive_ptr，拿走owning_ptr的所有權，用它來建構另外一個引用計數為0的intrusive_ptr。

在函數返回時，因為新建的intrusive_ptr的生命週期已到了盡頭，所以會調用其解構子~intrusive_ptr。

以下進一步查看intrusive_ptr的建構子和解構子。

constructor

不增加引用計數的建構子

intrusive_ptr有八種建構子，以下這個版本的建構子就只是設定target_，並不涉及引用計數的改動：

public:
 // This constructor will not increase the ref counter for you.
  // We use the tagged dispatch mechanism to explicitly mark this constructor
  // to not increase the refcount
  explicit intrusive_ptr(TTarget* target, raw::DontIncreaseRefcount) noexcept
      : target_(target) {}

其中raw::DontIncreaseRefcount如下，就只是一個空的結構體：

namespace c10 {
// ...
namespace raw {
// ...
// constructor tag used by intrusive_ptr constructors
struct DontIncreaseRefcount {};
} // namespace raw

它的存在讓上面的intrusive_ptr建構子跟其它建構子有了不同的簽名，調用時可以用類似intrusive_ptr(NullType::singleton(), raw::DontIncreaseRefcount{})的寫法，有極高的可讀性。

private的建構子

以下這個版本的建構子只接受target一個參數，它首先調用raw::DontIncreaseRefcount版本的建構子，再自行設定引用計數相關的成員變數。

注意它是private的。

private:
 // This constructor will increase the ref counter for you.
  // This constructor will be used by the make_intrusive(), and also pybind11,
  // which wrap the intrusive_ptr holder around the raw pointer and incref
  // correspondingly (pybind11 requires raw pointer constructor to incref by
  // default).
  explicit intrusive_ptr(TTarget* target)
      : intrusive_ptr(target, raw::DontIncreaseRefcount{}) {
    if (target_ != NullType::singleton()) {
      // We just created result.target_, so we know no other thread has
      // access to it, so we know we needn't care about memory ordering.
      // (On x86_64, a store with memory_order_relaxed generates a plain old
      // `mov`, whereas an atomic increment does a lock-prefixed `add`, which is
      // much more expensive: https://godbolt.org/z/eKPzj8.)
      TORCH_INTERNAL_ASSERT_DEBUG_ONLY(
          target_->refcount_ == 0 && target_->weakcount_ == 0,
          "intrusive_ptr: Newly-created target had non-zero refcounts. Does its "
          "constructor do something strange like incref or create an "
          "intrusive_ptr from `this`?");
      target_->refcount_.store(1, std::memory_order_relaxed);
      target_->weakcount_.store(1, std::memory_order_relaxed);
    }
  }

此處將refcount_設為1，因為有一個intrusive_ptr指向它（也就是這個intrusive_ptr本身）。

將weakcount_設為1則是因為PyTorch中自己立下的規定：在refcount_大於0時，weakcount_會比它實際應有的值大1。

  //  - refcount == number of strong references to the object
  //    weakcount == number of weak references to the object,
  //      plus one more if refcount > 0
  //    An invariant: refcount > 0  =>  weakcount > 0

destructor

  ~intrusive_ptr() noexcept {
    reset_();
  }

解構子會調用reset_函數。

_reset

 private:
  // ...
  void reset_() noexcept {
    if (target_ != NullType::singleton() &&
        detail::atomic_refcount_decrement(target_->refcount_) == 0) {
      // See comment above about weakcount. As long as refcount>0,
      // weakcount is one larger than the actual number of weak references.
      // So we need to decrement it here.
      bool should_delete =
          target_->weakcount_.load(std::memory_order_acquire) == 1;
      if (!should_delete) {
        // justification for const_cast: release_resources is basically a
        // destructor and a destructor always mutates the object, even for const
        // objects. NOLINTNEXTLINE(cppcoreguidelines-pro-type-const-cast)
        const_cast<std::remove_const_t<TTarget>*>(target_)->release_resources();
        should_delete =
            detail::atomic_weakcount_decrement(target_->weakcount_) == 0;
      }
      if (should_delete) {
        delete target_;
      }
    }
  }

一開始會先遞減refcount_，如果變成0，表示已經沒有指標指向它，才會繼續做接下來的事。

要注意的是，在refcount_大於0時，weakcount_會比它實際應有的值大1。所以這裡檢查weakcount_是否為1，它的實際意義是檢查weak count是否為0，如果為真，則將should_delete設為true，代表底層物件的生命已經到盡頭了。

如果weak count仍大於0（should_delete為false），就調用release_resources釋放底層物件所佔用的資源，但是保留住target_指標。接著遞減weakcount_，遞減之後，weakcount_變為實際意義上的weak count。檢查它是否為0，如果為0，則將should_delete設為true。

如果should_delete為true（即weak count為0），就delete target_，正式刪除底層物件。

總結一下，如果只關注refcount_，那麼_reset函數的功用就是將它減1，如果減1之後refcount_為0，就會銷毀它所指向的物件。

這裡有個疑問：當refcount_為0而weakcount_大於1時，會釋放底層物件所佔用的資源，但是保留住target_指標。不懂為何資源可以先被釋放，之後如果需要存取底層物件時不會出錯？

另外decref，reclaim，~intrusive_ptr和_reset都不是intrusive_ptr_target的friend function，為何這裡可以修改intrusive_ptr_target的私有成員變數refcount_？

detail::atomic_refcount_decrement

_reset中調用了detail::atomic_refcount_decrement：

// Both decrements need to be acquire-release for correctness. See
// e.g. std::shared_ptr implementation.
inline size_t atomic_refcount_decrement(std::atomic<size_t>& refcount) {
  return refcount.fetch_sub(1, std::memory_order_acq_rel) - 1;
}

參考std::atomic::fetch_sub：

Atomically replaces the current value with the result of arithmetic subtraction of the value and arg. That is, it performs atomic post-decrement. The operation is read-modify-write operation. Memory is affected according to the value of order.

簡單來說，fetch_sub就是原子操作版本的i–。

atomic_refcount_decrement回傳的是fetch_sub的回傳值－1，來看看fetch_sub的回傳值為何？

Return value
The value immediately preceding the effects of this function in the modification order of *this.

因為fetch_sub回傳的是在本操作前變數原有的值，所以atomic_refcount_decrement回傳fetch_sub的回傳值－1表示的是經過fetch_sub操作後的值。

detail::atomic_weakcount_decrement

_reset中還調用了detail::atomic_weakcount_decrement：

inline size_t atomic_weakcount_decrement(std::atomic<size_t>& weakcount) {
  return weakcount.fetch_sub(1, std::memory_order_acq_rel) - 1;
}

其作用與detail::atomic_refcount_decrement類似，只不過作用的對象變為weakcount_。

使用案例一 - c10::make_intrusive

at::detail::_empty_generic

aten/src/ATen/EmptyTensor.cpp

先來看看在aten/src/ATen/EmptyTensor.cpp 的at::detail::_empty_generic函數中intrusive_ptr是如何被使用的：

  auto storage_impl = c10::make_intrusive<StorageImpl>(
      c10::StorageImpl::use_byte_size_t(),
      size_bytes,
      allocator,
      /*resizeable=*/true);

可以看到它呼叫了c10::make_intrusive，並以StorageImpl為模板參數，且傳入四個參數（這四個參數是StorageImpl建構子所需的）。

c10::make_intrusive

c10/util/intrusive_ptr.h

template <
    class TTarget,
    class NullType = detail::intrusive_target_default_null_type<TTarget>,
    class... Args>
inline intrusive_ptr<TTarget, NullType> make_intrusive(Args&&... args) {
  return intrusive_ptr<TTarget, NullType>::make(std::forward<Args>(args)...);
}

模板參數TTarget是StorageImpl，傳入四個型別分別為use_byte_size_t, SymInt size_bytes, at::Allocator*, bool的參數，這四個參數被接力傳入c10::intrusive_ptr::make。

c10::intrusive_ptr::make

c10/util/intrusive_ptr.h

  /**
   * Allocate a heap object with args and wrap it inside a intrusive_ptr and
   * incref. This is a helper function to let make_intrusive() access private
   * intrusive_ptr constructors.
   */
  template <class... Args>
  static intrusive_ptr make(Args&&... args) {
    return intrusive_ptr(new TTarget(std::forward<Args>(args)...));
  }

模板參數TTarget是StorageImpl，這裡會將四個型別分別為use_byte_size_t, SymInt size_bytes, at::Allocator*, bool的參數接力傳給TTarget建構子。

此處透過new TTarget得到StorageImpl物件指標後，會接著呼叫intrusive_ptr的private的建構子。注意這裡因為make也是c10::intrusive_ptr的成員函數之一，所以可以自由地調用private的建構子。

c10::intrusive_ptr::intrusive_ptr

回頭看c10::intrusive的建構子，發現有個條件：TTarget必須繼承自intrusive_ptr_target。

c10::StorageImpl

到c10/core/StorageImpl.h中檢查一下StorageImpl是否符合這個條件：

struct C10_API StorageImpl : public c10::intrusive_ptr_target {
    //...
};

使用案例二 - c10::Storage

StorageImpl繼承自c10::intrusive_ptr_target，所以c10::intrusive_ptr可以與StorageImpl搭配使用。

同樣地，TensorImpl也繼承自c10::intrusive_ptr_target，而TensorBase就是透過 c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl> impl_;這個成員變數來存取TensorImpl物件的。

此處從c10::StorageImpl入手：

c10::StorageImpl

c10/core/StorageImpl.h

c10::StorageImpl是繼承自c10::intrusive_ptr_target的具體功能類：

struct C10_API StorageImpl : public c10::intrusive_ptr_target {
    //...
};

接下來看看c10::intrusive_ptr_target是怎麼與c10::intrusive_ptr搭配使用的。

c10::Storage constructor

torch/include/c10/core/Storage.h

c10/core/Storage.h

struct C10_API Storage {
  // ...
  Storage(c10::intrusive_ptr<StorageImpl> ptr)
      : storage_impl_(std::move(ptr)) {}
  // ...
 protected:
  c10::intrusive_ptr<StorageImpl> storage_impl_;
}

Storage類別有一個成員變數storage_impl_，是經intrusive_ptr包裝過後的StorageImpl。記得我們之前看過StorageImpl是c10::intrusive_ptr_target的子類別，這也印證了剛才所說intrusive_ptr必須搭配intrusive_ptr_target使用的規定。

注意到這裡初始化storage_impl_時用到了std::move，也就是調用了c10::intrusive_ptr的move constructor。

c10::intrusive_ptr move constructor

c10/util/intrusive_ptr.h

c10::intrusive_ptr的move constructor如下。把rhs的target_佔為己有後，將rhs.target_設為空：

  intrusive_ptr(intrusive_ptr&& rhs) noexcept : target_(rhs.target_) {
    rhs.target_ = NullType::singleton();
  }

下面這種move constructor支援不同類型的rhs，並多了相應的類型檢查功能，如果模板參數From可以被轉換成TTarget*（也就是target_的型別）才算成功：

  template <class From, class FromNullType>
  /* implicit */ intrusive_ptr(intrusive_ptr<From, FromNullType>&& rhs) noexcept
      : target_(
            detail::assign_ptr_<TTarget, NullType, FromNullType>(rhs.target_)) {
    static_assert(
        std::is_convertible<From*, TTarget*>::value,
        "Type mismatch. intrusive_ptr move constructor got pointer of wrong type.");
    rhs.target_ = FromNullType::singleton();
  }

至於為何要使用move constructor呢？

根據Why would I std::move an std::shared_ptr?：

std::shared_ptr reference count is atomic. increasing or decreasing the reference count requires atomic increment or decrement. This is hundred times slower than non-atomic increment/decrement, not to mention that if we increment and decrement the same counter we wind up with the exact number, wasting a ton of time and resources in the process.

By moving the shared_ptr instead of copying it, we "steal" the atomic reference count and we nullify the other shared_ptr. "stealing" the reference count is not atomic, and it is hundred times faster than copying the shared_ptr (and causing atomic reference increment or decrement).

如果使用copy constructor的話，就需要atomic地增/減smart pointer的引用計數，而這個操作是十分耗時的，改用move constructor就可以免去這個atomic操作，節省大量時間。

使用案例三 - THPPointer<c10::StorageImpl>::free

注釋中說傳入reclaim（decref）的owning_ptr必須由release創造，但實際去PyTorch代碼中尋找，卻沒有找到release，decref連用的案例。但注意到release回傳的是raw pointer，推測reclaim（decref）的參數只要是raw pointer就好，正符合此處THPPointer::free的使用方式。

THPPointer

torch/csrc/utils/object_ptr.h

template <class T>
class THPPointer {
 public:
  THPPointer() : ptr(nullptr){};
  explicit THPPointer(T* ptr) noexcept : ptr(ptr){};
  THPPointer(THPPointer&& p) noexcept {
    free();
    ptr = p.ptr;
    p.ptr = nullptr;
  };

  ~THPPointer() {
    free();
  };
  // ...
  T* release() {
    T* tmp = ptr;
    ptr = nullptr;
    return tmp;
  }
  // ...

 private:
  void free();
  T* ptr = nullptr;
};

THPPointer是PyTorch自己實現的RAII的smart pointer，是一個類別模板。

其私有成員變數ptr是指向T類型物件的raw pointer。

free函數則用於釋放raw pointer，會在destructor中被調用。從本檔案中沒有找到free函數的定義，其實它是由THPPointer的模板特化（template specialization）來實現。

THPPointer<c10::StorageImpl>::free

torch/csrc/Storage.cpp

template <>
void THPPointer<c10::StorageImpl>::free() {
  if (ptr) {
    c10::raw::intrusive_ptr::decref(ptr);
  }
}

THPPointer<c10::StorageImpl>就是THPPointer的模板特化，這裡實現了free函數。

當raw pointer ptr非空時，就用decref將底層物件的引用計數減1，如果減1之後為0，則銷毀該物件。

demo

編輯intrusive_ptr.cpp如下：

#include <torch/torch.h>
#include <iostream>
#include <vector>
#include <memory> //shared_ptr
#include <boost/intrusive_ptr.hpp>
#include <boost/detail/atomic_count.hpp> // boost::detail::atomic_count
#include <boost/checked_delete.hpp> // boost::checked_delete

// #define use_weak

// #define use_shared
// #define use_boost
// #define use_c10

#ifdef use_c10
#define smart_ptr c10::intrusive_ptr
#define make_ptr c10::make_intrusive
#define weak_smart_ptr c10::weak_intrusive_ptr
#elif defined(use_boost)
#define smart_ptr boost::intrusive_ptr
#elif defined(use_shared)
#define smart_ptr std::shared_ptr
#define make_ptr std::make_shared
#define weak_smart_ptr std::weak_ptr
#endif

#ifdef use_boost
template<class T>
class intrusive_ptr_base {
public:
    /**
    * 缺省构造函数
    */
    intrusive_ptr_base(): ref_count(0) {
        // std::cout << "intrusive_ptr_base default constructor" << std::endl;
    }
     
    /**
    * 不允许拷贝构造，只能使用intrusive_ptr来构造另一个intrusive_ptr
    */
    intrusive_ptr_base(intrusive_ptr_base<T> const&): ref_count(0) {
        std::cout << "intrusive_ptr_base copy constructor" << std::endl;
    }

    ~intrusive_ptr_base(){
        std::cout << "intrusive_ptr_base destructor" << std::endl;
    }
     
    /**
    * 不允许进行赋值操作
    */
    intrusive_ptr_base& operator=(intrusive_ptr_base const& rhs) {
        std::cout << "Assignment operator" << std::endl;
        return *this;
    }
     
    /**
    * 递增引用计数（放到基类中以便compiler能找到，否则需要放到boost名字空间中）
    */
    friend void intrusive_ptr_add_ref(intrusive_ptr_base<T> const* s) {
        std::cout << "intrusive_ptr_base add ref" << std::endl;
        assert(s->ref_count >= 0);
        assert(s != 0);
        ++s->ref_count;
    }
 
    /**
    * 递减引用计数
    */
    friend void intrusive_ptr_release(intrusive_ptr_base<T> const* s) {
        std::cout << "intrusive_ptr_base release" << std::endl;
        assert(s->ref_count > 0);
        assert(s != 0);
        if (--s->ref_count == 0)
            boost::checked_delete(static_cast<T const*>(s));  //s的实际类型就是T，intrusive_ptr_base<T>为基类
    }
     
    /**
    * 类似于shared_from_this()函数
    */
    boost::intrusive_ptr<T> self() {
        return boost::intrusive_ptr<T>((T*)this);
    }
     
    boost::intrusive_ptr<const T> self() const {
        return boost::intrusive_ptr<const T>((T const*)this);
    }
     
    int refcount() const {
        return ref_count;
    }
     
private:
    ///should be modifiable even from const intrusive_ptr objects
    mutable boost::detail::atomic_count ref_count;
 
};
#endif

#ifdef use_c10
class MyVector : public c10::intrusive_ptr_target {
#elif defined(use_boost)
class MyVector : public intrusive_ptr_base<MyVector> {
#elif defined(use_shared)
class MyVector {
#endif
public:
  MyVector(const std::vector<int>& d) : data(d) {
    std::cout << "MyVector constructor" << std::endl;
  }
  ~MyVector() {
    std::cout << "MyVector destructor" << std::endl;
  }

  std::vector<int> data;
};

class A;
class B;

#ifdef use_c10
class A : public c10::intrusive_ptr_target {
#elif defined(use_boost)
class A : public intrusive_ptr_base<A> {
#elif defined(use_shared)
class A {
#endif
public:
  A() {
    // std::cout << "A constructor" << std::endl;
  }

  ~A() {
    std::cout << "A destructor" << std::endl;
  }

#ifdef use_weak
  weak_smart_ptr<B> pointer;
#else
  smart_ptr<B> pointer;
#endif
};

#ifdef use_c10
class B : public c10::intrusive_ptr_target {
#elif defined(use_boost)
class B : public intrusive_ptr_base<B> {
#elif defined(use_shared)
class B {
#endif
public:
  B() {
    // std::cout << "B constructor" << std::endl;
  }

  ~B() {
    std::cout << "B destructor" << std::endl;
  }

#ifdef use_weak
  weak_smart_ptr<A> pointer;
#else
  smart_ptr<A> pointer;
#endif
};

int main() {
  {
    // 多指標指向同一物件
    std::cout << "Multiple smart pointer point to the same object" << std::endl;
    std::vector<int> vec({1,2,3});
    
    MyVector* raw_ptr = new MyVector(vec);
    smart_ptr<MyVector> ip, ip2;
#if defined(use_c10)
    std::cout << "Create 1st smart pointer" << std::endl;
    // intrusive_ptr沒有提供接受raw pointer的建構子，也沒有接受物件的建構子
    // make_intrusive則是會新創建一個物件
    // 所以此處借用接受raw pointer為參數的recliam函數
    ip.reclaim(raw_ptr);
    // 但是以下報錯說無法用一個refcount非0的raw pointer創建c10::intrusive_ptr
    // 所以使用intrusive_ptr似乎無法讓多個指標指向同一個物件
    /*
    terminate called after throwing an instance of 'c10::Error'
      what():  owning_ptr == NullType::singleton() || owning_ptr->refcount_.load() == 0 || owning_ptr->weakcount_.load() INTERNAL ASSERT FAILED at "/root/Documents/installation/libtorch/include/c10/util/intrusive_ptr.h":471, please report a bug to PyTorch. TTarget violates the invariant that refcount > 0  =>  weakcount > 0
    Exception raised from reclaim at /xxx/libtorch/include/c10/util/intrusive_ptr.h:471 (most recent call first):
    */
    // std::cout << "Create 2nd smart pointer" << std::endl;
    // ip2.reclaim(raw_ptr);
#else
    std::cout << "Create 1st smart pointer" << std::endl;
    ip = smart_ptr<MyVector>(raw_ptr);
    std::cout << "Create 2nd smart pointer" << std::endl;
    ip2 = smart_ptr<MyVector>(raw_ptr);
#endif
    // shared_ptr: MyVector的destructor會被調用兩次,出現Segmentation fault (core dumped)
    // boost::intrusive_ptr: destructor只會被調用一次
  }

  std::cout << std::endl;

  {
    // 循環引用
    std::cout << "Circular reference" << std::endl;
#if defined(use_c10)
    smart_ptr<A> a_ptr = make_ptr<A>();
    smart_ptr<B> b_ptr = make_ptr<B>();
#else
    A* a_raw_ptr = new A();
    B* b_raw_ptr = new B();
    std::cout << "Create A's smart pointer" << std::endl;
    smart_ptr<A> a_ptr(a_raw_ptr);
    std::cout << "Create B's smart pointer" << std::endl;
    smart_ptr<B> b_ptr(b_raw_ptr);
#endif

#if !defined(use_boost)
    std::cout << "A ref count: " << a_ptr.use_count() << std::endl;
    std::cout << "B ref count: " << b_ptr.use_count() << std::endl;
#else
    std::cout << "A ref count: " << a_ptr->refcount() << std::endl;
    std::cout << "B ref count: " << b_ptr->refcount() << std::endl;
#endif

    std::cout << "A's smart pointer references to B" << std::endl;
    a_ptr->pointer = b_ptr;
    std::cout << "B's smart pointer references to A" << std::endl;
    b_ptr->pointer = a_ptr;

#if !defined(use_boost)
    std::cout << "A ref count: " << a_ptr.use_count() << std::endl;
    std::cout << "B ref count: " << b_ptr.use_count() << std::endl;
#else
    std::cout << "A ref count: " << a_ptr->refcount() << std::endl;
    std::cout << "B ref count: " << b_ptr->refcount() << std::endl;
#endif
    // shared_ptr, boost::intrusive_ptr: 引用計數都由1變成2,最後destructor不會被調用
  }
  return 0;
}

boost::intrusive_ptr所指向的物件必須自己實做引用計數功能。此處讓boost::intrusive_ptr指向MyVector，而MyVector繼承了intrusive_ptr_base類別。intrusive_ptr_base除了ref_count這個表示引用計數的成員變數外，還自己實現了intrusive_ptr_add_ref和intrusive_ptr_release，讓boost::intrusive_ptr能直接調用。

使用以下指令編譯執行：

rm -rf * && cmake -DCMAKE_PREFIX_PATH="<libtorch_installation_path>;<boost_installation_path>" .. && make && ./intrusive_ptr

多指標指向同一物件

std::shared_ptr

MyVector的destructor會被調用兩次，出現Segmentation fault (core dumped)：

Multiple smart pointer point to the same object
MyVector constructor
Create 1st smart pointer
Create 2nd smart pointer
MyVector destructor
MyVector destructor
Segmentation fault (core dumped)

boost::intrusive_ptr

MyVector的destructor只會被調用一次，成功被析構：

Multiple smart pointer point to the same object
MyVector constructor
Create 1st smart pointer
intrusive_ptr_base add ref
Create 2nd smart pointer
intrusive_ptr_base add ref
intrusive_ptr_base release
intrusive_ptr_base release
MyVector destructor
intrusive_ptr_base destructor

c10::intrusive_ptr

intrusive_ptr中接受raw pointer的建構子是private的，也沒有接受物件的建構子；make_intrusive則是會新創建一個物件。所以此處借用接受raw pointer為參數的recliam函數來試圖讓多個intrusive_ptr指向同一個物件。
但是以下報錯說無法用一個refcount非0的raw pointer創建c10::intrusive_ptr：

    terminate called after throwing an instance of 'c10::Error'
      what():  owning_ptr == NullType::singleton() || owning_ptr->refcount_.load() == 0 || owning_ptr->weakcount_.load() INTERNAL ASSERT FAILED at "/root/Documents/installation/libtorch/include/c10/util/intrusive_ptr.h":471, please report a bug to PyTorch. TTarget violates the invariant that refcount > 0  =>  weakcount > 0
    Exception raised from reclaim at /xxx/libtorch/include/c10/util/intrusive_ptr.h:471 (most recent call first):

所以這項嘗試失敗了，只能創建第一個intrusive_ptr：

Multiple smart pointer point to the same object
MyVector constructor
Create 1st smart pointer

循環引用

std::shared_ptr

循環引用之後兩個指標的引用計數都由1變成2，最後destructor都不會被調用：

Circular reference
Create A's smart pointer
Create B's smart pointer
A ref count: 1
B ref count: 1
A's smart pointer references to B
B's smart pointer references to A
A ref count: 2
B ref count: 2

如果將A和B的成員改成std::weak_ptr：

Circular reference
Create A's smart pointer
Create B's smart pointer
A ref count: 1
B ref count: 1
A's smart pointer references to B
B's smart pointer references to A
A ref count: 1
B ref count: 1
B destructor
A destructor

在循環引用後它們的reference count不會增加，並且在離開scope後A跟B的destructor都會被調用。

boost::intrusive_ptr

Circular reference
Create A's smart pointer
intrusive_ptr_base add ref
Create B's smart pointer
intrusive_ptr_base add ref
A ref count: 1
B ref count: 1
A's smart pointer references to B
intrusive_ptr_base add ref
B's smart pointer references to A
intrusive_ptr_base add ref
A ref count: 2
B ref count: 2
intrusive_ptr_base release
intrusive_ptr_base release

可以看到A跟B各自的intrusive_ptr_add_ref都被調用了兩次，引用計數變成2。但最後intrusive_ptr_release都只被調用了一次，兩者的引用計數變成1，無法變成0，所以最後無法調用destructor完成析構。

boost::intrusive_ptr無法與std::weak_ptr搭配使用，所以循環引用對boost::intrusive_ptr仍是個問題。

c10::intrusive_ptr

如果A和B都是用c10::intrusive_ptr的方式引用對方，結果會跟std::shared_ptr一樣無法析構：

Circular reference
A ref count: 1
B ref count: 1
A's smart pointer references to B
B's smart pointer references to A
A ref count: 2
B ref count: 2

如果改用c10::weak_intrusive_ptr，因為它沒有default constructor，會出現以下錯誤：

error: no matching function for call to 'c10::weak_intrusive_ptr<B>::wea
k_intrusive_ptr()'

所以這項嘗試失敗了，無法用c10::intrusive_ptr造成循環引用的情況。

參考連結

【C++11新特性】 C++11智能指针之weak_ptr

boost::intrusive_ptr原理介绍

Smart Ptr 一點訣 (1)：使用 intrusive_ptr

keineahnung2345

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
PyTorch中的intrusive_ptr

intrusive_ptr與unique_ptr，shared_ptr等一樣，都是smart pointer。但是intrusive_ptr比較特別，它所指向的物件類型必須繼承自intrusive_ptr_target，而intrusive_ptr_target必須實現引用計數相關的函數才行。在PyTorch中，StorageImpl繼承自c10::intrusive_ptr_target，所以c10::intrusive_ptr可以與StorageImpl搭配使用。
复制链接

扫一扫

专栏目录