PyTorch中的intrusive_ptr
前言
boost::intrusive_ptr
與std::unique_ptr
,std::shared_ptr
等一樣,都是smart pointer。
但是boost::intrusive_ptr
比較特別,人如其名,它是一種侵入式的指標。它所指向的物件必須要有一個表示引用計數的成員變數,讓它能自己計算引用計數。另外還需要實作intrusive_ptr_add_ref
和intrusive_ptr_release
這兩個用於修改引用計數的函數,boost::intrusive_ptr
中會透過這兩個函數增加或減少它所指向的物件的引用計數。
想要循環引用時,如果使用shared_ptr
會出現無法析構的問題,這個問題可以透過weak_ptr
來解決。因為weak_ptr
不會增加所指向物件的引用計數,所以從引用計數的角度來看,就不會有deadlock的問題。weak_ptr
可以說是為了解決這個問題而出現的,所以它必須搭配shared_ptr
來使用。
想要有多個指標指向同一物件時,如果是使用shared_ptr
的話,因為引用計數是儲存在shared_ptr
裡,當第一個shared_ptr
把物件銷毀後,第二個shared_ptr
的引用計數仍然為1,所以它也會嘗試銷毀物件,出現重複析構的問題;如果使用intrusive_ptr
,因為引用計數是儲存在指向的物件裡,一個物件只有一個引用計數,所以不會出現重複析構的問題。
intrusive_ptr
的缺點是無法使用weak_ptr
,所以不能用在循環引用的場景中。
PyTorch中也有類似的c10::intrusive_ptr
,它所指向的物件必須繼承自c10::intrusive_ptr_target
。與boost
類似,c10::intrusive_ptr
也需要incref
和decref
這兩個函數管理引用計數。在PyTorch中,這兩個函數是為c10::intrusive_ptr
的成員函數,它們會去操作c10::intrusive_ptr_target
的成員變數refcount_
。
本篇記錄了研讀c10::intrusive_ptr
相關代碼的筆記,以及在PyTorch中c10::intrusive_ptr
的實際使用案例,最後是用一個demo來探討c10::intrusive_ptr
與std::shared_ptr
, boost::intrusive_ptr
的差別。
c10::intrusive_ptr_target
c10::intrusive_ptr_target
在PyTorch中是一個提供引用計數功能的base class(基類)。
引用計數成員
c10/util/intrusive_ptr.h
class C10_API intrusive_ptr_target {
// Note [Weak references for intrusive refcounting]
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// Here's the scheme:
//
// - refcount == number of strong references to the object
// weakcount == number of weak references to the object,
// plus one more if refcount > 0
// An invariant: refcount > 0 => weakcount > 0
//
// - c10::StorageImpl stays live as long as there are any strong
// or weak pointers to it (weakcount > 0, since strong
// references count as a +1 to weakcount)
//
// - finalizers are called and data_ptr is deallocated when refcount == 0
//
// - Once refcount == 0, it can never again be > 0 (the transition
// from > 0 to == 0 is monotonic)
//
// - When you access c10::StorageImpl via a weak pointer, you must
// atomically increment the use count, if it is greater than 0.
// If it is not, you must report that the storage is dead.
//
mutable std::atomic<size_t> refcount_;
mutable std::atomic<size_t> weakcount_;
};
我們已經知道boost::intrusive_ptr
所指向的物件必須擁有reference count成員變數,因此這裡也有一個refcount_
成員變數,那麼weakcount_
是從何而來呢?
因為boost::instrusive_ptr
在循環引用時會有無法析構的問題,PyTorch中的intrusive_ptr
為了避免出現這種情況,被設計成兼具intrusive_ptr
和weak_ptr
的功能,所以除了refcount_
外,還有weakcount_
成員變數。
注:此處的refcount_
和weakcount_
是私有成員變數。前面的mutable
修飾字表示即使我們宣告了一個const的intrusive_ptr_target
的物件,它的這兩個成員變數仍可被修改;另外這兩個成員變數也可以被intrusive_ptr_target
的const成員函數修改。
constructors
constexpr intrusive_ptr_target() noexcept : refcount_(0), weakcount_(0) {}
// intrusive_ptr_target supports copy and move: but refcount and weakcount
// don't participate (since they are intrinsic properties of the memory
// location)
intrusive_ptr_target(intrusive_ptr_target&& /*other*/) noexcept
: intrusive_ptr_target() {}
intrusive_ptr_target& operator=(intrusive_ptr_target&& /*other*/) noexcept {
return *this;
}
intrusive_ptr_target(const intrusive_ptr_target& /*other*/) noexcept
: intrusive_ptr_target() {}
intrusive_ptr_target& operator=(
const intrusive_ptr_target& /*other*/) noexcept {
return *this;
}
- default constructor:將
refcount_
和weakcount_
都設為0 - copy constructor:
intrusive_ptr_target(const intrusive_ptr_target& /*other*/)
,無視傳入的參數,直接調用default constructor將refcount_
和weakcount_
都設為0 - move constructor:
intrusive_ptr_target(intrusive_ptr_target&& /*other*/)
,無視傳入的參數,直接調用default constructor將refcount_
和weakcount_
都設為0
注:在複製和移動物件時,因為新創建的物件跟傳入的物件參數會被視為不同的物件,所以這裡會將refcount_
和weakcount_
歸零。
destructor
protected:
// protected destructor. We never want to destruct intrusive_ptr_target*
// directly.
virtual ~intrusive_ptr_target() {
// Disable -Wterminate and -Wexceptions so we're allowed to use assertions
// (i.e. throw exceptions) in a destructor.
// We also have to disable -Wunknown-warning-option and -Wpragmas, because
// some other compilers don't know about -Wterminate or -Wexceptions and
// will show a warning about unknown warning options otherwise.
#if defined(_MSC_VER) && !defined(__clang__)
#pragma warning(push)
#pragma warning( \
disable : 4297) // function assumed not to throw an exception but does
#else
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wpragmas"
#pragma GCC diagnostic ignored "-Wunknown-warning-option"
#pragma GCC diagnostic ignored "-Wterminate"
#pragma GCC diagnostic ignored "-Wexceptions"
#endif
TORCH_INTERNAL_ASSERT_DEBUG_ONLY(
// Second condition is there to accommodate
// unsafe_adapt_non_heap_allocated: since we are doing our own
// deallocation in that case, it is correct for each
// expected_decref to have happened (some user code tried to
// decref and thus free the object, but it didn't happen right
// away) or not (no user code tried to free the object, and
// now it's getting destroyed through whatever mechanism the
// caller of unsafe_adapt_non_heap_allocated wanted to
// use). We choose our reference count such that the count
// will not dip below INT_MAX regardless.
refcount_.load() == 0 || refcount_.load() >= INT_MAX,
"Tried to destruct an intrusive_ptr_target that still has intrusive_ptr to it; refcount was ",
refcount_.load());
TORCH_INTERNAL_ASSERT_DEBUG_ONLY(
// See ~intrusive_ptr for optimization that will frequently result in 1
// at destruction time.
weakcount_.load() == 1 || weakcount_.load() == 0 ||
weakcount_.load() == INT_MAX - 1 || weakcount_.load() == INT_MAX,
"Tried to destruct an intrusive_ptr_target that still has weak_intrusive_ptr to it");
#if defined(_MSC_VER) && !defined(__clang__)
#pragma warning(pop)
#else
#pragma GCC diagnostic pop
#endif
}
在調用destructor前,refcount_
必須為0,weakcount_
必須為1或0(之後會在c10::intrusive_ptr
的destructor中看到weakcount_
為1時可以被銷毀的原因)。
release_resources
class C10_API intrusive_ptr_target {
private:
/**
* This is called when refcount reaches zero.
* You can override this to release expensive resources.
* There might still be weak references, so your object might not get
* destructed yet, but you can assume the object isn't used anymore,
* i.e. no more calls to methods or accesses to members (we just can't
* destruct it yet because we need the weakcount accessible).
*
* If there are no weak references (i.e. your class is about to be
* destructed), this function WILL NOT be called.
*/
virtual void release_resources() {}
};
release_resources
函數用於釋放資源,是一個虛擬函數,其具體內容由intrusive_ptr_target
的各子類別來實作。intrusive_ptr_target
的一個子類別便是TensorImpl
,其release_resources
函數實作如下。
TensorImpl::release_resources
torch/include/c10/core/TensorImpl.h
在看release_resources
之前,先大致看一下TensorImpl
的成員變數,它們就是待會release_resources
要釋放的“資源”:
struct C10_API TensorImpl : public c10::intrusive_ptr_target {
// ...
protected:
Storage storage_;
private:
// ...
std::unique_ptr<c10::AutogradMetaInterface> autograd_meta_ = nullptr;
// ...
protected:
// ...
impl::PyObjectSlot pyobj_slot_;
// ...
}
TensorImpl::release_resources
如下:
c10/core/TensorImpl.cpp
void TensorImpl::release_resources() {
autograd_meta_.reset();
if (storage_) {
storage_ = {};
}
pyobj_slot_.destroy_pyobj_if_needed();
}
看起來release_resources
函數的功能就是把各成員變數清空。
friend classes/functions
class C10_API intrusive_ptr_target {
// ...
template <typename T, typename NullType>
friend class intrusive_ptr;
friend inline void raw::intrusive_ptr::incref(intrusive_ptr_target* self);
template <typename T, typename NullType>
friend class weak_intrusive_ptr;
friend inline void raw::weak_intrusive_ptr::incref(
intrusive_ptr_target* self);
template <typename T>
friend struct ExclusivelyOwnedTensorTraits;
};
這裡將intrusive_ptr::incref
函數宣告成friend function,所以待會會看到:incref
可以自由存取refcount_
這個私有成員變數。
關於friend function,詳見Friend Class and Function in C++的2. Member Function of Another Class as Friend Function
章節。
c10::intrusive_ptr
target_
template <
class TTarget,
class NullType = detail::intrusive_target_default_null_type<TTarget>>
class intrusive_ptr final {
private:
// the following static assert would be nice to have but it requires
// the target class T to be fully defined when intrusive_ptr<T> is instantiated
// this is a problem for classes that contain pointers to themselves
// static_assert(
// std::is_base_of<intrusive_ptr_target, TTarget>::value,
// "intrusive_ptr can only be used for classes that inherit from
// intrusive_ptr_target.");
#ifndef _WIN32
// This static_assert triggers on MSVC
// error C2131: expression did not evaluate to a constant
static_assert(
NullType::singleton() == NullType::singleton(),
"NullType must have a constexpr singleton() method");
#endif
static_assert(
std::is_base_of<
TTarget,
typename std::remove_pointer<decltype(NullType::singleton())>::type>::
value,
"NullType::singleton() must return a element_type* pointer");
TTarget* target_;
// ...
c10::intrusive_ptr
是一個class template,target_
的型別即模板參數TTarget
。
target_
是intrusive_ptr
的私有成員變數,是intrusive_ptr
所管理的raw pointer,注釋中說明TTarget
必須繼承自intrusive_ptr_target
。
incref
c10::intrusive_ptr::incref
是c10:intrusive_ptr
用來增加底層物件引用計數的函數。
incref
namespace c10 {
// ...
namespace raw {
namespace intrusive_ptr {
// WARNING: Unlike the reclaim() API, it is NOT valid to pass
// NullType::singleton to this function
inline void incref(intrusive_ptr_target* self) {
if (self) {
detail::atomic_refcount_increment(self->refcount_);
}
}
// ...
} // namespace intrusive_ptr
// ...
} // namespace raw
} // namespace c10
因為intrusive_ptr_target
將c10::raw::intrusive_ptr::incref
宣告為friend function,所以c10::raw::intrusive_ptr::incref
才可以存取intrusive_ptr_target
的私有成員變數refcount_
。
detail::atomic_refcount_increment
incref
中調用了detail::atomic_refcount_increment
:
// Increment needs to be acquire-release to make use_count() and
// unique() reliable.
inline size_t atomic_refcount_increment(std::atomic<size_t>& refcount) {
return refcount.fetch_add(1, std::memory_order_acq_rel) + 1;
}
Atomically replaces the current value with the result of arithmetic addition of the value and arg. That is, it performs atomic post-increment. The operation is a read-modify-write operation. Memory is affected according to the value of order.
簡單來說,fetch_add
就是原子操作版本的i++。
atomic_refcount_increment
回傳的是fetch_add
的回傳值+1,來看看refcount
的回傳值為何?
Return value
The value immediately preceding the effects of this function in the modification order of *this.
因為fetch_add
回傳的是在本操作前變數原有的值,所以atomic_refcount_increment
回傳fetch_add
的回傳值+1表示的是經過fetch_add
操作後的值。
decref
decref
是c10:intrusive_ptr
用來減少底層物件引用計數的函數。
與decref
相關的函數有release
,decref
和reclaim
三個。其中decref
會調用reclaim
,而傳入reclaim
的指標必須由release
創造。所以這裡先從release
函數開始。
release
template <
class TTarget,
class NullType = detail::intrusive_target_default_null_type<TTarget>>
class intrusive_ptr final {
//...
public:
//...
/**
* Returns an owning (!) pointer to the underlying object and makes the
* intrusive_ptr instance invalid. That means the refcount is not decreased.
* You *must* put the returned pointer back into a intrusive_ptr using
* intrusive_ptr::reclaim(ptr) to properly destruct it.
* This is helpful for C APIs.
*/
TTarget* release() noexcept {
// NOLINTNEXTLINE(clang-analyzer-core.uninitialized.Assign)
TTarget* result = target_;
target_ = NullType::singleton();
return result;
}
首先將底層物件的raw pointertarget_
暫存至result
,這個result
會在函數最後被返回。
接著將target_
設為空,使intrusive_ptr
無效化,注意此處並未減少引用計數也尚未析構底層物件。
如注釋中所說,必須將本函數返回的raw pointer result
當作參數傳入reclaim
來正確地析構。
decref
namespace c10 {
// ...
namespace raw {
namespace intrusive_ptr {
// WARNING: Unlike the reclaim() API, it is NOT valid to pass
// NullType::singleton to this function
inline void decref(intrusive_ptr_target* self) {
// Let it die
c10::intrusive_ptr<intrusive_ptr_target>::reclaim(self);
// NB: Caller still has 'self' pointer, but it's now invalid.
// If you want more safety, used the actual c10::intrusive_ptr class
}
// ...
} // namespace intrusive_ptr
// ...
} // namespace raw
} // namespace c10
看起來只是reclaim
函數的wrapper。
reclaim
/**
* Takes an owning pointer to TTarget* and creates an intrusive_ptr that takes
* over ownership. That means the refcount is not increased.
* This is the counter-part to intrusive_ptr::release() and the pointer
* passed in *must* have been created using intrusive_ptr::release().
*/
static intrusive_ptr reclaim(TTarget* owning_ptr) {
TORCH_INTERNAL_ASSERT_DEBUG_ONLY(
owning_ptr == NullType::singleton() ||
owning_ptr->refcount_.load() == 0 || owning_ptr->weakcount_.load(),
"TTarget violates the invariant that refcount > 0 => weakcount > 0");
return intrusive_ptr(owning_ptr, raw::DontIncreaseRefcount{});
}
如前所述,傳入reclaim
的owning_ptr
必須由release
創造。
這個函數會創造一個intrusive_ptr
,拿走owning_ptr
的所有權,用它來建構另外一個引用計數為0的intrusive_ptr
。
在函數返回時,因為新建的intrusive_ptr
的生命週期已到了盡頭,所以會調用其解構子~intrusive_ptr
。
以下進一步查看intrusive_ptr
的建構子和解構子。
constructor
不增加引用計數的建構子
intrusive_ptr
有八種建構子,以下這個版本的建構子就只是設定target_
,並不涉及引用計數的改動:
public:
// This constructor will not increase the ref counter for you.
// We use the tagged dispatch mechanism to explicitly mark this constructor
// to not increase the refcount
explicit intrusive_ptr(TTarget* target, raw::DontIncreaseRefcount) noexcept
: target_(target) {}
其中raw::DontIncreaseRefcount
如下,就只是一個空的結構體:
namespace c10 {
// ...
namespace raw {
// ...
// constructor tag used by intrusive_ptr constructors
struct DontIncreaseRefcount {};
} // namespace raw
它的存在讓上面的intrusive_ptr
建構子跟其它建構子有了不同的簽名,調用時可以用類似intrusive_ptr(NullType::singleton(), raw::DontIncreaseRefcount{})
的寫法,有極高的可讀性。
private的建構子
以下這個版本的建構子只接受target
一個參數,它首先調用raw::DontIncreaseRefcount
版本的建構子,再自行設定引用計數相關的成員變數。
注意它是private
的。
private:
// This constructor will increase the ref counter for you.
// This constructor will be used by the make_intrusive(), and also pybind11,
// which wrap the intrusive_ptr holder around the raw pointer and incref
// correspondingly (pybind11 requires raw pointer constructor to incref by
// default).
explicit intrusive_ptr(TTarget* target)
: intrusive_ptr(target, raw::DontIncreaseRefcount{}) {
if (target_ != NullType::singleton()) {
// We just created result.target_, so we know no other thread has
// access to it, so we know we needn't care about memory ordering.
// (On x86_64, a store with memory_order_relaxed generates a plain old
// `mov`, whereas an atomic increment does a lock-prefixed `add`, which is
// much more expensive: https://godbolt.org/z/eKPzj8.)
TORCH_INTERNAL_ASSERT_DEBUG_ONLY(
target_->refcount_ == 0 && target_->weakcount_ == 0,
"intrusive_ptr: Newly-created target had non-zero refcounts. Does its "
"constructor do something strange like incref or create an "
"intrusive_ptr from `this`?");
target_->refcount_.store(1, std::memory_order_relaxed);
target_->weakcount_.store(1, std::memory_order_relaxed);
}
}
此處將refcount_
設為1,因為有一個intrusive_ptr
指向它(也就是這個intrusive_ptr
本身)。
將weakcount_
設為1則是因為PyTorch中自己立下的規定:在refcount_
大於0時,weakcount_
會比它實際應有的值大1。
// - refcount == number of strong references to the object
// weakcount == number of weak references to the object,
// plus one more if refcount > 0
// An invariant: refcount > 0 => weakcount > 0
destructor
destructor
~intrusive_ptr() noexcept {
reset_();
}
解構子會調用reset_
函數。
_reset
private:
// ...
void reset_() noexcept {
if (target_ != NullType::singleton() &&
detail::atomic_refcount_decrement(target_->refcount_) == 0) {
// See comment above about weakcount. As long as refcount>0,
// weakcount is one larger than the actual number of weak references.
// So we need to decrement it here.
bool should_delete =
target_->weakcount_.load(std::memory_order_acquire) == 1;
if (!should_delete) {
// justification for const_cast: release_resources is basically a
// destructor and a destructor always mutates the object, even for const
// objects. NOLINTNEXTLINE(cppcoreguidelines-pro-type-const-cast)
const_cast<std::remove_const_t<TTarget>*>(target_)->release_resources();
should_delete =
detail::atomic_weakcount_decrement(target_->weakcount_) == 0;
}
if (should_delete) {
delete target_;
}
}
}
一開始會先遞減refcount_
,如果變成0,表示已經沒有指標指向它,才會繼續做接下來的事。
要注意的是,在refcount_
大於0時,weakcount_
會比它實際應有的值大1。所以這裡檢查weakcount_
是否為1,它的實際意義是檢查weak count是否為0,如果為真,則將should_delete
設為true,代表底層物件的生命已經到盡頭了。
如果weak count仍大於0(should_delete
為false),就調用release_resources釋放底層物件所佔用的資源,但是保留住target_
指標。接著遞減weakcount_
,遞減之後,weakcount_
變為實際意義上的weak count。檢查它是否為0,如果為0,則將should_delete
設為true。
如果should_delete
為true(即weak count為0),就delete target_
,正式刪除底層物件。
總結一下,如果只關注refcount_
,那麼_reset
函數的功用就是將它減1,如果減1之後refcount_
為0,就會銷毀它所指向的物件。
這裡有個疑問:當refcount_
為0而weakcount_
大於1時,會釋放底層物件所佔用的資源,但是保留住target_
指標。不懂為何資源可以先被釋放,之後如果需要存取底層物件時不會出錯?
另外decref
,reclaim
,~intrusive_ptr
和_reset
都不是intrusive_ptr_target
的friend function,為何這裡可以修改intrusive_ptr_target
的私有成員變數refcount_
?
detail::atomic_refcount_decrement
_reset
中調用了detail::atomic_refcount_decrement
:
// Both decrements need to be acquire-release for correctness. See
// e.g. std::shared_ptr implementation.
inline size_t atomic_refcount_decrement(std::atomic<size_t>& refcount) {
return refcount.fetch_sub(1, std::memory_order_acq_rel) - 1;
}
Atomically replaces the current value with the result of arithmetic subtraction of the value and arg. That is, it performs atomic post-decrement. The operation is read-modify-write operation. Memory is affected according to the value of order.
簡單來說,fetch_sub
就是原子操作版本的i–。
atomic_refcount_decrement
回傳的是fetch_sub
的回傳值-1,來看看fetch_sub
的回傳值為何?
Return value
The value immediately preceding the effects of this function in the modification order of *this.
因為fetch_sub
回傳的是在本操作前變數原有的值,所以atomic_refcount_decrement
回傳fetch_sub
的回傳值-1表示的是經過fetch_sub
操作後的值。
detail::atomic_weakcount_decrement
_reset
中還調用了detail::atomic_weakcount_decrement
:
inline size_t atomic_weakcount_decrement(std::atomic<size_t>& weakcount) {
return weakcount.fetch_sub(1, std::memory_order_acq_rel) - 1;
}
其作用與detail::atomic_refcount_decrement
類似,只不過作用的對象變為weakcount_
。
使用案例一 - c10::make_intrusive
at::detail::_empty_generic
aten/src/ATen/EmptyTensor.cpp
先來看看在aten/src/ATen/EmptyTensor.cpp
的at::detail::_empty_generic
函數中intrusive_ptr
是如何被使用的:
auto storage_impl = c10::make_intrusive<StorageImpl>(
c10::StorageImpl::use_byte_size_t(),
size_bytes,
allocator,
/*resizeable=*/true);
可以看到它呼叫了c10::make_intrusive
,並以StorageImpl
為模板參數,且傳入四個參數(這四個參數是StorageImpl
建構子所需的)。
c10::make_intrusive
c10/util/intrusive_ptr.h
template <
class TTarget,
class NullType = detail::intrusive_target_default_null_type<TTarget>,
class... Args>
inline intrusive_ptr<TTarget, NullType> make_intrusive(Args&&... args) {
return intrusive_ptr<TTarget, NullType>::make(std::forward<Args>(args)...);
}
模板參數TTarget
是StorageImpl
,傳入四個型別分別為use_byte_size_t
, SymInt size_bytes
, at::Allocator*
, bool
的參數,這四個參數被接力傳入c10::intrusive_ptr::make
。
c10::intrusive_ptr::make
c10/util/intrusive_ptr.h
/**
* Allocate a heap object with args and wrap it inside a intrusive_ptr and
* incref. This is a helper function to let make_intrusive() access private
* intrusive_ptr constructors.
*/
template <class... Args>
static intrusive_ptr make(Args&&... args) {
return intrusive_ptr(new TTarget(std::forward<Args>(args)...));
}
模板參數TTarget
是StorageImpl
,這裡會將四個型別分別為use_byte_size_t
, SymInt size_bytes
, at::Allocator*
, bool
的參數接力傳給TTarget
建構子。
此處透過new TTarget
得到StorageImpl
物件指標後,會接著呼叫intrusive_ptr
的private的建構子。注意這裡因為make
也是c10::intrusive_ptr
的成員函數之一,所以可以自由地調用private的建構子。
c10::intrusive_ptr::intrusive_ptr
回頭看c10::intrusive的建構子,發現有個條件:TTarget
必須繼承自intrusive_ptr_target
。
c10::StorageImpl
到c10/core/StorageImpl.h
中檢查一下StorageImpl
是否符合這個條件:
struct C10_API StorageImpl : public c10::intrusive_ptr_target {
//...
};
使用案例二 - c10::Storage
StorageImpl
繼承自c10::intrusive_ptr_target
,所以c10::intrusive_ptr
可以與StorageImpl
搭配使用。
同樣地,TensorImpl
也繼承自c10::intrusive_ptr_target
,而TensorBase
就是透過 c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl> impl_;
這個成員變數來存取TensorImpl
物件的。
此處從c10::StorageImpl
入手:
c10::StorageImpl
c10/core/StorageImpl.h
c10::StorageImpl
是繼承自c10::intrusive_ptr_target
的具體功能類:
struct C10_API StorageImpl : public c10::intrusive_ptr_target {
//...
};
接下來看看c10::intrusive_ptr_target
是怎麼與c10::intrusive_ptr
搭配使用的。
c10::Storage constructor
torch/include/c10/core/Storage.h
c10/core/Storage.h
struct C10_API Storage {
// ...
Storage(c10::intrusive_ptr<StorageImpl> ptr)
: storage_impl_(std::move(ptr)) {}
// ...
protected:
c10::intrusive_ptr<StorageImpl> storage_impl_;
}
Storage
類別有一個成員變數storage_impl_
,是經intrusive_ptr
包裝過後的StorageImpl
。記得我們之前看過StorageImpl
是c10::intrusive_ptr_target
的子類別,這也印證了剛才所說intrusive_ptr
必須搭配intrusive_ptr_target
使用的規定。
注意到這裡初始化storage_impl_
時用到了std::move
,也就是調用了c10::intrusive_ptr
的move constructor。
c10::intrusive_ptr move constructor
c10/util/intrusive_ptr.h
c10::intrusive_ptr
的move constructor如下。把rhs
的target_
佔為己有後,將rhs.target_
設為空:
intrusive_ptr(intrusive_ptr&& rhs) noexcept : target_(rhs.target_) {
rhs.target_ = NullType::singleton();
}
下面這種move constructor支援不同類型的rhs
,並多了相應的類型檢查功能,如果模板參數From
可以被轉換成TTarget*
(也就是target_
的型別)才算成功:
template <class From, class FromNullType>
/* implicit */ intrusive_ptr(intrusive_ptr<From, FromNullType>&& rhs) noexcept
: target_(
detail::assign_ptr_<TTarget, NullType, FromNullType>(rhs.target_)) {
static_assert(
std::is_convertible<From*, TTarget*>::value,
"Type mismatch. intrusive_ptr move constructor got pointer of wrong type.");
rhs.target_ = FromNullType::singleton();
}
至於為何要使用move constructor呢?
根據Why would I std::move an std::shared_ptr?:
std::shared_ptr reference count is atomic. increasing or decreasing the reference count requires atomic increment or decrement. This is hundred times slower than non-atomic increment/decrement, not to mention that if we increment and decrement the same counter we wind up with the exact number, wasting a ton of time and resources in the process.
By moving the shared_ptr instead of copying it, we "steal" the atomic reference count and we nullify the other shared_ptr. "stealing" the reference count is not atomic, and it is hundred times faster than copying the shared_ptr (and causing atomic reference increment or decrement).
如果使用copy constructor的話,就需要atomic地增/減smart pointer的引用計數,而這個操作是十分耗時的,改用move constructor就可以免去這個atomic操作,節省大量時間。
使用案例三 - THPPointer<c10::StorageImpl>::free
注釋中說傳入reclaim
(decref
)的owning_ptr
必須由release
創造,但實際去PyTorch代碼中尋找,卻沒有找到release
,decref
連用的案例。但注意到release
回傳的是raw pointer,推測reclaim
(decref
)的參數只要是raw pointer就好,正符合此處THPPointer::free
的使用方式。
THPPointer
torch/csrc/utils/object_ptr.h
template <class T>
class THPPointer {
public:
THPPointer() : ptr(nullptr){};
explicit THPPointer(T* ptr) noexcept : ptr(ptr){};
THPPointer(THPPointer&& p) noexcept {
free();
ptr = p.ptr;
p.ptr = nullptr;
};
~THPPointer() {
free();
};
// ...
T* release() {
T* tmp = ptr;
ptr = nullptr;
return tmp;
}
// ...
private:
void free();
T* ptr = nullptr;
};
THPPointer
是PyTorch自己實現的RAII的smart pointer,是一個類別模板。
其私有成員變數ptr
是指向T
類型物件的raw pointer。
free
函數則用於釋放raw pointer,會在destructor中被調用。從本檔案中沒有找到free
函數的定義,其實它是由THPPointer
的模板特化(template specialization)來實現。
THPPointer<c10::StorageImpl>::free
torch/csrc/Storage.cpp
template <>
void THPPointer<c10::StorageImpl>::free() {
if (ptr) {
c10::raw::intrusive_ptr::decref(ptr);
}
}
THPPointer<c10::StorageImpl>
就是THPPointer
的模板特化,這裡實現了free
函數。
當raw pointer ptr
非空時,就用decref
將底層物件的引用計數減1,如果減1之後為0,則銷毀該物件。
demo
編輯intrusive_ptr.cpp
如下:
#include <torch/torch.h>
#include <iostream>
#include <vector>
#include <memory> //shared_ptr
#include <boost/intrusive_ptr.hpp>
#include <boost/detail/atomic_count.hpp> // boost::detail::atomic_count
#include <boost/checked_delete.hpp> // boost::checked_delete
// #define use_weak
// #define use_shared
// #define use_boost
// #define use_c10
#ifdef use_c10
#define smart_ptr c10::intrusive_ptr
#define make_ptr c10::make_intrusive
#define weak_smart_ptr c10::weak_intrusive_ptr
#elif defined(use_boost)
#define smart_ptr boost::intrusive_ptr
#elif defined(use_shared)
#define smart_ptr std::shared_ptr
#define make_ptr std::make_shared
#define weak_smart_ptr std::weak_ptr
#endif
#ifdef use_boost
template<class T>
class intrusive_ptr_base {
public:
/**
* 缺省构造函数
*/
intrusive_ptr_base(): ref_count(0) {
// std::cout << "intrusive_ptr_base default constructor" << std::endl;
}
/**
* 不允许拷贝构造,只能使用intrusive_ptr来构造另一个intrusive_ptr
*/
intrusive_ptr_base(intrusive_ptr_base<T> const&): ref_count(0) {
std::cout << "intrusive_ptr_base copy constructor" << std::endl;
}
~intrusive_ptr_base(){
std::cout << "intrusive_ptr_base destructor" << std::endl;
}
/**
* 不允许进行赋值操作
*/
intrusive_ptr_base& operator=(intrusive_ptr_base const& rhs) {
std::cout << "Assignment operator" << std::endl;
return *this;
}
/**
* 递增引用计数(放到基类中以便compiler能找到,否则需要放到boost名字空间中)
*/
friend void intrusive_ptr_add_ref(intrusive_ptr_base<T> const* s) {
std::cout << "intrusive_ptr_base add ref" << std::endl;
assert(s->ref_count >= 0);
assert(s != 0);
++s->ref_count;
}
/**
* 递减引用计数
*/
friend void intrusive_ptr_release(intrusive_ptr_base<T> const* s) {
std::cout << "intrusive_ptr_base release" << std::endl;
assert(s->ref_count > 0);
assert(s != 0);
if (--s->ref_count == 0)
boost::checked_delete(static_cast<T const*>(s)); //s的实际类型就是T,intrusive_ptr_base<T>为基类
}
/**
* 类似于shared_from_this()函数
*/
boost::intrusive_ptr<T> self() {
return boost::intrusive_ptr<T>((T*)this);
}
boost::intrusive_ptr<const T> self() const {
return boost::intrusive_ptr<const T>((T const*)this);
}
int refcount() const {
return ref_count;
}
private:
///should be modifiable even from const intrusive_ptr objects
mutable boost::detail::atomic_count ref_count;
};
#endif
#ifdef use_c10
class MyVector : public c10::intrusive_ptr_target {
#elif defined(use_boost)
class MyVector : public intrusive_ptr_base<MyVector> {
#elif defined(use_shared)
class MyVector {
#endif
public:
MyVector(const std::vector<int>& d) : data(d) {
std::cout << "MyVector constructor" << std::endl;
}
~MyVector() {
std::cout << "MyVector destructor" << std::endl;
}
std::vector<int> data;
};
class A;
class B;
#ifdef use_c10
class A : public c10::intrusive_ptr_target {
#elif defined(use_boost)
class A : public intrusive_ptr_base<A> {
#elif defined(use_shared)
class A {
#endif
public:
A() {
// std::cout << "A constructor" << std::endl;
}
~A() {
std::cout << "A destructor" << std::endl;
}
#ifdef use_weak
weak_smart_ptr<B> pointer;
#else
smart_ptr<B> pointer;
#endif
};
#ifdef use_c10
class B : public c10::intrusive_ptr_target {
#elif defined(use_boost)
class B : public intrusive_ptr_base<B> {
#elif defined(use_shared)
class B {
#endif
public:
B() {
// std::cout << "B constructor" << std::endl;
}
~B() {
std::cout << "B destructor" << std::endl;
}
#ifdef use_weak
weak_smart_ptr<A> pointer;
#else
smart_ptr<A> pointer;
#endif
};
int main() {
{
// 多指標指向同一物件
std::cout << "Multiple smart pointer point to the same object" << std::endl;
std::vector<int> vec({1,2,3});
MyVector* raw_ptr = new MyVector(vec);
smart_ptr<MyVector> ip, ip2;
#if defined(use_c10)
std::cout << "Create 1st smart pointer" << std::endl;
// intrusive_ptr沒有提供接受raw pointer的建構子,也沒有接受物件的建構子
// make_intrusive則是會新創建一個物件
// 所以此處借用接受raw pointer為參數的recliam函數
ip.reclaim(raw_ptr);
// 但是以下報錯說無法用一個refcount非0的raw pointer創建c10::intrusive_ptr
// 所以使用intrusive_ptr似乎無法讓多個指標指向同一個物件
/*
terminate called after throwing an instance of 'c10::Error'
what(): owning_ptr == NullType::singleton() || owning_ptr->refcount_.load() == 0 || owning_ptr->weakcount_.load() INTERNAL ASSERT FAILED at "/root/Documents/installation/libtorch/include/c10/util/intrusive_ptr.h":471, please report a bug to PyTorch. TTarget violates the invariant that refcount > 0 => weakcount > 0
Exception raised from reclaim at /xxx/libtorch/include/c10/util/intrusive_ptr.h:471 (most recent call first):
*/
// std::cout << "Create 2nd smart pointer" << std::endl;
// ip2.reclaim(raw_ptr);
#else
std::cout << "Create 1st smart pointer" << std::endl;
ip = smart_ptr<MyVector>(raw_ptr);
std::cout << "Create 2nd smart pointer" << std::endl;
ip2 = smart_ptr<MyVector>(raw_ptr);
#endif
// shared_ptr: MyVector的destructor會被調用兩次,出現Segmentation fault (core dumped)
// boost::intrusive_ptr: destructor只會被調用一次
}
std::cout << std::endl;
{
// 循環引用
std::cout << "Circular reference" << std::endl;
#if defined(use_c10)
smart_ptr<A> a_ptr = make_ptr<A>();
smart_ptr<B> b_ptr = make_ptr<B>();
#else
A* a_raw_ptr = new A();
B* b_raw_ptr = new B();
std::cout << "Create A's smart pointer" << std::endl;
smart_ptr<A> a_ptr(a_raw_ptr);
std::cout << "Create B's smart pointer" << std::endl;
smart_ptr<B> b_ptr(b_raw_ptr);
#endif
#if !defined(use_boost)
std::cout << "A ref count: " << a_ptr.use_count() << std::endl;
std::cout << "B ref count: " << b_ptr.use_count() << std::endl;
#else
std::cout << "A ref count: " << a_ptr->refcount() << std::endl;
std::cout << "B ref count: " << b_ptr->refcount() << std::endl;
#endif
std::cout << "A's smart pointer references to B" << std::endl;
a_ptr->pointer = b_ptr;
std::cout << "B's smart pointer references to A" << std::endl;
b_ptr->pointer = a_ptr;
#if !defined(use_boost)
std::cout << "A ref count: " << a_ptr.use_count() << std::endl;
std::cout << "B ref count: " << b_ptr.use_count() << std::endl;
#else
std::cout << "A ref count: " << a_ptr->refcount() << std::endl;
std::cout << "B ref count: " << b_ptr->refcount() << std::endl;
#endif
// shared_ptr, boost::intrusive_ptr: 引用計數都由1變成2,最後destructor不會被調用
}
return 0;
}
boost::intrusive_ptr
所指向的物件必須自己實做引用計數功能。此處讓boost::intrusive_ptr
指向MyVector
,而MyVector
繼承了intrusive_ptr_base
類別。intrusive_ptr_base
除了ref_count
這個表示引用計數的成員變數外,還自己實現了intrusive_ptr_add_ref
和intrusive_ptr_release
,讓boost::intrusive_ptr
能直接調用。
使用以下指令編譯執行:
rm -rf * && cmake -DCMAKE_PREFIX_PATH="<libtorch_installation_path>;<boost_installation_path>" .. && make && ./intrusive_ptr
多指標指向同一物件
std::shared_ptr
MyVector
的destructor會被調用兩次,出現Segmentation fault (core dumped):
Multiple smart pointer point to the same object
MyVector constructor
Create 1st smart pointer
Create 2nd smart pointer
MyVector destructor
MyVector destructor
Segmentation fault (core dumped)
boost::intrusive_ptr
MyVector
的destructor只會被調用一次,成功被析構:
Multiple smart pointer point to the same object
MyVector constructor
Create 1st smart pointer
intrusive_ptr_base add ref
Create 2nd smart pointer
intrusive_ptr_base add ref
intrusive_ptr_base release
intrusive_ptr_base release
MyVector destructor
intrusive_ptr_base destructor
c10::intrusive_ptr
intrusive_ptr
中接受raw pointer的建構子是private的,也沒有接受物件的建構子;make_intrusive
則是會新創建一個物件。所以此處借用接受raw pointer為參數的recliam
函數來試圖讓多個intrusive_ptr
指向同一個物件。
但是以下報錯說無法用一個refcount非0的raw pointer創建c10::intrusive_ptr
:
terminate called after throwing an instance of 'c10::Error'
what(): owning_ptr == NullType::singleton() || owning_ptr->refcount_.load() == 0 || owning_ptr->weakcount_.load() INTERNAL ASSERT FAILED at "/root/Documents/installation/libtorch/include/c10/util/intrusive_ptr.h":471, please report a bug to PyTorch. TTarget violates the invariant that refcount > 0 => weakcount > 0
Exception raised from reclaim at /xxx/libtorch/include/c10/util/intrusive_ptr.h:471 (most recent call first):
所以這項嘗試失敗了,只能創建第一個intrusive_ptr
:
Multiple smart pointer point to the same object
MyVector constructor
Create 1st smart pointer
循環引用
std::shared_ptr
循環引用之後兩個指標的引用計數都由1變成2,最後destructor都不會被調用:
Circular reference
Create A's smart pointer
Create B's smart pointer
A ref count: 1
B ref count: 1
A's smart pointer references to B
B's smart pointer references to A
A ref count: 2
B ref count: 2
如果將A
和B
的成員改成std::weak_ptr
:
Circular reference
Create A's smart pointer
Create B's smart pointer
A ref count: 1
B ref count: 1
A's smart pointer references to B
B's smart pointer references to A
A ref count: 1
B ref count: 1
B destructor
A destructor
在循環引用後它們的reference count不會增加,並且在離開scope後A
跟B
的destructor都會被調用。
boost::intrusive_ptr
Circular reference
Create A's smart pointer
intrusive_ptr_base add ref
Create B's smart pointer
intrusive_ptr_base add ref
A ref count: 1
B ref count: 1
A's smart pointer references to B
intrusive_ptr_base add ref
B's smart pointer references to A
intrusive_ptr_base add ref
A ref count: 2
B ref count: 2
intrusive_ptr_base release
intrusive_ptr_base release
可以看到A
跟B
各自的intrusive_ptr_add_ref
都被調用了兩次,引用計數變成2。但最後intrusive_ptr_release
都只被調用了一次,兩者的引用計數變成1,無法變成0,所以最後無法調用destructor完成析構。
boost::intrusive_ptr
無法與std::weak_ptr
搭配使用,所以循環引用對boost::intrusive_ptr
仍是個問題。
c10::intrusive_ptr
如果A
和B
都是用c10::intrusive_ptr
的方式引用對方,結果會跟std::shared_ptr
一樣無法析構:
Circular reference
A ref count: 1
B ref count: 1
A's smart pointer references to B
B's smart pointer references to A
A ref count: 2
B ref count: 2
如果改用c10::weak_intrusive_ptr
,因為它沒有default constructor,會出現以下錯誤:
error: no matching function for call to 'c10::weak_intrusive_ptr<B>::wea
k_intrusive_ptr()'
所以這項嘗試失敗了,無法用c10::intrusive_ptr
造成循環引用的情況。