PyTorch中的STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA和STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA
前言
在查看PyTorch算子源碼時會不時看到STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA
和STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA
這兩個macro,它們被定義於c10/macros/Macros.h
,其作用簡單來說就是為class或struct定義static const字串並賦予初始值。
#if defined(__CUDA_ARCH__)
#if defined(_MSC_VER) && defined(__CUDACC__)
#define CONSTEXPR_EXCEPT_WIN_CUDA const
#define C10_HOST_CONSTEXPR_EXCEPT_WIN_CUDA __host__
// Note [static constexpr char* members for windows NVCC]
// The Windows NVCC compiler doesn't handle static constexpr class members,
// although it's fixed in a later version.
// (see
// https://developercommunity.visualstudio.com/t/intellisense-error-c11-static-constexpr-member-ini/245425)
//
// If we want to ensure that our field is static under all builds, then we need
// to work around it specifically for windows NVCC by making it (a) const, (b)
// defined outside of the class definition We need to define it outside of the
// class definition because of the C++ standard; char* is not an integral type
// (see
// https://stackoverflow.com/questions/24278473/intellisense-a-member-of-type-const-char-const-cannot-have-an-in-class-in)
//
// So instead of this:
// struct Foo {
// static constexpr const char* name = "foo";
// }
// In Windows NVCC, we end up with this:
// struct Foo {
// static const char* name;
// }
// const char* Foo::name = "foo";
//
// This gives us a small perf hit for any code that wants to access these field
// members, but right now it isn't used in any perf-critical code paths.
#define STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(field, val) \
static const char* field;
#define STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA(cls, field, val) \
const char* cls::field = val;
#else
#define CONSTEXPR_EXCEPT_WIN_CUDA constexpr
#define C10_HOST_CONSTEXPR_EXCEPT_WIN_CUDA __host__
#define STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(field, val) \
static constexpr const char* field = val;
#define STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA(cls, field, val)
#endif
#else
#if defined(_MSC_VER) && defined(__CUDACC__)
#define CONSTEXPR_EXCEPT_WIN_CUDA const
#define C10_HOST_CONSTEXPR_EXCEPT_WIN_CUDA
#define STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(field, val) \
static const char* field;
#define STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA(cls, field, val) \
const char* cls::field = val;
#else
#define CONSTEXPR_EXCEPT_WIN_CUDA constexpr
#define C10_HOST_CONSTEXPR_EXCEPT_WIN_CUDA constexpr
#define STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(field, val) \
static constexpr const char* field = val;
#define STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA(cls, field, val)
#endif
#endif
根據平台是否為Windows NVCC,STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA
和STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA
大概可分為以下兩種不同的定義。
如果平台為Windows NVCC,則macro定義如下,即先定義static變數後賦值:
STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA
#define STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(field, val) \
static const char* field;
STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA
通常會被放在class或struct定義中,為它定義一個名為field
的static變數。
STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA
#define STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA(cls, field, val) \
const char* cls::field = val;
STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA
通常會被放在class或struct定義外,用於為class或struct的field
成員變數賦予val
的值。
如果平台非Windows NVCC,定義的時候就直接對變數賦值:
STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA
#define STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(field, val) \
static constexpr const char* field = val;
STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA
#define STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA(cls, field, val)
// 定義為空
build/aten/src/ATen/Operators.h
中定義了at::_ops::rand
結構體:
struct TORCH_API rand {
using schema = at::Tensor (c10::SymIntArrayRef, c10::optional<at::ScalarType>, c10::optional<at::Layout>, c10::optional<at::Device>, c10::optional<bool>);
using ptr_schema = schema*;
// See Note [static constexpr char* members for windows NVCC]
STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(name, "aten::rand")
STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(overload_name, "")
STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(schema_str, "rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor")
static at::Tensor call(c10::SymIntArrayRef size, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory);
static at::Tensor redispatch(c10::DispatchKeySet dispatchKeySet, c10::SymIntArrayRef size, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory);
};
build/aten/src/ATen/Operators_2.cpp
中則有:
STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA(rand, name, "aten::rand")
STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA(rand, overload_name, "")
STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA(rand, schema_str, "rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor")
把兩者合起來看,就是為rand
結構體定義了三個const char*
,並為它們賦予相應的初始值。