一個Tensor的一生 - torch.rand篇

前言

Life of a Tensor這篇文章中介紹了torch.rand函數從Python API一直到C++底層的調用過程,寫於2019年7月,當時的PyTorch版本為1.1.0。參考該篇文章,本篇的關注點同樣在整個調用流程,不過是基於較新的PyTorch 2.0版。

首先編寫一個torch_rand.py如下:

import torch
torch.rand(3, 4)

如果我們查找torch.rand的定義,會跳到torch/_C/_VariableFunctions.pyi中。

torch/_C/_VariableFunctions.pyi

torch/_C/_VariableFunctions.pyi這個檔案裡面共有8個不同簽名的rand函數:

@overload
def rand(size: Sequence[Union[_int, SymInt]], *, generator: Optional[Generator], names: Optional[Sequence[Union[str, ellipsis, None]]], dtype: Optional[_dtype] = None, layout: Optional[_layout] = None, device: Optional[Union[_device, str, None]] = None, pin_memory: Optional[_bool] = False, requires_grad: Optional[_bool] = False) -> Tensor: ...
@overload
def rand(*size: _int, generator: Optional[Generator], names: Optional[Sequence[Union[str, ellipsis, None]]], dtype: Optional[_dtype] = None, layout: Optional[_layout] = None, device: Optional[Union[_device, str, None]] = None, pin_memory: Optional[_bool] = False, requires_grad: Optional[_bool] = False) -> Tensor: ...
@overload
def rand(size: Sequence[Union[_int, SymInt]], *, generator: Optional[Generator], out: Optional[Tensor] = None, dtype: Optional[_dtype] = None, layout: Optional[_layout] = None, device: Optional[Union[_device, str, None]] = None, pin_memory: Optional[_bool] = False, requires_grad: Optional[_bool] = False) -> Tensor: ...
@overload
def rand(*size: _int, generator: Optional[Generator], out: Optional[Tensor] = None, dtype: Optional[_dtype] = None, layout: Optional[_layout] = None, device: Optional[Union[_device, str, None]] = None, pin_memory: Optional[_bool] = False, requires_grad: Optional[_bool] = False) -> Tensor: ...
@overload
def rand(size: Sequence[Union[_int, SymInt]], *, out: Optional[Tensor] = None, dtype: Optional[_dtype] = None, layout: Optional[_layout] = None, device: Optional[Union[_device, str, None]] = None, pin_memory: Optional[_bool] = False, requires_grad: Optional[_bool] = False) -> Tensor: ...
@overload
def rand(*size: _int, out: Optional[Tensor] = None, dtype: Optional[_dtype] = None, layout: Optional[_layout] = None, device: Optional[Union[_device, str, None]] = None, pin_memory: Optional[_bool] = False, requires_grad: Optional[_bool] = False) -> Tensor: ...
@overload
def rand(size: Sequence[Union[_int, SymInt]], *, names: Optional[Sequence[Union[str, ellipsis, None]]], dtype: Optional[_dtype] = None, layout: Optional[_layout] = None, device: Optional[Union[_device, str, None]] = None, pin_memory: Optional[_bool] = False, requires_grad: Optional[_bool] = False) -> Tensor: ...
@overload
def rand(*size: _int, names: Optional[Sequence[Union[str, ellipsis, None]]], dtype: Optional[_dtype] = None, layout: Optional[_layout] = None, device: Optional[Union[_device, str, None]] = None, pin_memory: Optional[_bool] = False, requires_grad: Optional[_bool] = False) -> Tensor: ...

torch.rand(3, 4)調用的是上面倒數第三個rand函數。

注1:關於此處出現的*size, _intSequence等型別,以及, *, 的用法,詳見Python typing函式庫和torch.types

注2:torch/_C/_VariableFunctions.pyi這個檔案是由gen_pyi.py依據torch/_C/_VariableFunctions.pyi.in這個模板及native_functions.yaml生成的,想知道它具體是怎麼被生成的,詳見PyTorch中的pyi檔案生成機制

如果把擂同的部份對齊,完全相同的部份省略:

@overload
def rand(size: Sequence[Union[_int, SymInt]], *, generator: Optional[Generator], names: Optional[Sequence[Union[str, ellipsis, None]]], ...
@overload
def rand(*size: _int,                            generator: Optional[Generator], names: Optional[Sequence[Union[str, ellipsis, None]]], ...
@overload
def rand(size: Sequence[Union[_int, SymInt]], *, generator: Optional[Generator], out: Optional[Tensor] = None, ...
@overload
def rand(*size: _int,                            generator: Optional[Generator], out: Optional[Tensor] = None, ...
@overload
def rand(size: Sequence[Union[_int, SymInt]], *,                                 out: Optional[Tensor] = None, ...
@overload
def rand(*size: _int,                                                            out: Optional[Tensor] = None, ...
@overload
def rand(size: Sequence[Union[_int, SymInt]], *,                                 names: Optional[Sequence[Union[str, ellipsis, None]]], ...
@overload
def rand(*size: _int,                                                            names: Optional[Sequence[Union[str, ellipsis, None]]], ...

可以依據參數將它們分為以下四類:

  • size, generator, names

  • size, generator(out可省略)

  • size(out可省略)

  • size, names

這四類中的每一類又各自都有兩個版本,即第一個參數(形狀)接受的是_int或者是Sequence

如果將Python腳本中的代碼改成:

torch.rand((3,4))

或:

torch.rand([3,4])

就會調用到倒數第四個、Sequence版本的rand函數(跟剛剛看到的倒數第三個、_int版本的是一對):

@overload
def rand(size: Sequence[Union[_int, SymInt]], *, out: Optional[Tensor] = None, dtype: Optional[_dtype] = None, layout: Optional[_layout] = None, device: Optional[Union[_device, str, None]] = None, pin_memory: Optional[_bool] = False, requires_grad: Optional[_bool] = False) -> Tensor: ...

backtrace

接下來新建一個名為gdbrand的檔案並將以下內容填入其中:

python sys.path.append("/usr/share/gcc/python");
set logging file rand.txt
set logging on
set breakpoint pending on
break at::empty
info breakpoints
run torch_rand.py
bt

開啟gdb python,輸入source gdbrand指令執行上述腳本,就會得到以下的backtrace:

#0  at::empty (size=..., options=..., memory_format=memory_format@entry=...) at /root/Documents/pytorch/build/aten/src/ATen/Functions.h:2652
#1  0x00007f04620d60b1 in at::native::rand (size=..., generator=..., dtype=..., layout=..., device=..., pin_memory=...)
    at /root/Documents/pytorch/c10/util/Optional.h:204
#2  0x00007f04620d61fc in at::native::rand (size=..., dtype=..., layout=..., layout@entry=..., device=..., pin_memory=...)
    at /root/Documents/pytorch/aten/src/ATen/native/TensorFactories.cpp:781
#3  0x00007f0462dd2c28 in at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__rand (pin_memory=..., device=..., layout=..., 
    dtype=..., size=...) at /root/Documents/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:2214
#4  c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor(c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__rand>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >::operator() (args#4=..., args#3=..., args#2=..., args#1=..., args#0=..., this=<optimized out>)
    at /root/Documents/pytorch/aten/src/ATen/core/boxing/impl/WrapFunctionIntoFunctor.h:13
#5  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor(c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__rand>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >, at::Tensor(c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)>::call(c10::OperatorKernel *, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) (functor=<optimized out>, args#0=..., args#1=..., args#2=..., 
    args#3=..., args#4=...) at /root/Documents/pytorch/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:463
#6  0x00007f0462889309 in c10::callUnboxedKernelFunction<at::Tensor, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > (dispatchKeySet=..., functor=<optimized out>, unboxed_kernel_func=<optimized out>)
    at /root/Documents/pytorch/aten/src/ATen/core/boxing/KernelFunction_impl.h:50
#7  c10::KernelFunction::call<at::Tensor, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > (dispatchKeySet=..., opHandle=..., this=0x2075db8) at /root/Documents/pytorch/aten/src/ATen/core/boxing/KernelFunction_impl.h:91
#8  c10::Dispatcher::redispatch<at::Tensor, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> >(c10::TypedOperatorHandle<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)> const&, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) const (this=<optimized out>, currentDispatchKeySet=..., op=...)
    at /root/Documents/pytorch/aten/src/ATen/core/dispatch/Dispatcher.h:656
#9  c10::TypedOperatorHandle<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)>::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) const (args#4=..., args#3=..., args#2=..., args#1=..., args#0=..., currentDispatchKeySet=..., this=0x7f046a780150 <at::_ops::rand::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)::op>) at /root/Documents/pytorch/aten/src/ATen/core/dispatch/Dispatcher.h:492
#10 at::_ops::rand::redispatch (dispatchKeySet=..., size=..., dtype=..., layout=..., layout@entry=..., device=..., pin_memory=...) at /root/Documents/pytorch/build/aten/src/ATen/Operators_2.cpp:5220
#11 0x00007f0462c05329 in at::(anonymous namespace)::rand (pin_memory=..., device=..., layout=..., dtype=..., size=...) at /root/Documents/pytorch/build/aten/src/ATen/RegisterBackendSelect.cpp:365
#12 c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor(c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), at::(anonymous namespace)::rand>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >::operator() (args#4=..., args#3=..., args#2=..., args#1=..., args#0=..., this=<optimized out>) at /root/Documents/pytorch/aten/src/ATen/core/boxing/impl/WrapFunctionIntoFunctor.h:13
#13 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor(c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>), at::(anonymous namespace)::rand>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > >, at::Tensor(c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)>::call(c10::OperatorKernel *, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) (functor=<optimized out>, args#0=..., args#1=..., args#2=..., args#3=..., args#4=...) at /root/Documents/pytorch/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:463
#14 0x00007f04628e3422 in c10::callUnboxedKernelFunction<at::Tensor, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > (dispatchKeySet=..., functor=<optimized out>, unboxed_kernel_func=<optimized out>) at /root/Documents/pytorch/aten/src/ATen/core/boxing/KernelFunction_impl.h:50
#15 c10::KernelFunction::call<at::Tensor, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> > (dispatchKeySet=..., opHandle=..., this=0x2076638) at /root/Documents/pytorch/aten/src/ATen/core/boxing/KernelFunction_impl.h:91
#16 c10::Dispatcher::call<at::Tensor, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool> >(c10::TypedOperatorHandle<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)> const&, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) const (op=..., this=<optimized out>) at /root/Documents/pytorch/aten/src/ATen/core/dispatch/Dispatcher.h:639
#17 c10::TypedOperatorHandle<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)>::call(c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) const (args#4=..., args#3=..., args#2=..., args#1=..., args#0=..., this=0x7f046a780170 <at::_ops::rand::call(c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)::op>) at /root/Documents/pytorch/aten/src/ATen/core/dispatch/Dispatcher.h:487
#18 at::_ops::rand::call (size=..., dtype=..., layout=..., layout@entry=..., device=..., pin_memory=...) at /root/Documents/pytorch/build/aten/src/ATen/Operators_2.cpp:5213
#19 0x00007f046abdd595 in at::rand_symint (options=..., size=...) at /root/Documents/pytorch/build/aten/src/ATen/Functions.h:5770
#20 torch::rand_symint (options=..., size=...) at /root/Documents/pytorch/torch/csrc/autograd/generated/variable_factories.h:418
#21 operator() (__closure=<synthetic pointer>, options=..., size=...) at /root/Documents/pytorch/torch/csrc/autograd/generated/python_torch_functions_0.cpp:5256
#22 torch::autograd::THPVariable_rand (self_=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at /root/Documents/pytorch/torch/csrc/autograd/generated/python_torch_functions_0.cpp:5258
#23 0x0000000000508127 in cfunction_call (func=0x7f048932aa90, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.9.13/Objects/methodobject.c:543
#24 0x00000000004f0edc in _PyObject_MakeTpCall (tstate=0x1a610a0, callable=0x7f048932aa90, args=<optimized out>, nargs=<optimized out>, keywords=0x0) at /usr/local/src/conda/python-3.9.13/Objects/call.c:191
#25 0x00000000004ed255 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x1abc308, callable=0x7f048932aa90, tstate=<optimized out>) at /usr/local/src/conda/python-3.9.13/Include/cpython/abstract.h:116
#26 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x1abc308, callable=0x7f048932aa90, tstate=<optimized out>) at /usr/local/src/conda/python-3.9.13/Include/cpython/abstract.h:103
#27 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x1abc308, callable=0x7f048932aa90) at /usr/local/src/conda/python-3.9.13/Include/cpython/abstract.h:127
#28 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x1a610a0) at /usr/local/src/conda/python-3.9.13/Python/ceval.c:5077
#29 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x1abc190, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.13/Python/ceval.c:3489
#30 0x00000000004e70ca in _PyEval_EvalFrame (throwflag=0, f=0x1abc190, tstate=0x1a610a0) at /usr/local/src/conda/python-3.9.13/Include/internal/pycore_ceval.h:40
#31 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x0, kwcount=<optimized out>, kwstep=2, defs=0x0, defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at /usr/local/src/conda/python-3.9.13/Python/ceval.c:4329
#32 0x00000000004e6d57 in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at /usr/local/src/conda/python-3.9.13/Python/ceval.c:4361
#33 0x00000000004e6d09 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at /usr/local/src/conda/python-3.9.13/Python/ceval.c:4377
#34 0x0000000000594e7b in PyEval_EvalCode (co=co@entry=0x7f049ec05870, globals=globals@entry=0x7f049ebfd780, locals=locals@entry=0x7f049ebfd780) at /usr/local/src/conda/python-3.9.13/Python/ceval.c:828
#35 0x00000000005c2307 in run_eval_code_obj (tstate=0x1a610a0, co=0x7f049ec05870, globals=0x7f049ebfd780, locals=0x7f049ebfd780) at /usr/local/src/conda/python-3.9.13/Python/pythonrun.c:1221
#36 0x00000000005be270 in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7f049ebfd780, locals=0x7f049ebfd780, flags=<optimized out>, arena=<optimized out>) at /usr/local/src/conda/python-3.9.13/Python/pythonrun.c:1242
#37 0x00000000004563ed in pyrun_file (fp=0x1a5d440, filename=0x7f049eb2d930, start=<optimized out>, globals=0x7f049ebfd780, locals=0x7f049ebfd780, closeit=1, flags=0x7ffecfd9a7f8) at /usr/local/src/conda/python-3.9.13/Python/pythonrun.c:1140
#38 0x00000000005b8062 in pyrun_simple_file (flags=0x7ffecfd9a7f8, closeit=1, filename=0x7f049eb2d930, fp=0x1a5d440) at /usr/local/src/conda/python-3.9.13/Python/pythonrun.c:450
#39 PyRun_SimpleFileExFlags (fp=0x1a5d440, filename=<optimized out>, closeit=1, flags=0x7ffecfd9a7f8) at /usr/local/src/conda/python-3.9.13/Python/pythonrun.c:483
#40 0x00000000005b55ce in pymain_run_file (cf=0x7ffecfd9a7f8, config=0x1a5faa0) at /usr/local/src/conda/python-3.9.13/Modules/main.c:379
#41 pymain_run_python (exitcode=0x7ffecfd9a7f0) at /usr/local/src/conda/python-3.9.13/Modules/main.c:604
#42 Py_RunMain () at /usr/local/src/conda/python-3.9.13/Modules/main.c:683
#43 0x0000000000588ff9 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/local/src/conda/python-3.9.13/Modules/main.c:1129
#44 0x00007f049ed56d90 in __libc_start_call_main (main=main@entry=0x588fb0 <main>, argc=argc@entry=2, argv=argv@entry=0x7ffecfd9aa28) at ../sysdeps/nptl/libc_start_call_main.h:58
#45 0x00007f049ed56e40 in __libc_start_main_impl (main=0x588fb0 <main>, argc=2, argv=0x7ffecfd9aa28, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffecfd9aa18) at ../csu/libc-start.c:392
#46 0x0000000000588eae in _start ()

backtrace應由下往上看,一開始是_start,中間經過十幾個Python自身的函數,來到的第一個與PyTorch相關的函數為第22個節點:torch::autograd::THPVariable_rand

Python bindings

torch::autograd::THPVariable_rand

#22 torch::autograd::THPVariable_rand (self_=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at /root/Documents/pytorch/torch/csrc/autograd/generated/python_torch_functions_0.cpp:5258

torch::autograd::THPVariable_rand函數定義於torch/csrc/autograd/generated/python_torch_functions_0.cpp,注意到該資料夾下還有其它名字類似但後綴不同的cpp檔。這些檔案(python_torch_functions_i.cpp)是在編譯PyTorch時由gen.py腳本依據native_functions.yamltools/autograd/templates/python_torch_functions.cpp這個模板來生成的,詳見PyTorch中的python_torch_functions_i.cpp檔案生成機制

THPVariable_rand函數的宣告如下:

static PyObject * THPVariable_rand(PyObject* self_, PyObject* args, PyObject* kwargs);

定義PyMethodDef陣列,Python中的rand函數會被對應到C++的THPVariable_rand函數:

static PyMethodDef torch_functions_shard[] = {
  //...
  {"rand", castPyCFunctionWithKeywords(THPVariable_rand), METH_VARARGS | METH_KEYWORDS | METH_STATIC, NULL},
  //...
}

THPVariable_rand函數的具體實現如下:

// generated methods start here

// ...

// rand
static PyObject * THPVariable_rand(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  HANDLE_TH_ERRORS
  static PythonArgParser parser({
    "rand(SymIntArrayRef size, *, Generator? generator, DimnameList? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
    "rand(SymIntArrayRef size, *, Generator? generator, Tensor out=None, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
    "rand(SymIntArrayRef size, *, Tensor out=None, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
    "rand(SymIntArrayRef size, *, DimnameList? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
  }, /*traceable=*/true);

  ParsedArgs<8> parsed_args;
  auto _r = parser.parse(nullptr, args, kwargs, parsed_args);
  if(_r.has_torch_function()) {
    return handle_torch_function(_r, nullptr, args, kwargs, THPVariableFunctionsModule, "torch");
  }
  switch (_r.idx) {
    case 0: {
      // aten::rand.generator_with_names(SymInt[] size, *, Generator? generator, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
      auto __names = _r.toDimnameListOptional(2);
      c10::optional<DimnameList> names = __names ? c10::make_optional(DimnameList(__names.value())) : c10::nullopt;
      const auto options = TensorOptions()
          .dtype(_r.scalartypeOptional(3))
          .device(_r.deviceWithDefault(5, torch::tensors::get_default_device()))
          .layout(_r.layoutOptional(4))
          .requires_grad(_r.toBool(7))
          .pinned_memory(_r.toBool(6));
      torch::utils::maybe_initialize_cuda(options);
      
      auto dispatch_rand = [](c10::SymIntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::DimnameList> names, at::TensorOptions options) -> at::Tensor {
        pybind11::gil_scoped_release no_gil;
        return torch::rand_symint(size, generator, names, options);
      };
      return wrap(dispatch_rand(_r.symintlist(0), _r.generator(1), names, options));
    }
    case 1: {
      if (_r.isNone(2)) {
        // aten::rand.generator(SymInt[] size, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
        const auto options = TensorOptions()
            .dtype(_r.scalartypeOptional(3))
            .device(_r.deviceWithDefault(5, torch::tensors::get_default_device()))
            .layout(_r.layoutOptional(4))
            .requires_grad(_r.toBool(7))
            .pinned_memory(_r.toBool(6));
        torch::utils::maybe_initialize_cuda(options);
        
        auto dispatch_rand = [](c10::SymIntArrayRef size, c10::optional<at::Generator> generator, at::TensorOptions options) -> at::Tensor {
          pybind11::gil_scoped_release no_gil;
          return torch::rand_symint(size, generator, options);
        };
        return wrap(dispatch_rand(_r.symintlist(0), _r.generator(1), options));
      } else {
        // aten::rand.generator_out(SymInt[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!)
        check_out_type_matches(_r.tensor(2), _r.scalartypeOptional(3),
                               _r.isNone(3), _r.layoutOptional(4),
                               _r.deviceWithDefault(5, torch::tensors::get_default_device()), _r.isNone(5));
        
        auto dispatch_rand_out = [](at::Tensor out, c10::SymIntArrayRef size, c10::optional<at::Generator> generator) -> at::Tensor {
          pybind11::gil_scoped_release no_gil;
          return at::rand_symint_out(out, size, generator);
        };
        return wrap(dispatch_rand_out(_r.tensor(2), _r.symintlist(0), _r.generator(1)).set_requires_grad(_r.toBool(7)));
      }
    }
    case 2: {
      if (_r.isNone(1)) {
        // aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
        const auto options = TensorOptions()
            .dtype(_r.scalartypeOptional(2))
            .device(_r.deviceWithDefault(4, torch::tensors::get_default_device()))
            .layout(_r.layoutOptional(3))
            .requires_grad(_r.toBool(6))
            .pinned_memory(_r.toBool(5));
        torch::utils::maybe_initialize_cuda(options);
        
        auto dispatch_rand = [](c10::SymIntArrayRef size, at::TensorOptions options) -> at::Tensor {
          pybind11::gil_scoped_release no_gil;
          return torch::rand_symint(size, options);
        };
        return wrap(dispatch_rand(_r.symintlist(0), options));
      } else {
        // aten::rand.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!)
        check_out_type_matches(_r.tensor(1), _r.scalartypeOptional(2),
                               _r.isNone(2), _r.layoutOptional(3),
                               _r.deviceWithDefault(4, torch::tensors::get_default_device()), _r.isNone(4));
        
        auto dispatch_rand_out = [](at::Tensor out, c10::SymIntArrayRef size) -> at::Tensor {
          pybind11::gil_scoped_release no_gil;
          return at::rand_symint_out(out, size);
        };
        return wrap(dispatch_rand_out(_r.tensor(1), _r.symintlist(0)).set_requires_grad(_r.toBool(6)));
      }
    }
    case 3: {
      // aten::rand.names(SymInt[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
      auto __names = _r.toDimnameListOptional(1);
      c10::optional<DimnameList> names = __names ? c10::make_optional(DimnameList(__names.value())) : c10::nullopt;
      const auto options = TensorOptions()
          .dtype(_r.scalartypeOptional(2))
          .device(_r.deviceWithDefault(4, torch::tensors::get_default_device()))
          .layout(_r.layoutOptional(3))
          .requires_grad(_r.toBool(6))
          .pinned_memory(_r.toBool(5));
      torch::utils::maybe_initialize_cuda(options);
      
      auto dispatch_rand = [](c10::SymIntArrayRef size, c10::optional<at::DimnameList> names, at::TensorOptions options) -> at::Tensor {
        pybind11::gil_scoped_release no_gil;
        return torch::rand_symint(size, names, options);
      };
      return wrap(dispatch_rand(_r.symintlist(0), names, options));
    }
  }
  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}

parser的定義中,rand函數依據簽名被分成了以下4大類,共6個函數:

第0個簽名需提供size, generator和names:

    "rand(SymIntArrayRef size, *, Generator? generator, DimnameList? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
// aten::rand.generator_with_names(SymInt[] size, *, Generator? generator, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor

第1種簽名需提供size和generator:

    "rand(SymIntArrayRef size, *, Generator? generator, Tensor out=None, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",

搭配注釋,可以知道第1種簽名又可分別被細分為2種API,第1-1種不需提供out:

// aten::rand.generator(SymInt[] size, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor

第1-2種是需提供out的變形:

// aten::rand.generator_out(SymInt[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!)

第2種簽名只需提供size:

    "rand(SymIntArrayRef size, *, Tensor out=None, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",

搭配注釋,可以知道第2種簽名又可分別被細分為2種API,第2-1種不需提供out:

// aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor

第2-2種是需提供out的變形:

// aten::rand.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!)

第3種簽名需提供size和names:

    "rand(SymIntArrayRef size, *, DimnameList? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? requires_grad=False)",
// aten::rand.names(SymInt[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor

因為我們調用的是torch.rand(3,4),只傳入size,所以此處會進入switch的case 2。

接著用_r.isNone(1)檢查第1個(0-based)參數,也就是out參數,是否為空。此處未提供out參數,所以進入if分支。對應到只提供sizeaten::rand

        // aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor

operator()

#21 operator() (__closure=<synthetic pointer>, options=..., size=...) at /root/Documents/pytorch/torch/csrc/autograd/generated/python_torch_functions_0.cpp:5256

torch/csrc/autograd/generated/python_torch_functions_0.cpp

這個節點其實就只是torch::autograd::THPVariable_rand中case 2的if分支裡定義的lambda函數:

        auto dispatch_rand = [](c10::SymIntArrayRef size, at::TensorOptions options) -> at::Tensor {
          pybind11::gil_scoped_release no_gil;
          return torch::rand_symint(size, options);
        };

其中pybind11::gil_scoped_release no_gil;釋放了GIL鎖,這使得接下來要調用的torch::rand_symint得以在其它的線程中執行(這裡有另外開thread?),相當於讓Python擁有了執行多線程的能力。詳見Python GIL及其釋放/獲取函數

C++ API

torch::rand_symint

#20 torch::rand_symint (options=..., size=...) at /root/Documents/pytorch/torch/csrc/autograd/generated/variable_factories.h:418

torch/csrc/autograd/generated/variable_factories.h

inline at::Tensor rand_symint(c10::SymIntArrayRef size, at::TensorOptions options = {}) {
  at::AutoDispatchBelowADInplaceOrView guard;
  return autograd::make_variable(at::rand_symint(size, at::TensorOptions(options).requires_grad(c10::nullopt)), /*requires_grad=*/options.requires_grad());
}

torch::rand_symint同樣只接受size參數,其函數內容則是先用at::rand_symint得到一個at::Tensor,再用autograd::make_variable包裝後返回。

at::rand_symint回傳的是一個沒有自動微分功能的純張量,autograd::make_variable的作用則是設定該張量關於自動微分的metadata,設定完成之後,該張量便擁有了自動微分的功能。關於torch::rand_symintat::rand_symint的差別,詳見torch::和at:: factory function的差別

以下是同一個檔案中出現的rand系列函數,

inline at::Tensor rand(at::IntArrayRef size, c10::optional<at::DimnameList> names, at::TensorOptions options = {}) {
  at::AutoDispatchBelowADInplaceOrView guard;
  return autograd::make_variable(at::rand(size, names, at::TensorOptions(options).requires_grad(c10::nullopt)), /*requires_grad=*/options.requires_grad());
}
//...
inline at::Tensor rand(at::IntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::DimnameList> names, at::TensorOptions options = {}) {
  at::AutoDispatchBelowADInplaceOrView guard;
  return autograd::make_variable(at::rand(size, generator, names, at::TensorOptions(options).requires_grad(c10::nullopt)), /*requires_grad=*/options.requires_grad());
}
//...
inline at::Tensor rand(at::IntArrayRef size, at::TensorOptions options = {}) {
  at::AutoDispatchBelowADInplaceOrView guard;
  return autograd::make_variable(at::rand(size, at::TensorOptions(options).requires_grad(c10::nullopt)), /*requires_grad=*/options.requires_grad());
}
//...
inline at::Tensor rand(at::IntArrayRef size, c10::optional<at::Generator> generator, at::TensorOptions options = {}) {
  at::AutoDispatchBelowADInplaceOrView guard;
  return autograd::make_variable(at::rand(size, generator, at::TensorOptions(options).requires_grad(c10::nullopt)), /*requires_grad=*/options.requires_grad());
}

分四種:

  • size, names
  • size, generator, names
  • size
  • size, generator

正好可與torch/_C/_VariableFucntions.pyitorch::autograd::THPVariable_rand中的4個函數一一對應。

at::rand_symint

#19 0x00007f046abdd595 in at::rand_symint (options=..., size=...) at /root/Documents/pytorch/build/aten/src/ATen/Functions.h:5770

build/aten/src/ATen/Functions.h

// aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand_symint(c10::SymIntArrayRef size, at::TensorOptions options={}) {
    return at::_ops::rand::call(size, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}

at::rand_symint函數同樣只接受size參數。

注釋中表明了at::rand_symintaten::rand的關係,待會會在native_functions.yaml看見aten::rand的身影。

下面還定義了at::symint::rand函數,其實現與at::rand_symint一致。

namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, c10::SymInt>::value>>
  at::Tensor rand(c10::SymIntArrayRef size, at::TensorOptions options={}) {
    return at::_ops::rand::call(size, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
  }
}

以下是同一個檔案中出現的rand系列函數,可分為aten::rand.names, aten::rand.generator_with_names, aten::rand, aten::rand.generator, aten::rand.out, aten::rand.generator_out六類,可與torch::autograd::THPVariable_rand注釋中的六個函數一一對應。

每類又依第一個參數型別為at::intArrayRefc10::SymIntArrayRef各分兩種,再依第二個以後的參數為dtype, layout, device, pin_memory或直接用at::TensorOptions包起來各分兩種,所以共有6 * 2 * 2 = 24個。

// aten::rand.names(SymInt[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand(at::IntArrayRef size, c10::optional<at::DimnameList> names, at::TensorOptions options={}) {
    return at::_ops::rand_names::call(c10::fromIntArrayRefSlow(size), names, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, int64_t>::value>>
  at::Tensor rand(at::IntArrayRef size, c10::optional<at::DimnameList> names, at::TensorOptions options={}) {
    return at::_ops::rand_names::call(c10::fromIntArrayRefSlow(size), names, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
  }
}

// aten::rand.names(SymInt[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand(at::IntArrayRef size, c10::optional<at::DimnameList> names, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    return at::_ops::rand_names::call(c10::fromIntArrayRefSlow(size), names, dtype, layout, device, pin_memory);
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, int64_t>::value>>
  at::Tensor rand(at::IntArrayRef size, c10::optional<at::DimnameList> names, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    return at::_ops::rand_names::call(c10::fromIntArrayRefSlow(size), names, dtype, layout, device, pin_memory);
  }
}

// aten::rand.names(SymInt[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand_symint(c10::SymIntArrayRef size, c10::optional<at::DimnameList> names, at::TensorOptions options={}) {
    return at::_ops::rand_names::call(size, names, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, c10::SymInt>::value>>
  at::Tensor rand(c10::SymIntArrayRef size, c10::optional<at::DimnameList> names, at::TensorOptions options={}) {
    return at::_ops::rand_names::call(size, names, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
  }
}

// aten::rand.names(SymInt[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand_symint(c10::SymIntArrayRef size, c10::optional<at::DimnameList> names, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    return at::_ops::rand_names::call(size, names, dtype, layout, device, pin_memory);
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, c10::SymInt>::value>>
  at::Tensor rand(c10::SymIntArrayRef size, c10::optional<at::DimnameList> names, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    return at::_ops::rand_names::call(size, names, dtype, layout, device, pin_memory);
  }
}

// aten::rand.generator_with_names(SymInt[] size, *, Generator? generator, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand(at::IntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::DimnameList> names, at::TensorOptions options={}) {
    return at::_ops::rand_generator_with_names::call(c10::fromIntArrayRefSlow(size), generator, names, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, int64_t>::value>>
  at::Tensor rand(at::IntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::DimnameList> names, at::TensorOptions options={}) {
    return at::_ops::rand_generator_with_names::call(c10::fromIntArrayRefSlow(size), generator, names, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
  }
}

// aten::rand.generator_with_names(SymInt[] size, *, Generator? generator, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand(at::IntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::DimnameList> names, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    return at::_ops::rand_generator_with_names::call(c10::fromIntArrayRefSlow(size), generator, names, dtype, layout, device, pin_memory);
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, int64_t>::value>>
  at::Tensor rand(at::IntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::DimnameList> names, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    return at::_ops::rand_generator_with_names::call(c10::fromIntArrayRefSlow(size), generator, names, dtype, layout, device, pin_memory);
  }
}

// aten::rand.generator_with_names(SymInt[] size, *, Generator? generator, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand_symint(c10::SymIntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::DimnameList> names, at::TensorOptions options={}) {
    return at::_ops::rand_generator_with_names::call(size, generator, names, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, c10::SymInt>::value>>
  at::Tensor rand(c10::SymIntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::DimnameList> names, at::TensorOptions options={}) {
    return at::_ops::rand_generator_with_names::call(size, generator, names, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
  }
}

// aten::rand.generator_with_names(SymInt[] size, *, Generator? generator, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand_symint(c10::SymIntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::DimnameList> names, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    return at::_ops::rand_generator_with_names::call(size, generator, names, dtype, layout, device, pin_memory);
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, c10::SymInt>::value>>
  at::Tensor rand(c10::SymIntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::DimnameList> names, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    return at::_ops::rand_generator_with_names::call(size, generator, names, dtype, layout, device, pin_memory);
  }
}

// aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand(at::IntArrayRef size, at::TensorOptions options={}) {
    return at::_ops::rand::call(c10::fromIntArrayRefSlow(size), optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, int64_t>::value>>
  at::Tensor rand(at::IntArrayRef size, at::TensorOptions options={}) {
    return at::_ops::rand::call(c10::fromIntArrayRefSlow(size), optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
  }
}

// aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand(at::IntArrayRef size, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    return at::_ops::rand::call(c10::fromIntArrayRefSlow(size), dtype, layout, device, pin_memory);
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, int64_t>::value>>
  at::Tensor rand(at::IntArrayRef size, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    return at::_ops::rand::call(c10::fromIntArrayRefSlow(size), dtype, layout, device, pin_memory);
  }
}

// aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand_symint(c10::SymIntArrayRef size, at::TensorOptions options={}) {
    return at::_ops::rand::call(size, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, c10::SymInt>::value>>
  at::Tensor rand(c10::SymIntArrayRef size, at::TensorOptions options={}) {
    return at::_ops::rand::call(size, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
  }
}

// aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand_symint(c10::SymIntArrayRef size, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    return at::_ops::rand::call(size, dtype, layout, device, pin_memory);
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, c10::SymInt>::value>>
  at::Tensor rand(c10::SymIntArrayRef size, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    return at::_ops::rand::call(size, dtype, layout, device, pin_memory);
  }
}

// aten::rand.generator(SymInt[] size, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand(at::IntArrayRef size, c10::optional<at::Generator> generator, at::TensorOptions options={}) {
    return at::_ops::rand_generator::call(c10::fromIntArrayRefSlow(size), generator, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, int64_t>::value>>
  at::Tensor rand(at::IntArrayRef size, c10::optional<at::Generator> generator, at::TensorOptions options={}) {
    return at::_ops::rand_generator::call(c10::fromIntArrayRefSlow(size), generator, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
  }
}

// aten::rand.generator(SymInt[] size, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand(at::IntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    return at::_ops::rand_generator::call(c10::fromIntArrayRefSlow(size), generator, dtype, layout, device, pin_memory);
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, int64_t>::value>>
  at::Tensor rand(at::IntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    return at::_ops::rand_generator::call(c10::fromIntArrayRefSlow(size), generator, dtype, layout, device, pin_memory);
  }
}

// aten::rand.generator(SymInt[] size, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand_symint(c10::SymIntArrayRef size, c10::optional<at::Generator> generator, at::TensorOptions options={}) {
    return at::_ops::rand_generator::call(size, generator, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, c10::SymInt>::value>>
  at::Tensor rand(c10::SymIntArrayRef size, c10::optional<at::Generator> generator, at::TensorOptions options={}) {
    return at::_ops::rand_generator::call(size, generator, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
  }
}

// aten::rand.generator(SymInt[] size, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
inline at::Tensor rand_symint(c10::SymIntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    return at::_ops::rand_generator::call(size, generator, dtype, layout, device, pin_memory);
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, c10::SymInt>::value>>
  at::Tensor rand(c10::SymIntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    return at::_ops::rand_generator::call(size, generator, dtype, layout, device, pin_memory);
  }
}

// aten::rand.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!)
inline at::Tensor & rand_out(at::Tensor & out, at::IntArrayRef size) {
    return at::_ops::rand_out::call(c10::fromIntArrayRefSlow(size), out);
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, int64_t>::value>>
  at::Tensor & rand_out(at::Tensor & out, at::IntArrayRef size) {
    return at::_ops::rand_out::call(c10::fromIntArrayRefSlow(size), out);
  }
}

// aten::rand.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!)
inline at::Tensor & rand_outf(at::IntArrayRef size, at::Tensor & out) {
    return at::_ops::rand_out::call(c10::fromIntArrayRefSlow(size), out);
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, int64_t>::value>>
  at::Tensor & rand_outf(at::IntArrayRef size, at::Tensor & out) {
    return at::_ops::rand_out::call(c10::fromIntArrayRefSlow(size), out);
  }
}

// aten::rand.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!)
inline at::Tensor & rand_symint_out(at::Tensor & out, c10::SymIntArrayRef size) {
    return at::_ops::rand_out::call(size, out);
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, c10::SymInt>::value>>
  at::Tensor & rand_out(at::Tensor & out, c10::SymIntArrayRef size) {
    return at::_ops::rand_out::call(size, out);
  }
}

// aten::rand.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!)
inline at::Tensor & rand_symint_outf(c10::SymIntArrayRef size, at::Tensor & out) {
    return at::_ops::rand_out::call(size, out);
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, c10::SymInt>::value>>
  at::Tensor & rand_outf(c10::SymIntArrayRef size, at::Tensor & out) {
    return at::_ops::rand_out::call(size, out);
  }
}

// aten::rand.generator_out(SymInt[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!)
inline at::Tensor & rand_out(at::Tensor & out, at::IntArrayRef size, c10::optional<at::Generator> generator) {
    return at::_ops::rand_generator_out::call(c10::fromIntArrayRefSlow(size), generator, out);
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, int64_t>::value>>
  at::Tensor & rand_out(at::Tensor & out, at::IntArrayRef size, c10::optional<at::Generator> generator) {
    return at::_ops::rand_generator_out::call(c10::fromIntArrayRefSlow(size), generator, out);
  }
}

// aten::rand.generator_out(SymInt[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!)
inline at::Tensor & rand_outf(at::IntArrayRef size, c10::optional<at::Generator> generator, at::Tensor & out) {
    return at::_ops::rand_generator_out::call(c10::fromIntArrayRefSlow(size), generator, out);
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, int64_t>::value>>
  at::Tensor & rand_outf(at::IntArrayRef size, c10::optional<at::Generator> generator, at::Tensor & out) {
    return at::_ops::rand_generator_out::call(c10::fromIntArrayRefSlow(size), generator, out);
  }
}

// aten::rand.generator_out(SymInt[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!)
inline at::Tensor & rand_symint_out(at::Tensor & out, c10::SymIntArrayRef size, c10::optional<at::Generator> generator) {
    return at::_ops::rand_generator_out::call(size, generator, out);
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, c10::SymInt>::value>>
  at::Tensor & rand_out(at::Tensor & out, c10::SymIntArrayRef size, c10::optional<at::Generator> generator) {
    return at::_ops::rand_generator_out::call(size, generator, out);
  }
}

// aten::rand.generator_out(SymInt[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!)
inline at::Tensor & rand_symint_outf(c10::SymIntArrayRef size, c10::optional<at::Generator> generator, at::Tensor & out) {
    return at::_ops::rand_generator_out::call(size, generator, out);
}
namespace symint {
  template <typename T, typename = std::enable_if_t<std::is_same<T, c10::SymInt>::value>>
  at::Tensor & rand_outf(c10::SymIntArrayRef size, c10::optional<at::Generator> generator, at::Tensor & out) {
    return at::_ops::rand_generator_out::call(size, generator, out);
  }
}

這些運算子是由native_functions.yaml生成,而由native_functions.yaml生成的函數的命名空間默認為aten,所以注釋裡所有函數都加上了aten::的前綴。

接下來會調用at::_ops::rand::call做分發(dispatch),但在進入該函數前,先來看一下at::_ops::rand的定義。

dispatch

at::_ops::rand

torch/include/ATen/Operators.h

build/aten/src/ATen/Operators.h

rand是一個結構體,有callredispatch兩個static的成員函數,待會都會用到。

struct TORCH_API rand {
  using schema = at::Tensor (c10::SymIntArrayRef, c10::optional<at::ScalarType>, c10::optional<at::Layout>, c10::optional<at::Device>, c10::optional<bool>);
  using ptr_schema = schema*;
  // See Note [static constexpr char* members for windows NVCC]
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(name, "aten::rand")
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(overload_name, "")
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(schema_str, "rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor")
  static at::Tensor call(c10::SymIntArrayRef size, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory);
  static at::Tensor redispatch(c10::DispatchKeySet dispatchKeySet, c10::SymIntArrayRef size, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory);
};

STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDAat::_ops::rand這個結構體宣告了三個名字分別為name, overload_nameschema_str的static字串成員變數,並在此處或待會在build/aten/src/ATen/Operators_2.cpp處賦予它們初始值。

對照schemacall成員函數,可以發現schema正是call函數的type signature。

同檔案下共有六個rand系列函數(rand_names, rand_generator_with_names, rand, rand_generator, rand_out, rand_generator_out),可與torch::autograd::THPVariable_rand注釋中的六個函數一一對應。

struct TORCH_API rand_names {
  using schema = at::Tensor (c10::SymIntArrayRef, c10::optional<at::DimnameList>, c10::optional<at::ScalarType>, c10::optional<at::Layout>, c10::optional<at::Device>, c10::optional<bool>);
  using ptr_schema = schema*;
  // See Note [static constexpr char* members for windows NVCC]
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(name, "aten::rand")
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(overload_name, "names")
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(schema_str, "rand.names(SymInt[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor")
  static at::Tensor call(c10::SymIntArrayRef size, c10::optional<at::DimnameList> names, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory);
  static at::Tensor redispatch(c10::DispatchKeySet dispatchKeySet, c10::SymIntArrayRef size, c10::optional<at::DimnameList> names, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory);
};

struct TORCH_API rand_generator_with_names {
  using schema = at::Tensor (c10::SymIntArrayRef, c10::optional<at::Generator>, c10::optional<at::DimnameList>, c10::optional<at::ScalarType>, c10::optional<at::Layout>, c10::optional<at::Device>, c10::optional<bool>);
  using ptr_schema = schema*;
  // See Note [static constexpr char* members for windows NVCC]
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(name, "aten::rand")
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(overload_name, "generator_with_names")
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(schema_str, "rand.generator_with_names(SymInt[] size, *, Generator? generator, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor")
  static at::Tensor call(c10::SymIntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::DimnameList> names, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory);
  static at::Tensor redispatch(c10::DispatchKeySet dispatchKeySet, c10::SymIntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::DimnameList> names, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory);
};

struct TORCH_API rand {
  using schema = at::Tensor (c10::SymIntArrayRef, c10::optional<at::ScalarType>, c10::optional<at::Layout>, c10::optional<at::Device>, c10::optional<bool>);
  using ptr_schema = schema*;
  // See Note [static constexpr char* members for windows NVCC]
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(name, "aten::rand")
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(overload_name, "")
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(schema_str, "rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor")
  static at::Tensor call(c10::SymIntArrayRef size, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory);
  static at::Tensor redispatch(c10::DispatchKeySet dispatchKeySet, c10::SymIntArrayRef size, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory);
};

struct TORCH_API rand_generator {
  using schema = at::Tensor (c10::SymIntArrayRef, c10::optional<at::Generator>, c10::optional<at::ScalarType>, c10::optional<at::Layout>, c10::optional<at::Device>, c10::optional<bool>);
  using ptr_schema = schema*;
  // See Note [static constexpr char* members for windows NVCC]
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(name, "aten::rand")
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(overload_name, "generator")
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(schema_str, "rand.generator(SymInt[] size, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor")
  static at::Tensor call(c10::SymIntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory);
  static at::Tensor redispatch(c10::DispatchKeySet dispatchKeySet, c10::SymIntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory);
};

struct TORCH_API rand_out {
  using schema = at::Tensor & (c10::SymIntArrayRef, at::Tensor &);
  using ptr_schema = schema*;
  // See Note [static constexpr char* members for windows NVCC]
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(name, "aten::rand")
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(overload_name, "out")
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(schema_str, "rand.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!)")
  static at::Tensor & call(c10::SymIntArrayRef size, at::Tensor & out);
  static at::Tensor & redispatch(c10::DispatchKeySet dispatchKeySet, c10::SymIntArrayRef size, at::Tensor & out);
};

struct TORCH_API rand_generator_out {
  using schema = at::Tensor & (c10::SymIntArrayRef, c10::optional<at::Generator>, at::Tensor &);
  using ptr_schema = schema*;
  // See Note [static constexpr char* members for windows NVCC]
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(name, "aten::rand")
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(overload_name, "generator_out")
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(schema_str, "rand.generator_out(SymInt[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!)")
  static at::Tensor & call(c10::SymIntArrayRef size, c10::optional<at::Generator> generator, at::Tensor & out);
  static at::Tensor & redispatch(c10::DispatchKeySet dispatchKeySet, c10::SymIntArrayRef size, c10::optional<at::Generator> generator, at::Tensor & out);
};

at::_ops::rand::call

#18 at::_ops::rand::call (size=..., dtype=..., layout=..., layout@entry=..., device=..., pin_memory=...) at /root/Documents/pytorch/build/aten/src/ATen/Operators_2.cpp:5213

build/aten/src/ATen/Operators_2.cpp

STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA(rand, name, "aten::rand")
STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA(rand, overload_name, "")
STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA(rand, schema_str, "rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor")

at::_ops::rand中使用STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA宣告了name, overload_name, schema_str等三個成員變數。這裡則用STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDArandname成員變數的值設為"aten::rand";把overload_name成員設為"";把schema_str成員設為"rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor"。如果build/aten/src/ATen/Operators.h處已經賦予初始值,那麼此處的代碼就是無效的。

at::_ops::rand::call如下:

// aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
static C10_NOINLINE c10::TypedOperatorHandle<rand::schema> create_rand_typed_handle() {
  return c10::Dispatcher::singleton()
      .findSchemaOrThrow(rand::name, rand::overload_name)
      .typed<rand::schema>();
}

// aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
at::Tensor rand::call(c10::SymIntArrayRef size, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    
    static auto op = create_rand_typed_handle();
    return op.call(size, dtype, layout, device, pin_memory);
}

此處調用create_rand_typed_handle,會查表做dispatch,而分發時查的表格則是由native_functions.yaml生成。

另外從注釋中可以看出at::_ops::rand::call和待會會在native_functions.yaml看到的aten::rand是一對一的關係。

可以看到c10::TypedOperatorHandle的模板參數rand::schema正是at::_ops::rand::calltype signature

c10::Dispatcher::singleton().findSchemaOrThrow(rand::name, rand::overload_name).typed<rand::schema>();這一段代碼當中又用rand::namerand::overload_name去尋找對應的c10::TypedOperatorHandle

native_functions.yaml

aten/src/ATen/native/native_functions.yaml

- func: rand.names(SymInt[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
  device_check: NoCheck
  device_guard: False
  dispatch:
    CompositeExplicitAutograd: rand
  autogen: rand.names_out
  tags: nondeterministic_seeded

- func: rand.generator_with_names(SymInt[] size, *, Generator? generator, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
  device_check: NoCheck
  device_guard: False
  tags: nondeterministic_seeded
  dispatch:
    CompositeExplicitAutograd: rand
  autogen: rand.generator_with_names_out

- func: rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
  tags: [core, nondeterministic_seeded]
  dispatch:
    CompositeExplicitAutograd: rand

- func: rand.generator(SymInt[] size, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
  tags: nondeterministic_seeded
  dispatch:
    CompositeExplicitAutograd: rand

- func: rand.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!)
  tags: nondeterministic_seeded
  dispatch:
    CompositeExplicitAutograd: rand_out

- func: rand.generator_out(SymInt[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!)
  tags: nondeterministic_seeded

native_functions.yaml有六個版本的rand,分別是:

  • size, names
  • size, generator, names
  • size
  • size, generator
  • size, out
  • size, generator, out

torch/include/ATen/Operators.hbuild/aten/src/ATen/Operators.h)中對應的四個rand系列函數正是由native_functions.yaml生成的。

回顧torch/_C/_VariableFunctions.pyi中的四個分類:“size, generator, names”,“size, generator(out可省略)”,“size(out可省略)”,“size, names”。這四個分類可以拆開來變成6個:“size, generator, names”,“size, generator, out”,“size, generator”,“size, out”,“size”,“size, names”,正好可以跟此處的六個條目一一對應。

回顧THPVariable_rand注釋中的六個函數:aten::rand.generator_with_names, aten::rand.generator, aten::rand.generator_out, aten::rand, aten::rand.out, aten::rand.names。正可與此處的六個函數一一對應。另外因為由native_functions.yamlfunc:後的函數的命名空間默認為aten,所以注釋裡所有函數都加上了aten::的前綴。

at::_ops::rand::call(對應到aten::rand)調用了c10::TypedOperatorHandle<rand::schema>::call函數做分發,從Python bindings處已知torch.rand(3, 4)是對應到此處第三個沒有generator,names和out的rand函數,來到native_functions.yaml查表:

- func: rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
  tags: [core, nondeterministic_seeded]
  dispatch:
    CompositeExplicitAutograd: rand

後來到at::(anonymous namespace)::rand

at::(anonymous namespace)::rand

#11 0x00007f0462c05329 in at::(anonymous namespace)::rand (pin_memory=..., device=..., layout=..., dtype=..., size=...) at /root/Documents/pytorch/build/aten/src/ATen/RegisterBackendSelect.cpp:365

build/aten/src/ATen/RegisterBackendSelect.cpp

// aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
C10_ALWAYS_INLINE
at::Tensor rand(c10::SymIntArrayRef size, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
  DispatchKeySet _dk = c10::DispatchKeySet(c10::computeDispatchKey(dtype, layout, device));
  return at::_ops::rand::redispatch(
      _dk, size, dtype, layout, device, pin_memory);
}

進行再次分發(redispatch)。

在同一個檔案下有:

namespace at {

namespace {
//...
TORCH_LIBRARY_IMPL(aten, BackendSelect, m) {
  //...
  m.impl("aten::rand.names", TORCH_FN(rand_names));
  m.impl("aten::rand.generator_with_names", TORCH_FN(rand_generator_with_names));
  m.impl("aten::rand", TORCH_FN(rand));
  m.impl("aten::rand.generator", TORCH_FN(rand_generator));
  //...
}

} // namespace
} // at

m.impl("aten::rand", TORCH_FN(rand));這個欄位中,rand全名為at::(anonymous namespace)::rand函數,它與aten::rand在這裡被關聯起來了。

而剛剛看到的at::_ops::rand::call的注釋中也有aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor,因此at::ops_rand::callaten::rand就這樣間接地被關聯起來了。

redisptach

at::_ops::rand::redispatch

#10 at::_ops::rand::redispatch (dispatchKeySet=..., size=..., dtype=..., layout=..., layout@entry=..., device=..., pin_memory=...) at /root/Documents/pytorch/build/aten/src/ATen/Operators_2.cpp:5220

build/aten/src/ATen/Operators_2.cpp

// aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
at::Tensor rand::redispatch(c10::DispatchKeySet dispatchKeySet, c10::SymIntArrayRef size, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    
    static auto op = create_rand_typed_handle();
    return op.redispatch(dispatchKeySet, size, dtype, layout, device, pin_memory);
}

native_functions.yaml

回到native_functions.yaml查表,關注aten::rand

- func: rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
  tags: [core, nondeterministic_seeded]
  dispatch:
    CompositeExplicitAutograd: rand

查看dispatch欄位,下有一個名為CompositeExplicitAutograd的key,表示redispatch後會去到CompositeExplicitAutograd backend,即待會會看到的wrapper_CompositeExplicitAutograd__rand函數;value則是rand,因為dispatch:後預設的命名空間是at::native,所以redispatch的終點是at::native::rand函數。

CPU kernel

at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__rand

#3  0x00007f0462dd2c28 in at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__rand (pin_memory=..., device=..., layout=..., 
    dtype=..., size=...) at /root/Documents/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:2214

build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp

namespace {
at::Tensor wrapper_CompositeExplicitAutograd__rand(c10::SymIntArrayRef size, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    // No device check
  // DeviceGuard omitted
  return at::native::rand(C10_AS_INTARRAYREF_SLOW(size), dtype, layout, device, pin_memory);
}
} // anonymous namespace
TORCH_LIBRARY_IMPL(aten, CPU, m) {
// ...
m.impl("rand",
TORCH_FN(wrapper_CompositeExplicitAutograd__rand));
m.impl("rand.out",
TORCH_FN(wrapper_CompositeExplicitAutograd_out_rand_out));
m.impl("rand.generator",
TORCH_FN(wrapper_CompositeExplicitAutograd_generator_rand));
// ...
};

此處將aten::randwrapper_CompositeExplicitAutograd__rand做關聯。

而從at::_ops::rand::redispatch的注釋中可以看到它與aten::rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor也是相關聯的,所以at::_ops::rand::redispatch的目的地才會是這裡。

wrapper_CompositeExplicitAutograd__rand函數在同一個檔案下被調用:

at::Tensor rand(at::IntArrayRef size, at::TensorOptions options) {
return wrapper_CompositeExplicitAutograd__rand(c10::fromIntArrayRefSlow(size), optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
at::Tensor rand(at::IntArrayRef size, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
return wrapper_CompositeExplicitAutograd__rand(c10::fromIntArrayRefSlow(size), dtype, layout, device, pin_memory);
}
at::Tensor rand_symint(c10::SymIntArrayRef size, at::TensorOptions options) {
return wrapper_CompositeExplicitAutograd__rand(size, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
at::Tensor rand_symint(c10::SymIntArrayRef size, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
return wrapper_CompositeExplicitAutograd__rand(size, dtype, layout, device, pin_memory);
}

注:在PyTorch 1.14版中是build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp中的at::(anonymous namespace)::(anonymous namespace)::wrapper__rand函數。

同檔案下的rand系列函數:

namespace {
at::Tensor wrapper_CompositeExplicitAutograd_names_rand(c10::SymIntArrayRef size, c10::optional<at::DimnameList> names, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    // No device check
  // DeviceGuard omitted
  return at::native::rand(C10_AS_INTARRAYREF_SLOW(size), names, dtype, layout, device, pin_memory);
}
} // anonymous namespace
namespace {
at::Tensor & wrapper_CompositeExplicitAutograd_names_out_rand_out(c10::SymIntArrayRef size, c10::optional<at::DimnameList> names, at::Tensor & out) {
    // No device check
  // DeviceGuard omitted
  return at::native::rand_names_out_symint(size, names, out);
}
} // anonymous namespace
namespace {
at::Tensor wrapper_CompositeExplicitAutograd_generator_with_names_rand(c10::SymIntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::DimnameList> names, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    // No device check
  // DeviceGuard omitted
  return at::native::rand(C10_AS_INTARRAYREF_SLOW(size), generator, names, dtype, layout, device, pin_memory);
}
} // anonymous namespace
namespace {
at::Tensor & wrapper_CompositeExplicitAutograd_generator_with_names_out_rand_out(c10::SymIntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::DimnameList> names, at::Tensor & out) {
    // No device check
  // DeviceGuard omitted
  return at::native::rand_generator_with_names_out_symint(size, generator, names, out);
}
} // anonymous namespace
namespace {
at::Tensor wrapper_CompositeExplicitAutograd__rand(c10::SymIntArrayRef size, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    // No device check
  // DeviceGuard omitted
  return at::native::rand(C10_AS_INTARRAYREF_SLOW(size), dtype, layout, device, pin_memory);
}
} // anonymous namespace
namespace {
at::Tensor & wrapper_CompositeExplicitAutograd_out_rand_out(c10::SymIntArrayRef size, at::Tensor & out) {
    // No device check
  // DeviceGuard omitted
  return at::native::rand_out(C10_AS_INTARRAYREF_SLOW(size), out);
}
} // anonymous namespace
namespace {
at::Tensor wrapper_CompositeExplicitAutograd_generator_rand(c10::SymIntArrayRef size, c10::optional<at::Generator> generator, c10::optional<at::ScalarType> dtype, c10::optional<at::Layout> layout, c10::optional<at::Device> device, c10::optional<bool> pin_memory) {
    // No device check
  // DeviceGuard omitted
  return at::native::rand(C10_AS_INTARRAYREF_SLOW(size), generator, dtype, layout, device, pin_memory);
}
} // anonymous namespace

names_randnames_out_rand_outgenerator_with_names_randgenerator_with_names_out_rand_out_randout_rand_outgenrator_rand共七個(缺了有generator和out的版本?)。

at::native::rand

#2  0x00007f04620d61fc in at::native::rand (size=..., dtype=..., layout=..., layout@entry=..., device=..., pin_memory=...)
    at /root/Documents/pytorch/aten/src/ATen/native/TensorFactories.cpp:781

aten/src/ATen/native/TensorFactories.cpp

// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ rand ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Tensor rand(IntArrayRef size,
    c10::optional<ScalarType> dtype,
    c10::optional<Layout> layout,
    c10::optional<Device> device,
    c10::optional<bool> pin_memory) {
  return native::rand(size, static_cast<c10::optional<Generator>>(c10::nullopt), dtype, layout, device, pin_memory);
}

創建一個nullopt的generator後調用同一個檔案下簽名較完整的at::native::rand函數。

at::native::rand

#1  0x00007f04620d60b1 in at::native::rand (size=..., generator=..., dtype=..., layout=..., device=..., pin_memory=...)
    at /root/Documents/pytorch/c10/util/Optional.h:204

在backtrace中裡直接從at::native::rand跳來了c10/util/Optional.h,有點不明所以,以下是筆者猜測的完整調用路徑。

首先在at::native::rand中有一個static_cast<c10::optional<Generator>>(c10::nullopt)的參數,當中用到了c10::optional

c10::optional

c10/util/Optional.h

template <class T>
class optional : private OptionalBase<T> {
    // ...
};

c10::OptionalBase

c10/util/Optional.h

template <class T>
using OptionalBase = std::conditional_t<
    detail_::is_arrayref<T>::value,
    arrayref_optional_base<T>,
    std::conditional_t<
        std::is_trivially_destructible<T>::value &&
            C10_IS_TRIVIALLY_COPYABLE(T) &&
            // Avoid using is_trivially_copy_{constructible,assignable}
            // because old GCC versions don't support them. Also,
            // is_trivially_copyable seems not to do what I expect, so check
            // trivially_copyable_optimization_optional_base directly.
            std::is_copy_constructible<
                trivially_copyable_optimization_optional_base<T>>::value &&
            std::is_copy_assignable<
                trivially_copyable_optimization_optional_base<T>>::value,
        trivially_copyable_optimization_optional_base<T>,
        std::conditional_t<
            std::is_trivially_destructible<T>::value, // if possible
            constexpr_optional_base<std::remove_const_t<T>>, // use base with
                                                             // trivial
                                                             // destructor
            optional_base<std::remove_const_t<T>>>>>;

當中用到了constexpr_optional_base

c10::constexpr_optional_base

c10/util/Optional.h

template <class T>
struct constexpr_optional_base {
  bool init_;
  constexpr_storage_t<T> storage_;
  // ...
};

constexpr_storage_t

c10/util/Optional.h

template <class T>
union constexpr_storage_t {
  unsigned char dummy_;
  T value_;

#if __cplusplus >= 202002L
  // C++20 lifted the requirement to initialize a union member in order to be
  // constexpr.
  constexpr constexpr_storage_t(trivial_init_t) noexcept {
    new (&dummy_) unsigned char;
  }
#else
  constexpr constexpr_storage_t(trivial_init_t) noexcept : dummy_() {}
#endif

  template <class... Args>
  constexpr constexpr_storage_t(Args&&... args)
      : value_(constexpr_forward<Args>(args)...) {}

  ~constexpr_storage_t() = default;
};

從backtrace看此處是進入了else分支,然後直接進入at::empty函數,仍然不懂箇中奧妙。

at::native::rand

下面則是筆者根據調用處的函數簽名找到的對應的函數定義:

aten/src/ATen/native/TensorFactories.cpp

Tensor rand(IntArrayRef size, c10::optional<Generator> generator,
    c10::optional<ScalarType> dtype,
    c10::optional<Layout> layout,
    c10::optional<Device> device,
    c10::optional<bool> pin_memory) {
  // See [Note: hacky wrapper removal for TensorOptions]
  TensorOptions options = TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory);

  auto result = at::empty(size, options);
  return result.uniform_(0, 1, std::move(generator));
}

首先調用at::empty得到一個at::Tensor後,再使用at::Tensor::uniform_來讓張量裡的元素化為均勻分布。自此將進入at::emptyuniform_函數,結束rand函數的旅程。

  • 8
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值