基本实现:
1) NestedTensorImpl基本实现
1.1) NestedTensor只是dispatchkeyset中包含 NestedTensor的Tensor;
1.2) 連續性上只支持正向连续;
1.3) 通过创建一块连续的大buffer,将各个Tensor内存拷贝到buffer tensor, 通过nested_sizes_, nested_strides_, nested_offsets_进行信息记录,也是根据这几个信息,判断各个Tensor和buffer的连续性;
1.4) opt_sizes_維度0的數據是可靠的,其他維度數據僅當各個Tensor的對應dim size一致才有效,反之爲-1.
2) python端到c++端基本实现:
torch/nested/__init__.py
2.1) torch.nested.as_nested_tensor: Constructs a nested tensor preserving autograd history from :attr:`tensor_list` a list of tensors. a list of tensors with the same ndim. Tensors within the list are always copied by this function due to current nested tensor semantics.
调用: torch._nested_tensor_from_tensor_list(tensor_list, dtype, None, device, None)
2.1.1) 通过一个大的buffer tensor,将NestedNode中的所有Tensor数据拷贝到该buffer tensor中;其中会将NestedNode中的tensor先进行连续性处理,再进行拷贝. 连续且一致的cpu tensor可以进行并发的memcpy, 其他场景调用reshape+cat的方式进行处理; 其为一个设备端的tensor.
2.1.2) sizes: 维度为 ndim() + 1, 第一维记录NestedNode的size,即tensor的个数, 第二维为ndim(), 记录各个NestedNode中各个Tensor的size大小. 其为一个CPU的tensor.
2.1.3) 如何能够保持autograd history, 这个点不是很清楚.
2.2) torch.nested.to_padded_tensor(nt, 0.0): Returns a new (non-nested) Tensor by padding the :attr:`input` nested tensor. always copies the underlying data, since the nested and the non-nested tensors differ in memory layout.
调用: NestedTensor_to_padded_tensor_cuda 算子分发路径执行
cuda c函数查看: add_padding_kernelLauncher
待补充
cpu函数实现查看: NestedTensor_to_padded_tensor_generic
待补充
2.3) torch.nested.nested_tensor([a, b], requires_grad=True): Constructs a nested tensor with no autograd history. where each element of the list has the same dimensionality.
调用: torch/csrc/autograd/python_nested_functions_manual.cpp
2.3.1) nested_tensor_ctor函数中对原有tensor进行detach操作,故梯度不共享;
2.3.2)传入的dtype,device等不一定生效,最终根据tensor的属性进行设置;
2.3.3)传入的不可以是nestedTensor。