# TH库学习(二): THTensorApply宏观理解(简化)

[1] PyTorch源码浅析（一）
[2] PyTorch源码浅析（二）
[3] tiny_lib

TensorApply系列的宏函数是TH实现各种张量元素操作最重要的操作，它们负责把一个针对某些标量的操作应用到多个张量元素上去。在GPU部分是相当于一个map的操作。大致方法是优先去操作内存连续部分，然后再操作不连续的部分，以增加CPU cache命中率。

/*
1. 先从下标最外层的维度开始循环，因为按照stride和size的计算公式，最外层的维度步长肯定是1

Tensor B由一个又一个A组成

2. 然后我们开始从最里面的维度循环Tensor B，每次从Tensor B中取得的元素，都是一个连续的A，然后

1 2 3 4
5 6 7 8

2 3 4
6 7 8

#define __TH_TENSOR_APPLYX_PREAMBLE(TYPE, TENSOR, DIM, ALLOW_CONTIGUOUS)

#define  __TH_TENSOR_APPLYX_UPDATE_COUNTERS(TENSOR, ALWAYS_UPDATE)
*/
// ##########################################################
/*
* The basic strategy for apply is as follows:
*
* 1. Starting with the outermost index, loop until we reach a dimension where the
* data is no longer contiguous, i.e. the stride at that dimension is not equal to
* the size of the tensor defined by the outer dimensions. Let's call this outer
* (contiguous) tensor A. Note that if the Tensor is contiguous, then A is equal
* to the entire Tensor. Let's call the inner tensor B.
*
* 2. We loop through the indices in B, starting at its outermost dimension. For
* example, if B is a 2x2 matrix, then we do:
*
* B[0][0]
* B[0][1]
* B[1][0]
* B[1][1]
*
* We set the offset into the underlying storage as (storageOffset + stride_B * index_B),
* i.e. basically we compute the offset into the storage as we would normally for a
* Tensor. But because we are guaranteed the subsequent data is contiguous in memory, we
* can simply loop for sizeof(A) iterations and perform the operation, without having to
* follow the order described by the strides of A.
*
* 3. As an optimization, we merge dimensions of A that are contiguous in memory. For
* example, if A is a 3x3x3x3 tensor narrowed from a 3x3x4x3 tensor, then the first two
* dimensions can be merged for the purposes of APPLY, reducing the number of nested
* loops.
*/



• 广告
• 抄袭
• 版权
• 政治
• 色情
• 无意义
• 其他

120