deep learning

TH库学习(二): THTensorApply宏观理解(简化)

[1] PyTorch源码浅析（一）
[2] PyTorch源码浅析（二）
[3] tiny_lib

TensorApply系列的宏函数是TH实现各种张量元素操作最重要的操作，它们负责把一个针对某些标量的操作应用到多个张量元素上去。在GPU部分是相当于一个map的操作。大致方法是优先去操作内存连续部分，然后再操作不连续的部分，以增加CPU cache命中率。

/*
1. 先从下标最外层的维度开始循环，因为按照stride和size的计算公式，最外层的维度步长肯定是1

Tensor B由一个又一个A组成

2. 然后我们开始从最里面的维度循环Tensor B，每次从Tensor B中取得的元素，都是一个连续的A，然后

1 2 3 4
5 6 7 8

2 3 4
6 7 8

#define __TH_TENSOR_APPLYX_PREAMBLE(TYPE, TENSOR, DIM, ALLOW_CONTIGUOUS)

#define  __TH_TENSOR_APPLYX_UPDATE_COUNTERS(TENSOR, ALWAYS_UPDATE)
*/
// ##########################################################
/*
* The basic strategy for apply is as follows:
*
* 1. Starting with the outermost index, loop until we reach a dimension where the
* data is no longer contiguous, i.e. the stride at that dimension is not equal to
* the size of the tensor defined by the outer dimensions. Let's call this outer
* (contiguous) tensor A. Note that if the Tensor is contiguous, then A is equal
* to the entire Tensor. Let's call the inner tensor B.
*
* 2. We loop through the indices in B, starting at its outermost dimension. For
* example, if B is a 2x2 matrix, then we do:
*
* B[0][0]
* B[0][1]
* B[1][0]
* B[1][1]
*
* We set the offset into the underlying storage as (storageOffset + stride_B * index_B),
* i.e. basically we compute the offset into the storage as we would normally for a
* Tensor. But because we are guaranteed the subsequent data is contiguous in memory, we
* can simply loop for sizeof(A) iterations and perform the operation, without having to
* follow the order described by the strides of A.
*
* 3. As an optimization, we merge dimensions of A that are contiguous in memory. For
* example, if A is a 3x3x3x3 tensor narrowed from a 3x3x4x3 tensor, then the first two
* dimensions can be merged for the purposes of APPLY, reducing the number of nested
* loops.
*/



TH库学习(一): THTensor, THStorage, THAllocator介绍

2018-03-22 00:45:11

torch THTensorApply.h关于Tensor计算

2018-03-22 19:44:14

torch学习笔记3.2：实现自定义模块(cpu)

2016-11-06 22:17:38

Unity-官方教程：我如何制作一个天空盒？

2016-02-11 10:53:55

原生Android开发—Jar包生成

2017-11-23 11:53:57

TensorFlow学习笔记3

2017-11-13 15:21:07

android studio 项目内部依赖 jar包生成

2017-11-15 20:32:41

Windows下Anaconda、tensorflow、pycharm的安装(cpu版本)

2018-01-18 11:16:42

从宏观到微观理解coding

2016-10-04 21:49:44

个人对宏观经济学的理解

2012-08-20 23:29:03