Torch学习(二)

最新推荐文章于 2024-07-24 09:57:17 发布

sunflower_Yolanda

最新推荐文章于 2024-07-24 09:57:17 发布

阅读量2.9k

点赞数

分类专栏： Torch 文章标签： torch

本文链接：https://blog.csdn.net/sunflower_Yolanda/article/details/51726638

版权

Torch 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

本文介绍了Torch中用于构建神经网络的基本模块和容器，包括Module的forward和backward方法，以及如何进行梯度计算和参数更新。还讨论了各种特定的层如线性层、激活函数和Dropout等，以及Container如Sequential和ParallelTable的使用。

摘要由CSDN通过智能技术生成

Modules are the bricks used to build neural networks. Each are themselves neural networks, but can be combined with other networks using containers to create complex neural networks
Criterions compute a gradient according to a given loss function given an input and a target
Module
- Module is an abstract class which defines fundamental methods necessary for a training a neural network. Modules are serializable
- [output] forward(input): Takes an input object, and computes the corresponding output of the module. After a forward(), the output state variable should have been updated to the new value. It is not advised to override this function. Instead, one should implement updateOutput(input) function. The forward module in the abstract parent class Module will call updateOutput(input)
- [gradInput] backward(input, gradOutput) : Performs a backpropagation step through the module, with respect to the given input. It is not advised to override this function call in custom classes. It is better to override updateGradInput(input, gradOutput) and accGradParameters(input, gradOutput,scale) functions.
- updateOutput(input)
- [gradInput] updateGradInput(input, gradOutput)
- accGradParameters(input, gradOutput, scale) : Computing the gradient of the module with respect to its own parameters. scale is a scale factor that is multiplied with the gradParameters before being accumulated. Zeroing this accumulation is achieved with zeroGradParameters() and updating the parameters according to this accumulation is done with updateParameters().The module is expected to accumulate the gradients with respect to the parameters in some variable.
- zeroGradParameters()
- updateParameters(learningRate): parameters = parameters - learningRate * gradients_wrt_parameters
- accUpdateGradParameters(input, gradOutput, learningRate): Calculates and accumulates the gradients with respect to the weights after multiplying with negative of the learning rate learningRate.
```
function Module:accUpdateGradParameters(input, gradOutput, lr)
   local gradWeight = self.gradWeight
   local gradBias = self.gradBias
   self.gradWeight = self.weight
   self.gradBias = self.bias
   self:accGradParameters(input, gradOutput, -lr)
   self.gradWeight = gradWeight
   self.gradBias = gradBias
end
```
As it can be seen, the gradients are accumulated directly into weights. This assumption may not be true for a module that computes a nonlinear operation. ?没看懂
有关梯度下降算法可参考神经网络模型随机梯度下降法—简单实现与Torch应用
- share(mlp,s1,s2,…,sn)
- clone(mlp,…)
- type(type[, tensorCache])
- float([tensorCache])
- double([tensorCache])
- cuda([tensorCache])
- State Variables
  - output
  - gradInput
- Parameters and gradients w.r.t parameters
  - [{weights}, {gradWeights}] parameters()
  - [flatParameters, flatGradParameters] getParameters() ?没看懂
  - training()
  - evaluate()
  - findModules(typename)
  - listModules()
  - clearState()
  - apply(function)
  - replace(function)

Containers

Container
- add(module)
- get(index)
- size()

Sequential

creating a one hidden-layer multi-layer perceptron

mlp = nn.Sequential()
mlp:add( nn.Linear(10, 25) ) -- 10 input, 25 hidden units
mlp:add( nn.Tanh() ) -- some hyperbolic tangent transfer function
mlp:add( nn.Linear(25, 1) ) -- 1 output

print(mlp:forward(torch.randn(10)))

network:

nn.Sequential {
  [input -> (1) -> (2) -> (3) -> output]
  (1): nn.Linear(10 -> 25)
  (2): nn.Tanh
  (3): nn.Linear(25 -> 1)
}

Result:

-0.1815
[torch.Tensor of dimension 1]

remove([index])
insert(module, [index])
module = nn.Concat(dim)
module = nn.DepthConcat(dim)
module = nn.ConcatTable() ?没看懂
module = nn.ParallelTable() ?没看懂

Transfer functions
- HardTanh
- HardShrink
- SoftShrink
- SoftMax
- SoftMin
- SoftPlus
- SoftSign
- LogSigmoid
- LogSoftMax
- Sigmoid
- Tanh
- ReLU
- ReLU6
- PReLU
- RReLU
- ELU
- LeakyReLU
- SpatialSoftMax
- AddConstant
- MulConstant
Simple layers
- Parameterized Modules
  - Linear
  - SparseLinear
  - Bilinear
  - PartialLinear
  - Add
  - Mul
  - CMul
  - Euclidean
  - WeightedEuclidean
  - Cosine
- Modules that adapt basic Tensor methods
  - Copy
  - Narrow
  - Replicate
  - Reshape
  - View
  - Select
  - MaskedSelect
  - Index
  - Squeeze
  - Unsqueeze
  - Transpose
- Modules that adapt mathematical Tensor methods
  - AddConstant
  - MulConstant
  - Max
  - Min
  - Mean
  - Sum
  - Exp
  - Log
  - Abs
  - Power
  - Square
  - Sqrt
  - Clamp
  - Normalize
  - MM
- Miscellaneous Module
  - BatchNormalization
  - Identity
  - Dropout
  - SpatialDropout
  - VolumetricDropout
  - Padding
  - L1Penalty
  - GradientReversal
  - 有关dropout Deep learning：四十一(Dropout简单理解)
Table layers
- table Container Modules encapsulate sub-Modules
  - ConcatTable
  - ParallelTable
- Table Conversion Modules convert between tables and Tensors or tables
  - SplitTable
  - JoinTable
  - MixtureTable
  - SelectTable
  - NarrowTable
  - FlattenTable
- Pair Modules compute a measure like distance or similarity from a pair (table) of input Tensors
  - PairwiseDistance
  - DotProduct
  - CosineDistance
- CMath Modules perform element-wise operations on a table of Tensors
  - CAddTable
  - CSubTable
  - CMulTable
  - CDivTable
- Table of Criteria
  - CriterionTable
Convolution layers
Criterions
10.