- Modules are the bricks used to build neural networks. Each are themselves neural networks, but can be combined with other networks using containers to create complex neural networks
- Criterions compute a gradient according to a given loss function given an input and a target
Module
- Module is an abstract class which defines fundamental methods necessary for a training a neural network. Modules are serializable
- [output] forward(input): Takes an input object, and computes the corresponding output of the module. After a forward(), the output state variable should have been updated to the new value. It is not advised to override this function. Instead, one should implement updateOutput(input) function. The forward module in the abstract parent class Module will call updateOutput(input)
- [gradInput] backward(input, gradOutput) : Performs a backpropagation step through the module, with respect to the given input. It is not advised to override this function call in custom classes. It is better to override updateGradInput(input, gradOutput) and accGradParameters(input, gradOutput,scale) functions.
- updateOutput(input)
- [gradInput] updateGradInput(input, gradOutput)
- accGradParameters(input, gradOutput, scale) : Computing the gradient of the module with respect to its own parameters. scale is a scale factor that is multiplied with the gradParameters before being accumulated. Zeroing this accumulation is achieved with zeroGradParameters() and updating the parameters according to this accumulation is done with updateParameters().The module is expected to accumulate the gradients with respect to the parameters in some variable.
- zeroGradParameters()
- updateParameters(learningRate): parameters = parameters - learningRate * gradients_wrt_parameters
- accUpdateGradParameters(input, gradOutput, learningRate): Calculates and accumulates the gradients with respect to the weights after multiplying with negative of the learning rate learningRate.
function Module:accUpdateGradParameters(input, gradOutput, lr) local gradWeight = self.gradWeight local gradBias = self.gradBias self.gradWeight = self.weight self.gradBias = self.bias self:accGradParameters(input, gradOutput, -lr) self.gradWeight = gradWeight self.gradBias = gradBias end
As it can be seen, the gradients are accumulated directly into weights. This assumption may not be true for a module that computes a nonlinear operation. ?没看懂
有关梯度下降算法可参考神经网络模型随机梯度下降法—简单实现与Torch应用- share(mlp,s1,s2,…,sn)
- clone(mlp,…)
- type(type[, tensorCache])
- float([tensorCache])
- double([tensorCache])
- cuda([tensorCache])
- State Variables
- output
- gradInput
- Parameters and gradients w.r.t parameters
- [{weights}, {gradWeights}] parameters()
- [flatParameters, flatGradParameters] getParameters() ?没看懂
- training()
- evaluate()
- findModules(typename)
- listModules()
- clearState()
- apply(function)
- replace(function)
Containers
- Container
- add(module)
- get(index)
- size()
Sequential
- creating a one hidden-layer multi-layer perceptron
mlp = nn.Sequential() mlp:add( nn.Linear(10, 25) ) -- 10 input, 25 hidden units mlp:add( nn.Tanh() ) -- some hyperbolic tangent transfer function mlp:add( nn.Linear(25, 1) ) -- 1 output print(mlp:forward(torch.randn(10)))
network:
nn.Sequential { [input -> (1) -> (2) -> (3) -> output] (1): nn.Linear(10 -> 25) (2): nn.Tanh (3): nn.Linear(25 -> 1) }
Result:
-0.1815 [torch.Tensor of dimension 1]
- remove([index])
- insert(module, [index])
- module = nn.Concat(dim)
- module = nn.DepthConcat(dim)
- module = nn.ConcatTable() ?没看懂
- module = nn.ParallelTable() ?没看懂
- Container
- Transfer functions
- HardTanh
- HardShrink
- SoftShrink
- SoftMax
- SoftMin
- SoftPlus
- SoftSign
- LogSigmoid
- LogSoftMax
- Sigmoid
- Tanh
- ReLU
- ReLU6
- PReLU
- RReLU
- ELU
- LeakyReLU
- SpatialSoftMax
- AddConstant
- MulConstant
- Simple layers
- Parameterized Modules
- Linear
- SparseLinear
- Bilinear
- PartialLinear
- Add
- Mul
- CMul
- Euclidean
- WeightedEuclidean
- Cosine
- Modules that adapt basic Tensor methods
- Copy
- Narrow
- Replicate
- Reshape
- View
- Select
- MaskedSelect
- Index
- Squeeze
- Unsqueeze
- Transpose
- Modules that adapt mathematical Tensor methods
- AddConstant
- MulConstant
- Max
- Min
- Mean
- Sum
- Exp
- Log
- Abs
- Power
- Square
- Sqrt
- Clamp
- Normalize
- MM
- Miscellaneous Module
- BatchNormalization
- Identity
- Dropout
- SpatialDropout
- VolumetricDropout
- Padding
- L1Penalty
- GradientReversal
- 有关dropout Deep learning:四十一(Dropout简单理解)
- Parameterized Modules
- Table layers
- table Container Modules encapsulate sub-Modules
- ConcatTable
- ParallelTable
- Table Conversion Modules convert between tables and Tensors or tables
- SplitTable
- JoinTable
- MixtureTable
- SelectTable
- NarrowTable
- FlattenTable
- Pair Modules compute a measure like distance or similarity from a pair (table) of input Tensors
- PairwiseDistance
- DotProduct
- CosineDistance
- CMath Modules perform element-wise operations on a table of Tensors
- CAddTable
- CSubTable
- CMulTable
- CDivTable
- Table of Criteria
- CriterionTable
- table Container Modules encapsulate sub-Modules
- Convolution layers
- Criterions
- 10.
Torch学习(二)
最新推荐文章于 2024-07-24 09:57:17 发布
本文介绍了Torch中用于构建神经网络的基本模块和容器,包括Module的forward和backward方法,以及如何进行梯度计算和参数更新。还讨论了各种特定的层如线性层、激活函数和Dropout等,以及Container如Sequential和ParallelTable的使用。
摘要由CSDN通过智能技术生成