Torch学习(二)

本文介绍了Torch中用于构建神经网络的基本模块和容器,包括Module的forward和backward方法,以及如何进行梯度计算和参数更新。还讨论了各种特定的层如线性层、激活函数和Dropout等,以及Container如Sequential和ParallelTable的使用。
摘要由CSDN通过智能技术生成
  1. Modules are the bricks used to build neural networks. Each are themselves neural networks, but can be combined with other networks using containers to create complex neural networks
  2. Criterions compute a gradient according to a given loss function given an input and a target
  3. Module

    • Module is an abstract class which defines fundamental methods necessary for a training a neural network. Modules are serializable
    • [output] forward(input): Takes an input object, and computes the corresponding output of the module. After a forward(), the output state variable should have been updated to the new value. It is not advised to override this function. Instead, one should implement updateOutput(input) function. The forward module in the abstract parent class Module will call updateOutput(input)
    • [gradInput] backward(input, gradOutput) : Performs a backpropagation step through the module, with respect to the given input. It is not advised to override this function call in custom classes. It is better to override updateGradInput(input, gradOutput) and accGradParameters(input, gradOutput,scale) functions.
    • updateOutput(input)
    • [gradInput] updateGradInput(input, gradOutput)
    • accGradParameters(input, gradOutput, scale) : Computing the gradient of the module with respect to its own parameters. scale is a scale factor that is multiplied with the gradParameters before being accumulated. Zeroing this accumulation is achieved with zeroGradParameters() and updating the parameters according to this accumulation is done with updateParameters().The module is expected to accumulate the gradients with respect to the parameters in some variable.
    • zeroGradParameters()
    • updateParameters(learningRate): parameters = parameters - learningRate * gradients_wrt_parameters
    • accUpdateGradParameters(input, gradOutput, learningRate): Calculates and accumulates the gradients with respect to the weights after multiplying with negative of the learning rate learningRate.
    function Module:accUpdateGradParameters(input, gradOutput, lr)
       local gradWeight = self.gradWeight
       local gradBias = self.gradBias
       self.gradWeight = self.weight
       self.gradBias = self.bias
       self:accGradParameters(input, gradOutput, -lr)
       self.gradWeight = gradWeight
       self.gradBias = gradBias
    end

    As it can be seen, the gradients are accumulated directly into weights. This assumption may not be true for a module that computes a nonlinear operation. ?没看懂
    有关梯度下降算法可参考神经网络模型随机梯度下降法—简单实现与Torch应用

    • share(mlp,s1,s2,…,sn)
    • clone(mlp,…)
    • type(type[, tensorCache])
    • float([tensorCache])
    • double([tensorCache])
    • cuda([tensorCache])
    • State Variables
      • output
      • gradInput
    • Parameters and gradients w.r.t parameters
      • [{weights}, {gradWeights}] parameters()
      • [flatParameters, flatGradParameters] getParameters() ?没看懂
      • training()
      • evaluate()
      • findModules(typename)
      • listModules()
      • clearState()
      • apply(function)
      • replace(function)
  4. Containers

    • Container
      • add(module)
      • get(index)
      • size()
    • Sequential

      • creating a one hidden-layer multi-layer perceptron
      mlp = nn.Sequential()
      mlp:add( nn.Linear(10, 25) ) -- 10 input, 25 hidden units
      mlp:add( nn.Tanh() ) -- some hyperbolic tangent transfer function
      mlp:add( nn.Linear(25, 1) ) -- 1 output
      
      print(mlp:forward(torch.randn(10)))

      network:

      nn.Sequential {
        [input -> (1) -> (2) -> (3) -> output]
        (1): nn.Linear(10 -> 25)
        (2): nn.Tanh
        (3): nn.Linear(25 -> 1)
      }

      Result:

      -0.1815
      [torch.Tensor of dimension 1]
    • remove([index])
    • insert(module, [index])
    • module = nn.Concat(dim)
    • module = nn.DepthConcat(dim)
    • module = nn.ConcatTable() ?没看懂
    • module = nn.ParallelTable() ?没看懂
  5. Transfer functions
    • HardTanh
    • HardShrink
    • SoftShrink
    • SoftMax
    • SoftMin
    • SoftPlus
    • SoftSign
    • LogSigmoid
    • LogSoftMax
    • Sigmoid
    • Tanh
    • ReLU
    • ReLU6
    • PReLU
    • RReLU
    • ELU
    • LeakyReLU
    • SpatialSoftMax
    • AddConstant
    • MulConstant
  6. Simple layers
    • Parameterized Modules
      • Linear
      • SparseLinear
      • Bilinear
      • PartialLinear
      • Add
      • Mul
      • CMul
      • Euclidean
      • WeightedEuclidean
      • Cosine
    • Modules that adapt basic Tensor methods
      • Copy
      • Narrow
      • Replicate
      • Reshape
      • View
      • Select
      • MaskedSelect
      • Index
      • Squeeze
      • Unsqueeze
      • Transpose
    • Modules that adapt mathematical Tensor methods
      • AddConstant
      • MulConstant
      • Max
      • Min
      • Mean
      • Sum
      • Exp
      • Log
      • Abs
      • Power
      • Square
      • Sqrt
      • Clamp
      • Normalize
      • MM
    • Miscellaneous Module
  7. Table layers
    • table Container Modules encapsulate sub-Modules
      • ConcatTable
      • ParallelTable
    • Table Conversion Modules convert between tables and Tensors or tables
      • SplitTable
      • JoinTable
      • MixtureTable
      • SelectTable
      • NarrowTable
      • FlattenTable
    • Pair Modules compute a measure like distance or similarity from a pair (table) of input Tensors
      • PairwiseDistance
      • DotProduct
      • CosineDistance
    • CMath Modules perform element-wise operations on a table of Tensors
      • CAddTable
      • CSubTable
      • CMulTable
      • CDivTable
    • Table of Criteria
      • CriterionTable
  8. Convolution layers
  9. Criterions
  10. 10.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值