torch.nn 小坑和疑惑

最新推荐文章于 2024-08-05 15:23:39 发布

Nicoder

最新推荐文章于 2024-08-05 15:23:39 发布

阅读量2.9k

点赞数

分类专栏： lua 文章标签： torch 神经网络 share

本文链接：https://blog.csdn.net/hejunqing14/article/details/52605752

版权

lua 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

在利用lua/torch构建需要共享参数的神经网络时,根据官网，需要对两种情况进行区分。见 https://github.com/torch/nn/blob/master/doc/overview.md 。一种是通过:updateparameters()来更新参数的，另一种是通过:getparameters()来获得flattened的参数，手工进行更新。前者只需要共享网络中的参数，而后者需要共享参数和它们的梯度。原话如下：

A Note on Sharing Parameters

By using :share(…) and the Container Modules, one can easily create very complex architectures. In order to make sure that the network is going to train properly, one need to pay attention to the way the sharing is applied, because it might depend on the optimization procedure.

If you are using an optimization algorithm that iterates over the modules of your network (by calling :updateParameters for example), only the parameters of the network should be shared.
If you use the flattened parameter tensor to optimize the network, obtained by calling :getParameters, for example for the package optim, then you need to share both the parameters and the gradParameters.

第一种情况的例子：

-- our optimization procedure will iterate over the modules, so only share
-- the parameters
mlp = nn.Sequential()
linear = nn.Linear(2,2)
linear_clone = linear:clone('weight','bias') -- clone sharing the parameters
mlp:add(linear)
mlp:add(linear_clone)
function gradUpdate(mlp, x, y, criterion, learningRate) 
  local pred = mlp:forward(x)
  local err = criterion:forward(pred, y)
  local gradCriterion = criterion:backward(pred, y)
  mlp:zeroGradParameters()
  mlp:backward(x, gradCriterion)
  mlp:updateParameters(learningRate) --update
end

第二种情况的例子：

-- our optimization procedure will use all the parameters at once, because
-- it requires the flattened parameters and gradParameters Tensors. Thus,
-- we need to share both the parameters and the gradParameters
mlp = nn.Sequential()
linear = nn.Linear(2,2)
-- need to share the parameters and the gradParameters as well
linear_clone = linear:clone('weight','bias','gradWeight','gradBias')
mlp:add(linear)
mlp:add(linear_clone)
params, gradParams = mlp:getParameters() --notice!
function gradUpdate(mlp, x, y, criterion, learningRate, params, gradParams)
  local pred = mlp:forward(x)
  local err = criterion:forward(pred, y)
  local gradCriterion = criterion:backward(pred, y)
  mlp:zeroGradParameters()
  mlp:backward(x, gradCriterion)
  -- adds the gradients to all the parameters at once
  params:add(-learningRate, gradParams) --update
end