在利用lua/torch构建需要共享参数的神经网络时,根据官网,需要对两种情况进行区分。见 https://github.com/torch/nn/blob/master/doc/overview.md 。一种是通过:updateparameters()来更新参数的,另一种是通过:getparameters()来获得flattened的参数,手工进行更新。前者只需要共享网络中的参数,而后者需要共享参数和它们的梯度。原话如下:
A Note on Sharing Parameters
By using :share(…) and the Container Modules, one can easily create very complex architectures. In order to make sure that the network is going to train properly, one need to pay attention to the way the sharing is applied, because it might depend on the optimization procedure.
- If you are using an optimization algorithm that iterates over the modules of your network (by calling :updateParameters for example), only the parameters of the network should be shared.
- If you use the flattened parameter tensor to optimize the network, obtained by calling :getParameters, for example for the package optim, then you need to share both the parameters and the gradParameters.
第一种情况的例子:
-- our optimization procedure will iterate over the modules, so only share
-- the parameters
mlp = nn.Sequential()
linear = nn.Linear(2,2)
linear_clone = linear:clone('weight','bias') -- clone sharing the parameters
mlp:add(linear)
mlp:add(linear_clone)
function gradUpdate(mlp, x, y, criterion, learningRate)
local pred = mlp:forward(x)
local err = criterion:forward(pred, y)
local gradCriterion = criterion:backward(pred, y)
mlp:zeroGradParameters()
mlp:backward(x, gradCriterion)
mlp:updateParameters(learningRate) --update
end
第二种情况的例子:
-- our optimization procedure will use all the parameters at once, because
-- it requires the flattened parameters and gradParameters Tensors. Thus,
-- we need to share both the parameters and the gradParameters
mlp = nn.Sequential()
linear = nn.Linear(2,2)
-- need to share the parameters and the gradParameters as well
linear_clone = linear:clone('weight','bias','gradWeight','gradBias')
mlp:add(linear)
mlp:add(linear_clone)
params, gradParams = mlp:getParameters() --notice!
function gradUpdate(mlp, x, y, criterion, learningRate, params, gradParams)
local pred = mlp:forward(x)
local err = criterion:forward(pred, y)
local gradCriterion = criterion:backward(pred, y)
mlp:zeroGradParameters()
mlp:backward(x, gradCriterion)
-- adds the gradients to all the parameters at once
params:add(-learningRate, gradParams) --update
end
眼尖的我已经在代码注释了不同之处。