torch.nn 小坑和疑惑

在利用lua/torch构建需要共享参数的神经网络时,根据官网,需要对两种情况进行区分。见 https://github.com/torch/nn/blob/master/doc/overview.md 。一种是通过:updateparameters()来更新参数的,另一种是通过:getparameters()来获得flattened的参数,手工进行更新。前者只需要共享网络中的参数,而后者需要共享参数和它们的梯度。原话如下:

A Note on Sharing Parameters

By using :share(…) and the Container Modules, one can easily create very complex architectures. In order to make sure that the network is going to train properly, one need to pay attention to the way the sharing is applied, because it might depend on the optimization procedure.

  • If you are using an optimization algorithm that iterates over the modules of your network (by calling :updateParameters for example), only the parameters of the network should be shared.
  • If you use the flattened parameter tensor to optimize the network, obtained by calling :getParameters, for example for the package optim, then you need to share both the parameters and the gradParameters.

第一种情况的例子:

-- our optimization procedure will iterate over the modules, so only share
-- the parameters
mlp = nn.Sequential()
linear = nn.Linear(2,2)
linear_clone = linear:clone('weight','bias') -- clone sharing the parameters
mlp:add(linear)
mlp:add(linear_clone)
function gradUpdate(mlp, x, y, criterion, learningRate) 
  local pred = mlp:forward(x)
  local err = criterion:forward(pred, y)
  local gradCriterion = criterion:backward(pred, y)
  mlp:zeroGradParameters()
  mlp:backward(x, gradCriterion)
  mlp:updateParameters(learningRate) --update
end

第二种情况的例子:

-- our optimization procedure will use all the parameters at once, because
-- it requires the flattened parameters and gradParameters Tensors. Thus,
-- we need to share both the parameters and the gradParameters
mlp = nn.Sequential()
linear = nn.Linear(2,2)
-- need to share the parameters and the gradParameters as well
linear_clone = linear:clone('weight','bias','gradWeight','gradBias')
mlp:add(linear)
mlp:add(linear_clone)
params, gradParams = mlp:getParameters() --notice!
function gradUpdate(mlp, x, y, criterion, learningRate, params, gradParams)
  local pred = mlp:forward(x)
  local err = criterion:forward(pred, y)
  local gradCriterion = criterion:backward(pred, y)
  mlp:zeroGradParameters()
  mlp:backward(x, gradCriterion)
  -- adds the gradients to all the parameters at once
  params:add(-learningRate, gradParams) --update
end

眼尖的我已经在代码注释了不同之处。

可是,谁能告诉我这种共享参数的网络在哪里会用到呢?

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值