caffe在别人修改的层中,加入了个全连接层,并且设置为共享参数,训练出的模型中只有该层的名称,没有具体的tensor

问题描述:在caffe中,做的是强化学习image caption,想要实现在网络中多加一个全连接层(一共三个net要加,训练baseline的net,训练强化学习的scstnet,以及两者共用的decode_net)。在训练baseline的时候加入了全连接层。网络可以训练出加入那层的参数,并且decode的时候也能产生结果。但是在训练强化学习的scstnet训练能够训练处模型,不会报错。但是在decode的过程中发现添加的那层全连接层并没有传入值,导致了
的问题。问题显示加载的模型参数传入不进decode网络中。对模型参数进行打印,显示加入的那层只有名称,并没有具体的数值(是训练没有产生,还是没有保存?)!其中fc_atten是我添加的全连接层的层名。可以看到打印结果只有名称,没有参数的shape。其他参数由于都是共享参数,所以第一个节点都产生了具体的tensor。
2,相关代码。
在训练baseline的net和decode结构基本相同
```
layer {
  name: "fc_atten_0"
  type: "InnerProduct"
  bottom: "lstm0_hidden0"
  top: "fc_atten_0"
  param {
    name: "fc_param_0"
  }
  inner_product_param {
    num_output: 1000
    bias_term: true
    weight_filler {
      type: "gaussian"
      std: 0.00999999977648
    }
  }
}
```
在训练scst的部分,也是这样加入的。但scst和net有点不一样的是scst多了个beamsearch层,并且加载了net训练的模型参数(两者结构基本一样)。由于我训练的是语言模型,一共20个lstm的时间步。在作者的beamsearch层中,它将第一个step的全部代码放到该层中,后面的20个时间步都是共享该层中的参数。
```
layer {
  name: "beam"
  type: "BeamSearch"
  bottom: "num_boxes"
  bottom: "spatial_features"
  bottom: "fc"
  bottom: "context"
  top: "caption"
  top: "log_prob"
  top: "log_prob_sequence"
  param {
    name: "embed_param"
  }
  param {
    name: "lstm0_param_0"
  }
  param {
    name: "lstm0_param_1"
  }
  param {
    name: "hidden_att_param_0"
  }
  param {
    name: "predict_att_param_0"
  }
  param {
    name: "lstm1_param_0"
  }
  param {
    name: "lstm1_param_1"
  }
  param {
    name: "fc_param_0"
  }
  param {
    name: "predict_param_0"
  }
  param {
    name: "predict_param_1"
  }
  beam_search_param {
    net_param {
      layer {
        name: "input"
        type: "Input"
        top: "num_boxes"
        top: "spatial_features"
        top: "fc"
        top: "context"
        top: "input"
        input_param {
          shape {
            dim: 12
            dim: 1
          }
          shape {
            dim: 12
            dim: 100
            dim: 2048
          }
          shape {
            dim: 12
            dim: 100
            dim: 512
          }
          shape {
            dim: 12
            dim: 2048
          }
          shape {
            dim: 12
            dim: 1
          }
        }
      }
      layer {
        name: "lstm0_hidden_prev"
        type: "DummyData"
        top: "lstm0_hidden_prev"
        dummy_data_param {
          shape {
            dim: 12
            dim: 1000
          }
        }
      }
      layer {
        name: "lstm0_mem_cell_prev"
        type: "DummyData"
        top: "lstm0_mem_cell_prev"
        dummy_data_param {
          shape {
            dim: 12
            dim: 1000
          }
        }
      }
      layer {
        name: "lstm1_hidden_prev"
        type: "DummyData"
        top: "lstm1_hidden_prev"
        dummy_data_param {
          shape {
            dim: 12
            dim: 1000
          }
        }
      }
      layer {
        name: "lstm1_mem_cell_prev"
        type: "DummyData"
        top: "lstm1_mem_cell_prev"
        dummy_data_param {
          shape {
            dim: 12
            dim: 1000
          }
        }
      }
      layer {
        name: "embedding"
        type: "Embed"
        bottom: "input"
        top: "embedding"
        param {
          name: "embed_param"
        }
        propagate_down: false
        embed_param {
          num_output: 1000
          input_dim: 10010
          bias_term: false
          weight_filler {
            type: "gaussian"
            std: 0.00999999977648
          }
        }
      }
      layer {
        name: "concat0_t0"
        type: "Concat"
        bottom: "embedding"
        bottom: "context"
        bottom: "lstm1_hidden_prev"
        bottom: "lstm0_hidden_prev"
        top: "concat0_t0"
      }
      layer {
        name: "lstm1"
        type: "LSTMNode"
        bottom: "concat0_t0"
        bottom: "lstm0_mem_cell_prev"
        top: "lstm0_hidden0"
        top: "lstm0_mem_cell0"
        param {
          name: "lstm0_param_0"
        }
        param {
          name: "lstm0_param_1"
        }
        propagate_down: true
        propagate_down: false
        lstm_param {
          num_cells: 1000
          input_weight_filler {
            type: "gaussian"
            std: 0.00999999977648
          }
          input_gate_weight_filler {
            type: "gaussian"
            std: 0.00999999977648
          }
          forget_gate_weight_filler {
            type: "gaussian"
            std: 0.00999999977648
          }
          output_gate_weight_filler {
            type: "gaussian"
            std: 0.00999999977648
          }
          input_bias_filler {
            type: "constant"
            value: 0.0
          }
          input_gate_bias_filler {
            type: "constant"
            value: 0.0
          }
          forget_gate_bias_filler {
            type: "constant"
            value: 1.0
          }
          output_gate_bias_filler {
            type: "constant"
            value: 0.0
          }
        }
      }
      layer {
        name: "hidden_att_0"
        type: "InnerProduct"
        bottom: "lstm0_hidden0"
        top: "hidden_att_0"
        param {
          name: "hidden_att_param_0"
        }
        inner_product_param {
          num_output: 512
          bias_term: false
          weight_filler {
            type: "gaussian"
            std: 0.00999999977648
          }
        }
      }
      layer {
        name: "tile_hidden_att_0"
        type: "Tile"
        bottom: "hidden_att_0"
        top: "tile_hidden_att_0"
        tile_param {
          axis: 1
          tiles: 100
        }
      }
      layer {
        name: "tile_hidden_reshape_0"
        type: "Reshape"
        bottom: "tile_hidden_att_0"
        top: "tile_hidden_reshape_0"
        reshape_param {
          shape {
            dim: 0
            dim: -1
            dim: 512
          }
        }
      }
      layer {
        name: "sum_hidden_att_0"
        type: "Eltwise"
        bottom: "fc"
        bottom: "tile_hidden_reshape_0"
        top: "sum_hidden_att_0"
        eltwise_param {
          operation: SUM
        }
      }
      layer {
        name: "hidden_tanh_0"
        type: "TanH"
        bottom: "sum_hidden_att_0"
        top: "sum_hidden_att_0"
      }
      layer {
        name: "predict_att_0"
        type: "InnerProduct"
        bottom: "sum_hidden_att_0"
        top: "predict_att_0"
        param {
          name: "predict_att_param_0"
        }
        inner_product_param {
          num_output: 1
          bias_term: false
          weight_filler {
            type: "gaussian"
            std: 0.00999999977648
          }
          axis: 2
        }
      }
      layer {
        name: "reshape_predict_att_0"
        type: "Reshape"
        bottom: "predict_att_0"
        top: "reshape_predict_att_0"
        reshape_param {
          shape {
            dim: 0
            dim: -1
          }
        }
      }
      layer {
        name: "att_weight_0"
        type: "Softmax"
        bottom: "reshape_predict_att_0"
        bottom: "num_boxes"
        top: "att_weight_0"
        softmax_param {
          engine: CAFFE
          axis: 1
        }
      }
      layer {
        name: "att_product_0"
        type: "Scale"
        bottom: "spatial_features"
        bottom: "att_weight_0"
        top: "att_product_0"
        scale_param {
          axis: 0
        }
      }
      layer {
        name: "permute_att_0"
        type: "Permute"
        bottom: "att_product_0"
        top: "permute_att_0"
        permute_param {
          order: 0
          order: 2
          order: 1
        }
      }
      layer {
        name: "fc8_0"
        type: "Reduction"
        bottom: "permute_att_0"
        top: "fc8_0"
        reduction_param {
          axis: 2
        }
      }
      layer {
        name: "concat1_t0"
        type: "Concat"
        bottom: "lstm0_hidden0"
        bottom: "fc8_0"
        bottom: "lstm1_hidden_prev"
        top: "concat1_t0"
      }
      layer {
        name: "lstm2"
        type: "LSTMNode"
        bottom: "concat1_t0"
        bottom: "lstm1_mem_cell_prev"
        top: "lstm1_hidden0"
        top: "lstm1_mem_cell0"
        param {
          name: "lstm1_param_0"
        }
        param {
          name: "lstm1_param_1"
        }
        propagate_down: true
        propagate_down: false
        lstm_param {
          num_cells: 1000
          input_weight_filler {
            type: "gaussian"
            std: 0.00999999977648
          }
          input_gate_weight_filler {
            type: "gaussian"
            std: 0.00999999977648
          }
          forget_gate_weight_filler {
            type: "gaussian"
            std: 0.00999999977648
          }
          output_gate_weight_filler {
            type: "gaussian"
            std: 0.00999999977648
          }
          input_bias_filler {
            type: "constant"
            value: 0.0
          }
          input_gate_bias_filler {
            type: "constant"
            value: 0.0
          }
          forget_gate_bias_filler {
            type: "constant"
            value: 1.0
          }
          output_gate_bias_filler {
            type: "constant"
            value: 0.0
          }
        }
      }
      layer {
        name: "fc_atten_0"
        type: "InnerProduct"
        bottom: "lstm0_hidden0"
        top: "fc_atten_0"
        param {
          name: "fc_param_0"
        }
        inner_product_param {
          num_output: 1000
          bias_term: true
          weight_filler {
            type: "gaussian"
            std: 0.00999999977648
          }
        }
      }
      layer {
        name: "concat_lstm0_lstm1_0"
        type: "Concat"
        bottom: "lstm1_hidden0"
        bottom: "fc_atten_0"
        top: "concat_lstm0_lstm1_0"
        }
      layer {
        name: "predict"
        type: "InnerProduct"
        bottom: "concat_lstm0_lstm1_0"
        top: "predict"
        param {
          name: "predict_param_0"
          lr_mult: 1.0
          decay_mult: 1.0
        }
        param {
          name: "predict_param_1"
          lr_mult: 2.0
          decay_mult: 0.0
        }
        inner_product_param {
          num_output: 10010
          weight_filler {
            type: "gaussian"
            std: 0.00999999977648
          }
          bias_filler {
            type: "constant"
            value: 0.0
          }
          axis: 1
        }
      }
      layer {
        name: "probs_0"
        type: "Softmax"
        bottom: "predict"
        top: "probs_0"
        softmax_param {
          axis: 1
        }
      }
      layer {
        name: "logp_0"
        type: "Log"
        bottom: "probs_0"
        top: "logp_0"
      }
    }
    sequence_length: 20
    beam_size: 5
    end_of_sequence: 0
    recurrent_connection {
      src: "lstm0_hidden0"
      dest: "lstm0_hidden_prev"
    }
    recurrent_connection {
      src: "lstm0_mem_cell0"
      dest: "lstm0_mem_cell_prev"
    }
    recurrent_connection {
      src: "lstm1_hidden0"
      dest: "lstm1_hidden_prev"
    }
    recurrent_connection {
      src: "lstm1_mem_cell0"
      dest: "lstm1_mem_cell_prev"
    }
    beam_search_connection {
      src: "logp_0"
      dest: "input"
    }
    allowed_multiple: 2
    allowed_multiple: 5
    allowed_multiple: 4
    allowed_multiple: 15
    allowed_multiple: 3
    allowed_multiple: 6
    allowed_multiple: 8
    allowed_multiple: 7
    allowed_multiple: 9
    allowed_multiple: 13
    allowed_multiple: 277
    allowed_multiple: 11
    allowed_multiple: 30
    allowed_multiple: 16
    allowed_multiple: 19
    allowed_multiple: 27
    allowed_multiple: 25
    allowed_multiple: 119
    allowed_multiple: 48
  }
}
layer {
  name: "silence_bs"
  type: "Silence"
  bottom: "log_prob"
  bottom: "log_prob_sequence"
}
```
其中fc_atten是我仿造该网络加入的层。并且相同设置了共享参数,并且把参数名放到了前面。以下两部分就是我在别人代码上作的修改
```
param {
    name: "fc_param_0"
  }
```
```
 layer {
        name: "fc_atten_0"
        type: "InnerProduct"
        bottom: "lstm0_hidden0"
        top: "fc_atten_0"
        param {
          name: "fc_param_0"
        }
        inner_product_param {
          num_output: 1000
          bias_term: true
          weight_filler {
            type: "gaussian"
            std: 0.00999999977648
          }
        }
      }
```
其他都没有改变。现在就是训练出来不会报错,但是模型中有该层fc_atten的名称,没有对应的tensor。
3,试过的方法,把写的beam_search.cpp 和hpp都看过了,以及caffe.proto也看过修改过试着得到解决办法。但是可能是作者写的太nb自己水平不够。
这里贴上源代码的连接 https://github.com/peteanderson80/caffe/tree/631806541c68658248ffdbbbde659f478fac4113
我考虑的是,1,作者是否把beam_search 传入参数写死,只允许传入9个 我新加的传不进去(但是目前并为发现)2,既然在net和decode中可以得到结果,那么问题可能就出现在作者写的beamsearch.cpp中:https://github.com/peteanderson80/caffe/blob/631806541c68658248ffdbbbde659f478fac4113/src/caffe/layers/beam_search_layer.cpp。
4,提问:为什么在scst中训练不出来参数,怎样才能让模型跑出来,并且decode出来结果
_______

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值