LRCN(4) solver之train_test_lstm_RGB.prototxt

最新推荐文章于 2021-07-11 21:34:40 发布

lgy_keira

最新推荐文章于 2021-07-11 21:34:40 发布

阅读量797

点赞数

分类专栏：代码复现

本文链接：https://blog.csdn.net/u013608336/article/details/77201532

版权

代码复现专栏收录该内容

9 篇文章 3 订阅

订阅专栏

数据层

该网络的数据层为python层，从sequence_input_layer.py中导入。

name: "lstm_joints"
layer {
  name: "data"
  type: "Python"
  top: "data"
  top: "label"
  top: "clip_markers" #标志是否为连续帧
  python_param {
    module: "sequence_input_layer" 
    layer: "videoReadTrain_RGB" #训练阶段输入层，为.py中的一个类
  }
  include: { phase: TRAIN }
}

网络结构可视化
这里写图片描述
top：表示output
bottom：表示input
data 与 label: 在数据层中，至少有一个命名为data的top(输出)。如果有第二个top，一般命名为label。这种(data,label)配对是分类模型所必需的。

type：网络类型，有的数据层是ImageData，这里输入是一个python文件作为数据层的类型。

fc6

这里写图片描述
fc6有三个处理步骤
- 1.做内积 innerproduct
- 2.relu
- 3.dropout
输出的数据送给reshape-data层

layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"  #名字是 fc6，输出也是fc6，目的是进行relu和dropout
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"  #输入fc6
  top: "fc6"     #输出fc6
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"    #输入fc6
  top: "fc6"       #输出fc6
  dropout_param {
    dropout_ratio: 0.9
  }
}

reshape-data

fc6的输出维度是4096维，通过这一层变成16*3*4096


  bottom: "fc6"
  top: "fc6-reshape"
  reshape_param{
    shape{
      dim: 16
      dim: 3
      dim: 4096
    }
  }

fc6-reshape是reshape-data的输出

reshape-cm

将clip-markers变成输出维度16*3，reshape-data和reshape-cm是要配套使用的

  bottom: "clip_markers"
  top: "reshape-cm"
  reshape_param{
    shape{
      dim: 16
      dim: 3
    }

LSTM层

这里写图片描述

N 为LSTM同时处理的独立流的个数，在该实验中为输入LSTM相互独立的视频的个数，以该实验测试网络为例，本文取T=3。
T 为LSTM网络层处理的时间步总数，在该实验中为输入LSTM的任意一独立视频的视频帧个数，以该实验测试网络为例，本文取T=16。
4096为AlexNet中全连接层的维度，即CNN特征的维度。
reshape-cm的输出维度为 T×N，即每一个帧有一个是否连续帧的标志。
reshape-label的维度同样为 T×N
参考caffe中lstm的实现以及lstmlayer的理解

Caffe中一个LSTMLayer即为一个LSTM网络。CNN特征维度为4096，LSTM特征维度为256。

layer {
  name: "lstm1"
  type: "LSTM"
  bottom: "fc6-reshape"
  bottom: "reshape-cm"
  top: "lstm1"
  recurrent_param {
    num_output: 256
    weight_filler {
      type: "uniform"
      min: -0.01
      max: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

lstm1-drop

layer {
  name: "lstm1-drop"
  type: "Dropout"
  bottom: "lstm1"
  top: "lstm1-drop"
  dropout_param {
    dropout_ratio: 0.5
  }
}

附录

卷积层

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 7
    stride: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}

lr_mult
decay_mult
num_output
type: “constant”

附录

# python draw_net.py prototxt文件名 保存图片文件名 --rankdir=方向，比如：BT是从上到下
python draw_net.py lenet_train_test.prototxt mnist\lenet_train_test.png --rankdir=LR

stage: “test-on-test”　include

lgy_keira

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
LRCN(4) solver之train_test_lstm_RGB.prototxt

数据层该网络的数据层为python层，从sequence_input_layer.py中导入。name: "lstm_joints"layer { name: "data" type: "Python" top: "data" top: "label" top: "clip_markers" #标志是否为连续帧 python_param { module: "seq
复制链接

扫一扫

专栏目录