《A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation》
DispNet论文阅读之一 | 传送门
基于Caffe的DispNet模型文件下载 | 传送门
层介绍
此处先跳过了图像增强的层。直接从网络的输入部分开始(下图的红色箭头)。
train.prototxt文件,(数据输入部分)可视化如下:
基本层介绍
train.prototxt文件介绍如下(跳过了图像增强部分!):
自定义数据层
layer {
name: "CustomData1"
type: "CustomData" # 自定义数据层,用于输入数据
top: "blob0" # top,表示输出
top: "blob1"
top: "blob2"
include {
phase: TRAIN
}
data_param {
source: "../../../data/FlyingThings3D_release_TRAIN_lmdb"
batch_size: 4
backend: LMDB # 数据格式
rand_permute: true # 随机排列
rand_permute_seed: 77
slice_point: 3 # 三个输出的划分点,0 1 2; 3 4 5; 6
slice_point: 6
encoding: UINT8 # 分别指定三个输出数据的编码方式
encoding: UINT8
encoding: UINT16FLOW
verbose: true
}
}
Concat层
layer {
name: "Concat1"
type: "Concat" # 按维度拼接
bottom: "blob2" # bottom, 表示输入
bottom: "blob9"
top: "blob10"
concat_param {
concat_dim: 1
}
}
layer {
name: "Concat2"
type: "Concat" # 将增强后的左右视图在通道维拼接后作为网络的输入
bottom: "img0_aug"
bottom: "img1_aug"
top: "input"
concat_param {
axis: 1
}
}
卷积层 + 激活函数
layer {
name: "conv1"
type: "Convolution" # 卷积层
bottom: "input"
top: "conv1"
param {
lr_mult: 1 # 指定权重的学习率系数,要与超参数配置在文件中的学习率相乘
decay_mult: 1 # 指定权重的衰减系数
}
param {
lr_mult: 1 # 指定偏差的学习率系数
decay_mult: 0 # 指定偏差的学习率系数
}
convolution_param {
num_output: 64
pad: 3
kernel_size: 7
stride: 2
weight_filler {
type: "msra" # 权重初始化策略
}
bias_filler {
type: "constant"
}
engine: CUDNN
}
}
layer {
name: "ReLU1"
type: "ReLU" # 激活函数
bottom: "conv1"
top: "conv1"
relu_param {
negative_slope: 0.1 # 指定负斜率为0.1,即leakyRelu
}
}
卷积得到预测视差
layer {
name: "Convolution5"
type: "Convolution" # 卷积层,预测输出
bottom: "concat4"
top: "predict_flow4"
param {
lr_mult: 1
decay_mult: 0
}
param {
lr_mult: 1
decay_mult: 0
}
convolution_param {
num_output: 1 # 输出通道数为1
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
}
engine: CUDNN
}
}
下采样层
layer {
name: "Downsample3"
type: "Downsample" # 下采样层
bottom: "disp_gt_aug" # 输入为视差标签
bottom: "predict_flow4" # 预测视差
top: "blob38" # 将视差标签下采样到和预测视差相同的尺寸
propagate_down: false
propagate_down: false
}
损失层
layer {
name: "flow_loss4"
type: "L1Loss" # 损失层,L1loss
bottom: "predict_flow4" # 预测视差
bottom: "blob38" # 同尺度的视差标签
top: "flow_loss4"
loss_weight: 0.2 # 损失权重
l1_loss_param {
l2_per_location: false
normalize_by_num_entries: true # 求均值,loss为标量
}
}
模型初始化
模型权重的初始化:msra
只考虑输入个数时,MSRA初始化是一个均值为0方差为2/n的高斯分布:
基于pytorch的实现
DispNetSample架构的pytorch实现
# 自行导入相应的模块
class DispNetS(nn.Module):
"""
对论文中基于caffe的dispnets的pytorch的复现
"""
def __init__(self):
super(DispNetS, self).__init__()
# the extraction part
self.conv1 = self.conv2d_leakyrelu(6, 64, 7, 2, 3) # 1/2
self.conv2 = self.conv2d_leakyrelu(64, 128, 5, 2, 2) # 1/4
self.conv3a = self.conv2d_leakyrelu(128, 256, 5, 2, 2) # 1/8
self.conv3b = self.conv2d_leakyrelu(256, 256, 3, 1, 1)
self.conv4a = self.conv2d_leakyrelu(256, 512, 3, 2, 1) # 1/16
self.conv4b = self.conv2d_leakyrelu(512, 512, 3, 1, 1)
self.conv5a = self.conv2d_leakyrelu(512, 512, 3, 2, 1) # 1/32
self.conv5b = self.conv2d_leakyrelu(512, 512, 3, 1, 1)
self.conv6a = self.conv2d_leakyrelu(512, 1024, 3, 2, 1) # 1/64
self.conv6b = self.conv2d_leakyrelu(1024, 1024, 3,