【DispNet_CVPR_2016】论文阅读之二

DispNet论文阅读之一 | 传送门
基于Caffe的DispNet模型文件下载 | 传送门

层介绍

此处先跳过了图像增强的层。直接从网络的输入部分开始(下图的红色箭头)。
train.prototxt文件,(数据输入部分)可视化如下:
在这里插入图片描述

基本层介绍

train.prototxt文件介绍如下(跳过了图像增强部分!):

自定义数据层

layer {
   
  name: "CustomData1"
  type: "CustomData"  # 自定义数据层,用于输入数据
  top: "blob0"  # top,表示输出
  top: "blob1"
  top: "blob2"
  include {
   
    phase: TRAIN
  }
  data_param {
   
    source: "../../../data/FlyingThings3D_release_TRAIN_lmdb"
    batch_size: 4
    backend: LMDB  # 数据格式
    rand_permute: true  # 随机排列
    rand_permute_seed: 77
    slice_point: 3  # 三个输出的划分点,0 1 2; 3 4 5; 6
    slice_point: 6
    encoding: UINT8  # 分别指定三个输出数据的编码方式
    encoding: UINT8
    encoding: UINT16FLOW
    verbose: true
  }
}

Concat层

layer {
   
  name: "Concat1"
  type: "Concat"  # 按维度拼接
  bottom: "blob2"  # bottom, 表示输入
  bottom: "blob9"
  top: "blob10"
  concat_param {
   
    concat_dim: 1
  }
}

layer {
   
  name: "Concat2"
  type: "Concat"  # 将增强后的左右视图在通道维拼接后作为网络的输入
  bottom: "img0_aug"
  bottom: "img1_aug"
  top: "input"
  concat_param {
   
    axis: 1
  }
}

卷积层 + 激活函数

layer {
   
  name: "conv1"
  type: "Convolution"  # 卷积层
  bottom: "input"
  top: "conv1"
  param {
   
    lr_mult: 1  # 指定权重的学习率系数,要与超参数配置在文件中的学习率相乘
    decay_mult: 1  # 指定权重的衰减系数
  }
  param {
   
    lr_mult: 1  # 指定偏差的学习率系数
    decay_mult: 0  # 指定偏差的学习率系数
  }
  convolution_param {
   
    num_output: 64
    pad: 3
    kernel_size: 7
    stride: 2
    weight_filler {
   
      type: "msra"  # 权重初始化策略
    }
    bias_filler {
   
      type: "constant"
    }
    engine: CUDNN
  }
}
layer {
   
  name: "ReLU1"
  type: "ReLU"  # 激活函数
  bottom: "conv1"
  top: "conv1"
  relu_param {
   
    negative_slope: 0.1  # 指定负斜率为0.1,即leakyRelu
  }
}

卷积得到预测视差

layer {
   
  name: "Convolution5"
  type: "Convolution"  # 卷积层,预测输出
  bottom: "concat4"
  top: "predict_flow4"
  param {
   
    lr_mult: 1
    decay_mult: 0
  }
  param {
   
    lr_mult: 1
    decay_mult: 0
  }
  convolution_param {
   
    num_output: 1  # 输出通道数为1
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
   
      type: "msra"
    }
    bias_filler {
   
      type: "constant"
    }
    engine: CUDNN
  }
}

下采样层

layer {
   
  name: "Downsample3"
  type: "Downsample"  # 下采样层
  bottom: "disp_gt_aug"  # 输入为视差标签
  bottom: "predict_flow4"  # 预测视差
  top: "blob38"  # 将视差标签下采样到和预测视差相同的尺寸
  propagate_down: false
  propagate_down: false
}

损失层

layer {
   
  name: "flow_loss4"
  type: "L1Loss"  # 损失层,L1loss
  bottom: "predict_flow4"  # 预测视差
  bottom: "blob38"  # 同尺度的视差标签
  top: "flow_loss4"
  loss_weight: 0.2  # 损失权重
  l1_loss_param {
   
    l2_per_location: false
    normalize_by_num_entries: true  # 求均值,loss为标量
  }
}

模型初始化

模型权重的初始化:msra

只考虑输入个数时,MSRA初始化是一个均值为0方差为2/n的高斯分布:

基于pytorch的实现

DispNetSample架构的pytorch实现

# 自行导入相应的模块
class DispNetS(nn.Module):
    """
    对论文中基于caffe的dispnets的pytorch的复现
    """
    def __init__(self):
        super(DispNetS, self).__init__()
        # the extraction part
        self.conv1 =  self.conv2d_leakyrelu(6, 64, 7, 2, 3)  # 1/2
        self.conv2 =  self.conv2d_leakyrelu(64, 128, 5, 2, 2)  # 1/4
        self.conv3a = self.conv2d_leakyrelu(128, 256, 5, 2, 2)  # 1/8
        self.conv3b = self.conv2d_leakyrelu(256, 256, 3, 1, 1)
        self.conv4a = self.conv2d_leakyrelu(256, 512, 3, 2, 1)  # 1/16
        self.conv4b = self.conv2d_leakyrelu(512, 512, 3, 1, 1)
        self.conv5a = self.conv2d_leakyrelu(512, 512, 3, 2, 1)  # 1/32
        self.conv5b = self.conv2d_leakyrelu(512, 512, 3, 1, 1)
        self.conv6a = self.conv2d_leakyrelu(512, 1024, 3, 2, 1)  # 1/64
        self.conv6b = self.conv2d_leakyrelu(1024, 1024, 3, 
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
Gatys et al. (2016) proposed an algorithm for style transfer, which can generate an image that combines the content of one image and the style of another image. The algorithm is based on the neural style transfer technique, which uses a pre-trained convolutional neural network (CNN) to extract the content and style features from the input images. In this algorithm, the content and style features are extracted from the content and style images respectively using the VGG-19 network. The content features are extracted from the output of one of the convolutional layers in the network, while the style features are extracted from the correlations between the feature maps of different layers. The Gram matrix is used to measure these correlations. The optimization process involves minimizing a loss function that consists of three components: the content loss, the style loss, and the total variation loss. The content loss measures the difference between the content features of the generated image and the content image. The style loss measures the difference between the style features of the generated image and the style image. The total variation loss is used to smooth the image and reduce noise. The optimization is performed using gradient descent, where the gradient of the loss function with respect to the generated image is computed and used to update the image. The process is repeated until the loss function converges. The code for this algorithm is available online, and it is implemented using the TensorFlow library. It involves loading the pre-trained VGG-19 network, extracting the content and style features, computing the loss function, and optimizing the generated image using gradient descent. The code also includes various parameters that can be adjusted, such as the weight of the content and style loss, the number of iterations, and the learning rate.
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值