deepflow相关spynet, pwc, raft, uflow, upflow, Back to Basics,unflow, homoflow8base

spynet, pwc, raft, uflow, upflow, Back to Basics,unflow, homoflow8base

1spynet

在这里插入图片描述

https://github.com/sniklaus/pytorch-spynet/blob/master/run.py
该作者也有uflow,pwc的简洁实现

2pwc

主要引入 pyramid feature extractor, optical flow estimator, and context networks

左边是图像金字塔,右边是本文提出的特征金字塔
在这里插入图片描述

网络结构

https://github.com/NVlabs/PWC-Net/blob/master/PyTorch/models/PWCNet.py
https://github.com/Willianwatch/pwcnet-pytorch/blob/main/methods/models/pwcnet.py
https://zhuanlan.zhihu.com/p/589163667

3raft

https://blog.csdn.net/qq_39546227/article/details/115005833

主要是计算全局相似度

level i:n,5x5,h,w -> 24,1 相减—> boxfilter n,24,h,w—> top position x,y–>flow

4uflow

https://github.com/sniklaus/pytorch-unflow

5Back to Basics: Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness

https://arxiv.org/pdf/1608.05842
https://github.com/ily-R/Unsupervised-Optical-Flow

5.1photometric loss:

在这里插入图片描述

5.2.smooth loss:

在这里插入图片描述

5.3.网络结构:

在这里插入图片描述

5.4.数据增强

The photometric augmentations are comprised of
additive Gaussian noise applied to each image,
contrast,
multiplicative colour changes to the RGB channels,
gamma
and additive brightness.

The geometric transformations are comprised of
2D translations,
left-right flipping,
rotations
and scalings

5.5.训练超参数设置

在这里插入图片描述

5.6.code

网络结构flownetS

import torch
import torch.nn as nn
import torch.nn.functional as F


def conv(in_channels, out_channels, kernel_size=3, stride=2):
    return nn.Sequential(
        nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding=(kernel_size - 1) // 2, bias=False),
        nn.ReLU(inplace=True))


def predict_flow(in_channels):
    return nn.Conv2d(in_channels, 2, 5, stride=1, padding=2, bias=False)


def upconv(in_channels, out_channels):
    return nn.Sequential(nn.ConvTranspose2d(in_channels, out_channels, kernel_size=4, stride=2, padding=1, bias=False),
                         nn.ReLU(inplace=True))


def concatenate(tensor1, tensor2, tensor3):
    _, _, h1, w1 = tensor1.shape
    _, _, h2, w2 = tensor2.shape
    _, _, h3, w3 = tensor3.shape
    h, w = min(h1, h2, h3), min(w1, w2, w3)
    return torch.cat((tensor1[:, :, :h, :w], tensor2[:, :, :h, :w], tensor3[:, :, :h, :w]), 1)


class FlowNetS(nn.Module):
    def __init__(self):
        super(FlowNetS, self).__init__()

        self.conv1 = conv(6, 64, kernel_size=7)
        self.conv2 = conv(64, 128, kernel_size=5)
        self.conv3 = conv(128, 256, kernel_size=5)
        self.conv3_1 = conv(256, 256, stride=1)
        self.conv4 = conv(256, 512)
        self.conv4_1 = conv(512, 512, stride=1)
        self.conv5 = conv(512, 512)
        self.conv5_1 = conv(512, 512, stride=1)
        self.conv6 = conv(512, 1024)

        self.predict_flow6 = predict_flow(1024)  # conv6 output
        self.predict_flow5 = predict_flow(1026)  # upconv5 + 2 + conv5_1
        self.predict_flow4 = predict_flow(770)  # upconv4 + 2 + conv4_1
        self.predict_flow3 = predict_flow(386)  # upconv3 + 2 + conv3_1
        self.predict_flow2 = predict_flow(194)  # upconv2 + 2 + conv2

        self.upconv5 = upconv(1024, 512)
        self.upconv4 = upconv(1026, 256)
        self.upconv3 = upconv(770, 128)
        self.upconv2 = upconv(386, 64)

        self.upconvflow6 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=False)
        self.upconvflow5 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=False)
        self.upconvflow4 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=False)
        self.upconvflow3 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=False)

    def forward(self, x):

        out_conv2 = self.conv2(self.conv1(x))
        out_conv3 = self.conv3_1(self.conv3(out_conv2))
        out_conv4 = self.conv4_1(self.conv4(out_conv3))
        out_conv5 = self.conv5_1(self.conv5(out_conv4))
        out_conv6 = self.conv6(out_conv5)

        flow6 = self.predict_flow6(out_conv6)
        up_flow6 = self.upconvflow6(flow6)
        out_upconv5 = self.upconv5(out_conv6)
        concat5 = concatenate(out_upconv5, out_conv5, up_flow6)

        flow5 = self.predict_flow5(concat5)
        up_flow5 = self.upconvflow5(flow5)
        out_upconv4 = self.upconv4(concat5)
        concat4 = concatenate(out_upconv4, out_conv4, up_flow5)

        flow4 = self.predict_flow4(concat4)
        up_flow4 = self.upconvflow4(flow4)
        out_upconv3 = self.upconv3(concat4)
        concat3 = concatenate(out_upconv3, out_conv3, up_flow4)

        flow3 = self.predict_flow3(concat3)
        up_flow3 = self.upconvflow3(flow3)
        out_upconv2 = self.upconv2(concat3)
        concat2 = concatenate(out_upconv2, out_conv2, up_flow3)

        finalflow = self.predict_flow2(concat2)

        if self.training:
            return finalflow, flow3, flow4, flow5, flow6
        else:
            return finalflow,

# 下面是无监督训练的模型,相比于监督,模型会额外输出 warp后的image
def generate_grid(B, H, W, device):
    xx = torch.arange(0, W).view(1, -1).repeat(H, 1)
    yy = torch.arange(0, H).view(-1, 1).repeat(1, W)
    xx = xx.view(1, 1, H, W).repeat(B, 1, 1, 1)
    yy = yy.view(1, 1, H, W).repeat(B, 1, 1, 1)
    grid = torch.cat((xx, yy), 1).float()
    grid = torch.transpose(grid, 1, 2)
    grid = torch.transpose(grid, 2, 3)
    grid = grid.to(device)
    return grid


class Unsupervised(nn.Module):
    def __init__(self, conv_predictor="flownet"):
        super(Unsupervised, self).__init__()

        if "light" in conv_predictor:
            self.predictor = LightFlowNet()
        elif "pwc" in conv_predictor:
            self.predictor = PWC_Net()
        else:
            self.predictor = FlowNetS()

    def stn(self, flow, frame):
        b, _, h, w = flow.shape
        frame = F.interpolate(frame, size=(h, w), mode='bilinear', align_corners=True)
        flow = torch.transpose(flow, 1, 2)
        flow = torch.transpose(flow, 2, 3)

        grid = flow + generate_grid(b, h, w, flow.device)

        factor = torch.FloatTensor([[[[2 / w, 2 / h]]]]).to(flow.device)
        grid = grid * factor - 1
        warped_frame = F.grid_sample(frame, grid) # n,3,h,w and h,h,w,2

        return warped_frame

    def forward(self, x):

        flow_predictions = self.predictor(x)
        frame2 = x[:, 3:, :, :]
        warped_images = [self.stn(flow, frame2) for flow in flow_predictions]

        return flow_predictions, warped_images

损失函数:包括监督损失和无监督损失

import numpy as np
import sys
import os
import cv2
import torch.nn.functional as F
import torch

# APE损失 和 AAE损失, 可以用作监督训练 或者 评价
def EPE(flow_pred, flow_true, real=False): # n,2,h,w   n,2,h,w    在 dim1上相减计算2范数(距离), 再求均值
    # 默认real = false, 即将 gt_flow 缩放到 和 flow_pred一致
    if real:
        batch_size, _, h, w = flow_true.shape
        flow_pred = F.interpolate(flow_pred, (h, w), mode='bilinear', align_corners=False)
    else:
        batch_size, _, h, w = flow_pred.shape
        flow_true = F.interpolate(flow_true, (h, w), mode='area')
    return torch.norm(flow_pred - flow_true, 2, 1).mean() # 2范数,dim1


def EPE_all(flows_pred, flow_true, weights=(0.005, 0.01, 0.02, 0.08, 0.32)): # 五个weight,五个level的pred flow

    if len(flows_pred) < 5:
        weights = [0.005]*len(flows_pred)
    loss = 0

    for i in range(len(weights)):
        loss += weights[i] * EPE(flows_pred[i], flow_true, real=False)

    return loss


def AAE(flow_pred, flow_true): # 计算flow的角度,   cos = a * b / (|a| * |b|)
    batch_size, _, h, w = flow_true.shape
    flow_pred = F.interpolate(flow_pred, (h, w), mode='bilinear', align_corners=False)
    numerator = torch.sum(torch.mul(flow_pred, flow_pred), dim=1) + 1
    denominator = torch.sqrt(torch.sum(flow_pred ** 2, dim=1) + 1) * torch.sqrt(torch.sum(flow_true ** 2, dim=1) + 1)
    result = torch.clamp(torch.div(numerator, denominator), min=-1.0, max=1.0)

    return torch.acos(result).mean()


def evaluate(flow_pred, flow_true):

    epe = EPE(flow_pred, flow_true, real=True)
    aae = AAE(flow_pred, flow_true)
    return epe, aae

# 下面是 无监督训练的 像素损失 和 光流平滑损失
def charbonnier(x, alpha=0.25, epsilon=1.e-9):
    return torch.pow(torch.pow(x, 2) + epsilon**2, alpha)


def smoothness_loss(flow):
    b, c, h, w = flow.size()
    v_translated = torch.cat((flow[:, :, 1:, :], torch.zeros(b, c, 1, w, device=flow.device)), dim=-2)
    h_translated = torch.cat((flow[:, :, :, 1:], torch.zeros(b, c, h, 1, device=flow.device)), dim=-1)
    s_loss = charbonnier(flow - v_translated) + charbonnier(flow - h_translated) # 减去水平和数值平移的flow
    s_loss = torch.sum(s_loss, dim=1) / 2

    return torch.sum(s_loss)/b # hw个点的loss和


def photometric_loss(wraped, frame1):
    h, w = wraped.shape[2:]
    frame1 = F.interpolate(frame1, (h, w), mode='bilinear', align_corners=False)
    p_loss = charbonnier(wraped - frame1) # bchw
    p_loss = torch.sum(p_loss, dim=1)/3   # bhw
    return torch.sum(p_loss)/frame1.size(0) # 1 :是hw个点的loss和


def unsup_loss(pred_flows, wraped_imgs, frame1, weights=(0.005, 0.01, 0.02, 0.08, 0.32)):
    if len(pred_flows) < 5:
        weights = [0.005]*len(pred_flows)
    bce = 0
    smooth = 0
    lamda = 0.1
    for i in range(len(weights)):
        bce += weights[i] * photometric_loss(wraped_imgs[i], frame1)
        smooth += weights[i] * smoothness_loss(pred_flows[i])
    smooth = lamda* smooth
    loss = bce + smooth
    return loss, bce, smooth

训练

训练的时候,model input是 两张图像: batch_size, 6, h, w
输出第二张图像的flow, 和 第二张图像的warp

model = Unsupervised(conv_predictor='flownets')
criterion = unsup_loss

...
pred_flows, wraped_imgs = model(imgs)
loss, bce_loss, smooth_loss = criterion(pred_flows, wraped_imgs, imgs[:, :3, :, :])

小结

实际训练自己的图像,flow一直全为0,loss 停止,这在github issue中也有人遇到类似的问题,为什么学习不到内容呢?可能和数据集有关系? 把smooth weight减小 可以避免这种现象

6.UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss

AAAI 2018
https://arxiv.org/pdf/1711.07837

6.1网络框架

在这里插入图片描述

6.2.改进点

主要在损失函数

  1. bidirectional 双向
  2. occlusion-aware loss 考虑遮挡

6.3.如何判断遮挡

基于假设:非遮挡像素 前向光流 应该与同一空间位置 后向光流方向相反。即相加应该等于0

在这里插入图片描述

这样我们通过上面的公式可以得到 前向遮挡区域 和 后向遮挡区域的mask

6.4.occlusion-aware loss

pixel loss

非遮挡区域pixel loss, 另外就是对 遮挡mask应用正则化,防止网络倾向于学习 扩大遮挡区域。

在这里插入图片描述

smooth loss

二阶平滑

在这里插入图片描述

consistency loss

前后光流的一致性,不一致的或者差距大的都倍判为遮挡
在这里插入图片描述

final loss

在这里插入图片描述

7.8baseflow_Motion Basis Learning for Unsupervised Deep Homography Estimation with Subspace Projection

7.1.原理

一般的warp预测是预测一些对应点,本文是预测 homo flow
在这里插入图片描述

homo flow相比 一般flow当然会有更多的约束。

三个特点:
homography flow
a LRR block to reduce rank of features
a feature identity loss to stabilize the optimization process.

7.2.8 base flow

如何得到的这8个正交flow, 原文中有介绍
在这里插入图片描述

7.3.net

在这里插入图片描述

7.4.Warp-equivariant feature extractor.

f是一个特征提取小网络,对warp等价,即
在这里插入图片描述

7.5. homography estimator with LRR blocks

这个网络 基本上是 一个resnet-34,但是引入了LRR模块

7.6.小结

作者提供了 数据集,代码和 pretrained model
方便进行试验验证
如果是从新训练想要得到和作者接近的结果 还是有些超参数需要调试的。

  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值