文章目录
- spynet, pwc, raft, uflow, upflow, Back to Basics,unflow, homoflow8base
- 1spynet
- 2pwc
- 3raft
- 4uflow
- 5Back to Basics: Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness
- 6.UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss
- 7.8baseflow_Motion Basis Learning for Unsupervised Deep Homography Estimation with Subspace Projection
spynet, pwc, raft, uflow, upflow, Back to Basics,unflow, homoflow8base
1spynet
https://github.com/sniklaus/pytorch-spynet/blob/master/run.py
该作者也有uflow,pwc的简洁实现
2pwc
主要引入 pyramid feature extractor, optical flow estimator, and context networks
左边是图像金字塔,右边是本文提出的特征金字塔
网络结构
https://github.com/NVlabs/PWC-Net/blob/master/PyTorch/models/PWCNet.py
https://github.com/Willianwatch/pwcnet-pytorch/blob/main/methods/models/pwcnet.py
https://zhuanlan.zhihu.com/p/589163667
3raft
https://blog.csdn.net/qq_39546227/article/details/115005833
主要是计算全局相似度
level i:n,5x5,h,w -> 24,1 相减—> boxfilter n,24,h,w—> top position x,y–>flow
4uflow
https://github.com/sniklaus/pytorch-unflow
5Back to Basics: Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness
https://arxiv.org/pdf/1608.05842
https://github.com/ily-R/Unsupervised-Optical-Flow
5.1photometric loss:
5.2.smooth loss:
5.3.网络结构:
5.4.数据增强
The photometric augmentations are comprised of
additive Gaussian noise applied to each image,
contrast,
multiplicative colour changes to the RGB channels,
gamma
and additive brightness.
The geometric transformations are comprised of
2D translations,
left-right flipping,
rotations
and scalings
5.5.训练超参数设置
5.6.code
网络结构flownetS
import torch
import torch.nn as nn
import torch.nn.functional as F
def conv(in_channels, out_channels, kernel_size=3, stride=2):
return nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding=(kernel_size - 1) // 2, bias=False),
nn.ReLU(inplace=True))
def predict_flow(in_channels):
return nn.Conv2d(in_channels, 2, 5, stride=1, padding=2, bias=False)
def upconv(in_channels, out_channels):
return nn.Sequential(nn.ConvTranspose2d(in_channels, out_channels, kernel_size=4, stride=2, padding=1, bias=False),
nn.ReLU(inplace=True))
def concatenate(tensor1, tensor2, tensor3):
_, _, h1, w1 = tensor1.shape
_, _, h2, w2 = tensor2.shape
_, _, h3, w3 = tensor3.shape
h, w = min(h1, h2, h3), min(w1, w2, w3)
return torch.cat((tensor1[:, :, :h, :w], tensor2[:, :, :h, :w], tensor3[:, :, :h, :w]), 1)
class FlowNetS(nn.Module):
def __init__(self):
super(FlowNetS, self).__init__()
self.conv1 = conv(6, 64, kernel_size=7)
self.conv2 = conv(64, 128, kernel_size=5)
self.conv3 = conv(128, 256, kernel_size=5)
self.conv3_1 = conv(256, 256, stride=1)
self.conv4 = conv(256, 512)
self.conv4_1 = conv(512, 512, stride=1)
self.conv5 = conv(512, 512)
self.conv5_1 = conv(512, 512, stride=1)
self.conv6 = conv(512, 1024)
self.predict_flow6 = predict_flow(1024) # conv6 output
self.predict_flow5 = predict_flow(1026) # upconv5 + 2 + conv5_1
self.predict_flow4 = predict_flow(770) # upconv4 + 2 + conv4_1
self.predict_flow3 = predict_flow(386) # upconv3 + 2 + conv3_1
self.predict_flow2 = predict_flow(194) # upconv2 + 2 + conv2
self.upconv5 = upconv(1024, 512)
self.upconv4 = upconv(1026, 256)
self.upconv3 = upconv(770, 128)
self.upconv2 = upconv(386, 64)
self.upconvflow6 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=False)
self.upconvflow5 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=False)
self.upconvflow4 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=False)
self.upconvflow3 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=False)
def forward(self, x):
out_conv2 = self.conv2(self.conv1(x))
out_conv3 = self.conv3_1(self.conv3(out_conv2))
out_conv4 = self.conv4_1(self.conv4(out_conv3))
out_conv5 = self.conv5_1(self.conv5(out_conv4))
out_conv6 = self.conv6(out_conv5)
flow6 = self.predict_flow6(out_conv6)
up_flow6 = self.upconvflow6(flow6)
out_upconv5 = self.upconv5(out_conv6)
concat5 = concatenate(out_upconv5, out_conv5, up_flow6)
flow5 = self.predict_flow5(concat5)
up_flow5 = self.upconvflow5(flow5)
out_upconv4 = self.upconv4(concat5)
concat4 = concatenate(out_upconv4, out_conv4, up_flow5)
flow4 = self.predict_flow4(concat4)
up_flow4 = self.upconvflow4(flow4)
out_upconv3 = self.upconv3(concat4)
concat3 = concatenate(out_upconv3, out_conv3, up_flow4)
flow3 = self.predict_flow3(concat3)
up_flow3 = self.upconvflow3(flow3)
out_upconv2 = self.upconv2(concat3)
concat2 = concatenate(out_upconv2, out_conv2, up_flow3)
finalflow = self.predict_flow2(concat2)
if self.training:
return finalflow, flow3, flow4, flow5, flow6
else:
return finalflow,
# 下面是无监督训练的模型,相比于监督,模型会额外输出 warp后的image
def generate_grid(B, H, W, device):
xx = torch.arange(0, W).view(1, -1).repeat(H, 1)
yy = torch.arange(0, H).view(-1, 1).repeat(1, W)
xx = xx.view(1, 1, H, W).repeat(B, 1, 1, 1)
yy = yy.view(1, 1, H, W).repeat(B, 1, 1, 1)
grid = torch.cat((xx, yy), 1).float()
grid = torch.transpose(grid, 1, 2)
grid = torch.transpose(grid, 2, 3)
grid = grid.to(device)
return grid
class Unsupervised(nn.Module):
def __init__(self, conv_predictor="flownet"):
super(Unsupervised, self).__init__()
if "light" in conv_predictor:
self.predictor = LightFlowNet()
elif "pwc" in conv_predictor:
self.predictor = PWC_Net()
else:
self.predictor = FlowNetS()
def stn(self, flow, frame):
b, _, h, w = flow.shape
frame = F.interpolate(frame, size=(h, w), mode='bilinear', align_corners=True)
flow = torch.transpose(flow, 1, 2)
flow = torch.transpose(flow, 2, 3)
grid = flow + generate_grid(b, h, w, flow.device)
factor = torch.FloatTensor([[[[2 / w, 2 / h]]]]).to(flow.device)
grid = grid * factor - 1
warped_frame = F.grid_sample(frame, grid) # n,3,h,w and h,h,w,2
return warped_frame
def forward(self, x):
flow_predictions = self.predictor(x)
frame2 = x[:, 3:, :, :]
warped_images = [self.stn(flow, frame2) for flow in flow_predictions]
return flow_predictions, warped_images
损失函数:包括监督损失和无监督损失
import numpy as np
import sys
import os
import cv2
import torch.nn.functional as F
import torch
# APE损失 和 AAE损失, 可以用作监督训练 或者 评价
def EPE(flow_pred, flow_true, real=False): # n,2,h,w n,2,h,w 在 dim1上相减计算2范数(距离), 再求均值
# 默认real = false, 即将 gt_flow 缩放到 和 flow_pred一致
if real:
batch_size, _, h, w = flow_true.shape
flow_pred = F.interpolate(flow_pred, (h, w), mode='bilinear', align_corners=False)
else:
batch_size, _, h, w = flow_pred.shape
flow_true = F.interpolate(flow_true, (h, w), mode='area')
return torch.norm(flow_pred - flow_true, 2, 1).mean() # 2范数,dim1
def EPE_all(flows_pred, flow_true, weights=(0.005, 0.01, 0.02, 0.08, 0.32)): # 五个weight,五个level的pred flow
if len(flows_pred) < 5:
weights = [0.005]*len(flows_pred)
loss = 0
for i in range(len(weights)):
loss += weights[i] * EPE(flows_pred[i], flow_true, real=False)
return loss
def AAE(flow_pred, flow_true): # 计算flow的角度, cos = a * b / (|a| * |b|)
batch_size, _, h, w = flow_true.shape
flow_pred = F.interpolate(flow_pred, (h, w), mode='bilinear', align_corners=False)
numerator = torch.sum(torch.mul(flow_pred, flow_pred), dim=1) + 1
denominator = torch.sqrt(torch.sum(flow_pred ** 2, dim=1) + 1) * torch.sqrt(torch.sum(flow_true ** 2, dim=1) + 1)
result = torch.clamp(torch.div(numerator, denominator), min=-1.0, max=1.0)
return torch.acos(result).mean()
def evaluate(flow_pred, flow_true):
epe = EPE(flow_pred, flow_true, real=True)
aae = AAE(flow_pred, flow_true)
return epe, aae
# 下面是 无监督训练的 像素损失 和 光流平滑损失
def charbonnier(x, alpha=0.25, epsilon=1.e-9):
return torch.pow(torch.pow(x, 2) + epsilon**2, alpha)
def smoothness_loss(flow):
b, c, h, w = flow.size()
v_translated = torch.cat((flow[:, :, 1:, :], torch.zeros(b, c, 1, w, device=flow.device)), dim=-2)
h_translated = torch.cat((flow[:, :, :, 1:], torch.zeros(b, c, h, 1, device=flow.device)), dim=-1)
s_loss = charbonnier(flow - v_translated) + charbonnier(flow - h_translated) # 减去水平和数值平移的flow
s_loss = torch.sum(s_loss, dim=1) / 2
return torch.sum(s_loss)/b # hw个点的loss和
def photometric_loss(wraped, frame1):
h, w = wraped.shape[2:]
frame1 = F.interpolate(frame1, (h, w), mode='bilinear', align_corners=False)
p_loss = charbonnier(wraped - frame1) # bchw
p_loss = torch.sum(p_loss, dim=1)/3 # bhw
return torch.sum(p_loss)/frame1.size(0) # 1 :是hw个点的loss和
def unsup_loss(pred_flows, wraped_imgs, frame1, weights=(0.005, 0.01, 0.02, 0.08, 0.32)):
if len(pred_flows) < 5:
weights = [0.005]*len(pred_flows)
bce = 0
smooth = 0
lamda = 0.1
for i in range(len(weights)):
bce += weights[i] * photometric_loss(wraped_imgs[i], frame1)
smooth += weights[i] * smoothness_loss(pred_flows[i])
smooth = lamda* smooth
loss = bce + smooth
return loss, bce, smooth
训练
训练的时候,model input是 两张图像: batch_size, 6, h, w
输出第二张图像的flow, 和 第二张图像的warp
model = Unsupervised(conv_predictor='flownets')
criterion = unsup_loss
...
pred_flows, wraped_imgs = model(imgs)
loss, bce_loss, smooth_loss = criterion(pred_flows, wraped_imgs, imgs[:, :3, :, :])
小结
实际训练自己的图像,flow一直全为0,loss 停止,这在github issue中也有人遇到类似的问题,为什么学习不到内容呢?可能和数据集有关系? 把smooth weight减小 可以避免这种现象
6.UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss
AAAI 2018
https://arxiv.org/pdf/1711.07837
6.1网络框架
6.2.改进点
主要在损失函数
- bidirectional 双向
- occlusion-aware loss 考虑遮挡
6.3.如何判断遮挡
基于假设:非遮挡像素 前向光流 应该与同一空间位置 后向光流方向相反。即相加应该等于0
这样我们通过上面的公式可以得到 前向遮挡区域 和 后向遮挡区域的mask
6.4.occlusion-aware loss
pixel loss
非遮挡区域pixel loss, 另外就是对 遮挡mask应用正则化,防止网络倾向于学习 扩大遮挡区域。
smooth loss
二阶平滑
consistency loss
前后光流的一致性,不一致的或者差距大的都倍判为遮挡
final loss
7.8baseflow_Motion Basis Learning for Unsupervised Deep Homography Estimation with Subspace Projection
7.1.原理
一般的warp预测是预测一些对应点,本文是预测 homo flow
homo flow相比 一般flow当然会有更多的约束。
三个特点:
homography flow
a LRR block to reduce rank of features
a feature identity loss to stabilize the optimization process.
7.2.8 base flow
如何得到的这8个正交flow, 原文中有介绍
7.3.net
7.4.Warp-equivariant feature extractor.
f是一个特征提取小网络,对warp等价,即
7.5. homography estimator with LRR blocks
这个网络 基本上是 一个resnet-34,但是引入了LRR模块
7.6.小结
作者提供了 数据集,代码和 pretrained model
方便进行试验验证
如果是从新训练想要得到和作者接近的结果 还是有些超参数需要调试的。