学习内容
1.论文
2.Pytorch 官网学习
学习时间
2024.02.23 — 2024.02.29
学习笔记
FreeReg
Abstract
- 中间特征,即扩散特征,提取自 depth-to-image 扩散模型,在图像和点云之间的语义是一致的
- 粗匹配准确、鲁棒性高
- 效果提升显著(inlier ratio, registration recall)
Introduction
- 点云与图像表示不同的信息,强行对齐会导致效果不佳(鲁棒性差、泛化能力有限)
- 通过预训练的大规模模型(depth-to-image diffusion models, monocular depth estimators)统一图像与点云之间的模态,避免在 I2P 任务上进行训练
- 从点云中生成深度图,并将深度图的语义特征与图像的语义特征进行 the nearest neighbor 选择,建立鲁棒、稀疏、粗粒度的匹配
- 尽管根据图片生成的点云与输入的点云存在失真,局部几何形状仍然提供了的有用的信息
depth-to-image diffusion models:将点云投影到平面,获得一个深度图,然后通过此深度图转换为一个图像
monocular depth estimators:为输入图像生成逐像素深度,并从深度图中恢复点云
Related Work
- 过去的很多方法依赖跨模态数据集进行训练,而且缺乏泛化能力,而 FreeReg 不需要训练,并且泛化性良好(包括室内与室外数据集)
- 另一些方法将 I2P 问题视为优化问题,通过逐步对齐关键点来回归姿态,来摆脱局部最小值。因此局限于特定场景,且严重依赖初始的正确姿势。而 FreeReg 不需要严格准确的初始化,通过匹配特征以构建 correspondences 来处理大姿态变化
- 扩散模型提取的中间特征(diffusion feature)效果优秀,FreeReg 从 RGB 和 depth maps 上提取扩散特征进行使用
- 使用 SoTA 度量深度估计器 Zoe-Depth 来恢复与 RGB 图像对应的相同指标中的点云
Method
1
- 利用 Stable Diffusion(扩散模型:包含一个正向与反向过程,正向添加噪声,反向去除噪声)与 ControlNet (增加了一个 encoder 来处理深度图,使用提取的深度特征来指导 SD 的反向过程)来提取跨模态特征进行特征匹配
- 从深度图生成图像与输入图像存在深度不一致,因此选择使用中间特征来进行跨模态特征匹配
- 给图像加噪声,通过 SD;将深度图细化,通过 CN 指导 SD 的 reverse process(输入是纯高斯噪声)
- 生成不同层用于选择,本文选择了相似度最大的步(0, 4, 6)来获取 diffusion feature,并通过连接等处理得到最终的扩散特征
2
- 使用 Zoe-Depth 输入图片生成逐像素深度 D z D_{z} Dz ,并从生成的深度图中恢复点云
- 使用 FCGF 来提取逐点的特征,作为它们在图像 I 中对应像素的几何特征(geometric feature)
- 由于深度估计的失真,几何特征会有很多 outlier
3
- 融合 diffusion feature 与 geometric feature,获得 fuse feature
- 对深度图和 RGB 上的特征,使用相互近邻选择得到像素到点的对应关系
- 估计 SE(3) 时,两种方法(PnP 和 Kabsch)各有优劣
Experiments
- 在三个数据集上进行了测试:3DMatch ScanNet Kitti-DC
- 评估指标:FMR IR RR
disadvantages
- diffusion feature 与 geometric feature 融合生成 fuse feature,但是融合方法是简单粗暴的连接 — 公式,且两个特征存在 distortion
- 选择时间步 t 时,没有给出好的理由
- 时间虽然比较短,但是要求大量的空间(存疑)
Pytorch
Transforms
transform 用于修改特征,target_transforms 用于修改标签
import torch
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda
ds = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor(),
target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y), value=1))
)
out:
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz
0%| | 0/26421880 [00:00<?, ?it/s]
0%| | 65536/26421880 [00:00<01:12, 362694.14it/s]
1%| | 229376/26421880 [00:00<00:38, 682978.65it/s]
4%|3 | 950272/26421880 [00:00<00:11, 2189194.87it/s]
15%|#4 | 3833856/26421880 [00:00<00:02, 7614511.64it/s]
33%|###2 | 8650752/26421880 [00:00<00:01, 17248347.19it/s]
45%|####4 | 11829248/26421880 [00:00<00:00, 17706006.41it/s]
67%|######7 | 17760256/26421880 [00:01<00:00, 27456677.37it/s]
80%|#######9 | 21135360/26421880 [00:01<00:00, 24474772.52it/s]
98%|#########7| 25788416/26421880 [00:01<00:00, 25018881.62it/s]
100%|##########| 26421880/26421880 [00:01<00:00, 18219876.54it/s]
Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz
0%| | 0/29515 [00:00<?, ?it/s]
100%|##########| 29515/29515 [00:00<00:00, 327656.58it/s]
Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz
0%| | 0/4422102 [00:00<?, ?it/s]
1%|1 | 65536/4422102 [00:00<00:12, 362973.36it/s]
5%|5 | 229376/4422102 [00:00<00:06, 682826.01it/s]
21%|##1 | 950272/4422102 [00:00<00:01, 2190972.06it/s]
87%|########6 | 3833856/4422102 [00:00<00:00, 7618120.48it/s]
100%|##########| 4422102/4422102 [00:00<00:00, 6090776.97it/s]
Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz
0%| | 0/5148 [00:00<?, ?it/s]
100%|##########| 5148/5148 [00:00<00:00, 32765215.47it/s]
Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw
Lambda Transforms
创建独热向量:首先创建一个大小为10(我们数据集中标签的数量)的零张量,并调用scatter_,它在标签y给出的索引上分配一个值=1。
target_transform = Lambda(lambda y: torch.zeros(
10, dtype=torch.float).scatter_(dim=0, index=torch.tensor(y), value=1))s
Build the Neural Network
torch.nn namespace 提供了构建自己的神经网络所需的所有构建块。PyTorch 中的每个模块都将 nn 子类化
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
放在加速 device 上
device = (
"cuda"
if torch.cuda.is_available()
else "mps"
if torch.backends.mps.is_available()
else "cpu"
)
print(f"Using {device} device")
out:
Using cuda device
实例化神经网络层:
class NeuralNetwork(nn.Module):
def __init__(self):
super().__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
print(model)
out:
NeuralNetwork(
(flatten): Flatten(start_dim=1, end_dim=-1)
(linear_relu_stack): Sequential(
(0): Linear(in_features=784, out_features=512, bias=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=512, bias=True)
(3): ReLU()
(4): Linear(in_features=512, out_features=10, bias=True)
)
)
模型的调用与使用:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")
out:
Predicted class: tensor([7], device='cuda:0')
使用 nn.Flatten
来将图像展平为 1 维
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())
out:
torch.Size([3, 784])
使用 nn.Linear
来处理输入的数据
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())
out:
torch.Size([3, 20])
激活函数 relu:
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")
out:
Before ReLU: tensor([[ 0.4158, -0.0130, -0.1144, 0.3960, 0.1476, -0.0690, -0.0269, 0.2690,
0.1353, 0.1975, 0.4484, 0.0753, 0.4455, 0.5321, -0.1692, 0.4504,
0.2476, -0.1787, -0.2754, 0.2462],
[ 0.2326, 0.0623, -0.2984, 0.2878, 0.2767, -0.5434, -0.5051, 0.4339,
0.0302, 0.1634, 0.5649, -0.0055, 0.2025, 0.4473, -0.2333, 0.6611,
0.1883, -0.1250, 0.0820, 0.2778],
[ 0.3325, 0.2654, 0.1091, 0.0651, 0.3425, -0.3880, -0.0152, 0.2298,
0.3872, 0.0342, 0.8503, 0.0937, 0.1796, 0.5007, -0.1897, 0.4030,
0.1189, -0.3237, 0.2048, 0.4343]], grad_fn=<AddmmBackward0>)
After ReLU: tensor([[0.4158, 0.0000, 0.0000, 0.3960, 0.1476, 0.0000, 0.0000, 0.2690, 0.1353,
0.1975, 0.4484, 0.0753, 0.4455, 0.5321, 0.0000, 0.4504, 0.2476, 0.0000,
0.0000, 0.2462],
[0.2326, 0.0623, 0.0000, 0.2878, 0.2767, 0.0000, 0.0000, 0.4339, 0.0302,
0.1634, 0.5649, 0.0000, 0.2025, 0.4473, 0.0000, 0.6611, 0.1883, 0.0000,
0.0820, 0.2778],
[0.3325, 0.2654, 0.1091, 0.0651, 0.3425, 0.0000, 0.0000, 0.2298, 0.3872,
0.0342, 0.8503, 0.0937, 0.1796, 0.5007, 0.0000, 0.4030, 0.1189, 0.0000,
0.2048, 0.4343]], grad_fn=<ReluBackward0>)
nn.Sequantial
用于封装隐藏层:
seq_modules = nn.Sequential(
flatten,
layer1,
nn.ReLU(),
nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)
nn.Softmax
用于预测类型
softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)
对每个参数进行迭代,并打印其大小和值的预览:
print(f"Model structure: {model}\n\n")
for name, param in model.named_parameters():
print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")
out:
Model structure: NeuralNetwork(
(flatten): Flatten(start_dim=1, end_dim=-1)
(linear_relu_stack): Sequential(
(0): Linear(in_features=784, out_features=512, bias=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=512, bias=True)
(3): ReLU()
(4): Linear(in_features=512, out_features=10, bias=True)
)
)
Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[ 0.0273, 0.0296, -0.0084, ..., -0.0142, 0.0093, 0.0135],
[-0.0188, -0.0354, 0.0187, ..., -0.0106, -0.0001, 0.0115]],
device='cuda:0', grad_fn=<SliceBackward0>)
Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0155, -0.0327], device='cuda:0', grad_fn=<SliceBackward0>)
Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0116, 0.0293, -0.0280, ..., 0.0334, -0.0078, 0.0298],
[ 0.0095, 0.0038, 0.0009, ..., -0.0365, -0.0011, -0.0221]],
device='cuda:0', grad_fn=<SliceBackward0>)
Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values : tensor([ 0.0148, -0.0256], device='cuda:0', grad_fn=<SliceBackward0>)
Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values : tensor([[-0.0147, -0.0229, 0.0180, ..., -0.0013, 0.0177, 0.0070],
[-0.0202, -0.0417, -0.0279, ..., -0.0441, 0.0185, -0.0268]],
device='cuda:0', grad_fn=<SliceBackward0>)
Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values : tensor([ 0.0070, -0.0411], device='cuda:0', grad_fn=<SliceBackward0>)