周报-240301

学习内容

1.论文

2.Pytorch 官网学习


学习时间

2024.02.23 — 2024.02.29


学习笔记

FreeReg

Abstract

  • 中间特征,即扩散特征,提取自 depth-to-image 扩散模型,在图像和点云之间的语义是一致的
  • 粗匹配准确、鲁棒性高
  • 效果提升显著(inlier ratio, registration recall)

Introduction

  • 点云与图像表示不同的信息,强行对齐会导致效果不佳(鲁棒性差、泛化能力有限)
  • 通过预训练的大规模模型(depth-to-image diffusion models, monocular depth estimators)统一图像与点云之间的模态,避免在 I2P 任务上进行训练
  • 从点云中生成深度图,并将深度图的语义特征与图像的语义特征进行 the nearest neighbor 选择,建立鲁棒、稀疏、粗粒度的匹配
  • 尽管根据图片生成的点云与输入的点云存在失真,局部几何形状仍然提供了的有用的信息

depth-to-image diffusion models:将点云投影到平面,获得一个深度图,然后通过此深度图转换为一个图像

monocular depth estimators:为输入图像生成逐像素深度,并从深度图中恢复点云

Related Work

  • 过去的很多方法依赖跨模态数据集进行训练,而且缺乏泛化能力,而 FreeReg 不需要训练,并且泛化性良好(包括室内与室外数据集)
  • 另一些方法将 I2P 问题视为优化问题,通过逐步对齐关键点来回归姿态,来摆脱局部最小值。因此局限于特定场景,且严重依赖初始的正确姿势。而 FreeReg 不需要严格准确的初始化,通过匹配特征以构建 correspondences 来处理大姿态变化
  • 扩散模型提取的中间特征(diffusion feature)效果优秀,FreeReg 从 RGB 和 depth maps 上提取扩散特征进行使用
  • 使用 SoTA 度量深度估计器 Zoe-Depth 来恢复与 RGB 图像对应的相同指标中的点云

Method

1

  • 利用 Stable Diffusion(扩散模型:包含一个正向与反向过程,正向添加噪声,反向去除噪声)与 ControlNet (增加了一个 encoder 来处理深度图,使用提取的深度特征来指导 SD 的反向过程)来提取跨模态特征进行特征匹配
  • 从深度图生成图像与输入图像存在深度不一致,因此选择使用中间特征来进行跨模态特征匹配
  • 给图像加噪声,通过 SD;将深度图细化,通过 CN 指导 SD 的 reverse process(输入是纯高斯噪声)
  • 生成不同层用于选择,本文选择了相似度最大的步(0, 4, 6)来获取 diffusion feature,并通过连接等处理得到最终的扩散特征

2

  • 使用 Zoe-Depth 输入图片生成逐像素深度 D z D_{z} Dz ,并从生成的深度图中恢复点云
  • 使用 FCGF 来提取逐点的特征,作为它们在图像 I 中对应像素的几何特征(geometric feature)
  • 由于深度估计的失真,几何特征会有很多 outlier

3

  • 融合 diffusion feature 与 geometric feature,获得 fuse feature
  • 对深度图和 RGB 上的特征,使用相互近邻选择得到像素到点的对应关系
  • 估计 SE(3) 时,两种方法(PnP 和 Kabsch)各有优劣

Experiments

  • 在三个数据集上进行了测试:3DMatch ScanNet Kitti-DC
  • 评估指标:FMR IR RR

disadvantages

  • diffusion feature 与 geometric feature 融合生成 fuse feature,但是融合方法是简单粗暴的连接 — 公式,且两个特征存在 distortion
  • 选择时间步 t 时,没有给出好的理由
  • 时间虽然比较短,但是要求大量的空间(存疑)

Pytorch

Transforms

transform 用于修改特征,target_transforms 用于修改标签

import torch
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

ds = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
    target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y), value=1))
)

out:

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz

  0%|          | 0/26421880 [00:00<?, ?it/s]
  0%|          | 65536/26421880 [00:00<01:12, 362694.14it/s]
  1%|          | 229376/26421880 [00:00<00:38, 682978.65it/s]
  4%|3         | 950272/26421880 [00:00<00:11, 2189194.87it/s]
 15%|#4        | 3833856/26421880 [00:00<00:02, 7614511.64it/s]
 33%|###2      | 8650752/26421880 [00:00<00:01, 17248347.19it/s]
 45%|####4     | 11829248/26421880 [00:00<00:00, 17706006.41it/s]
 67%|######7   | 17760256/26421880 [00:01<00:00, 27456677.37it/s]
 80%|#######9  | 21135360/26421880 [00:01<00:00, 24474772.52it/s]
 98%|#########7| 25788416/26421880 [00:01<00:00, 25018881.62it/s]
100%|##########| 26421880/26421880 [00:01<00:00, 18219876.54it/s]
Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz

  0%|          | 0/29515 [00:00<?, ?it/s]
100%|##########| 29515/29515 [00:00<00:00, 327656.58it/s]
Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz

  0%|          | 0/4422102 [00:00<?, ?it/s]
  1%|1         | 65536/4422102 [00:00<00:12, 362973.36it/s]
  5%|5         | 229376/4422102 [00:00<00:06, 682826.01it/s]
 21%|##1       | 950272/4422102 [00:00<00:01, 2190972.06it/s]
 87%|########6 | 3833856/4422102 [00:00<00:00, 7618120.48it/s]
100%|##########| 4422102/4422102 [00:00<00:00, 6090776.97it/s]
Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz

  0%|          | 0/5148 [00:00<?, ?it/s]
100%|##########| 5148/5148 [00:00<00:00, 32765215.47it/s]
Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw
Lambda Transforms

创建独热向量:首先创建一个大小为10(我们数据集中标签的数量)的零张量,并调用scatter_,它在标签y给出的索引上分配一个值=1。

target_transform = Lambda(lambda y: torch.zeros(
    10, dtype=torch.float).scatter_(dim=0, index=torch.tensor(y), value=1))s

Build the Neural Network

torch.nn namespace 提供了构建自己的神经网络所需的所有构建块。PyTorch 中的每个模块都将 nn 子类化

import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

放在加速 device 上

device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

out:

Using cuda device

实例化神经网络层:

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits
    
model = NeuralNetwork().to(device)
print(model)

out:

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

模型的调用与使用:

X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

out:

Predicted class: tensor([7], device='cuda:0')

使用 nn.Flatten 来将图像展平为 1 维

flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

out:

torch.Size([3, 784])

使用 nn.Linear 来处理输入的数据

layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

out:

torch.Size([3, 20])

激活函数 relu:

print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

out:

Before ReLU: tensor([[ 0.4158, -0.0130, -0.1144,  0.3960,  0.1476, -0.0690, -0.0269,  0.2690,
          0.1353,  0.1975,  0.4484,  0.0753,  0.4455,  0.5321, -0.1692,  0.4504,
          0.2476, -0.1787, -0.2754,  0.2462],
        [ 0.2326,  0.0623, -0.2984,  0.2878,  0.2767, -0.5434, -0.5051,  0.4339,
          0.0302,  0.1634,  0.5649, -0.0055,  0.2025,  0.4473, -0.2333,  0.6611,
          0.1883, -0.1250,  0.0820,  0.2778],
        [ 0.3325,  0.2654,  0.1091,  0.0651,  0.3425, -0.3880, -0.0152,  0.2298,
          0.3872,  0.0342,  0.8503,  0.0937,  0.1796,  0.5007, -0.1897,  0.4030,
          0.1189, -0.3237,  0.2048,  0.4343]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.4158, 0.0000, 0.0000, 0.3960, 0.1476, 0.0000, 0.0000, 0.2690, 0.1353,
         0.1975, 0.4484, 0.0753, 0.4455, 0.5321, 0.0000, 0.4504, 0.2476, 0.0000,
         0.0000, 0.2462],
        [0.2326, 0.0623, 0.0000, 0.2878, 0.2767, 0.0000, 0.0000, 0.4339, 0.0302,
         0.1634, 0.5649, 0.0000, 0.2025, 0.4473, 0.0000, 0.6611, 0.1883, 0.0000,
         0.0820, 0.2778],
        [0.3325, 0.2654, 0.1091, 0.0651, 0.3425, 0.0000, 0.0000, 0.2298, 0.3872,
         0.0342, 0.8503, 0.0937, 0.1796, 0.5007, 0.0000, 0.4030, 0.1189, 0.0000,
         0.2048, 0.4343]], grad_fn=<ReluBackward0>)

nn.Sequantial 用于封装隐藏层:

seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

nn.Softmax 用于预测类型

softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

对每个参数进行迭代,并打印其大小和值的预览:

print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

out:

Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[ 0.0273,  0.0296, -0.0084,  ..., -0.0142,  0.0093,  0.0135],
        [-0.0188, -0.0354,  0.0187,  ..., -0.0106, -0.0001,  0.0115]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0155, -0.0327], device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0116,  0.0293, -0.0280,  ...,  0.0334, -0.0078,  0.0298],
        [ 0.0095,  0.0038,  0.0009,  ..., -0.0365, -0.0011, -0.0221]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values : tensor([ 0.0148, -0.0256], device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values : tensor([[-0.0147, -0.0229,  0.0180,  ..., -0.0013,  0.0177,  0.0070],
        [-0.0202, -0.0417, -0.0279,  ..., -0.0441,  0.0185, -0.0268]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values : tensor([ 0.0070, -0.0411], device='cuda:0', grad_fn=<SliceBackward0>)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值