周报-240301

最新推荐文章于 2024-09-27 20:57:01 发布

Gypsophila_01

最新推荐文章于 2024-09-27 20:57:01 发布

阅读量415

点赞数 5

文章标签： pytorch 3d 深度学习

本文链接：https://blog.csdn.net/Gypsophila_01/article/details/136384633

版权

学习内容

1.论文

2.Pytorch 官网学习

学习时间

2024.02.23 — 2024.02.29

学习笔记

FreeReg

Abstract

中间特征，即扩散特征，提取自 depth-to-image 扩散模型，在图像和点云之间的语义是一致的
粗匹配准确、鲁棒性高
效果提升显著（inlier ratio, registration recall）

Introduction

点云与图像表示不同的信息，强行对齐会导致效果不佳（鲁棒性差、泛化能力有限）
通过预训练的大规模模型（depth-to-image diffusion models, monocular depth estimators）统一图像与点云之间的模态，避免在 I2P 任务上进行训练
从点云中生成深度图，并将深度图的语义特征与图像的语义特征进行 the nearest neighbor 选择，建立鲁棒、稀疏、粗粒度的匹配
尽管根据图片生成的点云与输入的点云存在失真，局部几何形状仍然提供了的有用的信息

depth-to-image diffusion models：将点云投影到平面，获得一个深度图，然后通过此深度图转换为一个图像

monocular depth estimators：为输入图像生成逐像素深度，并从深度图中恢复点云

Related Work

过去的很多方法依赖跨模态数据集进行训练，而且缺乏泛化能力，而 FreeReg 不需要训练，并且泛化性良好（包括室内与室外数据集）
另一些方法将 I2P 问题视为优化问题，通过逐步对齐关键点来回归姿态，来摆脱局部最小值。因此局限于特定场景，且严重依赖初始的正确姿势。而 FreeReg 不需要严格准确的初始化，通过匹配特征以构建 correspondences 来处理大姿态变化
扩散模型提取的中间特征（diffusion feature）效果优秀，FreeReg 从 RGB 和 depth maps 上提取扩散特征进行使用
使用 SoTA 度量深度估计器 Zoe-Depth 来恢复与 RGB 图像对应的相同指标中的点云

Method

利用 Stable Diffusion（扩散模型：包含一个正向与反向过程，正向添加噪声，反向去除噪声）与 ControlNet （增加了一个 encoder 来处理深度图，使用提取的深度特征来指导 SD 的反向过程）来提取跨模态特征进行特征匹配
从深度图生成图像与输入图像存在深度不一致，因此选择使用中间特征来进行跨模态特征匹配
给图像加噪声，通过 SD；将深度图细化，通过 CN 指导 SD 的 reverse process（输入是纯高斯噪声）
生成不同层用于选择，本文选择了相似度最大的步（0, 4, 6）来获取 diffusion feature，并通过连接等处理得到最终的扩散特征

使用 Zoe-Depth 输入图片生成逐像素深度 $D_{z}$ ，并从生成的深度图中恢复点云
使用 FCGF 来提取逐点的特征，作为它们在图像 I 中对应像素的几何特征（geometric feature）
由于深度估计的失真，几何特征会有很多 outlier

融合 diffusion feature 与 geometric feature，获得 fuse feature
对深度图和 RGB 上的特征，使用相互近邻选择得到像素到点的对应关系
估计 SE(3) 时，两种方法（PnP 和 Kabsch）各有优劣

Experiments

在三个数据集上进行了测试：3DMatch ScanNet Kitti-DC
评估指标：FMR IR RR

disadvantages

diffusion feature 与 geometric feature 融合生成 fuse feature，但是融合方法是简单粗暴的连接 — 公式，且两个特征存在 distortion
选择时间步 t 时，没有给出好的理由
时间虽然比较短，但是要求大量的空间（存疑）

Pytorch

Transforms

transform 用于修改特征，target_transforms 用于修改标签

import torch
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

ds = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
    target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y), value=1))
)

out:

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz

  0%|          | 0/26421880 [00:00<?, ?it/s]
  0%|          | 65536/26421880 [00:00<01:12, 362694.14it/s]
  1%|          | 229376/26421880 [00:00<00:38, 682978.65it/s]
  4%|3         | 950272/26421880 [00:00<00:11, 2189194.87it/s]
 15%|#4        | 3833856/26421880 [00:00<00:02, 7614511.64it/s]
 33%|###2      | 8650752/26421880 [00:00<00:01, 17248347.19it/s]
 45%|####4     | 11829248/26421880 [00:00<00:00, 17706006.41it/s]
 67%|######7   | 17760256/26421880 [00:01<00:00, 27456677.37it/s]
 80%|#######9  | 21135360/26421880 [00:01<00:00, 24474772.52it/s]
 98%|#########7| 25788416/26421880 [00:01<00:00, 25018881.62it/s]
100%|##########| 26421880/26421880 [00:01<00:00, 18219876.54it/s]
Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz

  0%|          | 0/29515 [00:00<?, ?it/s]
100%|##########| 29515/29515 [00:00<00:00, 327656.58it/s]
Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz

  0%|          | 0/4422102 [00:00<?, ?it/s]
  1%|1         | 65536/4422102 [00:00<00:12, 362973.36it/s]
  5%|5         | 229376/4422102 [00:00<00:06, 682826.01it/s]
 21%|##1       | 950272/4422102 [00:00<00:01, 2190972.06it/s]
 87%|########6 | 3833856/4422102 [00:00<00:00, 7618120.48it/s]
100%|##########| 4422102/4422102 [00:00<00:00, 6090776.97it/s]
Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz

  0%|          | 0/5148 [00:00<?, ?it/s]
100%|##########| 5148/5148 [00:00<00:00, 32765215.47it/s]
Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Lambda Transforms

创建独热向量：首先创建一个大小为10（我们数据集中标签的数量）的零张量，并调用scatter_，它在标签y给出的索引上分配一个值=1。

target_transform = Lambda(lambda y: torch.zeros(
    10, dtype=torch.float).scatter_(dim=0, index=torch.tensor(y), value=1))s

Build the Neural Network

torch.nn namespace 提供了构建自己的神经网络所需的所有构建块。PyTorch 中的每个模块都将 nn 子类化

import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

放在加速 device 上

device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

out:

Using cuda device

实例化神经网络层：

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits
    
model = NeuralNetwork().to(device)
print(model)

out:

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

模型的调用与使用：

X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

out:

Predicted class: tensor([7], device='cuda:0')

使用 nn.Flatten 来将图像展平为 1 维

flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

out:

torch.Size([3, 784])

使用 nn.Linear 来处理输入的数据

layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

out:

torch.Size([3, 20])

激活函数 relu：

print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

out:

Before ReLU: tensor([[ 0.4158, -0.0130, -0.1144,  0.3960,  0.1476, -0.0690, -0.0269,  0.2690,
          0.1353,  0.1975,  0.4484,  0.0753,  0.4455,  0.5321, -0.1692,  0.4504,
          0.2476, -0.1787, -0.2754,  0.2462],
        [ 0.2326,  0.0623, -0.2984,  0.2878,  0.2767, -0.5434, -0.5051,  0.4339,
          0.0302,  0.1634,  0.5649, -0.0055,  0.2025,  0.4473, -0.2333,  0.6611,
          0.1883, -0.1250,  0.0820,  0.2778],
        [ 0.3325,  0.2654,  0.1091,  0.0651,  0.3425, -0.3880, -0.0152,  0.2298,
          0.3872,  0.0342,  0.8503,  0.0937,  0.1796,  0.5007, -0.1897,  0.4030,
          0.1189, -0.3237,  0.2048,  0.4343]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.4158, 0.0000, 0.0000, 0.3960, 0.1476, 0.0000, 0.0000, 0.2690, 0.1353,
         0.1975, 0.4484, 0.0753, 0.4455, 0.5321, 0.0000, 0.4504, 0.2476, 0.0000,
         0.0000, 0.2462],
        [0.2326, 0.0623, 0.0000, 0.2878, 0.2767, 0.0000, 0.0000, 0.4339, 0.0302,
         0.1634, 0.5649, 0.0000, 0.2025, 0.4473, 0.0000, 0.6611, 0.1883, 0.0000,
         0.0820, 0.2778],
        [0.3325, 0.2654, 0.1091, 0.0651, 0.3425, 0.0000, 0.0000, 0.2298, 0.3872,
         0.0342, 0.8503, 0.0937, 0.1796, 0.5007, 0.0000, 0.4030, 0.1189, 0.0000,
         0.2048, 0.4343]], grad_fn=<ReluBackward0>)

nn.Sequantial 用于封装隐藏层：

seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

nn.Softmax 用于预测类型

softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

对每个参数进行迭代，并打印其大小和值的预览：

print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

out:

Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[ 0.0273,  0.0296, -0.0084,  ..., -0.0142,  0.0093,  0.0135],
        [-0.0188, -0.0354,  0.0187,  ..., -0.0106, -0.0001,  0.0115]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0155, -0.0327], device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0116,  0.0293, -0.0280,  ...,  0.0334, -0.0078,  0.0298],
        [ 0.0095,  0.0038,  0.0009,  ..., -0.0365, -0.0011, -0.0221]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values : tensor([ 0.0148, -0.0256], device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values : tensor([[-0.0147, -0.0229,  0.0180,  ..., -0.0013,  0.0177,  0.0070],
        [-0.0202, -0.0417, -0.0279,  ..., -0.0441,  0.0185, -0.0268]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values : tensor([ 0.0070, -0.0411], device='cuda:0', grad_fn=<SliceBackward0>)