PointNet++论文介绍和代码实现

yufei-coder

于 2024-09-27 09:58:11 发布

阅读量734

点赞数 22

文章标签： python 计算机视觉深度学习 pytorch 图像处理神经网络人工智能

本文链接：https://blog.csdn.net/qq_45728381/article/details/142586008

版权

一、PointNet++论文详细介绍

1. 背景与动机

点云数据的重要性：在3D计算机视觉和图形学中，点云是一种常见的数据表示方式，广泛应用于3D扫描、自动驾驶、机器人导航等领域。
PointNet的局限性：PointNet是处理点云的开创性工作，但由于其直接对全局点集进行特征学习，无法有效捕捉局部特征，尤其是在处理具有复杂几何结构的场景时。

2. 主要贡献

引入层次化的特征学习架构：PointNet++通过构建层次化的神经网络架构，逐步捕捉从局部到全局的特征。
局部区域特征学习：利用度量空间中的距离信息，将点云划分为局部区域，在这些区域内应用PointNet，学习局部特征。
自适应区域大小：考虑到点云的非均匀采样问题，PointNet++引入了多尺度特征聚合和自适应采样策略。

3. 网络架构

整个网络主要由两种模块构成：Set Abstraction（SA）层和Feature Propagation（FP）层。

3.1 Set Abstraction（SA）层

目的：对点云进行下采样和特征提取，逐步构建层次化的特征表示。
步骤：
1. 采样（Sampling）：使用FPS（Farthest Point Sampling）算法，从输入点集选取代表性点，形成下一级的点集。
2. 分组（Grouping）：对于每个采样点，基于一定的半径或K近邻，在原始点集中找到其邻域点。
3. 特征提取（PointNet Layer）：在每个局部邻域内，使用PointNet对局部点集进行特征提取，生成局部特征。

3.2 Feature Propagation（FP）层

目的：在点云上进行特征上采样，将低分辨率的特征逐步传播回原始高分辨率的点集。
步骤：
1. 插值（Interpolation）：使用加权插值方法，将上一级的特征映射到更多的点上。
2. 特征融合：将插值得到的特征与对应级别的特征进行拼接，形成丰富的点特征表示。
3. MLP映射：通过多层感知机（MLP）对融合后的特征进行非线性变换。

4. 关键技术细节

多尺度特征聚合：为了解决点云密度不均的问题，PointNet++在SA层中采用多种尺度的邻域，提取不同尺度下的特征，然后进行融合。
自适应采样密度：在稀疏区域和密集区域，采用不同的采样策略，确保特征提取的有效性

5. 实验结果

分类任务：在ModelNet40数据集上，PointNet++取得了比PointNet更高的分类准确率。
分割任务：在ShapeNet和S3DIS等数据集上，PointNet++在语义分割和实例分割任务中表现出色。

6. 总结

PointNet++通过引入层次化的特征学习框架，克服了PointNet无法捕捉局部特征的缺陷，实现了对点云数据更深入和细致的理解。

二、使用Python和PyTorch实现PointNet++

下面将介绍如何使用Python和PyTorch从零开始实现PointNet++，包括关键模块的代码示例。

1. 环境准备

Python版本：建议使用Python 3.7或以上版本。
PyTorch版本：1.7或以上版本。
其他库：numpy、torchvision、h5py等

pip install torch torchvision numpy h5py

2. 关键模块实现

2.1 Farthest Point Sampling（FPS）

FPS用于从点云中选择具有代表性的点。

import torch

def farthest_point_sampling(xyz, npoint):
    """
    输入：
        xyz: 输入点云，形状为[B, N, 3]
        npoint: 采样点的数量
    输出：
        centroids: 采样点的索引，形状为[B, npoint]
    """
    device = xyz.device
    B, N, _ = xyz.shape
    centroids = torch.zeros(B, npoint, dtype=torch.long).to(device)
    distance = torch.ones(B, N).to(device) * 1e10
    batch_indices = torch.arange(B, dtype=torch.long).to(device)

    # 随机选择初始点
    farthest = torch.randint(0, N, (B,), dtype=torch.long).to(device)
    for i in range(npoint):
        centroids[:, i] = farthest
        centroid = xyz[batch_indices, farthest, :].view(B, 1, 3)
        dist = torch.sum((xyz - centroid) ** 2, -1)
        mask = dist < distance
        distance[mask] = dist[mask]
        farthest = torch.max(distance, -1)[1]
    return centroids

2.2 球形邻域查询（Ball Query）

在指定半径内查找每个采样点的邻域点。

def ball_query(radius, nsample, xyz, new_xyz):
    """
    输入：
        radius: 邻域半径
        nsample: 每个邻域的最大点数
        xyz: 所有点的坐标，形状为[B, N, 3]
        new_xyz: 采样点的坐标，形状为[B, npoint, 3]
    输出：
        group_idx: 邻域内点的索引，形状为[B, npoint, nsample]
    """
    device = xyz.device
    B, N, _ = xyz.shape
    _, S, _ = new_xyz.shape
    group_idx = torch.arange(N, dtype=torch.long).to(device).view(1, 1, N).repeat([B, S, 1])
    sqrdists = torch.sum((new_xyz.unsqueeze(2) - xyz.unsqueeze(1)) ** 2, -1)
    group_idx[sqrdists > radius ** 2] = N
    group_idx = group_idx.sort(dim=-1)[0][:, :, :nsample]
    group_first = group_idx[:, :, 0].view(B, S, 1).repeat([1, 1, nsample])
    mask = group_idx == N
    group_idx[mask] = group_first[mask]
    return group_idx

2.3 特征提取（PointNet Layer）

在每个局部邻域内，使用共享的MLP进行特征提取。

import torch.nn as nn

class PointNetSetAbstraction(nn.Module):
    def __init__(self, npoint, radius, nsample, in_channel, mlp):
        super(PointNetSetAbstraction, self).__init__()
        self.npoint = npoint
        self.radius = radius
        self.nsample = nsample
        
        layers = []
        last_channel = in_channel + 3  # 加上坐标维度
        for out_channel in mlp:
            layers.append(nn.Conv2d(last_channel, out_channel, 1))
            layers.append(nn.BatchNorm2d(out_channel))
            layers.append(nn.ReLU())
            last_channel = out_channel
        self.mlp = nn.Sequential(*layers)
        
    def forward(self, xyz, points):
        """
        输入：
            xyz: 原始点云坐标，形状为[B, N, 3]
            points: 输入特征，形状为[B, N, D]，如果没有特征则为None
        输出：
            new_xyz: 下采样点的坐标，形状为[B, npoint, 3]
            new_points: 下采样点的特征，形状为[B, npoint, mlp[-1]]
        """
        B, N, C = xyz.shape
        S = self.npoint
        # 1. 采样
        fps_idx = farthest_point_sampling(xyz, S)
        new_xyz = index_points(xyz, fps_idx)
        # 2. 分组
        idx = ball_query(self.radius, self.nsample, xyz, new_xyz)
        grouped_xyz = index_points(xyz, idx) - new_xyz.unsqueeze(-2)
        if points is not None:
            grouped_points = index_points(points, idx)
            grouped_points = torch.cat([grouped_xyz, grouped_points], dim=-1)
        else:
            grouped_points = grouped_xyz
        # 3. 特征提取
        grouped_points = grouped_points.permute(0, 3, 2, 1)  # [B, D+C, nsample, npoint]
        new_points = self.mlp(grouped_points)
        new_points = torch.max(new_points, 2)[0]
        new_points = new_points.permute(0, 2, 1)  # [B, npoint, mlp[-1]]
        return new_xyz, new_points

2.4 特征传播（Feature Propagation）

用于将特征上采样，恢复到高分辨率的点集。

class PointNetFeaturePropagation(nn.Module):
    def __init__(self, in_channel, mlp):
        super(PointNetFeaturePropagation, self).__init__()
        layers = []
        last_channel = in_channel
        for out_channel in mlp:
            layers.append(nn.Conv1d(last_channel, out_channel, 1))
            layers.append(nn.BatchNorm1d(out_channel))
            layers.append(nn.ReLU())
            last_channel = out_channel
        self.mlp = nn.Sequential(*layers)
        
    def forward(self, xyz1, xyz2, points1, points2):
        """
        输入：
            xyz1: 上一级的点坐标，形状为[B, N, 3]
            xyz2: 下一级的点坐标，形状为[B, S, 3]
            points1: 上一级的点特征，形状为[B, N, D1]
            points2: 下一级的点特征，形状为[B, S, D2]
        输出：
            new_points: 插值后的特征，形状为[B, N, mlp[-1]]
        """
        B, N, _ = xyz1.shape
        _, S, _ = xyz2.shape

        # 如果下一级没有特征，则直接插值
        if S == 1:
            interpolated_points = points2.repeat(1, N, 1)
        else:
            dists = square_distance(xyz1, xyz2)
            dists, idx = dists.sort(dim=-1)
            dists, idx = dists[:, :, :3], idx[:, :, :3]  # 取最近的3个点
            dist_recip = 1.0 / (dists + 1e-8)
            norm = torch.sum(dist_recip, dim=2, keepdim=True)
            weight = dist_recip / norm
            interpolated_points = torch.sum(index_points(points2, idx) * weight.unsqueeze(-1), dim=2)
        if points1 is not None:
            new_points = torch.cat([points1, interpolated_points], dim=-1)
        else:
            new_points = interpolated_points
        new_points = new_points.permute(0, 2, 1)
        new_points = self.mlp(new_points)
        new_points = new_points.permute(0, 2, 1)
        return new_points

2.5 完整的PointNet++网络

将上述模块组合，构建PointNet++模型。

class PointNetPlusPlus(nn.Module):
    def __init__(self, num_classes):
        super(PointNetPlusPlus, self).__init__()
        self.sa1 = PointNetSetAbstraction(npoint=1024, radius=0.1, nsample=32, in_channel=0, mlp=[32, 32, 64])
        self.sa2 = PointNetSetAbstraction(npoint=256, radius=0.2, nsample=32, in_channel=64, mlp=[64, 64, 128])
        self.sa3 = PointNetSetAbstraction(npoint=64, radius=0.4, nsample=32, in_channel=128, mlp=[128, 128, 256])
        self.sa4 = PointNetSetAbstraction(npoint=16, radius=0.8, nsample=32, in_channel=256, mlp=[256, 256, 512])
        
        self.fp4 = PointNetFeaturePropagation(in_channel=768, mlp=[256, 256])
        self.fp3 = PointNetFeaturePropagation(in_channel=384, mlp=[256, 256])
        self.fp2 = PointNetFeaturePropagation(in_channel=320, mlp=[256, 128])
        self.fp1 = PointNetFeaturePropagation(in_channel=128, mlp=[128, 128, 128])
        
        self.classifier = nn.Sequential(
            nn.Conv1d(128, 128, 1),
            nn.BatchNorm1d(128),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Conv1d(128, num_classes, 1)
        )
        
    def forward(self, xyz):
        B, N, C = xyz.shape
        l0_points = None
        l0_xyz = xyz
        
        # Set Abstraction layers
        l1_xyz, l1_points = self.sa1(l0_xyz, l0_points)
        l2_xyz, l2_points = self.sa2(l1_xyz, l1_points)
        l3_xyz, l3_points = self.sa3(l2_xyz, l2_points)
        l4_xyz, l4_points = self.sa4(l3_xyz, l3_points)
        
        # Feature Propagation layers
        l3_points = self.fp4(l3_xyz, l4_xyz, l3_points, l4_points)
        l2_points = self.fp3(l2_xyz, l3_xyz, l2_points, l3_points)
        l1_points = self.fp2(l1_xyz, l2_xyz, l1_points, l2_points)
        l0_points = self.fp1(l0_xyz, l1_xyz, l0_points, l1_points)
        
        # Classification head
        x = l0_points.permute(0, 2, 1)  # [B, D, N]
        x = self.classifier(x)
        x = x.permute(0, 2, 1)  # [B, N, num_classes]
        return x

3. 模型训练与测试

3.1 数据集准备

可以使用公开的点云数据集，如ModelNet40或自制数据集。需要将点云数据处理为适合输入网络的格式。

3.2 损失函数与优化器

import torch.optim as optim

# 定义交叉熵损失函数
criterion = nn.CrossEntropyLoss()
# 定义优化器
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)

3.3 训练循环

for epoch in range(num_epochs):
    model.train()
    for data, label in train_loader:
        data = data.to(device)
        label = label.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output.view(-1, num_classes), label.view(-1))
        loss.backward()
        optimizer.step()
    # 验证模型
    model.eval()
    # ...（验证代码）