【代码阅读】PointNet++中的FPS的CUDA实现

最新推荐文章于 2024-06-08 00:43:05 发布

麒麒哈尔

最新推荐文章于 2024-06-08 00:43:05 发布

阅读量3k

点赞数 5

分类专栏：代码阅读文章标签： PointNet++ 点云 CUDA

本文链接：https://blog.csdn.net/wqwqqwqw1231/article/details/107451089

版权

文章目录

Pytorch的接口
cpp
cu

本文为PointNet++ CUDA代码阅读系列的第二部分，其他详见：
（一）PointNet++代码梳理
（二）PointNet++中的FPS的CUDA实现
（三）PointNet++中ball query的CUDA实现
（四）PointNet++中的Three_nn的CUDA实现

之前只是使用PointNet++，也没有想过是怎么实现的。之前学了一下cuda编程，这里就来详解一个示例。

本文使用的代码是PointRCNN中PointNet++的实现。

Pytorch的接口

FPS的实现是用c和cu实现的，所以先看一下pytorch中的定义。在pointnet2/pointnet2_utils.py中

class FurthestPointSampling(Function):
    @staticmethod
    def forward(ctx, xyz: torch.Tensor, npoint: int) -> torch.Tensor:
        """
        Uses iterative furthest point sampling to select a set of npoint features that have the largest
        minimum distance
        :param ctx:
        :param xyz: (B, N, 3) where N > npoint
        :param npoint: int, number of features in the sampled set
        :return:
             output: (B, npoint) tensor containing the set
        """
        assert xyz.is_contiguous()

        B, N, _ = xyz.size()
        output = torch.cuda.IntTensor(B, npoint)
        temp = torch.cuda.FloatTensor(B, N).fill_(1e10)

        pointnet2.furthest_point_sampling_wrapper(B, N, npoint, xyz, temp, output)
        return output

    @staticmethod
    def backward(xyz, a=None):
        return None, None


furthest_point_sample = FurthestPointSampling.apply

核心函数是furthest_point_sampling_wrapper，这个使用c++写成的。具体怎么链接到cpp，以及这个怎么再变成一个pytorch兼容的函数，具体可见我的另外一篇博客。

cpp

代码在pointnet2/src/sampling.cpp中

int furthest_point_sampling_wrapper(int b, int n, int m, 
    at::Tensor points_tensor, at::Tensor temp_tensor, at::Tensor idx_tensor) {
   

    const float *points = points_tensor.data<float>();
    float *temp = temp_tensor.data<float>();
    int *idx = idx_tensor.data<int>();

    cudaStream_t stream = THCState_getCurrentStream(state);
    furthest_point_sampling_kernel_launcher(b, n, m, points, temp, idx, stream);
    return 1;
}

可以看到，在cpp中，接收由python函数传入的变量，然后调用cu中的kernel_launcher函数

cu

kernel_launcher函数做的也不多，首先确定开的线程的数量

void furthest_point_sampling_kernel_launcher(int b, int n, int m, 
    const float *dataset, float *temp, int *idxs, cudaStream_t stream) {
   
    // dataset: (B, N, 3)
    // tmp: (B, N)
    // output:
    //      idx: (B, M)

    cudaError_t err;

最低0.47元/天解锁文章

麒麒哈尔

关注

5
点赞
踩
15

收藏

觉得还不错? 一键收藏
2
评论
【代码阅读】PointNet++中的FPS的CUDA实现

文章目录Pytorch的接口cppcu之前只是使用PointNet++，也没有想过是怎么实现的。之前学了一下cuda编程，这里就来详解一个示例。本文使用的代码是PointRCNN中PointNet++的实现。Pytorch的接口FPS的实现是用c和cu实现的，所以先看一下pytorch中的定义。在pointnet2/pointnet2_utils.py中class FurthestPointSampling(Function): @staticmethod def forward(
复制链接

扫一扫

专栏目录