[论文系列]Waffle Iron

最新推荐文章于 2024-07-25 12:58:40 发布

Guycynnnnn

最新推荐文章于 2024-07-25 12:58:40 发布

阅读量154

点赞数

分类专栏：深度学习文章标签：深度学习 Powered by 金山文档

本文链接：https://blog.csdn.net/Guycynnnnn/article/details/129115596

版权

深度学习专栏收录该内容

7 篇文章 0 订阅

订阅专栏

论文网址：https://arxiv.org/pdf/2301.10100v1.pdf

代码网址：https://github.com/valeoai/WaffleIron

结果：

方法：点+投影<MLP+2D convolutions>

(1) 点的特征提取通过MLP完成，即论文High-level description部分

(2) 投影部分：分别在(x,y) (x,z) (y,z)三个平面进行投影

we propose to repeatedly project along each main axis.Concretely, we sequentially project on planes (x, y), (x, z) and (y, z) atlayer l = 1, l = 2, and l = 3, respectively, and repeat this sequence until layer l= L.

所以需要控制的参数有两个：①L<3的倍数，作者给出的理由是点是按x,y y,z z,x三个方向投影的，但是既没有说明在这三个方向上投影有什么好处，在代码中也只看到x,y方向的投影，也就是BEV图的处理，> ②二维图片分辨率(grid_shape)

编码层Embedding(x : B×C_in×N; neighbors: B×K×N; Output : B×C_out×N)

BN(Batch Normal) ：点云数据归一化

Point Embedding(Conv1d) : 点云编码

Neighbors Embedding(Conv2d) : 周围点云编码

最后就是把点云和Neighbor的编码拼接起来

class Embedding(nn.Module):
    def __init__(self, channels_in, channels_out):
        super().__init__()

        # Normalize inputs
        self.norm = nn.BatchNorm1d(channels_in)

        # Point Embedding
        self.conv1 = nn.Conv1d(channels_in, channels_out, 1)

        # Neighborhood embedding
        self.conv2 = nn.Sequential(
            nn.BatchNorm2d(channels_in),
            nn.Conv2d(channels_in, channels_out, 1, bias=False),
            nn.BatchNorm2d(channels_out),
            nn.ReLU(inplace=True),
            nn.Conv2d(channels_out, channels_out, 1, bias=False),
        )

        # Merge point and neighborhood embeddings
        self.final = nn.Conv1d(2 * channels_out, channels_out, 1, bias=True, padding=0)

    def forward(self, x, neighbors):
        """x: B x C_in x N. neighbors: B x K x N. Output: B x C_out x N"""
        # Normalize input
        x = self.norm(x)

        # Point embedding
        point_emb = self.conv1(x)

        # Neighborhood embedding
        gather = []
        # Gather neighbors around each center point
        for ind_nn in range(
            1, neighbors.shape[1]
        ):  # Remove first neighbors which is the center point
            temp = neighbors[:, ind_nn : ind_nn + 1, :].expand(-1, x.shape[1], -1)
            gather.append(torch.gather(x, 2, temp).unsqueeze(-1))
        # Relative coordinates
        neigh_emb = torch.cat(gather, -1) - x.unsqueeze(-1)  # Size: (B x C x N) x K
        # Embedding
        neigh_emb = self.conv2(neigh_emb).max(-1)[0]

        # Merge both embeddings
        return self.final(torch.cat((point_emb, neigh_emb), dim=1))

骨干网络BackBone;

通道混合<遍历每个depth>

> BN

> MLP : Conv1d + Relu + Conv1d

> scale : Conv1d(groups = channels)

> token +scale(mlp(norm(token)))

空间混合<遍历每个BEV网格grids_shape>

> BN

> ffn: Conv2d+Relu+Conv2d

> scale

> token+scale(ffn(BN(token)))

class WaffleIron(nn.Module):
    def __init__(self, channels, depth, grids_shape):
        super().__init__()
        self.grids_shape = grids_shape
        self.channel_mix = nn.ModuleList([ChannelMix(channels) for _ in range(depth)])
        self.spatial_mix = nn.ModuleList(
            [
                SpatialMix(channels, grids_shape[d % len(grids_shape)])
                for d in range(depth)
            ]
        )

    def forward(self, tokens, cell_ind, occupied_cell):

        # Build projection matrices
        batch_size, num_points = tokens.shape[0], tokens.shape[-1]
        point_ind = (
            torch.arange(num_points, device=tokens.device)
            .unsqueeze(0)
            .expand(batch_size, -1)
            .reshape(1, -1)
        )
        batch_ind = (
            torch.arange(batch_size, device=tokens.device)
            .unsqueeze(1)
            .expand(-1, num_points)
            .reshape(1, -1)
        )
        non_zeros_ind = []
        for i in range(cell_ind.shape[1]):
            non_zeros_ind.append(
                torch.cat((batch_ind, cell_ind[:, i].reshape(1, -1), point_ind), axis=0)
            )
        sp_mat = [
            build_proj_matrix(id, occupied_cell, batch_size, np.prod(sh))
            for id, sh in zip(non_zeros_ind, self.grids_shape)
        ]

        # Actual backbone
        for d, (smix, cmix) in enumerate(zip(self.spatial_mix, self.channel_mix)):
            tokens = smix(tokens, sp_mat[d % len(sp_mat)])
            tokens = cmix(tokens)

        return tokens