论文网址:https://arxiv.org/pdf/2301.10100v1.pdf
代码网址:https://github.com/valeoai/WaffleIron
结果:
![](https://i-blog.csdnimg.cn/blog_migrate/a9d891112b34f676a4218f234effb106.png)
![](https://i-blog.csdnimg.cn/blog_migrate/2fb730c2fecabea12ed68f9d45259f9f.png)
方法:点+投影<MLP+2D convolutions>
![](https://i-blog.csdnimg.cn/blog_migrate/266a7897246d6e58c06bec2e74497b2a.png)
(1) 点的特征提取通过MLP完成,即论文High-level description部分
(2) 投影部分:分别在(x,y) (x,z) (y,z)三个平面进行投影
we propose to repeatedly project along each main axis.Concretely, we sequentially project on planes (x, y), (x, z) and (y, z) atlayer l = 1, l = 2, and l = 3, respectively, and repeat this sequence until layer l= L.
所以需要控制的参数有两个:①L<3的倍数,作者给出的理由是点是按x,y y,z z,x三个方向投影的,但是既没有说明在这三个方向上投影有什么好处,在代码中也只看到x,y方向的投影,也就是BEV图的处理,> ②二维图片分辨率(grid_shape)
编码层Embedding(x : B×C_in×N; neighbors: B×K×N; Output : B×C_out×N)
BN(Batch Normal) : 点云数据归一化
Point Embedding(Conv1d) : 点云编码
Neighbors Embedding(Conv2d) : 周围点云编码
最后就是把点云和Neighbor的编码拼接起来
class Embedding(nn.Module):
def __init__(self, channels_in, channels_out):
super().__init__()
# Normalize inputs
self.norm = nn.BatchNorm1d(channels_in)
# Point Embedding
self.conv1 = nn.Conv1d(channels_in, channels_out, 1)
# Neighborhood embedding
self.conv2 = nn.Sequential(
nn.BatchNorm2d(channels_in),
nn.Conv2d(channels_in, channels_out, 1, bias=False),
nn.BatchNorm2d(channels_out),
nn.ReLU(inplace=True),
nn.Conv2d(channels_out, channels_out, 1, bias=False),
)
# Merge point and neighborhood embeddings
self.final = nn.Conv1d(2 * channels_out, channels_out, 1, bias=True, padding=0)
def forward(self, x, neighbors):
"""x: B x C_in x N. neighbors: B x K x N. Output: B x C_out x N"""
# Normalize input
x = self.norm(x)
# Point embedding
point_emb = self.conv1(x)
# Neighborhood embedding
gather = []
# Gather neighbors around each center point
for ind_nn in range(
1, neighbors.shape[1]
): # Remove first neighbors which is the center point
temp = neighbors[:, ind_nn : ind_nn + 1, :].expand(-1, x.shape[1], -1)
gather.append(torch.gather(x, 2, temp).unsqueeze(-1))
# Relative coordinates
neigh_emb = torch.cat(gather, -1) - x.unsqueeze(-1) # Size: (B x C x N) x K
# Embedding
neigh_emb = self.conv2(neigh_emb).max(-1)[0]
# Merge both embeddings
return self.final(torch.cat((point_emb, neigh_emb), dim=1))
骨干网络BackBone;
通道混合<遍历每个depth>
> BN
> MLP : Conv1d + Relu + Conv1d
> scale : Conv1d(groups = channels)
> token +scale(mlp(norm(token)))
空间混合<遍历每个BEV网格grids_shape>
> BN
> ffn: Conv2d+Relu+Conv2d
> scale
> token+scale(ffn(BN(token)))
class WaffleIron(nn.Module):
def __init__(self, channels, depth, grids_shape):
super().__init__()
self.grids_shape = grids_shape
self.channel_mix = nn.ModuleList([ChannelMix(channels) for _ in range(depth)])
self.spatial_mix = nn.ModuleList(
[
SpatialMix(channels, grids_shape[d % len(grids_shape)])
for d in range(depth)
]
)
def forward(self, tokens, cell_ind, occupied_cell):
# Build projection matrices
batch_size, num_points = tokens.shape[0], tokens.shape[-1]
point_ind = (
torch.arange(num_points, device=tokens.device)
.unsqueeze(0)
.expand(batch_size, -1)
.reshape(1, -1)
)
batch_ind = (
torch.arange(batch_size, device=tokens.device)
.unsqueeze(1)
.expand(-1, num_points)
.reshape(1, -1)
)
non_zeros_ind = []
for i in range(cell_ind.shape[1]):
non_zeros_ind.append(
torch.cat((batch_ind, cell_ind[:, i].reshape(1, -1), point_ind), axis=0)
)
sp_mat = [
build_proj_matrix(id, occupied_cell, batch_size, np.prod(sh))
for id, sh in zip(non_zeros_ind, self.grids_shape)
]
# Actual backbone
for d, (smix, cmix) in enumerate(zip(self.spatial_mix, self.channel_mix)):
tokens = smix(tokens, sp_mat[d % len(sp_mat)])
tokens = cmix(tokens)
return tokens