点云的深度学习里面经常用到一维卷积,看了很多讲解的文章,但还是感觉摸不透,索性直接尝试手动复现 Conv1d 的计算过程。
创建点云
import torch
points = torch.tensor([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]
])
# points.shape
# torch.Size([4, 3])
points = points.unsqueeze(0) # batch = 1
# points.shape
# torch.Size([1, 4, 3])
points = points.permute(0, 2, 1)
# points.shape
# torch.Size([1, 3, 4])
创建一维卷积层
conv1d = torch.nn.Conv1d(in_channels=3, out_channels=5, kernel_size=1)
for k, v in conv1d.named_parameters():
print(k)
print(v)
print(v.shape)
'''
weight
Parameter containing:
tensor([[[-0.0527],
[-0.4512],
[ 0.1197]],
[[ 0.4636],
[ 0.4103],
[-0.3061]],
[[-0.0603],
[ 0.4536],
[ 0.5522]],
[[-0.2800],
[ 0.1951],
[ 0.5738]],
[[-0.3367],
[ 0.4430],
[ 0.0143]]], requires_grad=True)
torch.Size([5, 3, 1])
bias
Parameter containing:
tensor([ 0.2423, -0.1086, -0.0243, 0.2874, -0.1059], requires_grad=True)
torch.Size([5])
'''
进行运算并查看结果
out = conv1d(points)
'''
>>> out
tensor([[[-0.3538, -1.5065, -2.6592, -3.8118],
[ 0.2573, 1.9608, 3.6644, 5.3679],
[ 2.4792, 5.3157, 8.1521, 10.9886],
[ 2.1188, 3.5853, 5.0519, 6.5184],
[ 0.4864, 0.8483, 1.2102, 1.5721]]],
grad_fn=<ConvolutionBackward0>)
>>> out.shape
torch.Size([1, 5, 4])
'''
梳理
忽略 batch 维度,我们输入的矩阵为(从上到下第三行分别表示
x
x
x、
y
y
y、
z
z
z 坐标,每一列表示一个点):
p
o
i
n
t
s
=
(
1
4
7
10
2
5
8
11
3
6
9
12
)
points = \begin{pmatrix} 1 & 4 & 7 & 10 \\ 2 &5 & 8 & 11 \\ 3 & 6 & 9 & 12 \end{pmatrix}
points=
123456789101112
偏置为:
b
i
a
s
=
(
0.2423
−
0.1086
−
0.0243
0.2874
−
0.1059
)
bias = \begin{pmatrix} 0.2423\\ -0.1086\\ -0.0243\\ 0.2874\\ -0.1059 \end{pmatrix}
bias=
0.2423−0.1086−0.02430.2874−0.1059
参数为:
w
e
i
g
h
t
s
=
(
−
0.0527
−
0.4512
0.1197
0.4636
0.4103
−
0.3061
−
0.0603
0.4536
0.5522
−
0.2800
0.1951
0.5738
−
0.3367
0.4430
0.0143
)
weights = \begin{pmatrix} -0.0527 & -0.4512 & 0.1197\\ 0.4636 & 0.4103 & -0.3061\\ -0.0603 & 0.4536 & 0.5522\\ -0.2800 & 0.1951 & 0.5738\\ -0.3367 & 0.4430 & 0.0143 \end{pmatrix}
weights=
−0.05270.4636−0.0603−0.2800−0.3367−0.45120.41030.45360.19510.44300.1197−0.30610.55220.57380.0143
输出为:
o
u
t
p
u
t
=
(
−
0.3538
−
1.5065
−
2.6592
−
3.8118
0.2573
1.9608
3.6644
5.3679
2.4792
5.3157
8.1521
10.9886
2.1188
3.5853
5.0519
6.5184
0.4864
0.8483
1.2102
1.5721
)
output = \begin{pmatrix} -0.3538 & -1.5065 & -2.6592 & -3.8118\\ 0.2573 & 1.9608 & 3.6644 & 5.3679\\ 2.4792 & 5.3157 & 8.1521 & 10.9886\\ 2.1188 & 3.5853 & 5.0519 & 6.5184\\ 0.4864 & 0.8483 & 1.2102 & 1.5721 \end{pmatrix}
output=
−0.35380.25732.47922.11880.4864−1.50651.96085.31573.58530.8483−2.65923.66448.15215.05191.2102−3.81185.367910.98866.51841.5721
p o i n t s points points、 b i a s bias bias、 w e i g h t s weights weights 和 o u t p u t output output 的一行表示一个 channel,分析的关键是理解这几点:
- 输出( o u t p u t output output)的每一个 channel 都是独立的
- 输出( o u t p u t output output)的 channel 和 w e i g h t s weights weights 及 b i a s bias bias 相应 channel 对应
- 计算输出( o u t p u t output output)的每一个 channel 都用到了所有的点数据
因此,我们重点关注 b i a s bias bias、 w e i g h t s weights weights 和 o u t p u t output output 的第一行的计算过程,其他几行是类似的过程。
到这一步,根据这几个变量的形状,其实就能推断出
o
u
t
p
u
t
output
output 的计算过程:
−
0.3538
≈
1
×
(
−
0.0527
)
+
2
×
(
−
0.4512
)
+
3
×
0.1197
+
0.2423
−
1.5065
≈
4
×
(
−
0.0527
)
+
5
×
(
−
0.4512
)
+
6
×
0.1197
+
0.2423
−
2.6592
≈
7
×
(
−
0.0527
)
+
8
×
(
−
0.4512
)
+
9
×
0.1197
+
0.2423
−
3.8118
≈
10
×
(
−
0.0527
)
+
11
×
(
−
0.4512
)
+
12
×
0.1197
+
0.2423
-0.3538 \approx 1 \times (-0.0527) + 2 \times (-0.4512) + 3 \times 0.1197 + 0.2423 \\ -1.5065 \approx 4 \times (-0.0527) + 5 \times (-0.4512) + 6 \times 0.1197 + 0.2423 \\ -2.6592 \approx 7 \times (-0.0527) + 8 \times (-0.4512) + 9 \times 0.1197 + 0.2423 \\ -3.8118 \approx 10 \times (-0.0527) + 11 \times (-0.4512) + 12 \times 0.1197 + 0.2423
−0.3538≈1×(−0.0527)+2×(−0.4512)+3×0.1197+0.2423−1.5065≈4×(−0.0527)+5×(−0.4512)+6×0.1197+0.2423−2.6592≈7×(−0.0527)+8×(−0.4512)+9×0.1197+0.2423−3.8118≈10×(−0.0527)+11×(−0.4512)+12×0.1197+0.2423