卷积
二维卷积操作就是用一个卷积核来逐行逐列的扫描矩阵,并与矩阵做元素相乘,以此得到新的矩阵。其中卷积核也叫过滤器或者滤波器,滤波器在输入矩阵上扫过的面积称之为感受野。如下图,假设卷积核为3×3的矩阵[[1,0,1],[0,1,0],[1,0,1]],以输出矩阵第一个元素4为例,计算过程为,1×1+1×0+1×1+0×0+1×1+1×0+0×1+0×0+1×1 = 4
二维卷积中,我们默认通道数为1,当对三维或者更高维图像卷积时,我们需要保证卷积核的通道数和原输入图像通道数相同,例如对于(6,6,3)的图像,我们使用(3,3,3)的卷积核时,得到的输出图像维度为(4,4,1)。在实际图像处理中,使用单个卷积核很难对复杂的图像进行特征充分提取,因此我们会使用多个卷积核,然后将结果叠加。例如,使用2个(3,3,3)的卷积核时,我们得到的输出图像维度为(4,4,2)。
卷积输出维度确认
假设原始输入矩阵为(n,n),卷积核为(f,f),那么输出矩阵维度为(n-f+1,n-f+1),上述输出维度存在两个问题:
- 步长问题。上述默认了卷积核每次移动步长为1,假设卷积核每次移动步长为s,则输出维度为((n-f)/s+1,(n-f)/s+1);
- 每次卷积图像会变小,卷积多次之后会变成(1,1),并且原始输入的边缘和角落的像素点只能被扫到一次,靠近中心的像素点则会被扫到多次,导致特征提取不足,因此需要采用padding解决。假设前后各填充p个像素点,则n变为n+2p,输出矩阵维度要想保持和原输入维度相同,则(n+2p-f)/s+1=n,则p=(f-n+s*n-s)/2,当s=1时,p=(f-1)/2,此时f通常为奇数
a = np.random.randn(2,2,3)
# 第一个参数是需要填充的数组
# 第二个参数为填充形状
# 第三个参数为方式
# 参考 https://blog.csdn.net/hustqb/article/details/77726660
np.pad(a,((0,0),(1,1),(1,1)),'constant')
池化
简单来说,池化层的作用在于缩减模型大小,提高模型计算速度以及提高所提取特征的鲁棒性。池化分为两种,最大池化和平均池化。池化操作与卷积类似,都跟卷积核大小f和步长s有关。
卷积层的反向传播
卷积层的前向传播
单通道
假设输入X(3,3),卷积核K(2,2),输出
Y
=
X
c
o
n
v
K
Y = X conv K
Y=XconvK,即
(
x
11
x
12
x
13
x
21
x
22
x
23
x
31
x
32
x
33
)
\begin{pmatrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23} \\ x_{31} & x_{32} & x_{33} \\ \end{pmatrix}
⎝⎛x11x21x31x12x22x32x13x23x33⎠⎞ conv
(
k
11
k
12
k
21
k
22
)
\begin{pmatrix} k_{11} & k_{12} \\ k_{21} & k_{22} \\ \end{pmatrix}
(k11k21k12k22) =
(
y
11
y
12
y
21
y
22
)
\begin{pmatrix} y_{11} & y_{12} \\ y_{21} & y_{22} \\ \end{pmatrix}
(y11y21y12y22) ,
通过转换我们可以得到
(
y
11
y
12
y
21
y
22
)
\begin{pmatrix} y_{11} \\ y_{12} \\ y_{21} \\ y_{22} \\ \end{pmatrix}
⎝⎜⎜⎛y11y12y21y22⎠⎟⎟⎞ =
(
x
11
k
11
+
x
12
k
12
+
x
21
k
21
+
x
22
k
22
x
12
k
11
+
x
13
k
12
+
x
22
k
21
+
x
23
k
22
x
21
k
11
+
x
22
k
12
+
x
31
k
21
+
x
32
k
22
x
22
k
11
+
x
23
k
12
+
x
32
k
21
+
x
33
k
22
)
\begin{pmatrix} x_{11}k_{11}+x_{12}k_{12}+x_{21}k_{21}+x_{22}k_{22}\\ x_{12}k_{11}+x_{13}k_{12}+x_{22}k_{21}+x_{23}k_{22}\\ x_{21}k_{11}+x_{22}k_{12}+x_{31}k_{21}+x_{32}k_{22}\\ x_{22}k_{11}+x_{23}k_{12}+x_{32}k_{21}+x_{33}k_{22}\\ \end{pmatrix}
⎝⎜⎜⎛x11k11+x12k12+x21k21+x22k22x12k11+x13k12+x22k21+x23k22x21k11+x22k12+x31k21+x32k22x22k11+x23k12+x32k21+x33k22⎠⎟⎟⎞
=
(
x
11
x
12
x
21
x
22
x
12
x
13
x
22
x
23
x
21
x
22
x
31
x
32
x
22
x
23
x
32
x
33
)
⋅
(
k
11
k
12
k
21
k
22
)
\begin{pmatrix} x_{11} & x_{12} & x_{21} & x_{22} \\ x_{12} & x_{13} & x_{22} & x_{23}\\ x_{21} & x_{22} & x_{31} & x_{32}\\ x_{22} & x_{23} & x_{32} & x_{33}\\ \end{pmatrix} \cdot\begin{pmatrix} k_{11} \\ k_{12} \\ k_{21} \\ k_{22} \\ \end{pmatrix}
⎝⎜⎜⎛x11x12x21x22x12x13x22x23x21x22x31x32x22x23x32x33⎠⎟⎟⎞⋅⎝⎜⎜⎛k11k12k21k22⎠⎟⎟⎞ ,
通过上式,我们将卷积运算转换成了矩阵运算,假设X,K,Y经过变换得到XC,KC,YC,则
Y
C
=
X
C
⋅
K
C
YC=XC\cdot KC
YC=XC⋅KC,Y和K只需reshape即可得到YC和KC,对于X需要特别处理,我们称之为im2col(image to column),即把卷积窗口拉成一行,每行
k
2
k^2
k2列,共(X.w-k+1)*(X.h-k+1)行
# im2col简易实现,优点是简单易懂,缺点是性能差
def im2col(image, ksize, stride=1):
# image is a 4d tensor([batchsize, width ,height, channel])
image_col = []
for i in range(0, image.shape[1] - ksize + 1, stride):
for j in range(0, image.shape[2] - ksize + 1, stride):
col = image[:, i:i + ksize, j:j + ksize, :].reshape([-1])
# 此时col是一个数组
image_col.append(col)
# 转换成numpy array
image_col = np.array(image_col)
return image_col
多通道
假设B,H,W,
C
i
n
C_{in}
Cin = X.shape,跟单通道一样,我们将X转换成XC(B,(H-k+1)(W-k+1),
k
2
∗
C
i
n
k^2*C_{in}
k2∗Cin),由于卷积核数量等于输出通道数,我们将卷积核K(k,k,
C
i
n
C_{in}
Cin,
C
o
u
t
C_{out}
Cout)转换成KC(
k
2
∗
C
i
n
k^2*C_{in}
k2∗Cin,
C
o
u
t
C_{out}
Cout),综上YC(B,(H-k+1)(W-k+1),
C
o
u
t
C_{out}
Cout),YC通过reshape得到Y(B,H-k+1,W-k+1,
C
o
u
t
C_{out}
Cout)
下图中,块大小B为1,X(3,3,3),输入通道为3,输出通道为2,则采用两个(2,2,3)的卷积核,输出为(2,2,2),经过转换XC(4,12),KC(12,2),YC(4,2)。图中最后一行即表示卷积层的前向传播算法。实际编码时还会加上偏置b和块大小Batchs
卷积层的反向传播
为了书写方便,对单通道情况进行反向传播,
记
δ
=
(
δ
11
δ
12
δ
21
δ
22
)
=
▽
Y
C
=
(
▽
y
11
▽
y
12
▽
y
21
▽
y
22
)
\delta=\begin{pmatrix}\delta_{11}\\\delta_{12}\\\delta_{21}\\\delta_{22}\end{pmatrix}=\triangledown YC=\begin{pmatrix}\triangledown y_{11}\\\triangledown y_{12}\\\triangledown y_{21}\\\triangledown y_{22}\end{pmatrix}
δ=⎝⎜⎜⎛δ11δ12δ21δ22⎠⎟⎟⎞=▽YC=⎝⎜⎜⎛▽y11▽y12▽y21▽y22⎠⎟⎟⎞,
由于
Y
C
=
X
C
⋅
K
C
YC=XC\cdot KC
YC=XC⋅KC,
▽
K
C
=
X
C
T
⋅
▽
Y
C
\triangledown KC=XC^T\cdot \triangledown YC
▽KC=XCT⋅▽YC,然后通过reshape即可得到
▽
K
\triangledown K
▽K,
▽
X
C
\triangledown XC
▽XC可以同理求得,但是想转换成
▽
X
\triangledown X
▽X十分困难,需要通过其他方式。
通过前向传播
( y 11 y 12 y 21 y 22 ) \begin{pmatrix} y_{11} \\ y_{12} \\ y_{21} \\ y_{22} \\ \end{pmatrix} ⎝⎜⎜⎛y11y12y21y22⎠⎟⎟⎞ = ( x 11 k 11 + x 12 k 12 + x 21 k 21 + x 22 k 22 x 12 k 11 + x 13 k 12 + x 22 k 21 + x 23 k 22 x 21 k 11 + x 22 k 12 + x 31 k 21 + x 32 k 22 x 22 k 11 + x 23 k 12 + x 32 k 21 + x 33 k 22 ) \begin{pmatrix} x_{11}k_{11}+x_{12}k_{12}+x_{21}k_{21}+x_{22}k_{22}\\ x_{12}k_{11}+x_{13}k_{12}+x_{22}k_{21}+x_{23}k_{22}\\ x_{21}k_{11}+x_{22}k_{12}+x_{31}k_{21}+x_{32}k_{22}\\ x_{22}k_{11}+x_{23}k_{12}+x_{32}k_{21}+x_{33}k_{22}\\ \end{pmatrix} ⎝⎜⎜⎛x11k11+x12k12+x21k21+x22k22x12k11+x13k12+x22k21+x23k22x21k11+x22k12+x31k21+x32k22x22k11+x23k12+x32k21+x33k22⎠⎟⎟⎞
我们可以计算每个
x
i
j
x_{ij}
xij的导数
▽
x
11
=
k
11
δ
11
\triangledown x_{11} = k_{11}\delta_{11}
▽x11=k11δ11
▽
x
12
=
k
12
δ
11
+
k
11
δ
12
\triangledown x_{12} = k_{12}\delta_{11}+k_{11}\delta_{12}
▽x12=k12δ11+k11δ12
▽
x
13
=
k
12
δ
12
\triangledown x_{13} = k_{12}\delta_{12}
▽x13=k12δ12
▽
x
21
=
k
21
δ
11
+
k
11
δ
21
\triangledown x_{21} = k_{21}\delta_{11}+k_{11}\delta_{21}
▽x21=k21δ11+k11δ21
▽
x
22
=
k
22
δ
11
+
k
21
δ
12
+
k
12
δ
21
+
k
11
δ
22
\triangledown x_{22} = k_{22}\delta_{11}+k_{21}\delta_{12}+k_{12}\delta_{21}+k_{11}\delta_{22}
▽x22=k22δ11+k21δ12+k12δ21+k11δ22
▽
x
23
=
k
22
δ
12
+
k
12
δ
22
\triangledown x_{23} = k_{22}\delta_{12}+k_{12}\delta_{22}
▽x23=k22δ12+k12δ22
▽
x
31
=
k
21
δ
21
\triangledown x_{31} = k_{21}\delta_{21}
▽x31=k21δ21
▽
x
32
=
k
22
δ
21
+
k
21
δ
22
\triangledown x_{32} = k_{22}\delta_{21}+k_{21}\delta_{22}
▽x32=k22δ21+k21δ22
▽
x
33
=
k
22
δ
22
\triangledown x_{33} = k_{22}\delta_{22}
▽x33=k22δ22
所以
( ▽ x 11 ▽ x 12 ▽ x 13 ▽ x 21 ▽ x 22 ▽ x 23 ▽ x 31 ▽ x 32 ▽ x 33 ) \begin{pmatrix} \triangledown x_{11} \\ \triangledown x_{12} \\ \triangledown x_{13} \\ \triangledown x_{21} \\ \triangledown x_{22} \\ \triangledown x_{23} \\ \triangledown x_{31} \\ \triangledown x_{32} \\ \triangledown x_{33} \\ \end{pmatrix} ⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎛▽x11▽x12▽x13▽x21▽x22▽x23▽x31▽x32▽x33⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎞ = ( 0 0 0 δ 11 0 0 δ 11 δ 12 0 0 δ 12 0 0 δ 11 0 δ 21 δ 11 δ 12 δ 21 δ 22 δ 12 0 δ 22 0 0 δ 21 0 0 δ 21 δ 22 0 0 δ 22 0 0 0 ) ⋅ ( k 22 k 21 k 12 k 11 ) \begin{pmatrix} 0 & 0 & 0 & \delta_{11} \\ 0 & 0 & \delta_{11} & \delta_{12} \\ 0 & 0 & \delta_{12} & 0 \\ 0 & \delta_{11} & 0 & \delta_{21} \\ \delta_{11} & \delta_{12} & \delta_{21} & \delta_{22} \\ \delta_{12} & 0 & \delta_{22} & 0 \\ 0 & \delta_{21} & 0 & 0 \\ \delta_{21} & \delta_{22} & 0 & 0 \\ \delta_{22} & 0 & 0 & 0 \\ \end{pmatrix}\cdot\begin{pmatrix} k_{22}\\k_{21}\\k_{12}\\k_{11}\\ \end{pmatrix} ⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎛0000δ11δ120δ21δ22000δ11δ120δ21δ2200δ11δ120δ21δ22000δ11δ120δ21δ220000⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎞⋅⎝⎜⎜⎛k22k21k12k11⎠⎟⎟⎞
转换过得到
▽
X
=
(
▽
x
11
▽
x
12
▽
x
13
▽
x
21
▽
x
22
▽
x
23
▽
x
31
▽
x
32
▽
x
33
)
\triangledown X = \begin{pmatrix} \triangledown x_{11} & \triangledown x_{12} & \triangledown x_{13} \\ \triangledown x_{21} & \triangledown x_{22} & \triangledown x_{23} \\ \triangledown x_{31} & \triangledown x_{32} & \triangledown x_{33} \\ \end{pmatrix}
▽X=⎝⎛▽x11▽x21▽x31▽x12▽x22▽x32▽x13▽x23▽x33⎠⎞
=
(
0
0
0
0
0
δ
11
δ
12
0
0
δ
21
δ
22
0
0
0
0
0
)
c
o
n
v
(
k
22
k
21
k
12
k
11
)
\begin{pmatrix} 0 & 0 & 0 & 0\\ 0 & \delta_{11} & \delta_{12} & 0 \\ 0 & \delta_{21} & \delta_{22} & 0 \\ 0 & 0 & 0 & 0\\ \end{pmatrix} conv \begin{pmatrix} k_{22} & k_{21} \\ k_{12} & k_{11} \\ \end{pmatrix}
⎝⎜⎜⎛00000δ11δ2100δ12δ2200000⎠⎟⎟⎞conv(k22k12k21k11)
所以求
▽
X
\triangledown X
▽X实际是对
▽
Y
\triangledown Y
▽Y做padding后再卷积