这里只谈一下二维的情况,也就是channel = 1, strides = 1的情况下,单就这一层的计算.
ConvNet forward propagation
filter’s shape: [height, length, 1]
y i , j = Σ n = 0 l e n g t h Σ m = 0 h e i g h t W m , n x i + m , j + n y_{i,j} = \Sigma_{n=0}^{length}\Sigma_{m=0}^{height} W_{m,n}x_{i+m,j+n} yi,j=Σn=0lengthΣm=0heightWm,nxi+m,j+n
ConvNet backpropagation
对于该层,我们首先已知
∂
L
∂
y
i
,
j
\frac{\partial{L}}{\partial y_{i,j}}
∂yi,j∂L, 我们将他重新命名为
δ
y
i
,
j
\delta{y_{i,j}}
δyi,j
首先用全微分用
W
,
x
W, x
W,x来表达
d
y
i
,
j
dy_{i,j}
dyi,j:
d
y
i
,
j
=
Σ
n
=
0
l
e
n
g
t
h
Σ
m
=
0
h
e
i
g
h
t
d
W
m
,
n
x
i
+
m
,
j
+
n
+
Σ
n
=
0
l
e
n
g
t
h
Σ
m
=
0
h
e
i
g
h
t
W
m
,
n
d
x
i
+
m
,
j
+
n
dy_{i,j} = \Sigma_{n=0}^{length}\Sigma_{m=0}^{height} dW_{m,n}x_{i+m,j+n} + \Sigma_{n=0}^{length}\Sigma_{m=0}^{height} W_{m,n}dx_{i+m,j+n}
dyi,j=Σn=0lengthΣm=0heightdWm,nxi+m,j+n+Σn=0lengthΣm=0heightWm,ndxi+m,j+n
从上式我们可以看出:
∂
y
i
,
j
∂
x
i
+
m
,
j
+
n
=
W
m
,
n
\frac{\partial y_{i,j}}{\partial x_{i+m,j+n}} = W_{m,n}
∂xi+m,j+n∂yi,j=Wm,n
而通过链式法则
∂
L
∂
x
i
,
j
=
Σ
h
,
k
∂
y
i
−
h
,
j
−
k
∂
x
i
,
j
∂
L
∂
y
i
−
h
,
j
−
k
=
Σ
h
,
k
W
h
,
k
∂
L
∂
y
i
−
h
,
j
−
k
\frac{\partial{L}}{\partial x_{i,j}} = \Sigma_{h,k} \frac{\partial y_{i-h,j-k}}{\partial x_{i,j}}\frac{\partial{L}}{\partial y_{i-h,j-k}} = \Sigma_{h,k} W_{h,k} \frac{\partial{L}}{\partial y_{i-h,j-k}}
∂xi,j∂L=Σh,k∂xi,j∂yi−h,j−k∂yi−h,j−k∂L=Σh,kWh,k∂yi−h,j−k∂L
这个等式告诉我们,其实ConvNet的Backpropagation其实也是一个ConvNet,不过需要做一定的padding, padding的量为[height, width], 这个结果应该就是很多代码中的element-wise operation。
Question: 如果 strides != 1呢?
Answer(我也不知道对不对): if strides.shape = (stride1, stride2)个人认为解决方法可以是对输出矩阵Y各元素之间加上(stride1-1, stride2-1)个0
另外还有
∂
L
∂
W
i
,
j
\frac{\partial{L}}{\partial W_{i,j}}
∂Wi,j∂L
∂
L
∂
W
i
,
j
=
Σ
h
,
k
∂
y
i
−
h
,
j
−
k
∂
W
i
,
j
∂
L
∂
y
i
−
h
,
j
−
k
=
Σ
h
,
k
x
h
,
k
∂
L
∂
y
i
−
h
,
j
−
k
\frac{\partial{L}}{\partial W_{i,j}} = \Sigma_{h,k} \frac{\partial y_{i-h,j-k}}{\partial W_{i,j}}\frac{\partial{L}}{\partial y_{i-h,j-k}} = \Sigma_{h,k} x_{h,k} \frac{\partial{L}}{\partial y_{i-h,j-k}}
∂Wi,j∂L=Σh,k∂Wi,j∂yi−h,j−k∂yi−h,j−k∂L=Σh,kxh,k∂yi−h,j−k∂L