前言
我们在感知机中采用了梯度下降的方式实现了参数的优化(手动实现感知机),但是感知机对于较为复杂的问题就显得力不从心了,所以我们需要用到多层感知机,即神经网络。此时的梯度下降就需要通过反向传递来实现了
最简单的反向传递
我们在感知机中进行的最简单的操作就是加法和乘法,这里我们先以乘法和除法为例实现最简单的反向传递
乘法层
公式
我们假设x*y=z
, 损失函数为L
,那么我们分别对z求关于x和y的偏导得
∂
z
∂
x
=
y
\frac{ \partial z}{\partial x}=y
∂x∂z=y
∂
z
∂
y
=
x
\frac{\partial z}{\partial y}=x
∂y∂z=x
得到结论乘法层的偏导为两个乘数互换位置
则
∂
L
∂
z
∂
z
∂
x
=
∂
L
∂
z
⋅
y
\frac{ \partial L}{\partial z}\frac{ \partial z}{\partial x}=\frac{ \partial L}{\partial z} \cdot y
∂z∂L∂x∂z=∂z∂L⋅y
∂
L
∂
z
∂
z
∂
y
=
∂
L
∂
z
⋅
x
\frac{ \partial L}{\partial z}\frac{\partial z}{\partial y}=\frac{ \partial L}{\partial z} \cdot x
∂z∂L∂y∂z=∂z∂L⋅x
代码实现
在反向传递时要遵循链式法则,所以在这里我们每个偏导都要乘以后面一层反向传递来的偏导数dout才是应该传递给上一层的偏导数,下同。
class MulLayer:
def __init__(self):
self.x = None
self.y = None
def forward(self, x, y):
self.x = x
self.y = y
out = x * y
return out
def backward(self, dout):
dx = dout * self.y
dy = dout * self.x
return dx, dy
加法层
公式
我们假设x+y=z
, 损失函数为L
那么我们分别对z求关于x和y的偏导得
∂
z
∂
x
=
1
\frac{ \partial z}{\partial x}=1
∂x∂z=1
∂
z
∂
y
=
1
\frac{\partial z}{\partial y}=1
∂y∂z=1
则
∂
L
∂
x
=
∂
L
∂
z
∂
z
∂
x
=
∂
L
∂
z
\frac{\partial L}{\partial x}= \frac{\partial L}{\partial z} \frac{\partial z}{\partial x}= \frac{\partial L}{\partial z}
∂x∂L=∂z∂L∂x∂z=∂z∂L
∂
L
∂
y
=
∂
L
∂
z
∂
z
∂
y
=
∂
L
∂
z
\frac{\partial L}{\partial y}= \frac{\partial L}{\partial z} \frac{\partial z}{\partial y}= \frac{\partial L}{\partial z}
∂y∂L=∂z∂L∂y∂z=∂z∂L
代码实现
class AddLayer:
def __init__(self):
pass
def forward(self, x, y):
out = x + y
return out
def backward(self, dout):
dx = dout * 1
dy = dout * 1
return dx, dy
激活函数的反向传递
Relu层
公式
y
=
{
x
,
x
>
0
0
,
x
<
=
0
y=\begin{cases} x, & {x>0} \\ 0, & {x<=0} \end{cases}
y={x,0,x>0x<=0
d
y
d
x
=
{
1
,
x
>
0
0
,
x
<
=
0
\frac{dy}{dx}=\begin{cases} 1, & {x>0} \\ 0, & {x<=0} \end{cases}
dxdy={1,0,x>0x<=0
则
d
L
d
x
=
d
L
d
y
d
y
d
x
=
{
d
L
d
y
,
x
>
0
0
,
x
<
=
0
\frac{d L}{d x}= \frac{dL}{dy}\frac{dy}{dx}=\begin{cases} \frac{dL}{dy}, & {x>0} \\ 0, & {x<=0} \end{cases}
dxdL=dydLdxdy={dydL,0,x>0x<=0
代码
class Relu:
def __init__(self):
self.mask = None
def forward(self, x):
self.mask = (x <= 0)
out = x.copy()
out[self.mask] = 0
return out
def backward(self, dout):
dout[self.mask] = 0
dx = dout
return dx
Sigmoid层
公式
y
=
1
1
+
e
−
x
y=\frac{1}{1+e^{-x}}
y=1+e−x1
d
y
d
x
=
e
−
x
(
1
+
e
−
x
)
2
=
1
1
+
e
−
x
⋅
e
−
x
1
+
e
−
x
=
1
1
+
e
−
x
⋅
(
1
−
1
1
+
e
−
x
)
=
y
⋅
(
1
−
y
)
\frac{dy}{dx}=\frac{e^{-x}}{(1+e^{-x})^2}=\frac{1}{1+e^{-x}} \cdot\frac{e^{-x}}{1+e^{-x}} \\ =\frac{1}{1+e^{-x}} \cdot(1-\frac{1}{1+e^{-x}})=y \cdot (1-y)
dxdy=(1+e−x)2e−x=1+e−x1⋅1+e−xe−x=1+e−x1⋅(1−1+e−x1)=y⋅(1−y)
则
d
L
d
y
d
y
d
x
=
d
L
d
y
⋅
y
⋅
(
1
−
y
)
\frac{dL}{dy} \frac{dy}{dx}=\frac{dL}{dy} \cdot y \cdot (1-y)
dydLdxdy=dydL⋅y⋅(1−y)
代码
def sigmoid(x):
return 1 / (1 + np.exp(-x))
class Sigmoid:
def __init__(self):
self.out = None
def forward(self, x):
out = sigmoid(x)
self.out = out
return out
def backward(self, dout):
dx = dout * (1.0 - self.out) * self.out
return dx
带交叉熵误差的SoftMax层
公式
这个函数反向传递的实质是传回真实值与预测值的差距
代码
def cross_entropy_error(y, t):
if y.ndim == 1:
t = t.reshape(1, t.size)
y = y.reshape(1, y.size)
if t.size == y.size:
t = t.argmax(axis=1)
batch_size = y.shape[0]
return -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_size
class SoftmaxWithLoss:
def __init__(self):
self.loss = None
self.y = None # softmax的输出
self.t = None # 监督数据
def forward(self, x, t):
self.t = t
self.y = softmax(x)
self.loss = cross_entropy_error(self.y, self.t)
return self.loss
def backward(self, dout=1):
batch_size = self.t.shape[0]
if self.t.size == self.y.size:
dx = (self.y - self.t) / batch_size
else:
dx = self.y.copy()
dx[np.arange(batch_size), self.t] -= 1
dx = dx / batch_size
return dx