对于1个样本,其前向传播图为:
下面计算反向传播过程:
假设logistic回归与激活函数如下:
L ( a , y ) = − ( y log ( a ) + ( 1 − y ) log ( 1 − a ) ) a = σ ( z ) = 1 1 + e − z a ′ = σ ( z ) ( 1 − σ ( z ) ) = a ( 1 − a ) L(a,y)=-(y\log (a)+(1-y)\log (1-a)) \\ a=\sigma (z)=\frac{1}{1+e^{-z}} \\ a'=\sigma (z)(1-\sigma (z)) = a(1-a) L(a,y)=−(ylog(a)+(1−y)log(1−a))a=σ(z)=1+e−z1a′=σ(z)(1−σ(z))=a(1−a)
第一步:
d a = d L ( a , y ) d a = − y a + 1 − y 1 − a da=\frac{dL(a,y)}{da} =-\frac{y}{a}+\frac{1-y}{1-a} da=dadL(a,y)=−ay+1−a1−y
第二步:
d z = d L d z = d L d a d a d z dz=\frac{dL}{dz}=\frac{dL}{da}\frac{da}{dz} dz=dzdL=dadLdzda
d z = ( − y a + 1 − y 1 − a ) ( a ( 1 − a ) ) = a − y \begin{aligned} dz&=(-\frac{y}{a}+\frac{1-y}{1-a})(a(1-a)) \\ &=a-y \end{aligned} dz=(−ay+1−a1−y)(a(1−a))=a−y
第三步:
d w 1 = x 1 d z d w 2 = x 2 d z d b = d z dw_1=x_1dz \\ dw_2=x_2dz \\ db=dz dw1=x1dzdw2=x2dzdb=dz
故最后的梯度更新为( α \alpha α为学习率):
w 1 = w 1 − α d w 1 w 2 = w 2 − α d w 2 b = b − α d b w_1=w_1-\alpha dw_1 \\ w_2=w_2-\alpha dw_2 \\ b=b-\alpha db w1=w1−αdw1w2=w2−αdw2b=b−αdb
对于m个样本:
J ( w , b ) = 1 m ∑ i = 1 m L ( a ( i ) , y ( i ) ) J(w,b)=\frac{1}{m}\sum^m_{i=1}L(a^{(i)},y^{(i)}) J(w,b)=m1i=1∑mL(a(i),y(i))
加右上角标即可。
伪代码【可以使用向量化减少显式循环】:
J = 0;
dw_1 = 0; dw_2 = 0; db = 0;
for i=1 to m:
z[i] = w*x[i]+b
a[i] = sigmoid(z[i])
J += logistic(a[i],y[i])
dz[i] = a[i]-y[i]
# 这里如果有多个特征w,仍然需要嵌套循环
dw_1 += x_1[i]*dz[i]
dw_2 += x_2[i]*dz[i]
db += dz[i]
J /= m;
dw_1 /= m; dw_2 /= m; db /= m;
w_1 = w_1 - a*dw_1
w_2 = w_2 - a*dw_2
b = b - a*db