声明: 此笔记为吴恩达(Andrew Ng)的深度学习课程学习后的总结,会根据自己的学习进度更新。
神经网络基础
1. Sigmod Function
O u t p u t : y ^ = σ ( w T x + b ) Output: \hat{y}= \sigma(w^Tx+b) Output:y^=σ(wTx+b)
σ ( z ) = 1 1 + e − z \sigma(z)=\frac{1}{1+e^{-z}} σ(z)=1+e−z1
2. Loss(erro) Function :
L
(
y
^
,
y
)
=
−
y
log
y
^
+
(
1
−
y
)
log
(
1
−
y
^
)
L(\hat{y},y)=-y\log\hat{y}+(1-y)\log (1-\hat{y})
L(y^,y)=−ylogy^+(1−y)log(1−y^)
3. Gradient Descent:
J
(
w
,
b
)
=
1
m
∑
i
=
1
m
L
(
y
^
,
y
(
i
)
)
J(w,b)=\frac{1}{m} \sum_{i=1}^mL(\hat{y},y^{(i)})
J(w,b)=m1i=1∑mL(y^,y(i))
w = w − α ∂ J ( w , b ) ∂ w w=w-\alpha\frac{\partial J(w,b)}{\partial w} w=w−α∂w∂J(w,b)
b = b − α ∂ J ( w , b ) ∂ b b=b-\alpha \frac{\partial J(w,b)}{\partial b} b=b−α∂b∂J(w,b)
注:其中alpha为学习率,无论初始值为多少,在不断的迭代中J的值一直往最小值运动
4.Computation Graph
向后传播就是往前求偏导,例如:
∂ J ( w , b ) ∂ v = 3 \frac{\partial{J(w,b)}}{\partial{v}}=3 ∂v∂J(w,b)=3
5.Logistic Regerssion Derivatives(逻辑斯蒂回归中的梯度下降用法)
d a = ∂ L ( a , y ) ∂ a = − y a + 1 − y 1 − a da =\frac{\partial{L(a,y)}}{\partial a}=-\frac{y}{a}+\frac{1-y}{1-a} da=∂a∂L(a,y)=−ay+1−a1−y
d z = ∂ L ∂ a × ∂ a ∂ z = ( − y a + 1 − y 1 − a ) × ( 1 − a ) a = a − y dz=\frac{\partial L}{\partial a}\times\frac{\partial a}{\partial z} = (-\frac{y}{a}+\frac{1-y}{1-a})\times(1-a)a=a-y dz=∂a∂L×∂z∂a=(−ay+1−a1−y)×(1−a)a=a−y
d w 1 = ∂ L ∂ w 1 = x 1 × d z = x 1 ( a − y ) dw_1=\frac{\partial L}{\partial w_1}=x_1\times dz =x_1(a-y) dw1=∂w1∂L=x1×dz=x1(a−y)
d w 2 = ∂ L ∂ w 2 = x 1 × d z = x 2 ( a − y ) dw_2=\frac{\partial L}{\partial w_2}=x_1\times dz =x_2(a-y) dw2=∂w2∂L=x1×dz=x2(a−y)
d b = d z db=dz db=dz
w 1 = w 1 − α × d w 1 w_1=w_1-\alpha\times dw_1 w1=w1−α×dw1
w 2 = w 2 − α × d w 2 w_2=w_2-\alpha\times dw_2 w2=w2−α×dw2
b = b − α × d b b=b-\alpha\times db b=b−α×db
从上到下的计算就是一轮两个特征的梯度下降的迭代
5. numpy的应用
输入V,求U(其中V为向量,及在普通的math库里面需要用到for循环)
v = [ v 1 . . . v n ] T v=[v_1...v_n]^T v=[v1...vn]T
u = [ e V 1 . . . e V n ] T u=[e^{V_1}...e^{V_n}]^T u=[eV1...eVn]T
#伪代码不要直接运行
import numpy
#用到numpy里面的函数,很好的计算了向量的运算,及省去了for循环带来的时间损耗
u=np.exp(v)
#for loop
u = np.zero((n,1))
for i in range(n):
u[i] = math.exp(v[i])
#这样在梯度下降更行w_i的时候可以用numpy里的函来简化for循环
dw = np.zero(n_x,1)
dw += X*dZ
dw /= n
#这就是对前面需要for循环的优化
#向量化后的logistic回归梯度输出,伪代码!
Z = np.dot(w^T,x)+b
#plus b :python's broadcasting
A = sigmod(Z)
dZ = A - Y
dw = 1/m * X * dZ^T
db = 1/m * np.sum(dZ)
w = w - alpha * dw
b = b - alpha * db