[笔记]Coursera Deep Learning笔记 逻辑回归典型的训练过程

Deep Learning 用逻辑回归训练图片的典型步骤. 学习Coursera上吴恩达老师Deep Learning第一课程前两周的笔记.

1. 处理数据

1.1. 向量化(Vectorization)

将每张图片的高和宽和RGB展为向量, 最终X的shape为 (height*width*3, m) .

1.2. 特征归一化(Normalization)

对于一般数据,使用标准化(Standardization), z_i = (x_i - mean) / delta , meandelta 代表X的均值和标准差. 最终特征处于[-1, 1]区间.

对于图片, 可直接使用Min-Max Scaling, 即将每个特征除以255(每个像素分为R, G, B, 范围在0~255)使得值处于[0, 1].

2. 初始化参数

一般将 wb 随机选择. 作业为了方便设为0了.

3. 梯度下降(Gradient descent)

根据 w , b 和训练集,来训练数据. 需要设定迭代次数与学习率.

以下为大循环(迭代次数)中内容:

3.1. 计算代价函数

对于 x ( i ) ∈ X x^{(i)} \in X x(i)X, 有

z ( i ) = w T x ( i ) + b z^{(i)} = w^T x^{(i)} + b z(i)=wTx(i)+b

a ( i ) = y ^ ( i ) = s i g m o i d ( z ( i ) ) = σ ( z ( i ) ) = 1 1 + e − z ( i ) a^{(i)} = \hat{y}^{(i)} = sigmoid(z^{(i)}) = \sigma (z^{(i)}) = \frac{1}{1 + e^{-z^{(i)}}} a(i)=y^(i)=sigmoid(z(i))=σ(z(i))=1+ez(i)1

L ( a ( i ) , y ( i ) ) = − y ( i ) log ⁡ ( a ( i ) ) − ( 1 − y ( i ) ) log ⁡ ( 1 − a ( i ) ) \mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} \log(a^{(i)}) - (1-y^{(i)} ) \log(1-a^{(i)}) L(a(i),y(i))=y(i)log(a(i))(1y(i))log(1a(i))

A = ( a ( 1 ) , a ( 2 ) , . . . , a ( m − 1 ) , a ( m ) ) = σ ( w T X + b ) = 1 1 + e − ( w T X + b ) A = (a^{(1)}, a^{(2)}, ... , a^{(m-1)}, a^{(m)}) = \sigma(w^TX+b) = \dfrac{1}{1+e^{-(w^TX+b)}} A=(a(1),a(2),...,a(m1),a(m))=σ(wTX+b)=1+e(wTX+b)1

J = − 1 m ∑ i = 1 m L ( a ( i ) , y ( i ) ) = − 1 m ∑ i = 1 m ( y ( i ) l o g ( a ( i ) ) + ( 1 − y ( i ) ) l o g ( 1 − a ( i ) ) ) J = -\dfrac{1}{m} \sum^{m}_{i=1} \mathcal{L}(a^{(i)}, y^{(i)}) = -\dfrac{1}{m} \sum^{m}_{i=1} (y^{(i)} log(a^{(i)}) + (1-y^{(i)}) log(1-a^{(i)})) J=m1i=1mL(a(i),y(i))=m1i=1m(y(i)log(a(i))+(1y(i))log(1a(i)))

# 激活函数
A = sigmoid(w.T.dot(X) + b)
# 代价函数
cost = -np.sum(Y * np.log(A) + (1-Y) * np.log(1 - A)) / m

3.2. 计算反向传播的梯度

即计算导数.

注:
此处 L ( a , y ) L(a, y) L(a,y) 即上面公式的 L ( a ( i ) , y ( i ) ) \mathcal{L}(a^{(i)}, y^{(i)}) L(a(i),y(i)) .
L ( a , y ) = − y log ⁡ ( a ) − ( 1 − y ) log ⁡ ( 1 − a ) L(a, y) = -y \log (a) -(1-y) \log (1-a) L(a,y)=ylog(a)(1y)log(1a) .
以下公式都省略了上标.

∂ L ∂ a = ∂ L ( a , y ) ∂ a = − y a + 1 − y 1 − a \dfrac{\partial L}{\partial a} = \dfrac{\partial L(a, y)}{\partial a} = -\frac{y}{a} + \frac{1-y}{1-a} aL=aL(a,y)=ay+1a1y

d a d z = ( 1 1 + e − z ) ′ = e − z ( 1 + e − z ) 2 = 1 1 + e − z − 1 ( 1 + e − z ) 2 = a − a 2 = a ⋅ ( 1 − a ) \dfrac{da}{dz} = (\frac{1}{1 + e^{-z}})' = \dfrac{e^{-z}}{(1+e^{-z})^2} = \dfrac{1}{1+e^{-z}} - \dfrac{1}{(1+e^{-z})^2} = a-a^2 = a · (1-a) dzda=(1+ez1)=(1+ez)2ez=1+ez1(1+ez)21=aa2=a(1a)

∂ L ∂ z = ∂ L ∂ a d a d z = ( − y a + 1 − y 1 − a ) ⋅ a ⋅ ( 1 − a ) = a − y \dfrac{\partial L}{\partial z} = \dfrac{\partial L}{\partial a} \dfrac{da}{dz} = (-\dfrac{y}{a} + \dfrac{1-y}{1-a}) · a · (1-a) = a - y zL=aLdzda=(ay+1a1y)a(1a)=ay

∂ L ∂ w = ∂ L ∂ z ∂ z ∂ w = ( a − y ) ⋅ x \dfrac{\partial L}{\partial w} = \dfrac{\partial L}{\partial z} \dfrac{\partial z}{\partial w} = (a-y) · x wL=zLwz=(ay)x

∂ L ∂ b = ∂ L ∂ z ∂ z ∂ b = a − y \dfrac{\partial L}{\partial b} = \dfrac{\partial L}{\partial z} \dfrac{\partial z}{\partial b} = a-y bL=zLbz=ay

根据 J = − 1 m ∑ L ( a , y ) J = -\dfrac{1}{m} \sum L(a, y) J=m1L(a,y) 最终可得:

∂ J ∂ w = ∂ J ∂ a ∂ a ∂ w = 1 m X ( A − Y ) T \dfrac{\partial J}{\partial w} = \dfrac{\partial J}{\partial a} \dfrac{\partial a}{\partial w} = \dfrac{1}{m} X(A-Y)^T wJ=aJwa=m1X(AY)T

∂ J ∂ b = 1 m ∑ i = 1 m ( a ( i ) − y ( i ) ) \dfrac{\partial J}{\partial b} = \dfrac{1}{m} \sum^{m}_{i=1} (a^{(i)} - y^{(i)}) bJ=m1i=1m(a(i)y(i))

dw = X.dot((A - Y).T) / m
db = np.sum(A - Y) / m

3.3. 更新 w , b

w = w - learning_rate * dw
b = b - learning_rate * db

4. 预测测试集

使用训练出来的 w , b , 对测试集使用 y_pred = sigmoid(wx+b) , 计算得预测的概率, 对其取整, 例如大于0.7则判定为’是’, 否则为’否’. Coursera作业是0.5就判定为’是’, 那么使用内建函数四舍五入即可.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值