吴恩达深度学习-哔哩哔哩哔全讲笔记(1-2周)

前言

本博客总结的是吴恩达深度学习全192讲哔哩哔哩
为了方便理解 和回顾写下本笔记
标题对应相应视频编号

2.1二元分类(猫)

给定一张64*64的图片

设x为 列向量 存储图片RGB数据

x = [ 255 255 ⋮ 255 255 ⋮ 255 255 ⋮ ] x=\begin{bmatrix} 255\\ 255\\ \vdots\\ 255\\ 255\\ \vdots\\ 255\\ 255\\ \vdots \end{bmatrix}\quad x= 255255255255255255 分别对应 红绿蓝 列向量维度nx=64*64*3

设(x,y) x ∈ R n x ( x 是 n 维向量 ) y ∈ [ 0 , 1 ] \quad x\in R^{nx} (x是n维向量)\quad y\in [0,1]\quad xRnx(xn维向量)y[0,1]为一个训练集样本

设m是训练集样本总数
训练集: [ ( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , ⋯   , ( x ( m ) , y ( m ) ) ] [(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),\cdots,(x^{(m)},y^{(m)})] [(x(1),y(1)),(x(2),y(2)),,(x(m),y(m))]
其中 ( x ( i ) , y ( i ) ) (x^{(i)},y^{(i)}) (x(i),y(i))表示第i个样本


X = [ ∣ ∣ ∣ x ( 1 ) ⋯ x ( m ) ∣ ∣ ∣ ] ( 2.1.1 ) X=\begin{bmatrix} | & | &| \\ x^{(1)} &\cdots &x^{(m)} \\ | &| &| \end{bmatrix} \quad (2.1.1) X= x(1)x(m) (2.1.1)
因为 x为列向量 所以X为(nx,m)矩阵 nx行 m列


Y = [ ∣ ∣ ∣ y ( 1 ) ⋯ y ( 3 ) ∣ ∣ ∣ ]   ( 2.1.2 ) Y = \begin{bmatrix} | & | & | & \\ y^{(1)} & \cdots & y^{(3)}\\ | & | & | \end{bmatrix} \ (2.1.2) Y= y(1)y(3)  (2.1.2)
此处 y ( i ) y^{(i)} y(i)是一个数 当然也是列向量,Y为(1,m)矩阵,Y.shape=(1,m)

tips:把数或列向量 按列排列很有用(排成行)

2.2 逻辑回归

给定x 算 y ^ = P ( y = 1 ∣ x ) 0 ≤ y ^ ≤ 1 \hat{y}=P(y=1|x) \quad 0 \leq \hat{y} \leq 1 y^=P(y=1∣x)0y^1


y ^ = σ ( w T x + b ) \hat{y}= \sigma(w^Tx+b) y^=σ(wTx+b)
这里 σ \sigma σ是sigmoid激活函数 看博客sigmoid函数
参数: w ∈ R ( n x ) w \in R^{(nx)} \quad wR(nx) w是nx维向量

损失函数

对训练集 [ ( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , ⋯   , ( x ( m ) , y ( m ) ) ] [(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),\cdots,(x^{(m)},y^{(m)})] [(x(1),y(1)),(x(2),y(2)),,(x(m),y(m))]

设损失函数loss fuction: L ( y ^ , y ) = − ( y l o g y ^ + ( 1 − y ) l o g ( 1 − y ^ ) ) L(\hat{y},y)=-(ylog \hat{y}+(1-y)log(1-\hat{y})) L(y^,y)=(ylogy^+(1y)log(1y^))
loss fuction有意义原因:y=1 y=0

设代价函数cost function: J ( w , b ) = 1 m ∑ i = 1 m L ( y ( i ) , y ) J(w,b)=\frac{1}{m} \sum_{i=1}^m L(y^{(i)},y) J(w,b)=m1i=1mL(y(i),y)

2.4梯度下降

w : = w − α J ( w ) d w w:=w - \alpha \frac{J(w)}{dw} w:=wαdwJ(w)
说明为啥迭代接近最优
w : = w − α α ( J ( w , b ) ) α ( w ) w:=w - \alpha \frac{\alpha(J(w,b))}{\alpha (w)}\quad w:=wαα(w)α(J(w,b))
偏导数写法不影响

2.9逻辑回归梯度下降(一个样本)

给定一个样本(x,y)
假设w,x为二维向量 nx=2
x = [ x 1 x 2 ] w = [ w 1 w 2 ] x= \begin{bmatrix} x1\\ x2 \end{bmatrix}\quad w= \begin{bmatrix} w1\\ w2 \end{bmatrix} x=[x1x2]w=[w1w2]
x1,x2,w1,w2分别是一个数


z = w T x + b y ^ = a = σ ( z ) ] z = w 1 x 1 + w 2 x 2 + b L ( a , y ) d a = d L ( a , y ) d a d z = d L d z = d L ( a , y ) d a ⋅ d a d z = a − y d w 1 = α L α w 1 = x 1 ⋅ d z d w 2 = x 2 ⋅ d z d b = d z (2.9.1) \begin{aligned} &z=w^Tx+b \\ &\hat{y}=a=\sigma(z) ] \\ &z=w_1x_1+w_2x_2+b \\ &L(a,y) \\ &da=\frac{dL(a,y)}{da} \\ &dz=\frac{dL}{dz}=\frac{dL(a,y)}{da} \cdot \frac{da}{dz}=a-y \quad \\ &dw_1=\frac{\alpha L}{\alpha w_1}=x_1 \cdot dz \\ &dw_2=x_2 \cdot dz \\ &db=dz \end{aligned} \tag{2.9.1} z=wTx+by^=a=σ(z)]z=w1x1+w2x2+bL(a,y)da=dadL(a,y)dz=dzdL=dadL(a,y)dzda=aydw1=αw1αL=x1dzdw2=x2dzdb=dz(2.9.1)
链式求导,其中 d a d z = a ( 1 − a ) 详见 s i g m a 函数导数 ) \frac{da}{dz}=a(1-a)详见sigma函数导数) dzda=a(1a)详见sigma函数导数)

2.10 m个样本下的逻辑回归

设定有m个训练样本 w,x的维度仍是2维向量 参考标题2.9
样本 ( x ( i ) , y ( i ) ) 对应 z ( i ) a ( i ) x 1 ( i ) x 2 ( i ) d w 1 ( i ) d w 2 ( i ) (x^{(i)},y^{(i)})对应z^{(i)} \quad a^{(i)}\quad x_1^{(i)} \quad x_2^{(i)}\quad dw_1^{(i)} \quad dw_2^{(i)} (x(i),y(i))对应z(i)a(i)x1(i)x2(i)dw1(i)dw2(i)
参数 w 1 w 2 b 在一次梯度下降中是固定的 w_1 \quad w_2 \quad b在一次梯度下降中是固定的 w1w2b在一次梯度下降中是固定的

J ( w , b ) = 1 m ∑ i = 1 m L ( a ( i ) , y ( i ) ) J(w,b)= \frac{1}{m} \sum_{i=1}^mL(a^{(i)},y^{(i)}) \\ J(w,b)=m1i=1mL(a(i),y(i))

d w 1 = 1 m ∑ i = 1 m α ( L [ a ( i ) , y ( i ) ] ) α ( w 1 ) 则 = 1 m ∑ i = 1 m d w 1 ( i ) = 1 m ∑ i = 1 m x 1 ( i ) ⋅ d z ( i ) d w 2 = 1 m ∑ i = 1 m α ( L [ a ( i ) , y ( i ) ] ) α ( w 2 ) 则 = 1 m ∑ i = 1 m d w 2 ( i ) = 1 m ∑ i = 1 m x 2 ( i ) ⋅ d z ( i ) (2.10.2) \begin{aligned} dw_1 &= \frac{1}{m} \sum_{i=1}^m \frac{\alpha (L[a^{(i)},y^{(i)}])}{\alpha (w_1)}\\ 则&= \frac{1}{m} \sum_{i=1}^m dw_1^{(i)} \\ &= \frac{1}{m} \sum_{i=1}^{m}x_1^{(i)} \cdot dz^{(i)} \\ \\ dw_2 &= \frac{1}{m} \sum_{i=1}^m \frac{\alpha (L[a^{(i)},y^{(i)}])}{\alpha (w_2)}\\ 则&= \frac{1}{m} \sum_{i=1}^m dw_2^{(i)} \\ &= \frac{1}{m} \sum_{i=1}^{m}x_2^{(i)} \cdot dz^{(i)} \end{aligned} \tag{2.10.2} dw1dw2=m1i=1mα(w1)α(L[a(i),y(i)])=m1i=1mdw1(i)=m1i=1mx1(i)dz(i)=m1i=1mα(w2)α(L[a(i),y(i)])=m1i=1mdw2(i)=m1i=1mx2(i)dz(i)(2.10.2)
推导见公式2.9.1


d b = 1 m ∑ i = 1 m α ( L [ a ( i ) , y ( i ) ] ) α ( b ) 则 = 1 m ∑ i = 1 m d z ( i ) = 1 m ∑ i = 1 m a ( i ) − y ( i ) (2.10.2) \begin{aligned} db &= \frac{1}{m} \sum_{i=1}^m \frac{\alpha (L[a^{(i)},y^{(i)}])}{\alpha (b)}\\ 则&= \frac{1}{m} \sum_{i=1}^m dz^{(i)} \\ &=\frac{1}{m} \sum_{i=1}^ma^{(i)} - y^{(i)} \end{aligned} \tag{2.10.2} db=m1i=1mα(b)α(L[a(i),y(i)])=m1i=1mdz(i)=m1i=1ma(i)y(i)(2.10.2)
推导见公式2.9.1

可以用一个for循环计算 d w 1 d w 2 d b dw_1 \quad dw_2 \quad db dw1dw2db
然后计算出此次梯度下降得出的 w 1 w 2 b w_1 \quad w_2 \quad b w1w2b
w 1 = w 1 − α ⋅ d w 1 w 2 = w 2 − α ⋅ d w 2 b = b − α ⋅ d b \begin{aligned} &w_1= w_1- \alpha \cdot dw_1 \\ &w_2=w_2 - \alpha \cdot dw_2 \\ &b=b- \alpha \cdot db \end{aligned} w1=w1αdw1w2=w2αdw2b=bαdb


以上章节 w与x的维度nx=2,下一章扩展nx维度 并且采用向量化方法运算得出 d w 1 d w 2 d b dw_1 \quad dw_2 \quad db dw1dw2db

2.13向量化逻辑回归


z = w T x + b z , b ∈ R (实数) w = [ w 1 ⋮ w n x ] x = [ x 1 ⋮ x n x ] x , w ∈ R n x z=w^Tx+b \quad z,b\in R(实数)\\ w=\begin{bmatrix} w_1\\ \vdots \\ w_{nx} \end{bmatrix} \quad x=\begin{bmatrix} x_1\\ \vdots \\ x_{nx} \end{bmatrix} \quad x,w\in R^{nx} z=wTx+bz,bR(实数)w= w1wnx x= x1xnx x,wRnx
python中用一行代码运算 z z zz=np.dot(w,x)+b

X = [ ∣ ∣ ∣ x ( 1 ) ⋯ x ( 2 ) ∣ ∣ ∣ ] X=\begin{bmatrix} |&|&| \\ x^{(1)}&\cdots& x^{(2)} \\ |&|&| \end{bmatrix} X= x(1)x(2)
X矩阵里的 x ( i ) x^{(i)} x(i)为nx维列向量 详见(2.1.1)公式

Z = [ z ( i ) , z ( i ) , ⋯   , z ( m ) ] = w T X + [ b , b , ⋯   , b ] z ( i ) ∈ R Z 为 ( 1 , m ) 矩阵 \begin{aligned} &Z=\begin{bmatrix} z^{(i)},z^{(i)},\cdots,z^{(m)} \end{bmatrix}=w^TX+[b,b,\cdots,b] \\ &z^{(i)} \in R \quad Z为(1,m)矩阵 \end{aligned} Z=[z(i),z(i),,z(m)]=wTX+[b,b,,b]z(i)RZ(1,m)矩阵
python中用一行代码算 Z Z Z: Z=np.dot(w.T,X)+b
其中 b会被广播成[b,b,…,b] (1,m)矩阵


A = [ a ( 1 ) , a ( 2 ) , ⋯   , a ( m ) ] = σ ( Z ) a ( i ) ∈ R A 是 ( 1 , m ) 矩阵 \begin{aligned} &A=\begin{bmatrix} a^{(1)},a^{(2)},\cdots,a^{(m)} \end{bmatrix}=\sigma(Z)\\ &a^{(i)} \in R \quad A是(1,m)矩阵 \end{aligned} A=[a(1),a(2),,a(m)]=σ(Z)a(i)RA(1,m)矩阵
python中用一个函数就能算A


d Z = [ d z ( i ) , d z ( i ) , ⋯   , d z ( m ) ] 其中 d z ( i ) = a ( i ) − y ( i ) 参考 ( 2.9.1 ) \begin{aligned} &dZ=[dz^{(i)},dz^{(i)},\cdots,dz^{(m)}] \\ &其中dz^{(i)}=a^{(i)}-y^{(i)} \quad参考(2.9.1) \end{aligned} dZ=[dz(i),dz(i),,dz(m)]其中dz(i)=a(i)y(i)参考(2.9.1)

Y = [ y ( i ) , y ( i ) , ⋯   , y ( m ) ] y ( i ) 是一个数 \begin{aligned} &Y=[y^{(i)},y^{(i)},\cdots,y^{(m)}] \\ &y^{(i)}是一个数 \end{aligned} Y=[y(i),y(i),,y(m)]y(i)是一个数

向量化求变量

d Z = A − Y = [ a ( 1 ) − y ( 1 ) , ⋯   , a ( m ) − y ( m ) ] \begin{aligned} &dZ=A-Y=[a^{(1)}-y^{(1)},\cdots,a^{(m)}-y^{(m)}] \\ \end{aligned} dZ=AY=[a(1)y(1),,a(m)y(m)]
d b = 1 m ∑ i = 1 m d z ( i ) = 1 m ∑ i = 1 m a ( i ) − y ( i ) = 1 m n p . s u m ( d Z ) \begin{aligned} db &= \frac{1}{m} \sum_{i=1}^m dz^{(i)} \\ &=\frac{1}{m} \sum_{i=1}^ma^{(i)} - y^{(i)} \\ &=\frac{1}{m}np.sum(dZ) \end{aligned} db=m1i=1mdz(i)=m1i=1ma(i)y(i)=m1np.sum(dZ)


d W = [ d w 1 d w 2 ⋮ d w n x ] = [ 1 m ∑ i = 1 m x 1 ( i ) ⋅ d z ( i ) 1 m ∑ i = 1 m x 2 ( i ) ⋅ d z ( i ) ⋮ 1 m ∑ i = 1 m x n x ( i ) ⋅ d z ( i ) ] 其中 d w i ∈ R 参考 ( 2.10.2 ) \begin{aligned} dW=\begin{bmatrix} dw_1\\dw_2\\ \vdots \\ dw_{nx} \\ \end{bmatrix}=\begin{bmatrix} \frac{1}{m} \sum_{i=1}^{m}x_1^{(i)} \cdot dz^{(i)} \\ \frac{1}{m} \sum_{i=1}^{m}x_2^{(i)} \cdot dz^{(i)} \\ \vdots \\ \frac{1}{m} \sum_{i=1}^{m}x_{nx}^{(i)} \cdot dz^{(i)} \\ \end{bmatrix}\\ 其中dw_i \in R \quad参考(2.10.2)\\ \end{aligned} dW= dw1dw2dwnx = m1i=1mx1(i)dz(i)m1i=1mx2(i)dz(i)m1i=1mxnx(i)dz(i) 其中dwiR参考(2.10.2)
d W = 1 m X ⋅ d Z T = [ ∣ ∣ ∣ x ( 1 ) ⋯ x ( 2 ) ∣ ∣ ∣ ] ⋅ [ d z ( 1 ) d z ( 2 ) ⋮ d z ( m ) ] = ( n x , m ) ⋅ ( m , 1 ) = [ 1 m ∑ i = 1 m x 1 ( i ) ⋅ d z ( i ) 1 m ∑ i = 1 m x 2 ( i ) ⋅ d z ( i ) ⋮ 1 m ∑ i = 1 m x n x ( i ) ⋅ d z ( i ) ] = [ d w 1 d w 2 ⋮ d w n x ] \begin{aligned} dW & =\frac{1}{m}X \cdot dZ^T \\ &=\begin{bmatrix} |&|&| \\ x^{(1)}&\cdots& x^{(2)} \\ |&|&| \end{bmatrix} \cdot \begin{bmatrix} dz^{(1)}\\dz^{(2)}\\ \vdots\\ dz^{(m)} \end{bmatrix} \\ &=(nx,m) \cdot (m,1) \\ &=\begin{bmatrix} \frac{1}{m} \sum_{i=1}^{m}x_1^{(i)} \cdot dz^{(i)} \\ \frac{1}{m} \sum_{i=1}^{m}x_2^{(i)} \cdot dz^{(i)} \\ \vdots \\ \frac{1}{m} \sum_{i=1}^{m}x_{nx}^{(i)} \cdot dz^{(i)} \\ \end{bmatrix}=\begin{bmatrix} dw_1\\dw_2\\ \vdots \\ dw_{nx} \\ \end{bmatrix} \end{aligned} dW=m1XdZT= x(1)x(2) dz(1)dz(2)dz(m) =(nx,m)(m,1)= m1i=1mx1(i)dz(i)m1i=1mx2(i)dz(i)m1i=1mxnx(i)dz(i) = dw1dw2dwnx

  • 16
    点赞
  • 26
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值