01. 神经网络和深度学习
第二周 神经网络基础
2.1 二分分类
- 图片在计算机中的保存:三通道(红,绿,蓝)
- X = [ x ( 1 ) , x ( 2 ) , ⋯ , x ( m ) ] X=[x^{(1)},x^{(2)},\cdots,x^{(m)}] X=[x(1),x(2),⋯,x(m)],每个列向量为一个样本。
2.2 logistic回归
- 给定 x x x,目的是求 y ^ = P ( y = 1 ∣ x ) \hat{y}=P(y=1|x) y^=P(y=1∣x)。
- sigmoid函数
σ ( z ) = 1 1 + e − z \sigma(z)=\frac{1}{1+e^{-z}} σ(z)=1+e−z1
- logistic回归
y ^ = σ ( w T + b ) \hat{y} = \sigma(w^{T}+b) y^=σ(wT+b)
2.3 logistic回归损失函数
- 平方损失函数(在logistic中不常用,因为会导致非凸)
L ( y ^ , y ) = 1 2 ( y ^ − y ) 2 L(\hat{y}, y) = \frac{1}{2}(\hat{y}-y)^2 L(y^,y)=21(y^−y)2 - 熵损失函数
L ( y ^ , y ) = − ( y l o g y ^ + ( 1 − y ) l o g ( 1 − y ^ ) ) L(\hat{y}, y)=-(y log \hat{y}+ (1-y)log(1-\hat{y})) L(y^,y)=−(ylogy^+(1−y)log(1−y^)) - 成本函数
J j ( w , b ) = 1 m ∑ i = 1 m L ( y ^ ( i ) , y ( i ) ) Jj(w, b)=\frac{1}{m}\sum_{i=1}^{m}L(\hat{y}^{(i)}, y^{(i)}) Jj(w,b)=m1∑i=1mL(y^(i),y(i))
2.4 梯度下降法
- 流程
w : = w − α ∂ J ( w , b ) ∂ w w:= w-\alpha \frac{\partial J(w, b)}{\partial w} w:=w−α∂w∂J(w,b)
b : = b − α ∂ J ( w , b ) ∂ b b:= b-\alpha \frac{\partial J(w,b)}{\partial b} b:=b−α∂b∂J(w,b)
2.5 导数
2.6 更多导数的例子
2.7 计算图
- 举例:
J
(
a
,
b
,
c
)
=
3
(
a
+
b
c
)
J(a,b,c)=3(a+bc)
J(a,b,c)=3(a+bc)
2.8 计算图的导数计算
- 计算图
- 链式法则
- 代码中常用变量:dvar,最终关心的输出变量的导数
2.9 logistic回归中的梯度下降法
- 计算图
2.10 m个样本的梯度下降
J
=
0
;
d
w
1
=
0
;
d
w
2
=
0
;
d
b
=
0
f
o
r
i
=
1
t
o
m
z
(
i
)
=
w
T
x
(
i
)
+
b
a
(
i
)
=
σ
(
z
(
i
)
)
J
+
=
−
[
y
(
i
)
l
o
g
a
(
i
)
+
(
1
−
y
(
i
)
)
l
o
g
(
1
−
a
(
i
)
)
]
d
z
(
i
)
=
a
(
i
)
−
y
(
i
)
d
w
1
+
=
x
1
(
i
)
d
z
(
i
)
d
w
2
+
=
x
2
(
i
)
d
z
(
i
)
d
b
+
=
d
z
(
i
)
J
/
=
m
d
w
1
/
=
m
,
d
w
2
/
=
m
,
d
b
/
=
m
;
J=0; dw_1 =0; dw_2=0;db=0\\ for i =1 to m\\ z^{(i)} = w^T x^{(i)} +b\\ a^{(i)} = \sigma (z^{(i)})\\ J += -[y^{(i)}loga^{(i)}+ (1-y^{(i)})log(1-a^{(i)})]\\ dz^{(i)} = a^{(i)}-y^{(i)}\\ dw_1 += x_1^{(i)}dz^{(i)}\\ dw_2 += x_2^{(i)}dz^{(i)}\\ db += dz^{(i)}\\ J /= m\\ dw_1 /=m, dw_2 /=m, db/=m;
J=0;dw1=0;dw2=0;db=0fori=1tomz(i)=wTx(i)+ba(i)=σ(z(i))J+=−[y(i)loga(i)+(1−y(i))log(1−a(i))]dz(i)=a(i)−y(i)dw1+=x1(i)dz(i)dw2+=x2(i)dz(i)db+=dz(i)J/=mdw1/=m,dw2/=m,db/=m;
2. 问题:两次for循环,较为低效
3. 解决方法:向量化
2.11 向量化
- 什么是向量化
z = w T x + b z=w^T x+b z=wTx+b - 非向量化代码
z = 0 f o r i i n r a n g e ( n − x ) : z + = w [ i ] ∗ x [ i ] z + = b z=0 \\ for i in range(n-x): z += w[i]*x[i] z+= b z=0foriinrange(n−x):z+=w[i]∗x[i]z+=b - 向量化代码
z = n p . d o t ( w , x ) z=np.dot(w,x) z=np.dot(w,x)
2.12 向量化的更多例子
- 神经网络编程指南
(1) 只要可能,尽量少用for循环; - numpy常用函数:log, abs, maximum,**
- 向量化logistic梯度下降
2.13 向量化logistic回归
- Z = w T X + b Z=w^T X+b Z=wTX+b
- Z = n p . d o t ( w . T , X ) + b Z = np.dot(w.T, X)+b Z=np.dot(w.T,X)+b
- A = σ ( Z ) A = \sigma(Z) A=σ(Z)
2.14 向量化logistic回归的梯度输出
- d z = [ d z ( 1 ) , d z ( 2 ) , ⋯ , ( m ) ] dz = [dz^{(1)}, dz^{(2)}, \cdots, ^{(m)}] dz=[dz(1),dz(2),⋯,(m)]
- d z = A − Y dz = A-Y dz=A−Y
- d b = 1 m n p . s u m ( d Z ) db = \frac{1}{m}np.sum(dZ) db=m1np.sum(dZ)
- d w = 1 m X d z T dw = \frac{1}{m}Xdz^T dw=m1XdzT
2.15 Python中的广播
- 广播可以使python运行更搞笑
- 例子(求每种食物中热量占比)
- 代码
c a l = A . s u m ( a x i s = 0 ) cal =A.sum(axis = 0) cal=A.sum(axis=0)
p e r c e n t a g e = 100 ∗ A / ( c a l . r e s h a p e ( 1 , 4 ) ) percentage = 100*A/(cal.reshape(1,4)) percentage=100∗A/(cal.reshape(1,4))
增加reshape确保矩阵维度正确 - 广播举例
- 广播通用规则
( m , n ) + − ∗ / ( 1 , n ) → ( m , n ) (m,n) +-*/ (1,n) \rightarrow (m,n) (m,n)+−∗/(1,n)→(m,n)
( m , n ) + − ∗ / ( m , 1 ) → ( m , n ) (m,n) +-*/ (m,1) \rightarrow (m,n) (m,n)+−∗/(m,1)→(m,n)
( m , 1 ) + ( 1 , 1 ) → ( m , 1 ) (m,1) + (1,1) \rightarrow (m,1) (m,1)+(1,1)→(m,1)
( 1 , n ) + ( 1 , 1 ) → ( 1 , n ) (1,n) + (1,1) \rightarrow (1,n) (1,n)+(1,1)→(1,n)
2.16 关于python numpy向量的说明
- 不要使用
(
n
,
)
(n,)
(n,)形式的数据结构
例子: a = n p . r a n d o m . r a n d ( 5 ) → a = n p . r a n d o m . r a n d n ( 5 , 1 ) a=np.random.rand(5) \rightarrow a=np.random.randn(5,1) a=np.random.rand(5)→a=np.random.randn(5,1) - 使用声明
例子: a s s e r t ( a . s h a p e = = ( 5 , 1 ) ) assert(a.shape == (5,1)) assert(a.shape==(5,1)) - 重塑矩阵
例子: a = a . r e s h a p e ( ( 5 , 1 ) ) a = a.reshape((5,1)) a=a.reshape((5,1))