数据集格式
在机器学习里数据集格式一般如下:
第
i
i
i个样本特征和标签写作:
x
i
=
(
x
1
i
,
x
2
i
,
x
3
i
,
.
.
.
,
x
d
i
)
T
∈
R
d
y
i
∈
R
x^i=(x_1^i,x_2^i,x_3^i,...,x_d^i)^T \in R^d \\ y^i \in R
xi=(x1i,x2i,x3i,...,xdi)T∈Rdyi∈R
完整的数据集可以写作:
X
=
[
x
1
,
x
2
,
…
,
x
n
]
=
[
x
1
1
x
1
2
…
x
1
n
x
2
1
x
2
2
…
x
2
n
⋮
⋮
⋱
⋮
x
d
1
x
d
2
…
x
d
n
]
∈
R
d
∗
n
y
=
[
y
1
,
y
2
,
…
,
y
n
]
∈
R
n
X=[ x^1,x^2,\ldots,x^n] \\ = \begin{bmatrix} x^1_1& x^2_1 &\ldots &x^n_1 \\ x^1_2& x^2_2 &\ldots &x^n_2 \\ \vdots& \vdots & \ddots & \vdots\\ x^1_d& x^2_d &\ldots &x^n_d \\ \end{bmatrix} \in R^{d*n} \\ y=[ y^1,y^2,\ldots,y^n] \in R^n
X=[x1,x2,…,xn]=⎣
⎡x11x21⋮xd1x12x22⋮xd2……⋱…x1nx2n⋮xdn⎦
⎤∈Rd∗ny=[y1,y2,…,yn]∈Rn
基于线性回归+sigmoid实现二分类的表达式
对于单个样本
z
=
w
T
x
+
b
=
w
1
x
1
+
w
2
x
2
+
…
+
w
d
x
d
+
b
z = w^Tx+b \\ = w_1x_1+ w_2x_2+\ldots+w_dx_d+b
z=wTx+b=w1x1+w2x2+…+wdxd+b
使用
s
i
g
m
o
i
d
sigmoid
sigmoid函数实现输出为
0
−
1
0-1
0−1之间,从而实现二分类,
s
i
g
m
o
i
d
sigmoid
sigmoid函数表达式如下
σ
(
z
)
=
1
1
+
e
−
z
=
e
z
1
+
e
z
\sigma (z) = \frac{1}{1+e^{-z}} = \frac{e^z}{1+e^z}
σ(z)=1+e−z1=1+ezez
使用
c
r
o
s
s
−
e
n
t
r
o
p
y
cross-entropy
cross−entropy 作为损失函数,对于二分类问题,其表达式为
g
(
z
i
)
=
−
y
i
log
(
σ
(
z
i
)
)
−
(
1
−
y
i
)
log
(
1
−
σ
(
z
i
)
)
g(z^i)=-y^i \log{(\sigma(z^i))}-(1-y^i) \log{(1-\sigma(z^i))}
g(zi)=−yilog(σ(zi))−(1−yi)log(1−σ(zi))
则损失函数可写作
L
=
1
n
∑
i
=
1
n
(
g
(
z
i
)
)
=
1
n
∑
i
=
1
n
(
−
y
i
log
(
σ
(
z
i
)
)
−
(
1
−
y
i
)
log
(
1
−
σ
(
z
i
)
)
)
L=\frac{1}{n} \sum_{i=1}^{n}(g(z^i))=\frac{1}{n} \sum_{i=1}^{n}(-y^i \log{(\sigma(z^i))}-(1-y^i) \log{(1-\sigma(z^i))})
L=n1i=1∑n(g(zi))=n1i=1∑n(−yilog(σ(zi))−(1−yi)log(1−σ(zi)))
链式法则求导
链式表达式
求解
w
w
w和
b
b
b的导数需要使用链式求导法则
求导公式如下:
∂
L
∂
w
i
=
∂
L
∂
g
∂
g
∂
σ
∂
σ
∂
z
∂
z
∂
w
i
∂
L
∂
b
=
∂
L
∂
g
∂
g
∂
σ
∂
σ
∂
z
∂
z
∂
b
\frac{\partial L}{\partial w_i} = \frac{\partial L}{\partial g} \frac{\partial g}{\partial \sigma} \frac{\partial \sigma}{\partial z} \frac{\partial z}{\partial w_i} \\ \frac{\partial L}{\partial b} = \frac{\partial L}{\partial g} \frac{\partial g}{\partial \sigma} \frac{\partial \sigma}{\partial z} \frac{\partial z}{\partial b}
∂wi∂L=∂g∂L∂σ∂g∂z∂σ∂wi∂z∂b∂L=∂g∂L∂σ∂g∂z∂σ∂b∂z
求解 ∂ L ∂ g \frac{\partial L}{\partial g} ∂g∂L
L
L
L关于
g
g
g的表达式可写作
L
=
1
n
∑
i
=
1
n
(
g
)
=
g
L=\frac{1}{n} \sum_{i=1}^{n}(g)=g
L=n1i=1∑n(g)=g
因此
∂
L
∂
g
=
1
\frac{\partial L}{\partial g}=1
∂g∂L=1
求解 ∂ g ∂ σ \frac{\partial g}{\partial \sigma} ∂σ∂g
g
g
g关于
σ
\sigma
σ的表达式可写作
g
=
−
y
log
(
σ
)
−
(
1
−
y
)
log
(
1
−
σ
)
g=-y \log{(\sigma)}-(1-y) \log{(1-\sigma)}
g=−ylog(σ)−(1−y)log(1−σ)
则可得
∂
g
∂
σ
=
∂
(
−
y
log
(
σ
)
−
(
1
−
y
)
log
(
1
−
σ
)
)
∂
σ
=
−
y
∂
log
(
σ
)
∂
σ
−
(
1
−
y
)
∂
log
(
1
−
σ
)
∂
σ
=
−
y
σ
+
1
−
y
1
−
σ
\frac{\partial g}{\partial \sigma} = \frac{\partial (-y \log{(\sigma)}-(1-y) \log{(1-\sigma)})}{\partial \sigma} \\ = -y \frac{\partial \log{(\sigma)}}{\partial \sigma} -(1-y) \frac{\partial \log{(1-\sigma)}}{\partial \sigma} \\ =-\frac{y}{\sigma} + \frac {1-y}{1-\sigma}
∂σ∂g=∂σ∂(−ylog(σ)−(1−y)log(1−σ))=−y∂σ∂log(σ)−(1−y)∂σ∂log(1−σ)=−σy+1−σ1−y
求解 ∂ σ ∂ z \frac{\partial \sigma}{\partial z} ∂z∂σ
σ
\sigma
σ关于
z
z
z的表达式可写作
σ
(
z
)
=
1
1
+
e
−
z
=
e
z
1
+
e
z
\sigma (z) = \frac{1}{1+e^{-z}} = \frac{e^z}{1+e^z}
σ(z)=1+e−z1=1+ezez
则
∂
σ
∂
z
=
∂
(
1
1
+
e
−
z
)
∂
z
=
−
1
(
1
+
e
−
z
)
2
×
e
−
z
×
(
−
1
)
=
e
−
z
(
1
+
e
−
z
)
2
=
σ
(
1
−
σ
)
\frac{\partial \sigma}{\partial z}=\frac{\partial (\frac{1}{1+e^{-z}}) }{\partial z} \\ =-\frac{1}{(1+e^{-z})^2}\times e^{-z} \times (-1) \\ =\frac{e^{-z} }{(1+e^{-z})^2}=\sigma(1-\sigma)
∂z∂σ=∂z∂(1+e−z1)=−(1+e−z)21×e−z×(−1)=(1+e−z)2e−z=σ(1−σ)
求解 ∂ z ∂ w \frac{\partial z}{\partial w} ∂w∂z
z
z
z关于
w
w
w的表达式为
z
=
w
T
x
+
b
z = w^Tx+b
z=wTx+b
则可得
∂
z
∂
w
i
=
x
i
,
i
=
1
,
2
,
…
,
d
\frac{\partial z}{\partial w_i}=x_i,i=1,2,\ldots,d
∂wi∂z=xi,i=1,2,…,d
求解 ∂ z ∂ b \frac{\partial z}{\partial b} ∂b∂z
z
z
z关于
w
w
w的表达式为
z
=
w
T
x
+
b
z = w^Tx+b
z=wTx+b
则可得
∂
z
∂
b
=
1
\frac{\partial z}{\partial b}=1
∂b∂z=1
最终表达式
梯度表达式
∂ L ∂ w i = ∂ L ∂ g ∂ g ∂ σ ∂ σ ∂ z ∂ z ∂ w i = 1 × ( − y σ + 1 − y 1 − σ ) × σ ( 1 − σ ) × x i = x i ( − y ( 1 − σ ) + σ ( 1 − y ) ) = x i ( σ − y ) ∂ L ∂ b = ∂ L ∂ g ∂ g ∂ σ ∂ σ ∂ z ∂ z ∂ b = 1 × ( − y σ + 1 − y 1 − σ ) × σ ( 1 − σ ) × 1 = − y ( 1 − σ ) + σ ( 1 − y ) = σ − y \frac{\partial L}{\partial w_i} = \frac{\partial L}{\partial g} \frac{\partial g}{\partial \sigma} \frac{\partial \sigma}{\partial z} \frac{\partial z}{\partial w_i} \\ = 1\times(-\frac{y}{\sigma} + \frac {1-y}{1-\sigma} ) \times \sigma(1-\sigma) \times x_i \\ = x_i(-y(1-\sigma)+\sigma(1-y)) \\ = x_i(\sigma-y) \\ \frac{\partial L}{\partial b} = \frac{\partial L}{\partial g} \frac{\partial g}{\partial \sigma} \frac{\partial \sigma}{\partial z} \frac{\partial z}{\partial b} \\ = 1\times(-\frac{y}{\sigma} + \frac {1-y}{1-\sigma} ) \times \sigma(1-\sigma) \times 1 \\ = -y(1-\sigma)+\sigma(1-y) \\ = \sigma-y \\ ∂wi∂L=∂g∂L∂σ∂g∂z∂σ∂wi∂z=1×(−σy+1−σ1−y)×σ(1−σ)×xi=xi(−y(1−σ)+σ(1−y))=xi(σ−y)∂b∂L=∂g∂L∂σ∂g∂z∂σ∂b∂z=1×(−σy+1−σ1−y)×σ(1−σ)×1=−y(1−σ)+σ(1−y)=σ−y
梯度更新表达式
w w w更新表达式
因为
∂
L
∂
w
i
=
x
i
(
σ
−
y
)
\frac{\partial L}{\partial w_i} =x_i(\sigma-y)
∂wi∂L=xi(σ−y)
则梯度更新表达式为
w
i
=
w
i
−
η
∂
L
∂
w
i
=
w
i
−
η
x
i
(
σ
−
y
)
w_i=w_i-\eta\frac{\partial L}{\partial w_i} \\ = w_i-\eta x_i(\sigma-y)
wi=wi−η∂wi∂L=wi−ηxi(σ−y)
则
[
w
1
w
2
⋮
w
d
]
=
[
w
1
w
2
⋮
w
d
]
−
η
(
σ
−
y
)
[
x
1
x
2
⋮
x
d
]
\begin{bmatrix} w_1\\ w_2\\ \vdots \\ w_d\\ \end{bmatrix}=\begin{bmatrix} w_1\\ w_2\\ \vdots \\ w_d\\ \end{bmatrix}-\eta(\sigma-y)\begin{bmatrix} x_1\\ x_2\\ \vdots \\ x_d\\ \end{bmatrix}
⎣
⎡w1w2⋮wd⎦
⎤=⎣
⎡w1w2⋮wd⎦
⎤−η(σ−y)⎣
⎡x1x2⋮xd⎦
⎤
即
w
=
w
−
η
(
σ
−
y
)
x
w=w-\eta(\sigma-y)x
w=w−η(σ−y)x
b b b更新表达式
因为
∂
L
∂
b
=
σ
−
y
\frac{\partial L}{\partial b} =\sigma-y
∂b∂L=σ−y
则梯度更新表达式为
b
=
b
−
η
∂
L
∂
b
=
b
−
η
(
σ
−
y
)
b=b-\eta\frac{\partial L}{\partial b} \\ = b-\eta(\sigma-y)
b=b−η∂b∂L=b−η(σ−y)