逻辑回归
给定输入
x
x
x,使得假设函数在
0
≤
h
θ
(
x
)
≤
1
0 \leq h_\theta(x) \leq 1
0≤hθ(x)≤1分布,
h
θ
(
x
)
=
g
(
z
)
h_\theta(x)=g(z)
hθ(x)=g(z)
为了使预测值在
[
0
,
1
]
[0,1]
[0,1]分布,选择sigmoid函数:
g
(
z
)
=
1
1
+
e
−
z
g(z)=\frac{1}{1+e^{-z}}
g(z)=1+e−z1
Loss Function 损失函数
L
(
h
θ
(
x
)
,
y
)
=
−
y
l
o
g
(
h
θ
(
x
)
)
−
(
1
−
y
)
l
o
g
(
1
−
h
θ
(
x
)
)
L(h_\theta(x),y)=-ylog(h_\theta(x))-(1-y)log(1-h_\theta(x))
L(hθ(x),y)=−ylog(hθ(x))−(1−y)log(1−hθ(x))
If y=1,
L
(
h
θ
(
x
)
,
y
)
=
−
l
o
g
(
h
θ
(
x
)
)
L(h_\theta(x),y)=-log(h_\theta(x))
L(hθ(x),y)=−log(hθ(x)), 想要
h
θ
(
x
)
h_\theta(x)
hθ(x)更大且
l
o
g
(
h
θ
(
x
)
)
log(h_\theta(x))
log(hθ(x))更大,使得
−
l
o
g
(
h
θ
(
x
)
)
-log(h_\theta(x))
−log(hθ(x))更小,即损失(误差)最小;
if y=0,
L
(
h
θ
(
x
)
,
y
)
=
−
l
o
g
(
1
−
h
θ
(
x
)
)
L(h_\theta(x),y)=-log(1-h_\theta(x))
L(hθ(x),y)=−log(1−hθ(x)),想要
h
θ
(
x
)
h_\theta(x)
hθ(x)更小且
l
o
g
(
h
θ
(
x
)
)
log(h_\theta(x))
log(hθ(x))更大,使得
−
l
o
g
(
1
−
h
θ
(
x
)
)
-log(1-h_\theta(x))
−log(1−hθ(x))更小,即损失(误差)最小。
Cost Function 成本函数
J
(
w
,
b
)
=
1
m
∑
i
=
1
m
L
(
h
θ
(
x
(
i
)
)
,
y
(
i
)
)
=
−
1
m
∑
i
=
1
m
y
(
i
)
l
o
g
(
h
θ
(
x
(
i
)
)
)
+
(
1
−
y
(
i
)
)
l
o
g
(
1
−
h
θ
(
x
(
i
)
)
)
J(w, b)=\frac{1}{m}\sum_{i=1}^{m}L(h_\theta(x^{(i)}),y^{(i)})=-\frac{1}{m}\sum_{i=1}^{m}y^{(i)}log(h_\theta(x^{(i)}))+(1-y^{(i)})log(1-h_\theta(x^{(i)}))
J(w,b)=m1i=1∑mL(hθ(x(i)),y(i))=−m1i=1∑my(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))
其中w为传递的权重参数,b为参数,x为输入,y为真实值(training data),
h
θ
(
x
)
h_\theta(x)
hθ(x)为输出,预测值。
Gradient Descent 梯度下降
x为输入数据,w为权重参数,b为参数,然后从左至右依次通过公式z, a, L计算损失(误差)函数,目标:使损失(误差)函数最小。其中公式z为数据x与参数d的表示函数,公式a为激活函数,此处选择了sigmoid函数,最后公式L为损失函数。
然后从右至左依次计算公式L, a, z, w/b的导数即反向传播的过程,使bp神经网络的基础,从右至左依次求导得出各权重的导数,以求得使误差最小的权重参数w和b。