J(θ)=∑i=1m(y(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i))))
J
(
θ
)
=
∑
i
=
1
m
(
y
(
i
)
l
o
g
h
θ
(
x
(
i
)
)
+
(
1
−
y
(
i
)
)
l
o
g
(
1
−
h
θ
(
x
(
i
)
)
)
)
前面我们提到了用梯度上升法和牛顿法。那么什么是梯度上升法和牛顿法呢?
梯度上升算法
由于J(θ)
J
(
θ
)
过于复杂,我们从一个简单的函数求极大值说起。 一元二次函数
f(x)=−x2+4x
f
(
x
)
=
−
x
2
+
4
x
图像如下:
根据高中所学知识: 1. 求极值,先求函数的导数
f′(x)=−2x+4
f
′
(
x
)
=
−
2
x
+
4
2. 令导数为0,可求出
x=2
x
=
2
即取得函数
f(x)
f
(
x
)
的极大值。极大值等于
f(2)=4
f
(
2
)
=
4
J(θ)=∑i=1m{y(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))}
J
(
θ
)
=
∑
i
=
1
m
{
y
(
i
)
l
o
g
h
θ
(
x
(
i
)
)
+
(
1
−
y
(
i
)
)
l
o
g
(
1
−
h
θ
(
x
(
i
)
)
)
}
hθ(x)=g(θTx)=11+e−θTx
h
θ
(
x
)
=
g
(
θ
T
x
)
=
1
1
+
e
−
θ
T
x
令:
g(z)=11+e−z
g
(
z
)
=
1
1
+
e
−
z
求导:
g′(z)=e−z(1+e−z)2=11+e−z∗e−z1+e−z=11+e−z∗(1−11+e−z)=g(z)∗(1−g(z))
g
′
(
z
)
=
e
−
z
(
1
+
e
−
z
)
2
=
1
1
+
e
−
z
∗
e
−
z
1
+
e
−
z
=
1
1
+
e
−
z
∗
(
1
−
1
1
+
e
−
z
)
=
g
(
z
)
∗
(
1
−
g
(
z
)
)
可得:
g′(θTx)=g(θTx)∗(1−g(θTx))
g
′
(
θ
T
x
)
=
g
(
θ
T
x
)
∗
(
1
−
g
(
θ
T
x
)
)
求J(θ)的偏导
J
(
θ
)
的
偏
导
∂J(θ)∂θj=∑i=1m(y(i)hθ(x(i))−1−y(i)1−hθ(x(i)))∗∂hθ(x(i))∂θj
∂
J
(
θ
)
∂
θ
j
=
∑
i
=
1
m
(
y
(
i
)
h
θ
(
x
(
i
)
)
−
1
−
y
(
i
)
1
−
h
θ
(
x
(
i
)
)
)
∗
∂
h
θ
(
x
(
i
)
)
∂
θ
j
=∑i=1m(y(i)g(θTx(i))−1−y(i)1−g(θTx(i)))∗∂g(θTx(i))∂θj
=
∑
i
=
1
m
(
y
(
i
)
g
(
θ
T
x
(
i
)
)
−
1
−
y
(
i
)
1
−
g
(
θ
T
x
(
i
)
)
)
∗
∂
g
(
θ
T
x
(
i
)
)
∂
θ
j
=∑i=1m(y(i)g(θTx(i))−1−y(i)1−g(θTx(i)))∗g(θTx(i))∗(1−g(θTx(i)))∗∂θTx(i)∂θj
=
∑
i
=
1
m
(
y
(
i
)
g
(
θ
T
x
(
i
)
)
−
1
−
y
(
i
)
1
−
g
(
θ
T
x
(
i
)
)
)
∗
g
(
θ
T
x
(
i
)
)
∗
(
1
−
g
(
θ
T
x
(
i
)
)
)
∗
∂
θ
T
x
(
i
)
∂
θ
j
其中:
∂θTx(i)∂θj=∂(θ1x(i)1+θ2x(i)2+θ3x(i)3+...+θnx(i)n)∂θj=x(i)j
∂
θ
T
x
(
i
)
∂
θ
j
=
∂
(
θ
1
x
1
(
i
)
+
θ
2
x
2
(
i
)
+
θ
3
x
3
(
i
)
+
.
.
.
+
θ
n
x
n
(
i
)
)
∂
θ
j
=
x
j
(
i
)
上式=∑i=1m{y(i)(1−g(θTx(i)))−(1−y(i))(g(θTx(i))}∗x(i)j=∑i=1m(y(i)−g(θTx(i)))∗x(i)j
上
式
=
∑
i
=
1
m
{
y
(
i
)
(
1
−
g
(
θ
T
x
(
i
)
)
)
−
(
1
−
y
(
i
)
)
(
g
(
θ
T
x
(
i
)
)
}
∗
x
j
(
i
)
=
∑
i
=
1
m
(
y
(
i
)
−
g
(
θ
T
x
(
i
)
)
)
∗
x
j
(
i
)
综上:
θj:=θj+α∑i=1m(y(i)−hθ(x(i)))∗x(i)j
θ
j
:=
θ
j
+
α
∑
i
=
1
m
(
y
(
i
)
−
h
θ
(
x
(
i
)
)
)
∗
x
j
(
i
)
θj:=θj+α(y(i)−hθ(x(i)))∗x(i)j
θ
j
:=
θ
j
+
α
(
y
(
i
)
−
h
θ
(
x
(
i
)
)
)
∗
x
j
(
i
)
牛顿法
同样,我们先来看个简单的例子。求函数值为0时的x的值。 用牛顿法迭代公式:
xn+1=xn−f(xn)f′(xn)xn+2=xn+1−f(xn+1)f′(xn+1)
x
n
+
1
=
x
n
−
f
(
x
n
)
f
′
(
x
n
)
x
n
+
2
=
x
n
+
1
−
f
(
x
n
+
1
)
f
′
(
x
n
+
1
)
这个迭代 公式的意思就是:在x=x1
x
=
x
1
时,求得(x1,f(x1))
(
x
1
,
f
(
x
1
)
)
的切线与x轴的交点为x2
x
2
,再求(x2,f(x2))
(
x
2
,
f
(
x
2
)
)
的切线与x轴的交点x3
x
3
,依次迭代,直到找到满足要求的点。
然而,对于J(θ)
J
(
θ
)
我们需要求得一阶导数为0的点,那么牛顿法迭代公式可以更新为:
xn+1=xn−J′(xn)J′′(xn)xn+2=xn+1−J′(xn+1)J′′(xn+1)
x
n
+
1
=
x
n
−
J
′
(
x
n
)
J
″
(
x
n
)
x
n
+
2
=
x
n
+
1
−
J
′
(
x
n
+
1
)
J
″
(
x
n
+
1
)
拓展
在多元的情况下,J′′(xn)=Hℓ(θ^)
J
″
(
x
n
)
=
H
ℓ
(
θ
^
)
海塞矩阵
Hℓ(θ^)=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∑i=1nhθ(xi)(1−hθ(xi))xi,1xi,1,∑i=1nhθ(xi)(1−hθ(xi))xi,2xi,1,∑i=1nhθ(xi)(1−hθ(xi))xi,1,∑i=1nhθ(xi)(1−hθ(xi))xi,1xi,2,∑i=1nhθ(xi)(1−hθ(xi))xi,2xi,2,∑i=1nhθ(xi)(1−hθ(xi))xi,2,∑i=1nhθ(xi)(1−hθ(xi))xi,1∑i=1nhθ(xi)(1−hθ(xi))xi,2,∑i=1nhθ(xi)(1−hθ(xi))⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥hθ(xi)=11+e−zz=θ1xi,1+θ2xi,2+θ3
H
ℓ
(
θ
^
)
=
[
∑
i
=
1
n
h
θ
(
x
i
)
(
1
−
h
θ
(
x
i
)
)
x
i
,
1
x
i
,
1
,
∑
i
=
1
n
h
θ
(
x
i
)
(
1
−
h
θ
(
x
i
)
)
x
i
,
1
x
i
,
2
,
∑
i
=
1
n
h
θ
(
x
i
)
(
1
−
h
θ
(
x
i
)
)
x
i
,
1
∑
i
=
1
n
h
θ
(
x
i
)
(
1
−
h
θ
(
x
i
)
)
x
i
,
2
x
i
,
1
,
∑
i
=
1
n
h
θ
(
x
i
)
(
1
−
h
θ
(
x
i
)
)
x
i
,
2
x
i
,
2
,
∑
i
=
1
n
h
θ
(
x
i
)
(
1
−
h
θ
(
x
i
)
)
x
i
,
2
,
∑
i
=
1
n
h
θ
(
x
i
)
(
1
−
h
θ
(
x
i
)
)
x
i
,
1
,
∑
i
=
1
n
h
θ
(
x
i
)
(
1
−
h
θ
(
x
i
)
)
x
i
,
2
,
∑
i
=
1
n
h
θ
(
x
i
)
(
1
−
h
θ
(
x
i
)
)
]
h
θ
(
x
i
)
=
1
1
+
e
−
z
z
=
θ
1
x
i
,
1
+
θ
2
x
i
,
2
+
θ
3
一阶导数
∇J=−⟨∑ni=1(yi−hθ(xi))xi,1∑ni=1(yi−hθ(xi))xi,2∑ni=1(yi−hθ(xi))⟩
∇
J
=
−
⟨
∑
i
=
1
n
(
y
i
−
h
θ
(
x
i
)
)
x
i
,
1
∑
i
=
1
n
(
y
i
−
h
θ
(
x
i
)
)
x
i
,
2
∑
i
=
1
n
(
y
i
−
h
θ
(
x
i
)
)
⟩