假设函数
h(θ)=11+e−θTX h ( θ ) = 1 1 + e − θ T X
为什么使用sigmod
这个网上有很多文章,但是还是不太看懂。大概就是0-1之间增函数,还有是指数分布簇。
代价函数
J(θ)=−1m∑i=1m[yilog(hθ(xi))+(1−yi)log(1−hθ(xi))] J ( θ ) = − 1 m ∑ i = 1 m [ y i log ( h θ ( x i ) ) + ( 1 − y i ) log ( 1 − h θ ( x i ) ) ]
代价函数推导
伯努利分布
求p的最大似然估计量
P{X=x}= px(1−p)1−p=0,1 p x ( 1 − p ) 1 − p = 0 , 1
设 x1,x2,…,xn x 1 , x 2 , … , x n 是给定的样本值
对应的似然函数
L(p)=∏i=1npxi(1−p)1−xi(0<p<1) L ( p ) = ∏ i = 1 n p x i ( 1 − p ) 1 − x i ( 0 < p < 1 ) 求L(p)的最大值点
取对数
lnL(p)=∑i=1nln[pxi(1−p)1−xi] ln L ( p ) = ∑ i = 1 n ln [ p x i ( 1 − p ) 1 − x i ]
=∑i=1n[xilnp+(1−xi)ln(1−p)] = ∑ i = 1 n [ x i ln p + ( 1 − x i ) ln ( 1 − p ) ]
替换成logistic回归
J(θ)=−1m[∑I=1myiloghθ(xi)+(1−yi)log(1−hθ(xi))] J ( θ ) = − 1 m [ ∑ I = 1 m y i l o g h θ ( x i ) + ( 1 − y i ) l o g ( 1 − h θ ( x i ) ) ]
代价函数求导
J(θ)=−1m[∑I=1myiloghθ(xi)+(1−yi)log(1−hθ(xi))]
J
(
θ
)
=
−
1
m
[
∑
I
=
1
m
y
i
l
o
g
h
θ
(
x
i
)
+
(
1
−
y
i
)
l
o
g
(
1
−
h
θ
(
x
i
)
)
]
∂∂θj=−1m∑I=1m∂∂θj[yiloghθ(xi)+(1−yi)log(1−hθ(xi))]
∂
∂
θ
j
=
−
1
m
∑
I
=
1
m
∂
∂
θ
j
[
y
i
l
o
g
h
θ
(
x
i
)
+
(
1
−
y
i
)
l
o
g
(
1
−
h
θ
(
x
i
)
)
]
=−1m∑I=1m[yiloghθ(xi)]′+[(1−yi)log(1−hθ(xi))]′
=
−
1
m
∑
I
=
1
m
[
y
i
l
o
g
h
θ
(
x
i
)
]
′
+
[
(
1
−
y
i
)
l
o
g
(
1
−
h
θ
(
x
i
)
)
]
′
……….(
(u+v)′=u′+v′
(
u
+
v
)
′
=
u
′
+
v
′
)
=−1m∑I=1m[yiloghθ(xi)]′+[(1−yi)log(1−hθ(xi))]′
=
−
1
m
∑
I
=
1
m
[
y
i
l
o
g
h
θ
(
x
i
)
]
′
+
[
(
1
−
y
i
)
l
o
g
(
1
−
h
θ
(
x
i
)
)
]
′
……….(
(uv)′=u′v−uv′
(
u
v
)
′
=
u
′
v
−
u
v
′
)
=−1m∑I=1m[(yi)′loghθ(xi)+yi(loghθ(xi))′]+[(1−yi)′log(1−hθ(xi))+(1−yi)log(1−hθ(xi))′]
=
−
1
m
∑
I
=
1
m
[
(
y
i
)
′
l
o
g
h
θ
(
x
i
)
+
y
i
(
l
o
g
h
θ
(
x
i
)
)
′
]
+
[
(
1
−
y
i
)
′
l
o
g
(
1
−
h
θ
(
x
i
)
)
+
(
1
−
y
i
)
l
o
g
(
1
−
h
θ
(
x
i
)
)
′
]
=−1m∑I=1m[(yi)′loghθ(xi)+yi(loghθ(xi))′]+[(1−yi)′log(1−hθ(xi))+(1−yi)log(1−hθ(xi))′]
=
−
1
m
∑
I
=
1
m
[
(
y
i
)
′
l
o
g
h
θ
(
x
i
)
+
y
i
(
l
o
g
h
θ
(
x
i
)
)
′
]
+
[
(
1
−
y
i
)
′
l
o
g
(
1
−
h
θ
(
x
i
)
)
+
(
1
−
y
i
)
l
o
g
(
1
−
h
θ
(
x
i
)
)
′
]
……….(
h(θ)=11+e−θTX
h
(
θ
)
=
1
1
+
e
−
θ
T
X
带入)
=−1m∑I=1m[yi(log(11+e−θTxi)′]+[(1−yi)log(1+e−θTxi−11+e−θTxi)′]
=
−
1
m
∑
I
=
1
m
[
y
i
(
l
o
g
(
1
1
+
e
−
θ
T
x
i
)
′
]
+
[
(
1
−
y
i
)
l
o
g
(
1
+
e
−
θ
T
x
i
−
1
1
+
e
−
θ
T
x
i
)
′
]
……….(
(Cu)′=Cu′,(log(u))′=1uu′
(
C
u
)
′
=
C
u
′
,
(
l
o
g
(
u
)
)
′
=
1
u
u
′
)
=−1m∑I=1m[yi(1+e−θTxi)(11+e−θTxi)′]+[(1−yi)(1+e−θTxie−θTxi)(e−θTxi1+e−θTxi)′]
=
−
1
m
∑
I
=
1
m
[
y
i
(
1
+
e
−
θ
T
x
i
)
(
1
1
+
e
−
θ
T
x
i
)
′
]
+
[
(
1
−
y
i
)
(
1
+
e
−
θ
T
x
i
e
−
θ
T
x
i
)
(
e
−
θ
T
x
i
1
+
e
−
θ
T
x
i
)
′
]
……….(
(uv)′=u′v−uv′v2,(e−Cx)′=−Ce−Cx
(
u
v
)
′
=
u
′
v
−
u
v
′
v
2
,
(
e
−
C
x
)
′
=
−
C
e
−
C
x
)
=−1m∑I=1m[yi(1+e−θTxi)(0−(1+e−θTxi)′(1+e−θTxi)2)]−[(1−yi)(1+e−θTxie−θTxi)((e−θTxi)′(1+e−θTxi)2)]
=
−
1
m
∑
I
=
1
m
[
y
i
(
1
+
e
−
θ
T
x
i
)
(
0
−
(
1
+
e
−
θ
T
x
i
)
′
(
1
+
e
−
θ
T
x
i
)
2
)
]
−
[
(
1
−
y
i
)
(
1
+
e
−
θ
T
x
i
e
−
θ
T
x
i
)
(
(
e
−
θ
T
x
i
)
′
(
1
+
e
−
θ
T
x
i
)
2
)
]
=−1m∑I=1m[yi(−(1+e−θTxi)′(1+e−θTxi))]−[(1−yi)(xi(1+e−θTxi))]
=
−
1
m
∑
I
=
1
m
[
y
i
(
−
(
1
+
e
−
θ
T
x
i
)
′
(
1
+
e
−
θ
T
x
i
)
)
]
−
[
(
1
−
y
i
)
(
x
i
(
1
+
e
−
θ
T
x
i
)
)
]
……….(
(e−Cx)′=−Ce−Cx
(
e
−
C
x
)
′
=
−
C
e
−
C
x
)
=−1m∑I=1myixe−θTxi−x+xyi1+e−θTxi
=
−
1
m
∑
I
=
1
m
y
i
x
e
−
θ
T
x
i
−
x
+
x
y
i
1
+
e
−
θ
T
x
i
=−1m∑I=1myi(1+e−θTxi−1)1+e−θTxixj
=
−
1
m
∑
I
=
1
m
y
i
(
1
+
e
−
θ
T
x
i
−
1
)
1
+
e
−
θ
T
x
i
x
j
=−1m∑I=1myi(1+e−θTxi−1)1+e−θTxixj
=
−
1
m
∑
I
=
1
m
y
i
(
1
+
e
−
θ
T
x
i
−
1
)
1
+
e
−
θ
T
x
i
x
j
=−1m∑I=1myi−11+e−θTxixi
=
−
1
m
∑
I
=
1
m
y
i
−
1
1
+
e
−
θ
T
x
i
x
i
=−1m∑I=1m[yi−hθ(xi)]xj
=
−
1
m
∑
I
=
1
m
[
y
i
−
h
θ
(
x
i
)
]
x
j