逻辑回归
原来的线性回归函数:
h
θ
(
x
)
=
θ
T
∗
x
h_{\theta}(x) = {\theta}^T * x
hθ(x)=θT∗x
θ T ∗ x = = θ ⋅ x {\theta}^T * x == \theta \cdot x θT∗x==θ⋅x
表示两个向量的内积, 即两个向量做Dot-Product
在此基础上增加Sigmoid函数,改成如下的逻辑回归函数:
h
θ
(
x
)
=
1
(
1
+
e
−
θ
T
∗
x
)
h_{\theta}(x) = \frac{1} {(1 + e^{-{\theta}^T * x})}
hθ(x)=(1+e−θT∗x)1
其中:
h
θ
(
x
)
=
g
(
θ
T
∗
x
)
h_{\theta}(x) = g({\theta}^T * x)
hθ(x)=g(θT∗x)
g
(
z
)
=
1
(
1
+
e
−
z
)
g(z) = \frac{1} {(1 + e^{-z})}
g(z)=(1+e−z)1
决策界限
传统的线性回归函数 θ T x {\theta}^T x θTx 假如表示成如下:
θ 0 + θ 1 x 1 + θ 2 x 2 {\theta}_0 + {\theta}_1x_1 + {\theta}_2x_2 θ0+θ1x1+θ2x2
令
θ
=
[
−
3
1
1
]
{\theta}=\left[\begin{matrix}-3 \\1\\1\end{matrix}\right]
θ=⎣⎡−311⎦⎤,可知:
x
1
+
x
2
=
3
x_1+x_2=3
x1+x2=3就是这个决策界限函数.
逻辑回归代价函数的简单写法
C o s t ( h θ ( x ) , y ) = − y ∗ l o g ( h θ ( x ) ) − ( 1 − y ) ∗ l o g ( 1 − h θ ( x ) ) Cost( h_{\theta}(x), y) = -y*log(h_{\theta}(x)) - (1-y)*log(1 - h_{\theta}(x)) Cost(hθ(x),y)=−y∗log(hθ(x))−(1−y)∗log(1−hθ(x))
注意:这里的log()函数相当于ln(),即以e为底的对数.
最终代价函数如下:
J
(
θ
)
=
1
/
m
∗
∑
i
=
1
m
C
o
s
t
(
h
θ
(
x
(
i
)
)
,
y
(
i
)
)
J(\theta) = 1/m * \sum_{i=1}^m Cost(h_{\theta}(x^{(i)}), y^{(i)})
J(θ)=1/m∗i=1∑mCost(hθ(x(i)),y(i))
J
(
θ
)
=
−
1
/
m
∗
∑
i
=
1
m
[
y
(
i
)
l
o
g
(
h
θ
(
x
(
i
)
)
)
+
(
1
−
y
(
i
)
)
l
o
g
(
1
−
h
θ
(
x
(
i
)
)
]
J(\theta) = -1/m * \sum_{i=1}^m [y^{(i)} log(h_{\theta}(x^{(i)})) + (1-y^{(i)})log(1 - h_{\theta}(x^{(i)})]
J(θ)=−1/m∗i=1∑m[y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i))]
最终对上式求得偏导数如下:
∂
∂
θ
j
J
(
θ
)
=
1
/
m
∗
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
\frac{\partial}{\partial \theta_j}J(\theta)=1/m*\sum_{i=1}^m (h_{\theta}(x^{(i)})-y^{(i)})x_j^{(i)}
∂θj∂J(θ)=1/m∗i=1∑m(hθ(x(i))−y(i))xj(i)
其中求导过程如下:
这里对复合函数求导:
令:
f
1
(
θ
j
)
=
y
(
i
)
l
o
g
(
h
θ
(
x
(
i
)
)
)
f_1(\theta_j)=y^{(i)} log(h_{\theta}(x^{(i)}))
f1(θj)=y(i)log(hθ(x(i)))
u
(
θ
j
)
=
h
θ
(
x
(
i
)
)
;
g
(
u
)
=
y
(
i
)
l
o
g
(
u
)
;
u(\theta_j)=h_{\theta}(x^{(i)}) ;g(u)=y^{(i)}log(u);
u(θj)=hθ(x(i));g(u)=y(i)log(u);
∴
f
1
′
(
θ
j
)
=
g
′
(
u
)
∗
u
′
(
θ
j
)
\therefore f_1'(\theta_j)=g'(u)*u'(\theta_j)
∴f1′(θj)=g′(u)∗u′(θj)
∴
f
1
′
(
θ
j
)
=
y
(
i
)
u
(
θ
j
)
∗
u
′
(
θ
j
)
=
y
(
i
)
h
θ
(
x
(
i
)
)
∗
u
′
(
θ
j
)
\therefore f_1'(\theta_j)=\frac{y^{(i)}}{u(\theta_j)}*u'(\theta_j)=\frac{y^{(i)}}{h_{\theta}(x^{(i)})}*u'(\theta_j)
∴f1′(θj)=u(θj)y(i)∗u′(θj)=hθ(x(i))y(i)∗u′(θj)
因为: l o g ′ ( x ) = 1 / x log'(x)=1/x log′(x)=1/x
( e x ) ′ = e x (e^x)'=e^x (ex)′=ex
g ′ ( z ) = g ( z ) ∗ ( 1 − g ( z ) ) g'(z)=g(z)*(1-g(z)) g′(z)=g(z)∗(1−g(z))
u
′
(
θ
j
)
=
e
−
θ
T
∗
x
(
1
+
e
−
θ
T
∗
x
)
2
∗
x
j
(
i
)
=
h
θ
(
x
(
i
)
)
∗
(
1
−
h
θ
(
x
(
i
)
)
)
∗
x
j
(
i
)
u'(\theta_j)= \frac{e^{-{\theta}^T * x}} {(1 + e^{-{\theta}^T * x})^2}*x_j^{(i)}=h_{\theta}(x^{(i)})*(1-h_{\theta}(x^{(i)}))*x_j^{(i)}
u′(θj)=(1+e−θT∗x)2e−θT∗x∗xj(i)=hθ(x(i))∗(1−hθ(x(i)))∗xj(i)
∴
f
1
′
(
θ
j
)
=
y
(
i
)
∗
(
1
−
h
θ
(
x
(
i
)
)
)
∗
x
j
(
i
)
\therefore f_1'(\theta_j)=y^{(i)}*(1-h_{\theta}(x^{(i)}))*x_j^{(i)}
∴f1′(θj)=y(i)∗(1−hθ(x(i)))∗xj(i)
又有:
f
2
(
θ
j
)
=
(
1
−
y
(
i
)
)
l
o
g
(
1
−
h
θ
(
x
(
i
)
)
f_2(\theta_j)= (1-y^{(i)})log(1 - h_{\theta}(x^{(i)})
f2(θj)=(1−y(i))log(1−hθ(x(i))
最终:
f
2
′
(
θ
j
)
=
(
1
−
y
(
i
)
)
∗
−
h
θ
(
x
(
i
)
)
∗
(
1
−
h
θ
(
x
(
i
)
)
)
∗
x
j
(
i
)
1
−
h
θ
(
x
(
i
)
)
f_2'(\theta_j)=(1-y^{(i)})* \frac{-h_{\theta}(x^{(i)})*(1-h_{\theta}(x^{(i)}))*x_j^{(i)}}{1 - h_{\theta}(x^{(i)})}
f2′(θj)=(1−y(i))∗1−hθ(x(i))−hθ(x(i))∗(1−hθ(x(i)))∗xj(i)
消去分子分母后得到:
f
2
′
(
θ
j
)
=
−
(
1
−
y
(
i
)
)
∗
h
θ
(
x
(
i
)
)
∗
x
j
(
i
)
f_2'(\theta_j)=-(1-y^{(i)})* h_{\theta}(x^{(i)})*x_j^{(i)}
f2′(θj)=−(1−y(i))∗hθ(x(i))∗xj(i)
所以:
∂
∂
θ
j
J
(
θ
)
=
1
/
m
∗
∑
i
=
1
m
(
f
1
′
(
θ
j
)
+
f
2
′
(
θ
j
)
)
\frac{\partial}{\partial \theta_j}J(\theta)=1/m*\sum_{i=1}^m (f_1'(\theta_j)+f_2'(\theta_j))
∂θj∂J(θ)=1/m∗i=1∑m(f1′(θj)+f2′(θj))
∂
∂
θ
j
J
(
θ
)
=
−
1
/
m
∗
∑
i
=
1
m
(
y
(
i
)
−
h
θ
(
x
(
i
)
)
)
x
j
(
i
)
\frac{\partial}{\partial \theta_j}J(\theta)=-1/m*\sum_{i=1}^m (y^{(i)}-h_{\theta}(x^{(i)}))x_j^{(i)}
∂θj∂J(θ)=−1/m∗i=1∑m(y(i)−hθ(x(i)))xj(i)
然后提出一个负号,最终求得偏导数如下:
∂
∂
θ
j
J
(
θ
)
=
1
/
m
∗
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
\frac{\partial}{\partial \theta_j}J(\theta)=1/m*\sum_{i=1}^m (h_{\theta}(x^{(i)})-y^{(i)})x_j^{(i)}
∂θj∂J(θ)=1/m∗i=1∑m(hθ(x(i))−y(i))xj(i)