预测值为0或者1的离散序列。
将
x
⃗
\vec x
x映射成0或者1,使用sigmoid函数进行模拟。
假设函数:
h
(
x
⃗
)
=
1
1
+
e
−
θ
⃗
T
x
⃗
h(\vec{x}) =\frac{1}{ 1+e^{ -\vec{\theta}^T\vec{x}}}
h(x)=1+e−θTx1
其中:
x
⃗
=
[
x
0
,
x
1
,
.
.
.
,
x
n
]
T
∈
R
(
n
+
1
)
×
1
θ
⃗
=
[
θ
0
,
θ
1
,
.
.
.
,
θ
n
]
T
∈
R
(
n
+
1
)
×
1
(
n
为
特
征
个
数
)
\begin{aligned} \vec{x}=[x_0, x_1, ...,x_n]^T\in\mathbb R^{(n+1)\times1} \\ \vec{\theta}=[\theta_0, \theta_1, ...,\theta_n]^T\in\mathbb R^{(n+1)\times1} \\ (n为特征个数) \end{aligned}
x=[x0,x1,...,xn]T∈R(n+1)×1θ=[θ0,θ1,...,θn]T∈R(n+1)×1(n为特征个数)
即找到一系列参数
θ
⃗
\vec{\theta}
θ尽可能使得
y
=
0
y=0
y=0时
h
(
x
⃗
)
→
0
h(\vec x)\rightarrow 0
h(x)→0,
y
=
1
y=1
y=1时
h
(
x
⃗
)
→
1
h(\vec x)\rightarrow 1
h(x)→1。
将
h
(
x
⃗
)
h(\vec x)
h(x)视为
h
(
x
⃗
)
=
1
h(\vec x)=1
h(x)=1的概率,则
h
(
x
⃗
)
h(\vec x)
h(x)预测正确的概率为:
p
=
h
(
x
⃗
)
y
(
1
−
h
(
x
⃗
)
)
(
1
−
y
)
p=h(\vec x)^y(1-h(\vec x))^{(1-y)}
p=h(x)y(1−h(x))(1−y)
当
y
=
0
y=0
y=0时,
h
(
x
⃗
)
h(\vec x)
h(x)预测正确的概率(即
h
(
x
⃗
)
=
0
h(\vec x)=0
h(x)=0)为
1
−
h
(
x
⃗
)
1-h(\vec x)
1−h(x)。
当
y
=
1
y=1
y=1时,
h
(
x
⃗
)
h(\vec x)
h(x)预测正确的概率(即
h
(
x
⃗
)
=
1
h(\vec x)=1
h(x)=1)为
h
(
x
⃗
)
h(\vec x)
h(x)。
要使预测正确的概率最大,则对所有的测试数据满足:
max
θ
⃗
l
(
θ
⃗
)
=
max
θ
⃗
(
p
(
1
)
⋅
p
(
2
)
⋅
.
.
.
p
(
m
)
)
=
max
θ
⃗
∏
i
=
1
i
=
m
h
(
x
⃗
(
i
)
)
y
(
i
)
(
1
−
h
(
x
⃗
(
i
)
)
)
(
1
−
y
(
i
)
)
\begin{aligned} \max_{\vec{\theta}}l(\vec{\theta}) &= \max_{\vec{\theta}}(p^{(1)}\cdot p^{(2)}\cdot ...p^{(m)})\\ &= \max_{\vec{\theta}}\prod_{i=1}^{i=m} h(\vec x^{(i)})^{y^{(i)}}(1-h(\vec x^{(i)}))^{(1-y^{(i)})}\\ \end{aligned}
θmaxl(θ)=θmax(p(1)⋅p(2)⋅...p(m))=θmaxi=1∏i=mh(x(i))y(i)(1−h(x(i)))(1−y(i))
两边取对数有:
max
θ
⃗
L
(
θ
⃗
)
=
max
θ
⃗
l
n
(
l
(
θ
⃗
)
)
)
=
max
θ
⃗
∑
i
=
1
i
=
m
y
(
i
)
l
n
(
h
(
x
⃗
(
i
)
)
)
+
(
1
−
y
(
i
)
)
l
n
(
(
1
−
h
(
x
⃗
(
i
)
)
)
)
\begin{aligned} \max_{\vec{\theta}}L(\vec{\theta}) &= \max_{\vec{\theta}}ln(l(\vec{\theta})))\\ &= \max_{\vec{\theta}}\sum_{i=1}^{i=m} y^{(i)}ln(h(\vec x^{(i)}))+(1-y^{(i)})ln((1-h(\vec x^{(i)})))\\ \end{aligned}
θmaxL(θ)=θmaxln(l(θ)))=θmaxi=1∑i=my(i)ln(h(x(i)))+(1−y(i))ln((1−h(x(i))))
所以令代价函数
J
(
θ
⃗
)
=
−
1
m
L
(
θ
⃗
)
J( \vec{\theta})=-\frac{1}{m}L(\vec{\theta})
J(θ)=−m1L(θ)。转化成求使
J
(
θ
⃗
)
J( \vec{\theta})
J(θ)最小的
θ
⃗
\vec{\theta}
θ。
故代价函数:
J
(
θ
⃗
)
=
−
1
m
(
∑
i
=
1
i
=
m
y
(
i
)
l
n
(
h
(
x
⃗
(
i
)
)
)
+
(
1
−
y
(
i
)
)
l
n
(
1
−
h
(
x
⃗
(
i
)
)
)
)
J( \vec{\theta}) = -\frac{1}{m}(\sum_{i=1}^{i=m}y^{(i)}ln(h(\vec{x}^{(i)}))+(1-y^{(i)})ln(1-h(\vec{x}^{(i)})))
J(θ)=−m1(i=1∑i=my(i)ln(h(x(i)))+(1−y(i))ln(1−h(x(i))))
其中:
y
⃗
=
[
y
(
0
)
,
y
(
1
)
,
.
.
.
,
y
(
m
)
]
T
∈
R
(
m
×
1
)
y
(
i
)
∈
{
0
,
1
}
(
m
为
测
试
样
本
个
数
)
\begin{aligned} \vec{y}=[y^{(0)}, y^{(1)}, ...,y^{(m)}]^T\in\mathbb R^{(m\times1)} \\ y^{(i)}\in\{0, 1\} (m为测试样本个数) \end{aligned}
y=[y(0),y(1),...,y(m)]T∈R(m×1)y(i)∈{0,1}(m为测试样本个数)
代价函数还可以做如下解释:
当
y
=
0
y=0
y=0时,
h
(
x
⃗
)
=
1
h(\vec x)=1
h(x)=1的代价趋于无穷,
h
(
x
⃗
)
=
0
h(\vec x)=0
h(x)=0的代价为零。
当
y
=
1
y=1
y=1时,
h
(
x
⃗
)
=
0
h(\vec x)=0
h(x)=0的代价趋于无穷,
h
(
x
⃗
)
=
1
h(\vec x)=1
h(x)=1的代价为零。
梯度下降法:
θ
⃗
j
:
=
θ
⃗
j
−
α
∂
J
(
θ
⃗
)
∂
θ
j
\begin{aligned} \vec{\theta}_j&:=\vec{\theta}_j-\alpha\frac{\partial J(\vec{\theta})}{\partial \theta_j} \\ \end{aligned}
θj:=θj−α∂θj∂J(θ)
∂
J
(
θ
⃗
)
∂
θ
j
=
−
1
m
∑
i
=
1
i
=
m
(
y
(
i
)
(
h
(
x
⃗
(
i
)
)
)
′
h
(
x
⃗
(
i
)
)
+
(
1
−
y
(
i
)
)
−
(
h
(
x
⃗
(
i
)
)
)
′
1
−
h
(
x
⃗
(
i
)
)
)
=
−
1
m
∑
i
=
1
i
=
m
(
(
y
(
i
)
h
(
x
⃗
(
i
)
)
−
1
−
y
(
i
)
1
−
h
(
x
⃗
(
i
)
)
)
(
h
(
x
⃗
(
i
)
)
)
′
)
=
−
1
m
∑
i
=
1
i
=
m
(
(
(
1
+
e
−
θ
⃗
T
x
⃗
(
i
)
)
(
y
(
i
)
e
−
θ
⃗
T
x
⃗
(
i
)
+
y
(
i
)
−
1
)
e
−
θ
⃗
T
x
⃗
(
i
)
)
(
e
−
θ
⃗
T
x
⃗
(
i
)
x
j
(
i
)
(
1
+
e
−
θ
⃗
T
x
⃗
(
i
)
)
2
)
)
=
1
m
∑
i
=
1
i
=
m
(
x
j
(
i
)
−
x
j
(
i
)
y
(
i
)
(
1
+
e
−
θ
⃗
T
x
⃗
(
i
)
)
1
+
e
−
θ
⃗
T
x
⃗
(
i
)
)
=
1
m
∑
i
=
1
i
=
m
(
h
(
x
⃗
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
\begin{aligned} \frac{\partial J(\vec{\theta})}{\partial \theta_j} &= -\frac{1}{m}\sum_{i=1}^{i=m}(y^{(i)}\frac{(h(\vec{x}^{(i)}))^{'}}{h(\vec{x}^{(i)})}+(1-y^{(i)})\frac{-(h(\vec{x}^{(i)}))^{'}}{1-h(\vec{x}^{(i)})}) \\ &= -\frac{1}{m}\sum_{i=1}^{i=m}((\frac{y^{(i)}}{h(\vec{x}^{(i)})}-\frac{1-y^{(i)}}{1-h(\vec{x}^{(i)})})(h(\vec{x}^{(i)}))^{'}) \\ &= -\frac{1}{m}\sum_{i=1}^{i=m}((\frac{(1+e^{-\vec{\theta}^T\vec{x}^{(i)}})(y^{(i)}e^{-\vec{\theta}^T\vec{x}^{(i)}}+y^{(i)}-1)}{e^{-\vec{\theta}^T\vec{x}^{(i)}}})(\frac{e^{-\vec{\theta}^T\vec{x}^{(i)}}x_j^{(i)}}{(1+e^{-\vec{\theta}^T\vec{x}^{(i)}})^2})) \\ &= \frac{1}{m}\sum_{i=1}^{i=m}(\frac{x_j^{(i)}-x_j^{(i)}y^{(i)}(1+e^{-\vec{\theta}^T\vec{x}^{(i)}})}{1+e^{-\vec{\theta}^T\vec{x}^{(i)}}}) \\ &= \frac{1}{m}\sum_{i=1}^{i=m}(h(\vec{x}^{(i)})-y^{(i)})x_j^{(i)} \\ \end{aligned}
∂θj∂J(θ)=−m1i=1∑i=m(y(i)h(x(i))(h(x(i)))′+(1−y(i))1−h(x(i))−(h(x(i)))′)=−m1i=1∑i=m((h(x(i))y(i)−1−h(x(i))1−y(i))(h(x(i)))′)=−m1i=1∑i=m((e−θTx(i)(1+e−θTx(i))(y(i)e−θTx(i)+y(i)−1))((1+e−θTx(i))2e−θTx(i)xj(i)))=m1i=1∑i=m(1+e−θTx(i)xj(i)−xj(i)y(i)(1+e−θTx(i)))=m1i=1∑i=m(h(x(i))−y(i))xj(i)
逻辑斯蒂回归法二元分类
最新推荐文章于 2020-11-28 17:13:18 发布