输出为包含多个值的离散序列。
y
∈
{
s
1
,
s
2
,
.
.
.
,
s
K
}
y\in \{ s_1, s_2, ..., s_K \}
y∈{s1,s2,...,sK}
其中:K为输出样本不同值的个数。
1.转化为多个二元分类。
将预测值
y
y
y进行以下映射,可划分为
K
K
K组:
z
k
=
{
1
(
y
=
s
k
)
0
(
y
≠
s
k
)
z_k= \begin{cases} 1& (y = s_k) \\ 0& (y \ne s_k)\\ \end{cases}
zk={10(y=sk)(y=sk)
其中:
k
=
1
,
2
,
.
.
.
,
K
k = 1, 2, ..., K
k=1,2,...,K。
转化为测试输入:
x
⃗
\vec{x}
x,测试输出:
z
k
{z}_k
zk。
利用二元分类可得到
θ
⃗
(
k
)
\vec{\theta}^{(k)}
θ(k)。
代入
h
(
x
⃗
)
h(\vec{x})
h(x),有:
h
(
k
)
(
x
t
⃗
)
=
1
1
+
e
−
θ
⃗
(
k
)
T
x
⃗
t
h_{(k)}(\vec{x_t})=\frac{1}{1+e^{- {\vec{\theta}^{(k)}}^T \vec{x}_t}}
h(k)(xt)=1+e−θ(k)Txt1
其中:
x
t
⃗
\vec{x_t}
xt表示单个测试数据的输入向量。
这样可以得到
K
K
K个假设,选择
h
(
k
)
(
x
t
⃗
)
h_{(k)}(\vec{x_t})
h(k)(xt)最大的假设,则输入测试向量
x
t
⃗
\vec{x_t}
xt的预测输出为
s
k
s_k
sk。
2.多元假设函数
假设函数:
h
(
x
⃗
)
=
1
1
+
∑
k
=
1
k
=
K
−
1
e
−
θ
⃗
(
k
)
T
x
⃗
h(\vec{x}) =\frac{1}{ 1+\sum_{k=1}^{k=K-1} e^{ -{\vec{\theta}^{(k)}}^T\vec{x}}}
h(x)=1+∑k=1k=K−1e−θ(k)Tx1
其中:
x
⃗
=
[
x
0
,
x
1
,
.
.
.
,
x
n
]
T
∈
R
(
n
+
1
)
×
1
θ
⃗
(
k
)
=
[
θ
0
(
k
)
,
θ
1
(
k
)
,
.
.
.
,
θ
n
(
k
)
]
T
∈
R
(
n
+
1
)
×
1
(
n
为
特
征
个
数
)
\begin{aligned} \vec{x}=[x_0, x_1, ...,x_n]^T\in\mathbb R^{(n+1)\times1} \\ \vec{\theta}^{(k)}=[\theta_0^{(k)}, \theta_1^{(k)}, ...,\theta_n^{(k)}]^T\in\mathbb R^{(n+1)\times1} \\ (n为特征个数) \end{aligned}
x=[x0,x1,...,xn]T∈R(n+1)×1θ(k)=[θ0(k),θ1(k),...,θn(k)]T∈R(n+1)×1(n为特征个数)
当测试输入
x
⃗
\vec x
x时,得到正确预测值的概率为:
p
=
{
h
(
x
⃗
)
P
(
y
=
s
1
)
(
1
−
h
(
x
⃗
)
)
P
(
y
≠
s
1
)
}
⋅
{
h
(
x
⃗
)
P
(
y
=
s
2
)
(
1
−
h
(
x
⃗
)
)
P
(
y
≠
s
2
)
}
⋅
.
.
.
{
h
(
x
⃗
)
P
(
y
=
s
K
)
(
1
−
h
(
x
⃗
)
)
P
(
y
≠
s
K
)
}
p=\{h(\vec x)^{P(y=s_1)}(1-h(\vec x))^{P(y\ne s_1)}\}\cdot \{h(\vec x)^{P(y=s_2)}(1-h(\vec x))^{P(y\ne s_2)}\}\cdot ...\{h(\vec x)^{P(y=s_K)}(1-h(\vec x))^{P(y\ne s_K)}\}
p={h(x)P(y=s1)(1−h(x))P(y=s1)}⋅{h(x)P(y=s2)(1−h(x))P(y=s2)}⋅...{h(x)P(y=sK)(1−h(x))P(y=sK)}
所以有:
p
=
∏
k
=
1
k
=
K
(
h
(
x
⃗
)
P
(
y
=
s
k
)
(
1
−
h
(
x
⃗
)
)
P
(
y
≠
s
k
)
)
p=\prod_{k=1}^{k=K}(h(\vec x)^{P(y=s_k)}(1-h(\vec x))^{P(y\ne s_k)})
p=k=1∏k=K(h(x)P(y=sk)(1−h(x))P(y=sk))
故似然函数:
l
(
θ
⃗
)
=
∏
i
=
1
i
=
m
∏
k
=
1
k
=
K
(
h
(
x
⃗
(
i
)
)
P
(
y
(
i
)
=
s
k
)
(
1
−
h
(
x
⃗
(
i
)
)
)
P
(
y
(
i
)
≠
s
k
)
)
l(\vec{\theta})=\prod_{i=1}^{i=m}\prod_{k=1}^{k=K}(h(\vec x^{(i)})^{P(y^{(i)}=s_k)}(1-h(\vec x^{(i)}))^{P(y^{(i)}\ne s_k)})
l(θ)=i=1∏i=mk=1∏k=K(h(x(i))P(y(i)=sk)(1−h(x(i)))P(y(i)=sk))
两边取对数有:
L
(
θ
⃗
)
=
l
n
(
l
(
θ
⃗
)
)
=
∑
i
=
1
i
=
m
∑
k
=
1
k
=
K
(
P
(
y
(
i
)
=
s
k
)
l
n
(
h
(
x
⃗
(
i
)
)
)
+
(
1
−
P
(
y
(
i
)
=
s
k
)
)
(
1
−
l
n
(
h
(
x
⃗
(
i
)
)
)
)
L(\vec{\theta}) =ln(l(\vec{\theta}))=\sum_{i=1}^{i=m}\sum_{k=1}^{k=K}(P(y^{(i)}=s_k)ln(h(\vec x^{(i)}))+(1-P(y^{(i)}=s_k))(1-ln(h(\vec x^{(i)})))
L(θ)=ln(l(θ))=i=1∑i=mk=1∑k=K(P(y(i)=sk)ln(h(x(i)))+(1−P(y(i)=sk))(1−ln(h(x(i))))
故代价函数:
J
(
θ
⃗
)
=
−
∑
i
=
1
i
=
m
∑
k
=
1
k
=
K
(
P
y
(
i
)
=
s
k
l
n
(
h
(
x
⃗
(
i
)
)
)
+
(
1
−
P
y
(
i
)
=
s
k
)
l
n
(
1
−
h
(
x
⃗
(
i
)
)
)
)
J( \vec{\theta}) = -\sum_{i=1}^{i=m}\sum_{k=1}^{k=K}(P_{y^{(i)}=s_k}ln(h(\vec x^{(i)}))+(1-P_{y^{(i)}=s_k})ln(1-h(\vec x^{(i)})))
J(θ)=−i=1∑i=mk=1∑k=K(Py(i)=skln(h(x(i)))+(1−Py(i)=sk)ln(1−h(x(i))))
其中:
y
⃗
=
[
y
(
1
)
,
y
(
2
)
,
.
.
.
,
y
(
m
)
]
T
y
(
i
)
∈
{
s
1
,
s
2
,
.
.
.
,
s
K
}
(
m
为
测
试
样
本
个
数
)
\begin{aligned} &\vec{y}=[y^{(1)}, y^{(2)}, ...,y^{(m)}]^T\\ &y^{(i)}\in \{ s_1, s_2, ..., s_K \} \\ &(m为测试样本个数) \end{aligned}
y=[y(1),y(2),...,y(m)]Ty(i)∈{s1,s2,...,sK}(m为测试样本个数)
梯度下降更新:
θ
j
(
t
)
:
=
θ
j
(
t
)
−
α
∂
J
(
θ
⃗
)
∂
θ
j
(
t
)
\theta_j^{(t)}:=\theta_j^{(t)}-\alpha \frac{\partial J( \vec{\theta})}{\partial \theta_j^{(t)}}
θj(t):=θj(t)−α∂θj(t)∂J(θ)
其中:
t
=
1
,
2
,
.
.
.
,
K
−
1
t=1,2,...,K-1
t=1,2,...,K−1。
∂
J
(
θ
⃗
)
∂
θ
j
(
t
)
=
−
∑
i
=
1
i
=
m
∑
k
=
1
k
=
K
(
P
y
(
i
)
=
s
k
l
n
(
h
(
x
⃗
(
i
)
)
)
+
(
1
−
P
y
(
i
)
=
s
k
)
l
n
(
1
−
h
(
x
⃗
(
i
)
)
)
)
=
∑
i
=
1
i
=
m
∑
k
=
1
k
=
K
(
h
(
x
⃗
(
i
)
)
−
P
y
(
i
)
=
s
k
)
x
j
(
e
−
θ
⃗
(
t
)
T
x
⃗
∑
u
=
1
u
=
K
−
1
e
−
θ
⃗
(
u
)
T
x
⃗
)
\begin{aligned} \frac{\partial J( \vec{\theta})}{\partial \theta_j^{(t)}} &= -\sum_{i=1}^{i=m}\sum_{k=1}^{k=K}(P_{y^{(i)}=s_k}ln(h(\vec x^{(i)}))+(1-P_{y^{(i)}=s_k})ln(1-h(\vec x^{(i)})))\\ &=\sum_{i=1}^{i=m}\sum_{k=1}^{k=K}(h(\vec x^{(i)})-P_{y^{(i)}=s_k})x_j(\frac{e^{ -{\vec{\theta}^{(t)}}^T\vec{x}}}{\sum_{u=1}^{u=K-1} e^{ -{\vec{\theta}^{(u)}}^T\vec{x}}}) \end{aligned}
∂θj(t)∂J(θ)=−i=1∑i=mk=1∑k=K(Py(i)=skln(h(x(i)))+(1−Py(i)=sk)ln(1−h(x(i))))=i=1∑i=mk=1∑k=K(h(x(i))−Py(i)=sk)xj(∑u=1u=K−1e−θ(u)Txe−θ(t)Tx)