文章目录
Semi-Supervised Learning
半监督学习, 针对标记样本数量不足,寻找充分利用未标记样本的方法. 半监督学习一般两个基本假设:
- 聚类假设:处于相同聚类的样本更可能具有相同标记;
- 流型假设:处于很小的局部区域的样本更相似,更可能具有相同标记;
Generative and Discriminative Model
判别式学习对条件概率建模
y
∗
=
arg
max
y
p
(
y
∣
x
)
y^* = \arg\max_yp(y|\pmb x)
y∗=argymaxp(y∣xxx)
生成式学习对联合概率建模
y
∗
=
arg
max
y
p
(
y
∣
x
)
=
arg
max
y
p
(
x
∣
y
)
p
(
y
)
p
(
x
)
=
arg
max
y
p
(
x
∣
y
)
p
(
y
)
y^* = \arg\max_y p(y|\pmb x) = \arg\max_y\frac{p(\pmb x|y)p(y)}{p(\pmb x)} = \arg\max_yp(\pmb x|y)p(y)
y∗=argymaxp(y∣xxx)=argymaxp(xxx)p(xxx∣y)p(y)=argymaxp(xxx∣y)p(y)
生成式假定样本数据服从某一潜在分布(模型泛化能力强),需要充分可靠的知识.
Likelihood Function of Gaussian Mixture Model
高斯混合模型的概率密度函数
p
(
x
∣
Θ
)
=
∑
k
p
(
x
∣
θ
k
)
p
(
θ
k
)
=
∑
k
α
k
p
(
x
∣
θ
k
)
p(\pmb x|\Theta) =\sum_k p(\pmb x|\theta_k)p(\theta_k)=\sum_k\alpha_kp(\pmb x|\theta_k)
p(xxx∣Θ)=k∑p(xxx∣θk)p(θk)=k∑αkp(xxx∣θk)
采用最大后验概率预测
x
\pmb x
xxx的标记,
Y
=
{
1
,
2
,
⋯
,
K
}
\mathcal Y=\{1, 2, \cdots, K\}
Y={1,2,⋯,K},则
f
(
x
)
=
arg
max
y
∈
Y
p
(
y
∣
x
)
=
arg
max
y
∈
Y
∑
k
p
(
y
,
θ
k
∣
x
)
=
arg
max
y
∈
Y
∑
k
p
(
y
∣
θ
k
,
x
)
p
(
θ
k
∣
x
)
\begin{aligned} f(\pmb x)=\arg\max_{y\in\mathcal Y}p(y|\pmb x)=\arg\max_{y\in\mathcal Y}\sum\nolimits_kp(y, \theta_k|\pmb x)=\arg\max_{y\in\mathcal Y}\sum\nolimits_kp(y|\theta_k,\pmb x)p(\theta_k|\pmb x) \end{aligned}
f(xxx)=argy∈Ymaxp(y∣xxx)=argy∈Ymax∑kp(y,θk∣xxx)=argy∈Ymax∑kp(y∣θk,xxx)p(θk∣xxx)
式中
- p ( y ∣ θ k , x ) p(y|\theta_k,\pmb x) p(y∣θk,xxx),表示 x \pmb x xxx由第 k k k个分布生成且标记为 y y y的概率,当且仅当 y = k y=k y=k时,概率为1;
- p ( θ k ∣ x ) p(\theta_k|\pmb x) p(θk∣xxx),表示 x \pmb x xxx由第 k k k个分布生成的后验概率,利用大数据量的未标记数据可提高该概率的准确率;
若类簇与真实类别一一对应,标记样本
x
∈
D
l
\pmb x\in D_l
xxx∈Dl,仅属于特定簇,则
p
D
l
(
x
,
y
=
i
∣
Θ
)
=
α
i
p
(
x
∣
θ
i
)
=
∑
k
α
k
p
(
x
∣
θ
k
)
p
(
y
=
i
∣
θ
k
,
x
)
p_{D_l}(\pmb x, y=i|\Theta)=\alpha_ip(\pmb x|\theta_i)=\sum_k\alpha_kp(\pmb x|\theta_k)p(y=i|\theta_k,\pmb x)
pDl(xxx,y=i∣Θ)=αip(xxx∣θi)=k∑αkp(xxx∣θk)p(y=i∣θk,xxx)
上式中仅当
i
=
k
i=k
i=k时,
p
(
y
=
k
∣
θ
i
,
x
)
p(y=k|\theta_i, \pmb x)
p(y=k∣θi,xxx)为1,否则为0. 无标记样本
x
∈
D
u
\pmb x\in D_u
xxx∈Du,可能属于任何类簇,则
p
D
u
(
x
∣
Θ
)
=
∑
k
α
k
p
(
x
∣
θ
k
)
p_{D_u}(\pmb x|\Theta)=\sum_k\alpha_kp(\pmb x|\theta_k)
pDu(xxx∣Θ)=k∑αkp(xxx∣θk)
对数似然函数
L
(
Θ
∣
D
l
∪
D
u
)
=
L
(
Θ
∣
D
l
)
+
L
(
Θ
∣
D
u
)
=
∑
(
x
,
y
)
∈
D
l
ln
p
(
x
,
y
=
k
∣
Θ
)
+
∑
(
x
,
y
)
∈
D
u
ln
p
(
x
∣
Θ
)
=
∑
(
x
,
y
)
∈
D
l
ln
∑
k
α
k
p
(
x
∣
θ
k
)
p
(
y
=
i
∣
θ
k
,
x
)
+
∑
(
x
,
y
)
∈
D
u
ln
∑
k
α
k
p
(
x
∣
θ
k
)
\begin{aligned} L(\Theta|D_l\cup D_u) &=L(\Theta|D_l) + L(\Theta|D_u)\\[1ex] &=\sum_{(\pmb x, y)\in D_l}\ln p(\pmb x, y=k|\Theta) + \sum_{(\pmb x, y)\in D_u}\ln p(\pmb x|\Theta)\\[1ex] &=\sum_{(\pmb x, y)\in D_l}\ln \sum_k\alpha_kp(\pmb x|\theta_k)p(y=i|\theta_k,\pmb x) + \sum_{(\pmb x, y)\in D_u}\ln \sum_k\alpha_kp(\pmb x|\theta_k)\\[1ex] \end{aligned}
L(Θ∣Dl∪Du)=L(Θ∣Dl)+L(Θ∣Du)=(xxx,y)∈Dl∑lnp(xxx,y=k∣Θ)+(xxx,y)∈Du∑lnp(xxx∣Θ)=(xxx,y)∈Dl∑lnk∑αkp(xxx∣θk)p(y=i∣θk,xxx)+(xxx,y)∈Du∑lnk∑αkp(xxx∣θk)
Parameter Estimation
GMM的参数估计使用EM算法,即
Θ
=
max
Θ
L
(
Θ
)
=
arg
max
Θ
Q
(
Θ
,
Θ
t
)
=
arg
max
θ
∑
j
∑
k
P
(
z
k
∣
x
j
,
Θ
t
)
ln
p
(
x
j
∣
z
k
,
Θ
)
p
(
z
k
∣
Θ
)
\Theta = \max_{\Theta} L(\Theta) = \arg\max_{\Theta}Q(\Theta, \Theta_t) =\arg\max_{\theta}\sum_j\sum_kP(z_k|\pmb x_j,\Theta_t)\ln p(\pmb x_j|z_k,\Theta)p(z_k|\Theta)
Θ=ΘmaxL(Θ)=argΘmaxQ(Θ,Θt)=argθmaxj∑k∑P(zk∣xxxj,Θt)lnp(xxxj∣zk,Θ)p(zk∣Θ)
其中隐变量期望,或者样本
x
j
\pmb x_j
xxxj属于第
k
k
k个分布的概率,即E步
λ
j
k
=
p
(
z
k
∣
x
j
,
Θ
t
)
=
α
k
p
(
x
k
∣
θ
k
)
∑
k
p
(
x
k
∣
θ
k
)
\lambda_{jk}= p(z_k|\pmb x_j,\Theta_t) =\frac{\alpha_kp(\pmb x_k|\theta_k)}{\sum_kp(\pmb x_k|\theta_k)}
λjk=p(zk∣xxxj,Θt)=∑kp(xxxk∣θk)αkp(xxxk∣θk)
令
N
k
N_k
Nk表示第
k
k
k类有标记的样本数,M步
μ
k
=
∑
x
j
∈
D
u
λ
j
k
x
j
+
∑
(
x
j
,
y
j
)
∈
D
l
∩
y
i
=
k
x
j
N
k
+
∑
x
j
∈
D
u
λ
j
k
σ
k
2
=
∑
x
j
∈
D
u
λ
j
k
(
x
j
−
μ
k
)
(
x
j
−
μ
k
)
T
+
∑
(
x
j
,
y
j
)
∈
D
l
∩
y
i
=
k
(
x
j
−
μ
k
)
(
x
j
−
μ
k
)
T
N
k
+
∑
x
j
∈
D
u
λ
j
k
α
k
=
1
N
(
N
k
+
∑
x
j
∈
D
u
λ
j
k
)
\begin{aligned} \pmb\mu_k &=\frac{\sum_{\pmb x_j\in D_u}\lambda_{jk}\pmb x_j+\sum_{(\pmb x_j, y_j)\in D_l\cap y_i=k}\pmb x_j}{N_k + \sum_{\pmb x_j \in D_u}\lambda_{jk}}\\ \pmb\sigma_k^2 &=\frac{\sum_{\pmb x_j\in D_u}\lambda_{jk}(\pmb x_j-\pmb\mu_k)(\pmb x_j-\pmb\mu_k)^T+\sum_{(\pmb x_j, y_j)\in D_l\cap y_i=k}(\pmb x_j-\pmb\mu_k)(\pmb x_j-\pmb\mu_k)^T}{N_k + \sum_{\pmb x_j\in D_u}\lambda_{jk}}\\ \alpha_k &= \frac{1}{N}\left(N_k + \sum_{\pmb x_j \in D_u}\lambda_{jk}\right) \end{aligned}
μμμkσσσk2αk=Nk+∑xxxj∈Duλjk∑xxxj∈Duλjkxxxj+∑(xxxj,yj)∈Dl∩yi=kxxxj=Nk+∑xxxj∈Duλjk∑xxxj∈Duλjk(xxxj−μμμk)(xxxj−μμμk)T+∑(xxxj,yj)∈Dl∩yi=k(xxxj−μμμk)(xxxj−μμμk)T=N1⎝⎛Nk+xxxj∈Du∑λjk⎠⎞