Pattern Recognition course 笔记 -Semi-supervised Learning

最新推荐文章于 2021-04-02 17:53:28 发布

One__Way

最新推荐文章于 2021-04-02 17:53:28 发布

阅读量303

点赞数

分类专栏：课程笔记文章标签： Semi-supervised Learning

本文链接：https://blog.csdn.net/wangweiwells/article/details/89242616

版权

课程笔记专栏收录该内容

5 篇文章 0 订阅

订阅专栏

仅供个人笔记使用

A pattern recognition problem

goal

there are large “labeled” data online e.g. tweets using hash #
can we use these unlabel data to improve our classifier

labeled data
unlabeled data

在这里插入图片描述
-some applications

image classification (easy to obtain images e.g, from flicker)
protein function prediction
document classification
part of speech tagging

-semi-supervised classification

similar but with continuous out come measure
using some labels to improve a clustering solution
measure how well the unlabeled data could help to improve

Content

self-learning

One of the earliest studies on SSL (Hartley & Rao 1968):
• Maximum likelihood trying all possible labelings (!)
(the problem of treating unlabeled data is dealing with explosive parameter)

More feasible suggestion (McLachlan 1975):
• Start with supervised solution
• Label unlabeled objects using this classifier
• Retrain classifier treating labels as true labels
在这里插入图片描述

Also known as self-training, self-labeling or pseudo-labeling

self-learning $\approx$ EXPECTATION MAXIMIZATION

Linear Discriminant Analysis (LDA)
$p(X,y;\theta)=\prod_{i=1}^L[\pi_0N(x_i,\mu_0,\Sigma)]^{1-y_i}[\pi_1N(x_i,\mu_1,\Sigma)]^{y_1}$
share the covariance $\Sigma$
$N(x_i,\mu_0,\Sigma)$ gaussians for each class
LDA + unlabeled data

$p(X,y,X_u,h;\theta)=\prod_{i=1}^L[\pi_0N(x_i,\mu_0,\Sigma)]^{1-y_i}[\pi_1N(x_i,\mu_1,\Sigma)]^{y_1}\\ \times\prod_{i=1}^u[\pi_0N(x_i,\mu_0,\Sigma)]^{1-h_i}[\pi_1N(x_i,\mu_1,\Sigma)]^{h_1}$
But we do not know h… Integrate it out!
$X_u; \theta) = \int_hp(X, y, X_u, h; \theta)dh$
LDA +
unlabeled data
$\prod_{i=1}^L[\pi_0N(x_i,\mu_0,\Sigma)]^{1-y_i}[\pi_1N(x_i,\mu_1,\Sigma)]^{y_1}\\ \times\prod_{i=1}^u\sum^1_{c=0}\pi_cN(x_i,\mu_c,\Sigma)$
Like LDA + a gaussian mixture with the same parameters

EM algorithm

• Log sum makes optimization difficult
• Change goal: find a local maximum of this function
在这里插入图片描述
EM algorithm: finding a lower bound

what we want is construct a lower bound and touch exactyl the objective function ,and get the best lower bound which you can get

Jensen’s inequality

If $f (x)$ concave then $\geq E[f(X)]$

在这里插入图片描述

Does unlabeled data help?

在这里插入图片描述

$\theta_x \rightarrow X$
$\rightarrow Y$
$\theta_{Y|X} \rightarrow Y$

Self-learning and EM conclusions

• For generative models:
• Integrate out the missing variables
• Difficult optimization problem can often be “solved” efficiently using
expectation maximization
• Only guaranteed to improve performance asymptotically, if the model is
correct
• Self-learning is a closely related technique that is applicable to any classifier
• Related: co-training (multi-view learning)
• Use labels predicted by other view(s) as newly labeled objects

Low-density assumption

Low-density assumption conclusion
• “Natural” extension for the SVM
• Local minima may be a problem
• Lots of work on optimization
• My experience: quite sensitive to parameter settings
• Other low-density approaches:
• Entropy Regularization (Bengio & Grandvalet 2005)

manifold assumption

在这里插入图片描述

manifold regularization
-consistency regularication

$\Vert f(x;w)-g(x';w^t) \Vert^2$

Semi-Supervised Conclusion

• Unlabeled data is often available
• Semi-supervised learning attempts to use it to improve classifier
• Often worthwhile, but it does not come for free
• Modeling time
• Computational cost
• Remember: an unlabeled object is less valuable than a labeled one
• Labeling a few more objects can be more effective
• Remember the goal: transductive or inductive?

One__Way

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Pattern Recognition course 笔记 -Semi-supervised Learning

仅供个人笔记使用A pattern recognition problemgoalthere are large “labeled” data online e.g. tweets using hash #can we use these unlabel data to improve our classifierlabeled dataunlabeled data...
复制链接

扫一扫