Pattern Recognition course 笔记 -Semi-supervised Learning

仅供个人笔记使用


A pattern recognition problem

  • goal

there are large “labeled” data online e.g. tweets using hash #
can we use these unlabel data to improve our classifier

  • labeled data
    在这里插入图片描述
  • unlabeled data

在这里插入图片描述
-some applications

  • image classification (easy to obtain images e.g, from flicker)
  • protein function prediction
  • document classification
  • part of speech tagging

-semi-supervised classification

  • similar but with continuous out come measure
  • using some labels to improve a clustering solution
  • measure how well the unlabeled data could help to improve

Content

self-learning

One of the earliest studies on SSL (Hartley & Rao 1968):
• Maximum likelihood trying all possible labelings (!)
(the problem of treating unlabeled data is dealing with explosive parameter)

More feasible suggestion (McLachlan 1975):
• Start with supervised solution
• Label unlabeled objects using this classifier
• Retrain classifier treating labels as true labels
在这里插入图片描述

Also known as self-training, self-labeling or pseudo-labeling

self-learning ≈ \approx EXPECTATION MAXIMIZATION
  • Linear Discriminant Analysis (LDA)
    p ( X , y ; θ ) = ∏ i = 1 L [ π 0 N ( x i , μ 0 , Σ ) ] 1 − y i [ π 1 N ( x i , μ 1 , Σ ) ] y 1 p(X,y;\theta)=\prod_{i=1}^L[\pi_0N(x_i,\mu_0,\Sigma)]^{1-y_i}[\pi_1N(x_i,\mu_1,\Sigma)]^{y_1} p(X,y;θ)=i=1L[π0N(xi,μ0,Σ)]1yi[π1N(xi,μ1,Σ)]y1
    share the covariance Σ \Sigma Σ
    N ( x i , μ 0 , Σ ) N(x_i,\mu_0,\Sigma) N(xi,μ0,Σ) gaussians for each class
  • LDA + unlabeled data

p ( X , y , X u , h ; θ ) = ∏ i = 1 L [ π 0 N ( x i , μ 0 , Σ ) ] 1 − y i [ π 1 N ( x i , μ 1 , Σ ) ] y 1 × ∏ i = 1 u [ π 0 N ( x i , μ 0 , Σ ) ] 1 − h i [ π 1 N ( x i , μ 1 , Σ ) ] h 1 p(X,y,X_u,h;\theta)=\prod_{i=1}^L[\pi_0N(x_i,\mu_0,\Sigma)]^{1-y_i}[\pi_1N(x_i,\mu_1,\Sigma)]^{y_1}\\ \times\prod_{i=1}^u[\pi_0N(x_i,\mu_0,\Sigma)]^{1-h_i}[\pi_1N(x_i,\mu_1,\Sigma)]^{h_1} p(X,y,Xu,h;θ)=i=1L[π0N(xi,μ0,Σ)]1yi[π1N(xi,μ1,Σ)]y1×i=1u[π0N(xi,μ0,Σ)]1hi[π1N(xi,μ1,Σ)]h1
But we do not know h… Integrate it out!
p ( X , y , X u ; θ ) = ∫ h p ( X , y , X u , h ; θ ) d h p(X, y, X_u; \theta) = \int_hp(X, y, X_u, h; \theta)dh p(X,y,Xu;θ)=hp(X,y,Xu,h;θ)dh
LDA +
unlabeled data
∏ i = 1 L [ π 0 N ( x i , μ 0 , Σ ) ] 1 − y i [ π 1 N ( x i , μ 1 , Σ ) ] y 1 × ∏ i = 1 u ∑ c = 0 1 π c N ( x i , μ c , Σ ) \prod_{i=1}^L[\pi_0N(x_i,\mu_0,\Sigma)]^{1-y_i}[\pi_1N(x_i,\mu_1,\Sigma)]^{y_1}\\ \times\prod_{i=1}^u\sum^1_{c=0}\pi_cN(x_i,\mu_c,\Sigma) i=1L[π0N(xi,μ0,Σ)]1yi[π1N(xi,μ1,Σ)]y1×i=1uc=01πcN(xi,μc,Σ)
Like LDA + a gaussian mixture with the same parameters

EM algorithm

• Log sum makes optimization difficult
• Change goal: find a local maximum of this function
在这里插入图片描述
EM algorithm: finding a lower bound

what we want is construct a lower bound and touch exactyl the objective function ,and get the best lower bound which you can get

Jensen’s inequality

If f ( x ) f(x) f(x) concave then f ( E [ X ] ) ≥ E [ f ( X ) ] f(E[X]) \geq E[f(X)] f(E[X])E[f(X)]

在这里插入图片描述

Does unlabeled data help?

在这里插入图片描述

θ x → X \theta_x \rightarrow X θxX
X → Y X \rightarrow Y XY
θ Y ∣ X → Y \theta_{Y|X} \rightarrow Y θYXY

Self-learning and EM conclusions

• For generative models:
• Integrate out the missing variables
• Difficult optimization problem can often be “solved” efficiently using
expectation maximization
• Only guaranteed to improve performance asymptotically, if the model is
correct
• Self-learning is a closely related technique that is applicable to any classifier
• Related: co-training (multi-view learning)
• Use labels predicted by other view(s) as newly labeled objects

Low-density assumption

Low-density assumption conclusion
• “Natural” extension for the SVM
• Local minima may be a problem
• Lots of work on optimization
• My experience: quite sensitive to parameter settings
• Other low-density approaches:
• Entropy Regularization (Bengio & Grandvalet 2005)

manifold assumption

在这里插入图片描述

  • manifold regularization
    -consistency regularication

∥ f ( x ; w ) − g ( x ′ ; w t ) ∥ 2 \Vert f(x;w)-g(x';w^t) \Vert^2 f(x;w)g(x;wt)2

Semi-Supervised Conclusion

• Unlabeled data is often available
• Semi-supervised learning attempts to use it to improve classifier
• Often worthwhile, but it does not come for free
• Modeling time
• Computational cost
• Remember: an unlabeled object is less valuable than a labeled one
• Labeling a few more objects can be more effective
• Remember the goal: transductive or inductive?

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值