论文笔记: Co-Forest (2007 年半监督协同训练经典论文)

论文题目: Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples
论文优势: 几个概念在当时比较新.
论文劣势 (现在的观点): 数据集太小,方法比较简单.
可以作为我们工作的比较算法.

  1. Why semi-supervised?
    Unlabeled data can help improving the performance of the learner.
  2. Why co-training?
    Different views have their own pros and cons.
  3. How to create different views?
    Natural (image, text explaination)
    Feature selection (100 -> 30, 25, 28)
    What if a view is insufficient? Does not matter. It depends on your objective: theoritical support (ICML), or peformance only (ICDM)?
  4. Why ensemble?
    4.1 Why bagging? Simply construct a number of classifiers and vote.
    4.2 Why boosting? Theoretical support. AdaBoosting.
  5. Why CV data sampling?
    Slightly change the data distribution. More importantly, removal some unwanted samples (outliers).

Basic ideas:

  1. From one data to multiple data.
    Serve for classifiers.
  2. From one classifier to multiple classifiers.
    2.1 Use different samples.
    2.2 Use different views of the same samples.
  3. How to integrate (ensemble) different classifiers?
    3.1 During training. Select and label unlabeled samples for each other.
    3.2 After prediction. Simple voting or weighted voting.

Sampling strategies:

  1. Random sampling with replacement. Enough times will incur good results.
  2. Cross validation. Partition to 10 parts, each time use 9 parts.

Confidence of an unlabeled instance.

  1. If the classifier is an SVM, the distance to the classification hyperplane.
  2. If the classifier is a decision tree, the purity of the leaf node.
  3. If the classifier is kNN, the purity of neighors.
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值