半监督算法：(Using Weighted Nearest Neighbor to Benefit from Unlabeled Data)

最新推荐文章于 2023-09-02 14:53:08 发布

lidoublewen

最新推荐文章于 2023-09-02 14:53:08 发布

阅读量2.3k

点赞数 1

分类专栏： data mining and imformation retrieval 文章标签：算法 classification distance training algorithm structure

本文链接：https://blog.csdn.net/lidoublewen/article/details/6382479

版权

今天看了一篇关于半监督算法的论文：Using Weighted Nearest Neighbor to Benefit from Unlabeled Data

对整篇论文做了一些总结：

一.简介

1.半监督算法的必要性：where often the unlabeled examples greatly outnumber the labeled examples 标签好的类往往大大少于未标签的类，因此我们可以考虑从未标签的类当中提取一些可供参考的信息来提高分类器的准确率。

2.半监督算法的大致流程：The examples from the unlabeled set are "pre-labeled" by an initial classifer that is build using the limited
available training data. By choosing appropriate weights for this prelabeled data, the nearest neighbor classifer consistently improves on the original classifer.首先用有标签的类去训练分类器，然后用这个初始分类器去预测未标签的类。然后给未分类数据选择合适的权重，用最近邻居分类器去提高初始分类器的准确率。

3.the key to semi-supervised learning is the prior assumption of consistency, that allows for exploiting the geometric structure of the data distribution.

半监督算法的关键是前提假设的一致性，这样就可以发现数据的几何分布。

4.Close data points should belong to the same class and decision boundaries should lie in regions of low data density; this is also called the "cluster assumption".

距离相互靠近的点应该同属于同一类，决策边界应该落在数据低密度的区域，即“假设聚类”。

5.该论文提出的半监督算法流程：In this paper, we introduce a very simple two-stage approach that uses the available unlabeled data to improve on the predictions made when learning only from the labeled examples. In a first stage, it uses an of-the-shelf classifier to
build a model based on the small amount of available training data, and in the second stage it uses that m

最低0.47元/天解锁文章

lidoublewen

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
半监督算法：(Using Weighted Nearest Neighbor to Benefit from Unlabeled Data)

Using Weighted Nearest Neighbor to Benefit from Unlabeled Data，半监督算法，二层分类器
复制链接

扫一扫