自我学习与半监督学习的区别

最新推荐文章于 2022-12-23 18:16:33 发布

FacingTheSunCN

最新推荐文章于 2022-12-23 18:16:33 发布

阅读量1.6k

点赞数

分类专栏：机器学习文章标签：半监督学习自我学习监督学习

本文链接：https://blog.csdn.net/FacingTheSunCN/article/details/22193821

版权

机器学习专栏收录该内容

3 篇文章 0 订阅

订阅专栏

From http://deeplearning.stanford.edu/wiki/index.php/Self-Taught_Learning

There are two common unsupervised feature learning settings, depending on what type of unlabeled data you have. The more general and powerful setting is the self-taught learning setting, which does not assume that your unlabeled data $x u$ has to be drawn from the same distribution as your labeled data $x l$ . The more restrictive setting where the unlabeled data comes from exactly the same distribution as the labeled data is sometimes called the semi-supervised learning setting. This distinctions is best explained with an example, which we now give.

Suppose your goal is a computer vision task where you'd like to distinguish between images of cars and images of motorcycles; so, each labeled example in your training set is either an image of a car or an image of a motorcycle. Where can we get lots of unlabeled data? The easiest way would be to obtain some random collection of images, perhaps downloaded off the internet. We could then train the autoencoder on this large collection of images, and obtain useful features from them. Because here the unlabeled data is drawn from a different distribution than the labeled data (i.e., perhaps some of our unlabeled images may contain cars/motorcycles, but not every image downloaded is either a car or a motorcycle), we call this self-taught learning.

In contrast, if we happen to have lots of unlabeled images lying around that are all images of either a car or a motorcycle, but where the data is just missing its label (so you don't know which ones are cars, and which ones are motorcycles), then we could use this form of unlabeled data to learn the features. This setting---where each unlabeled example is drawn from the same distribution as your labeled examples---is sometimes called the semi-supervised setting. In practice, we often do not have this sort of unlabeled data (where would you get a database of images where every image is either a car or a motorcycle, but just missing its label?), and so in the context of learning features from unlabeled data, the self-taught learning setting is more broadly applicable.

下图是来自2010年Andrew Ng在ECCV的tutorial：Feature learning for image classification（http://ufldl.stanford.edu/eccv10-tutorial/）

FacingTheSunCN

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
自我学习与半监督学习的区别

From http://deeplearning.stanford.edu/wiki/index.php/Self-Taught_LearningThere are two common unsupervised feature learning settings, depending on what type of unlabeled data you have. The more ge
复制链接

扫一扫