半监督学习

最新推荐文章于 2022-08-10 19:33:50 发布

u200710

最新推荐文章于 2022-08-10 19:33:50 发布

阅读量107

点赞数

分类专栏： scikit-learn

原文链接：https://scikit-learn.org/stable/modules/label_propagation.html

版权

scikit-learn 专栏收录该内容

21 篇文章 1 订阅

订阅专栏

半监督学习

半监督学习是指在你的训练数据中，有一些样本没有标记。sklearn.semi_supervised能够利用这些多余的未标记的数据去更好地抓住潜在数据分布的形状和提高对新样本的泛化能力。当仅有少部分标记数据和大部分未标记数据时，这些算法表现出较好的性能。

标签传播

标签传播定义了一些半监督学习图推理算法。

在这个模型中可用的特征

能够用于分类和回归任务
核方法将数据映射到其它维度空间

scikit-learn提供了两种标签传播的模型：LabelPropagation和LabelSpreading。LabelPropagation和LabelSpreading的差异在于相似矩阵的修改。LabelPropagation使用无修改数据构建的原始相似矩阵。相反，LabelSpreading最小化有正则项的损失函数，这样对噪音更加鲁棒。

# coding: utf-8
# Decision boundary of label propagation versus SVM on the Iris dataset

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm
from sklearn.semi_supervised import LabelSpreading

rng = np.random.RandomState(0)

iris = datasets.load_iris()

X = iris.data[:, :2]
y = iris.target

h = 0.02

y_30 = np.copy(y)
y_30[rng.rand(len(y)) < 0.3] = -1
y_50 = np.copy(y)
y_50[rng.rand(len(y)) < 0.5] = -1

ls30 = (LabelSpreading().fit(X, y_30), y_30)
ls50 = (LabelSpreading().fit(X, y_50), y_50)
ls100 = (LabelSpreading().fit(X, y), y)
rbf_svc = (svm.SVC(kernel='rbf', gamma=.5).fit(X, y), y)

x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))

titles = ['Label Spreading 30% data',
          'Label Spreading 50% data',
          'Label Spreading 100% data',
          'SVC with rbf kernel']

color_map = {-1: (1, 1, 1), 0: (0, 0, .9), 1: (1, 0, 0), 2: (.8, .6, 0)}

for i, (clf, y_train) in enumerate((ls30, ls50, ls100, rbf_svc)):
    plt.subplot(2, 2, i+1)
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])

    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)
    plt.axis('off')

    colors = [color_map[y] for y in y_train]
    plt.scatter(X[:, 0], X[:, 1], c=colors, edgecolors='black')

    plt.title(titles[i])

plt.suptitle("Unlabeled points are colored white", y=0.1)
plt.show()