【论文阅读】Self-paced Multi-view Co-training

论文下载
bib:

@ARTICLE{MaMeng2020SPamCo,
title 		= {Self-Paced Multi-View Co-Training},
author 		= {Fan Ma and Deyu Meng and Xuanyi Dong and Yi Yang},
journal 	= {J. Mach. Learn. Res.},
year 		= {2020},
volume 		= {21},
number 		= {1},
pages 	    = {1--38}
}

1. 摘要

Co-training is a well-known semi-supervised learning approach which trains classifiers on two or more different views and exchanges pseudo labels of unlabeled instances in an iterative way.

提纲挈领的第一句。(八股文)

During the co-training process, pseudo labels of unlabeled instances are very likely to be false especially in the initial training, while the standard co-training algorithm adopts a “draw without replacement” strategy and does not remove these wrongly labeled instances from training stages.

指出现有方法的不足,第一点: 初始伪标签质量差,现有的方法不会替换(更新)以前打的伪标签。值得注意的是,一般只会提出一项不足,这篇论文提出了三点,这也意味着跟多的贡献点。

Besides, most of the traditional co-training approaches are implemented for two-view cases, and their extensions in multi-view scenarios are not intuitive.

These issues not only degenerate their performance as well as available application range but also hamper their fundamental theory.

第二点不足:现有方法大多针对两个视图,不能直观的拓展到多个视图。

Moreover, there is no optimization model to explain the objective a co-training process manages to optimize.

第三点不足:没有一个优化模型来解释一个协同训练过程管理优化的目标。

To address these issues, in this study we design a unified self-paced multi-view co-training (SPamCo) framework which draws unlabeled instances with replacement.
Two specified co-regularization terms are formulated to develop different strategies for selecting pseudo-labeled instances during training.

提出方案处理第一个不足,方案会替换前期打的伪标签(draws unlabeled instances with replacement)。

Both forms share the same optimization strategy which is consistent with the iteration process in co-training and can be naturally extended to multi-view scenarios.

处理第二个不足,能自然的拓展到多视图(不局限于两个视图)。隐含处理了第三个不足(optimization strategy)。

A distributed optimization strategy is also introduced to train the classifier of each view in parallel to further improve the efficiency of the algorithm.

额外的并行优化方案。

Furthermore, the SPamCo algorithm is proved to be PAC learnable, supporting its theoretical soundness.

Experiments conducted on synthetic, text categorization, person re-identification, image recognition and object detection data sets substantiate the superiority of the proposed method.

2. 算法描述

SPamCo: optimization problem
min ⁡ Θ , V , Y ~ ∑ j = 1 M ( ∑ i = 1 N l ℓ i ( j ) + ∑ i = N l + 1 N l + N u ( v i ( j ) ℓ i ( j ) + f ( v i ( j ) , λ ( j ) ) ) + R ( Θ ) + R ( V ) (1) \min_{\Theta, \bm{V}, \widetilde{\bm{Y}}}\sum_{j=1}^M{( \sum_{i=1}^{N_l}\ell_i^{(j)}+\sum_{i=N_l+1}^{N_l+N_u}{(v_i ^{(j)}\ell_i^{(j)}+f(v_i^{(j)},\lambda^{(j)})}) + \mathcal{R}(\Theta})+\mathcal{R}(\bm{V}) \tag{1} Θ,V,Y minj=1M(i=1Nli(j)+i=Nl+1Nl+Nu(vi(j)i(j)+f(vi(j),λ(j)))+R(Θ)+R(V)(1)

Self-paced Regularization term:

f ( v i ( j ) , λ ( j ) ) = − λ ( j ) v i ( j ) (2) f(v_i^{(j)},\lambda^{(j)}) = -\lambda^{(j)}v_i^{(j)} \tag{2} f(vi(j),λ(j))=λ(j)vi(j)(2)

Co-Regularization Term:

  • hard:

R h ( V ) = − γ ∑ p < q ( v ( p ) ) T v ( q ) \mathcal{R}_h(\bm{V}) = -\gamma\sum_{p<q}({v}^{(p)})^{\mathsf{T}}{v}^{(q)} Rh(V)=γp<q(v(p))Tv(q)

v i ( j ) ∗ = { 1 , ℓ i j < λ c ( j ) + γ ∑ p ≠ j v i p ; 0 , otherwise . v_i^{(j)*} = \begin{cases} 1, \ell_i^{{j}}<\lambda_c^{(j)}+\gamma\sum_{p\neq j}{v_i^{p}};\\ 0, \text{otherwise}. \end{cases} vi(j)={1,ij<λc(j)+γp=jvip;0,otherwise.

  • soft:

R s ( V ) = − γ ∑ p < q ( v ( p ) − v ( q ) ) T ( v ( p ) − v ( q ) ) \mathcal{R}_s(\bm{V}) = -\gamma\sum_{p<q}({v}^{(p)}-{v}^{(q)})^{\mathsf{T}}{({v}^{(p)} - {v}^{(q)})} Rs(V)=γp<q(v(p)v(q))T(v(p)v(q))
与硬正则之间的重要区别是,硬正则是只有选或者不选,别的视图选了,那么该样本在本视图中有更大的可能被选择。而软正则是说, v i ( j ) ∈ [ 0 , 1 ] v_i^{(j)}\in [0, 1] vi(j)[0,1],表示0到1的实数,软正则要求两者的选择逼近(类似与均方误差)。

v i ( j ) ∗ = { 0 , ℓ i j ≥ λ c ( j ) + γ ∑ p ≠ j v i p ; 1 , ℓ i j ≥ λ c ( j ) + γ ∑ p ≠ j ( v i p − 1 ) ; 1 M − 1 ( ∑ p ≠ j v i p + λ c j − ℓ i ( j ) γ ) , otherwise . v_i^{(j)*} = \begin{cases} 0, \ell_i^{{j}} \geq \lambda_c^{(j)}+\gamma\sum_{p\neq j}{v_i^{p}};\\ 1, \ell_i^{{j}} \geq \lambda_c^{(j)}+\gamma\sum_{p\neq j}{(v_i^{p}-1)};\\ \frac{1}{M-1}(\sum_{p\neq j}{v_i^{p}}+ \frac{\lambda_c^{j}-\ell_i^{(j)}}{\gamma}), \text{otherwise}.\\ \end{cases} vi(j)= 0,ijλc(j)+γp=jvip;1,ijλc(j)+γp=j(vip1);M11(p=jvip+γλcji(j)),otherwise.

3. 具体算法

3.1. Serial SPamCo Algorithm

串行版本的SPamCo,体现在每个视图串行更新。
在这里插入图片描述
Note:
这里有一点让我很在意,就是为什么伪代码中的 v ( vid ) \bm{v}^{(\text{vid})} v(vid)要更新两次。主要的原因在于,模型在其中发生了更新,而 v ( vid ) \bm{v}^{(\text{vid})} v(vid)与模型的当前预测密切相关。也可以看作,第一次更新 v ( vid ) \bm{v}^{(\text{vid})} v(vid)是为了更新当前视图的模型参数,第二次更新 v ( vid ) \bm{v}^{(\text{vid})} v(vid)是为了选择自己自信的无标记样本给其他的视图。

3.2. Parallel SPamCo Algorithm

串行版本的SPamCo,体现在每个视图并行更新,用其他视图的上一个版本的 v i , t − 1 ( j ) v_{i,t-1}^{(j)} vi,t1(j)来更新当前版本当前视图的 v i , t ( m ) v_{i,t}^{(m)} vi,t(m)
在这里插入图片描述

4. Toy Example

Github

from sklearn.datasets import make_moons, make_classification, make_circles, make_blobs
import matplotlib.pyplot as plt
import matplotlib.pylab as plb
import numpy as np
from itertools import cycle, islice
import matplotlib
import warnings
from matplotlib.ticker import MaxNLocator
import copy
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import LinearSVC, SVC, NuSVC
from copy import deepcopy

warnings.filterwarnings("ignore")
# matplotlib.rcParams.update({'font.size': 10})


def sel_ids_y(score, add_num=10):
    ids_sort = np.argsort(score)
    add_id = np.zeros(score.shape[0])
    add_id[ids_sort[:add_num]] = -1
    add_id[ids_sort[-add_num:]] = 1
    # 同时获取前add_num和后add_num个样本为负正样本
    return add_id


def update_train_untrain(sel_ids, train_data, train_labels, untrain_data, weights=None):
    #  sel_ids = np.array(sel_ids, dtype='bool')
    add_ids = np.where(np.array(sel_ids) != 0)[0]
    untrain_ids = np.where(np.array(sel_ids) == 0)[0]
    add_datas = [d[add_ids] for d in untrain_data]
    new_train_data = [np.concatenate([d1, d2]) for d1, d2 in zip(train_data, add_datas)]
    add_y = [1 if sel_ids[idx] > 0 else 0 for idx in add_ids]
    new_train_y = np.concatenate([train_labels, add_y])
    new_untrain_data = [d[untrain_ids] for d in untrain_data]
    return new_train_data, new_train_y, new_untrain_data


def cotrain(labeled_data, labels, unlabeled_data, iter_step=1):
    lbls = copy.deepcopy(labels)
    for step in range(iter_step):
        scores = []
        add_ids = []
        add_ys = []
        clfs = []
        for view in range(2):
            clfs.append(LinearSVC())
            clfs[view].fit(labeled_data[view], lbls)
            scores.append(clfs[view].decision_function(unlabeled_data[view]))
            add_id = sel_ids_y(scores[view], 6)
            add_ids.append(add_id)
        add_id = sum(add_ids)
        labeled_data, lbls, unlabeled_data = update_train_untrain(add_id, labeled_data, lbls, unlabeled_data)
        if len(unlabeled_data[view]) <= 0:
            break
    return clfs


def update_train(sel_ids, train_data, train_labels, untrain_data, pred_y):
    add_ids = np.where(np.array(sel_ids) != 0)[0]
    add_data = [d[add_ids] for d in untrain_data]
    new_train_data = [np.concatenate([d1, d2]) for d1, d2 in zip(train_data, add_data)]
    add_y = pred_y[add_ids]
    new_train_y = np.concatenate([train_labels, pred_y[add_ids]])
    return new_train_data, new_train_y


def spaco(l_data, lbls, u_data, iter_step=1, gamma=0.5):
    # initiate classifier
    clfs = []
    scores = []
    add_ids = []
    add_num = 6
    clfss = []
    # initial
    for view in range(2):
        clfs.append(LinearSVC())
        clfs[view].fit(l_data[view], lbls)
        scores.append(clfs[view].decision_function(u_data[view]))
        add_ids.append(sel_ids_y(scores[view], add_num))
        # 置信度大于0,则为正样本;置信度为负,则为负样本
        py = [0 if s < 0 else 1 for s in scores[view]]
    score = sum(scores)
    pred_y = np.array([0 if s < 0 else 1 for s in score])
    # for each step
    for step in range(iter_step):
        # for each view
        for view in range(2):
            # 如果无标记样本不足,推出循环
            if add_num * 2 > u_data[0].shape[0]:
                break
            # update v
            ov = np.where(add_ids[1 - view] != 0)[0]
            scores[view][ov] += add_ids[1 - view][ov] * gamma
            add_ids[view] = sel_ids_y(scores[view], add_num)

            # update w
            nl_data, nlbls = update_train(add_ids[view], l_data, lbls, u_data, pred_y)
            clfs[view].fit(nl_data[view], nlbls)

            # update y, v
            scores[view] = clfs[view].decision_function(u_data[view])
            add_num += 6
            # 为什么要做这一步,是应为scores[view]这一步在更新y的时候发生了改变
            # 值得注意的是,这个不算重复计算
            scores[view][ov] += add_ids[1 - view][ov] * gamma
            add_ids[view] = sel_ids_y(scores[view], add_num)

            score = sum(scores)

            pred_y = np.array([0 if s < 0 else 1 for s in score])
            py = [0 if s < 0 else 1 for s in scores[view]]
    return clfs


def main():
    # toy 2
    np.random.seed(4)
    X, y = make_blobs(n_samples=400, centers=2, cluster_std=0.7)
    X[:, 0] -= 9.3
    X[:, 1] -= 2.5

    np.random.seed(1)
    pos_ids = np.where(y == 0)[0]
    neg_ids = np.where(y == 1)[0]
    ids1 = np.random.randint(0, len(pos_ids), 5)
    ids2 = np.random.randint(0, len(neg_ids), 5)
    # 正负样本各选了五个点
    p1 = pos_ids[ids1]
    p2 = neg_ids[ids2]

    # generate labeled and unlabeled data
    l_ids = np.concatenate((p1, p2))
    u_ids = np.array(list(set(np.arange(X.shape[0])) - set(l_ids)))
    l_data1, l_data2 = X[l_ids, 0].reshape(-1, 1), X[l_ids, 1].reshape(-1, 1)
    u_data1, u_data2 = X[u_ids, 0].reshape(-1, 1), X[u_ids, 1].reshape(-1, 1)
    labels = y[l_ids]

    colors = np.array(list(islice(cycle(['#377eb8', '#ff7f00', '#4daf4a',
                                         '#f781bf', '#a65628', '#984ea3',
                                         '#999999', '#e41a1c', '#dede00']),
                                  int(max(y) + 3))))

    x = [-1.5, 0, 1.5]
    my_xticks = [-2, 0, 2]

    ### parameters
    # steps = 16
    steps = 30
    gamma = 3

    ### original fig
    fig = plt.figure(figsize=(12, 12))
    plt.subplots_adjust(bottom=.05, top=.9, left=.05, right=0.9)

    ax = fig.add_subplot(141)
    plt.scatter(X[:, 0], X[:, 1], marker='o', c=colors[y], s=4)
    plt.scatter(X[p1, 0], X[p1, 1], marker='^', c='#0F0F0F', s=100)
    plt.scatter(X[p2, 0], X[p2, 1], marker='*', c='#0F0F0F', s=100)
    ax.set_xlabel('$x^{(1)}$')
    ax.set_ylabel('$x^{(2)}$')
    plt.xticks(x, my_xticks)

    #### cotrain experiment
    clfs = cotrain([l_data1, l_data2], labels, [u_data1, u_data2], iter_step=steps)
    score1 = clfs[0].decision_function(X[:, 0].reshape(-1, 1))
    score2 = clfs[1].decision_function(X[:, 1].reshape(-1, 1))
    score = score1 + score2
    pred_y = np.array([0 if s < 0 else 1 for s in score])
    print('cotrain:', np.mean(pred_y == y))

    ax = fig.add_subplot(142)
    ax.set_xlabel('$x^{(1)}$')
    ax.set_ylabel('$x^{(2)}$')
    plt.scatter(X[:, 0], X[:, 1], marker='o', c=colors[pred_y], s=4)

    #### spaco experiment1 gamma=3
    clfs = spaco([l_data1, l_data2], labels, [u_data1, u_data2], iter_step=steps, gamma=3)
    score1 = clfs[0].decision_function(X[:, 0].reshape(-1, 1))
    score2 = clfs[1].decision_function(X[:, 1].reshape(-1, 1))
    score = score1 + score2
    pred_y = np.array([0 if s < 0 else 1 for s in score])
    print('spaco experiment(gamma=3): %0.5f' % np.mean(pred_y == y))
    ax = fig.add_subplot(143)
    ax.set_xlabel('$x^{(1)}$')
    ax.set_ylabel('$x^{(2)}$')
    plt.scatter(X[:, 0], X[:, 1], marker='o', c=colors[pred_y], s=4)
    plt.xticks(x, my_xticks)

    #### spaco experiment2 gamma=0.3
    clfs = spaco([l_data1, l_data2], labels, [u_data1, u_data2], iter_step=steps, gamma=0.3)
    score1 = clfs[0].decision_function(X[:, 0].reshape(-1, 1))
    score2 = clfs[1].decision_function(X[:, 1].reshape(-1, 1))
    score = score1 + score2
    pred_y = np.array([0 if s < 0 else 1 for s in score])
    print('spaco experiment(gamma=0.3): %0.5f' % np.mean(pred_y == y))
    ax = fig.add_subplot(144)
    ax.set_xlabel('$x^{(1)}$')
    ax.set_ylabel('$x^{(2)}$')
    plt.scatter(X[:, 0], X[:, 1], marker='o', c=colors[pred_y], s=4)
    plt.xticks(x, my_xticks)

    plt.show()


if __name__ == '__main__':
    main()

5. 总结

全文读下来,这是一篇很完整的工作,从上到下透露出严谨。本文的核心是将self-pacedmuti-viewco-train糅合在一起,用清晰的数学优化目标表达出来了,特别是Co-Regularization的设计,简洁又漂亮。

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

来日可期1314

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值