读论文：fPML，ML-JMF

一条咸狗

于 2019-09-08 19:19:48 发布

阅读量625

点赞数

分类专栏：论文笔记

本文链接：https://blog.csdn.net/u013982164/article/details/100551897

版权

两篇ICDM 2018机器学习的论文，来自http://mlda.swu.edu.cn/publication.php

Feature-induced Partial Multi-label Learning (fPML)

ICDM 2018

However, the performance of multi-label learning may be compromised by noisy (or incorrect) labels of training instances.
the ground-truth labels are concealed in a set of candidate noisy labels, the number of ground-truth labels is also unknown.

partial multi-label learninl [Xie et al. AAAL, 2018]
to optimize the label conﬁdence values and the relevance ordering of labels of each instance by exploiting structural information in feature and label spaces, and by minimizing the conﬁdence weighted ranking loss.
However, it has to simultaneously optimize multiple binary predictors and a very large number of conﬁdence rankings of candidate label pairs; hence, suffers from heavy computational costs

Since labels are correlated, the label correlation and the ground-truth instance-label association matrices have a linear dependence structure, and thus they are low-rank [Zhu et al, TKDE, 2018, Xu et al, ICDM, 2014]
The low-rank approximation of a noisy matrix is robust to noise [Konstantinides et al, TIP, 1997, Meng et a, ICCV, 2013]

We seek the ground- truth instance-label association matrix via learning the low- rank approximation of the observed association matrix, which contains noisy associations.
The labels of an instance depend on its features, and thus the features of instances should be used to estimate noisy labels.

主要思想是假设一个没噪声的 $\widehat{\mathbf{Y}}$ ，用矩阵分解强制分解成低秩的 $\mathbf{S}$ 和 $\mathbf{G}$
$\widehat{\mathbf{Y}} \simeq \mathbf{S G}^{T} \tag{1}$
注意这两个矩阵的维度，
- $\mathbf{S} \in \mathbb{R}^{q \times k}$ 意义是把 $q$ 个label映射成 $k$ 个新的label
- $\mathbf{G} \in \mathbb{R}^{n \times k}$ 表示将 $n$ 个样本映射成 $k$ 个样本
此时目标函数是2式，
$\min _{\mathbf{S}, \mathbf{G}}\left\|\mathbf{Y}-\mathbf{S G}^{T}\right\|_{F}^{2} \tag{2}$
到目前为止仅利用了 label信息，作者此时的创新是利用了原始数据 $\mathbf{X}$ 的 feature信息，对 $\mathbf{G}$ 进行了约束（原文是说sharing $\mathbf{G}$ ），加了一层线性变换，参数是 $\mathbf{F}$ ，变成了3式。
$\min _{\mathbf{S}, \mathbf{F}, \mathbf{G}}\left\|\mathbf{Y}-\mathbf{S G}^{T}\right\|_{F}^{2}+\lambda_{1}\left\|\mathbf{X}-\mathbf{F} \mathbf{G}^{T}\right\|_{F}^{2} \tag{3}$
学习 $\mathbf{F} \in \mathbb{R}^{d \times k}$ 是用来抓特征之间的相互关系， $\lambda_{1}$ 起调控作用
最后为了将label映射回去，加了一层线性操作 $\mathbf{W}$ ，4式，转化成了5式
$\min _{\mathbf{W}}\left\|\mathbf{Y}-\mathbf{W}^{T} \mathbf{X}\right\|_{F}^{2} \tag{4}$

$\min _{\mathbf{W}}\left\|\mathbf{S G}^{T}-\mathbf{W}^{T} \mathbf{X}\right\|_{F}^{2} \tag{5}$

关注

专栏目录