读论文:fPML,ML-JMF

两篇ICDM 2018机器学习的论文,来自http://mlda.swu.edu.cn/publication.php

  • 首先记录短文(简称fPML)
  • 再写长文(简称ML-JMF)
  • 最后总结一下异同 (ongoing)

Feature-induced Partial Multi-label Learning (fPML)

ICDM 2018

Problem

  • However, the performance of multi-label learning may be compromised by noisy (or incorrect) labels of training instances.

  • the ground-truth labels are concealed in a set of candidate noisy labels, the number of ground-truth labels is also unknown.

Most relevant

  • partial multi-label learninl [Xie et al. AAAL, 2018]
  • to optimize the label confidence values and the relevance ordering of labels of each instance by exploiting structural information in feature and label spaces, and by minimizing the confidence weighted ranking loss.
  • However, it has to simultaneously optimize multiple binary predictors and a very large number of confidence rankings of candidate label pairs; hence, suffers from heavy computational costs

Motivetion

Why

  • Since labels are correlated, the label correlation and the ground-truth instance-label association matrices have a linear dependence structure, and thus they are low-rank [Zhu et al, TKDE, 2018, Xu et al, ICDM, 2014]
  • The low-rank approximation of a noisy matrix is robust to noise [Konstantinides et al, TIP, 1997, Meng et a, ICCV, 2013]

How

  • We seek the ground- truth instance-label association matrix via learning the low- rank approximation of the observed association matrix, which contains noisy associations.
  • The labels of an instance depend on its features, and thus the features of instances should be used to estimate noisy labels.

Method

  • 主要思想是假设一个没噪声的 Y ^ \widehat{\mathbf{Y}} Y ,用矩阵分解强制分解成低秩的 S \mathbf{S} S G \mathbf{G} G
    Y ^ ≃ S G T (1) \widehat{\mathbf{Y}} \simeq \mathbf{S G}^{T} \tag{1} Y SGT(1)
    注意这两个矩阵的维度,

    • S ∈ R q × k \mathbf{S} \in \mathbb{R}^{q \times k} SRq×k 意义是把 q q q 个label映射成 k k k 个新的label
    • G ∈ R n × k \mathbf{G} \in \mathbb{R}^{n \times k} GRn×k 表示将 n n n 个样本映射成 k k k 个样本
  • 此时目标函数是2式,
    min ⁡ S , G ∥ Y − S G T ∥ F 2 (2) \min _{\mathbf{S}, \mathbf{G}}\left\|\mathbf{Y}-\mathbf{S G}^{T}\right\|_{F}^{2} \tag{2} S,GminYSGTF2(2)

  • 到目前为止仅利用了 label信息, 作者此时的创新是利用了原始数据 X \mathbf{X} X 的 feature信息,对 G \mathbf{G} G 进行了约束(原文是说sharing G \mathbf{G} G),加了一层线性变换,参数是 F \mathbf{F} F,变成了3式。
    min ⁡ S , F , G ∥ Y − S G T ∥ F 2 + λ 1 ∥ X − F G T ∥ F 2 (3) \min _{\mathbf{S}, \mathbf{F}, \mathbf{G}}\left\|\mathbf{Y}-\mathbf{S G}^{T}\right\|_{F}^{2}+\lambda_{1}\left\|\mathbf{X}-\mathbf{F} \mathbf{G}^{T}\right\|_{F}^{2} \tag{3} S,F,GminYSGTF2+λ1XFGTF2(3)
    学习 F ∈ R d × k \mathbf{F} \in \mathbb{R}^{d \times k} FRd×k 是用来抓特征之间的相互关系, λ 1 \lambda_{1} λ1 起调控作用

  • 最后为了将label映射回去, 加了一层线性操作 W \mathbf{W} W,4式,转化成了5式
    min ⁡ W ∥ Y − W T X ∥ F 2 (4) \min _{\mathbf{W}}\left\|\mathbf{Y}-\mathbf{W}^{T} \mathbf{X}\right\|_{F}^{2} \tag{4} WminYWTXF2(4)

    min ⁡ W ∥ S G T − W T X ∥ F 2 (5) \min _{\mathbf{W}}\left\|\mathbf{S G}^{T}-\mathbf{W}^{T} \mathbf{X}\right\|_{F}^{2} \tag{5} WminS

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值