通过多视图信息瓶颈学习鲁棒表征

最新推荐文章于 2024-02-11 10:22:38 发布

Liao-Zhuolin

最新推荐文章于 2024-02-11 10:22:38 发布

阅读量1.8k

点赞数 3

分类专栏：论文笔记

本文链接：https://blog.csdn.net/gandebeautiful/article/details/115739400

版权

神经网络深度学习机器学习

论文笔记专栏收录该内容

14 篇文章 7 订阅

订阅专栏

论文题目：Learning Robust Representations via Multi-View Information Bottleneck

Summary

论文通过将两个视图学习得到的公共信息作为有用表征，将两个视图不共享的部分信息看作是冗余信息，最终两个视图之间相互学习得到标签信息丰富和鲁棒性强的表征。

Problem Statement

论文旨在通过优化信息瓶颈理论，构建损失函数使得表征能够含有更多的标签信息和鲁棒性。

Method

通过理论证明，得到两个损失函数，一个用于学习表征，一个用于去除冗余信息：

Evaluation

多视图数据集：直接提取多视图信息
单视图数据集：通过数据增强，构建两个视图，再在两个视图上进行学习。

Conclusion

通过同时增加标签信息和减小冗余信息，可以学习得到较好的表征。

Notes

信息平面：

References

Learning Representations by Maximizing Mutual Information Across Views.
Learning deep representations by mutual information estimation and maximization.
How do humans sketch objects?
Contrastive Multiview Coding.
On Mutual Information Maximization for Representation Learning.
Multi-view learning overview: Recent progress and new challenges.

Code

两个变量之间的互信息估计：

# Auxiliary network for mutual information estimation
class MIEstimator(nn.Module):
    def __init__(self, size1, size2):
        super(MIEstimator, self).__init__()

        # Vanilla MLP
        self.net = nn.Sequential(
            nn.Linear(size1 + size2, 1024),
            nn.ReLU(True),
            nn.Linear(1024, 1024),
            nn.ReLU(True),
            nn.Linear(1024, 1),
        )

    # Gradient for JSD mutual information estimation and EB-based estimation
    def forward(self, x1, x2):
        pos = self.net(torch.cat([x1, x2], 1))  # Positive Samples
        neg = self.net(torch.cat([torch.roll(x1, 1, 0), x2], 1)) #roll
        return -softplus(-pos).mean() - softplus(neg).mean(), pos.mean() - neg.exp().mean() + 1

多视图信息瓶颈表征学习模型：

from training.multiview_infomax import MVInfoMaxTrainer
from utils.schedulers import ExponentialScheduler
###############
# MIB Trainer #
###############
class MIBTrainer(MVInfoMaxTrainer):
    def __init__(self, beta_start_value=1e-3, beta_end_value=1,
                 beta_n_iterations=100000, beta_start_iteration=50000, **params):
        # The neural networks architectures and initialization procedure is analogous to Multi-View InfoMax
        super(MIBTrainer, self).__init__(**params)

        # Definition of the scheduler to update the value of the regularization coefficient beta over time
        self.beta_scheduler = ExponentialScheduler(start_value=beta_start_value, end_value=beta_end_value,
                                                   n_iterations=beta_n_iterations, start_iteration=beta_start_iteration)

    def _compute_loss(self, data):
        # Read the two views v1 and v2 and ignore the label y
        v1, v2, _ = data

        # Encode a batch of data
        p_z1_given_v1 = self.encoder_v1(v1)
        p_z2_given_v2 = self.encoder_v2(v2)

        # Sample from the posteriors with reparametrization
        z1 = p_z1_given_v1.rsample()
        z2 = p_z2_given_v2.rsample()

        # Mutual information estimation
        mi_gradient, mi_estimation = self.mi_estimator(z1, z2)
        mi_gradient = mi_gradient.mean()
        mi_estimation = mi_estimation.mean()

        # Symmetrized Kullback-Leibler divergence
        kl_1_2 = p_z1_given_v1.log_prob(z1) - p_z2_given_v2.log_prob(z1)
        kl_2_1 = p_z2_given_v2.log_prob(z2) - p_z1_given_v1.log_prob(z2)
        skl = (kl_1_2 + kl_2_1).mean() / 2.

        # Update the value of beta according to the policy
        beta = self.beta_scheduler(self.iterations)

        # Logging the components
        self._add_loss_item('loss/I_z1_z2', mi_estimation.item())
        self._add_loss_item('loss/SKL_z1_z2', skl.item())
        self._add_loss_item('loss/beta', beta)

        # Computing the loss function
        loss = - mi_gradient + beta * skl

        return loss

Liao-Zhuolin

关注

3
点赞
踩
9

收藏

觉得还不错? 一键收藏
5
评论
通过多视图信息瓶颈学习鲁棒表征

论文题目：Learning Robust Representations via Multi-View Information BottleneckSummary论文通过将两个视图学习得到的公共信息作为有用表征，将两个视图不共享的部分信息看作是冗余信息，最终两个视图之间相互学习得到标签信息丰富和鲁棒性强的表征。Problem Statement论文旨在通过优化信息瓶颈理论，构建损失函数使得表征能够含有更多的标签信息和鲁棒性。Method通过理论证明，得到两个损失函数，一个用于学习表征，
复制链接

扫一扫

专栏目录