Learning Representations by Maximizing Mutual Information Across Views.
Learning deep representations by mutual information estimation and maximization.
How do humans sketch objects?
Contrastive Multiview Coding.
On Mutual Information Maximization for Representation Learning.
Multi-view learning overview: Recent progress and new challenges.
Code
两个变量之间的互信息估计:
# Auxiliary network for mutual information estimationclassMIEstimator(nn.Module):def__init__(self, size1, size2):super(MIEstimator, self).__init__()# Vanilla MLP
self.net = nn.Sequential(
nn.Linear(size1 + size2,1024),
nn.ReLU(True),
nn.Linear(1024,1024),
nn.ReLU(True),
nn.Linear(1024,1),)# Gradient for JSD mutual information estimation and EB-based estimationdefforward(self, x1, x2):
pos = self.net(torch.cat([x1, x2],1))# Positive Samples
neg = self.net(torch.cat([torch.roll(x1,1,0), x2],1))#rollreturn-softplus(-pos).mean()- softplus(neg).mean(), pos.mean()- neg.exp().mean()+1
多视图信息瓶颈表征学习模型:
from training.multiview_infomax import MVInfoMaxTrainer
from utils.schedulers import ExponentialScheduler
################ MIB Trainer ################classMIBTrainer(MVInfoMaxTrainer):def__init__(self, beta_start_value=1e-3, beta_end_value=1,
beta_n_iterations=100000, beta_start_iteration=50000,**params):# The neural networks architectures and initialization procedure is analogous to Multi-View InfoMaxsuper(MIBTrainer, self).__init__(**params)# Definition of the scheduler to update the value of the regularization coefficient beta over time
self.beta_scheduler = ExponentialScheduler(start_value=beta_start_value, end_value=beta_end_value,
n_iterations=beta_n_iterations, start_iteration=beta_start_iteration)def_compute_loss(self, data):# Read the two views v1 and v2 and ignore the label y
v1, v2, _ = data
# Encode a batch of data
p_z1_given_v1 = self.encoder_v1(v1)
p_z2_given_v2 = self.encoder_v2(v2)# Sample from the posteriors with reparametrization
z1 = p_z1_given_v1.rsample()
z2 = p_z2_given_v2.rsample()# Mutual information estimation
mi_gradient, mi_estimation = self.mi_estimator(z1, z2)
mi_gradient = mi_gradient.mean()
mi_estimation = mi_estimation.mean()# Symmetrized Kullback-Leibler divergence
kl_1_2 = p_z1_given_v1.log_prob(z1)- p_z2_given_v2.log_prob(z1)
kl_2_1 = p_z2_given_v2.log_prob(z2)- p_z1_given_v1.log_prob(z2)
skl =(kl_1_2 + kl_2_1).mean()/2.# Update the value of beta according to the policy
beta = self.beta_scheduler(self.iterations)# Logging the components
self._add_loss_item('loss/I_z1_z2', mi_estimation.item())
self._add_loss_item('loss/SKL_z1_z2', skl.item())
self._add_loss_item('loss/beta', beta)# Computing the loss function
loss =- mi_gradient + beta * skl
return loss
论文题目:Learning Robust Representations via Multi-View Information BottleneckSummary论文通过将两个视图学习得到的公共信息作为有用表征,将两个视图不共享的部分信息看作是冗余信息,最终两个视图之间相互学习得到标签信息丰富和鲁棒性强的表征。Problem Statement论文旨在通过优化信息瓶颈理论,构建损失函数使得表征能够含有更多的标签信息和鲁棒性。Method通过理论证明,得到两个损失函数,一个用于学习表征,