论文:Enhancing Diversity in Teacher-Student Networks via Asymmetric branches for Unsupervised person re-identification
代码:https://github.com/chenhao2345/ABMT
Motivation:
The generated pseudo labels are generally very noisy. The noise is mainly from several inevitable factors, such as the strong domain gaps and the imperfection of clustering. In this way, an unsupervised Re-ID problem is naturally transferred into Generating pseudo labels and Learning from noisy labels problems.
对于上述标签噪声问题,一个流行的方法是训练成对的网络。但是这个 方法容易陷入局部最优。解决这个问题的方法主要依赖于选择不同的训练 样本、不同的初始化、数据增强。(To handle noisy labels, one of the most popular approaches is to train paired networks so that each network helps to correct its peer, e.g., two-student networks in Coteaching [9] and two-teacher-two-student networks in MMT[8]. However, these paired models with identical structure are prone to converge to each other and get stuck in a local minimum. There are several attempts to alleviate this problem, such as Co-teaching+ [28], ACT [27] and MMT [8]. These attempts of keeping divergence between paired models are mainly based on either different training sample selection [28, 27] or different initialization and data augmentation[8].)
所以,本文设计了非对称的神经网络,其利用clustering-based的硬标签和teacher-based的 软标签。(In this paper, we propose a strong alternative by designing asymmetric neural network structure in the Mean Teacher Model。 In this paper, we also use both clustering-based hard labels and teacher-based soft labels in our baseline.)
含义:通过不对称的网络,可以鼓励模型具有差异性,避免进入局部最优,解决coupling problem(the student and the teacher quickly converge to each other, which prevents them from exploring more diversified information)。
方法框架
每个epoch都要重新对classifier进行初始化:将clustering中心取平均初始化分类器。
优化的代码函数:
inputs_1, inputs_2, targets = self._parse_data(target_inputs)
# forward
f_out_t1, f_out_t1_m, p_out_t1, p_out_t1_m = self.model_1(inputs_1)
f_out_t1_ema, f_out_t1_ema_m, p_out_t1_ema, p_out_t1_ema_m = self.model_1_ema(inputs_1)
loss_ce = (self.criterion_ce(p_out_t1, targets) + self.criterion_ce(p_out_t1_m, targets))/2
loss_ce_soft = (self.criterion_ce_soft(p_out_t1, p_out_t1_ema_m) + self.criterion_ce_soft(p_out_t1_m, p_out_t1_ema))/2
loss_tri_soft = (self.criterion_tri_soft(f_out_t1, f_out_t1_ema_m, targets) + self.criterion_tri_soft(f_out_t1_m, f_out_t1_ema, targets))/2
loss = (loss_ce + loss_ce_soft)/2 + loss_tri_soft
optimizer.zero_grad()
loss.backward()
optimizer.step()
硬标签函数:
class CrossEntropyLabelSmooth(nn.Module):
"""Cross entropy loss with label smoothing regularizer.
Reference:
Szegedy et al. Rethinking the Inception Architecture for Computer Vision. CVPR 2016.
Equation: y = (1 - epsilon) * y + epsilon / K.
Args:
num_classes (int): number of classes.
epsilon (float): weight.
"""
def __init__(self, num_classes, epsilon=0.1):
super(CrossEntropyLabelSmooth, self).__init__()
self.num_classes = num_classes
self.epsilon = epsilon
self.logsoftmax = nn.LogSoftmax(dim=1).cuda()
def forward(self, inputs, targets):
"""
Args:
inputs: prediction matrix (before softmax) with shape (batch_size, num_classes)
targets: ground truth labels with shape (num_classes)
"""
log_probs = self.logsoftmax(inputs)
targets = torch.zeros_like(log_probs).scatter_(1, targets.unsqueeze(1), 1)
targets = (1 - self.epsilon) * targets + self.epsilon / self.num_classes
loss = (- targets * log_probs).mean(0).sum()
return loss
监督的软标签函数:
class SoftEntropy(nn.Module):
def __init__(self):
super(SoftEntropy, self).__init__()
self.logsoftmax = nn.LogSoftmax(dim=1).cuda()
def forward(self, inputs, targets):
log_probs = self.logsoftmax(inputs)
loss = (- F.softmax(targets, dim=1).detach() * log_probs).mean(0).sum()
return loss
软triplet loss函数:
class SoftTripletLoss(nn.Module):
def __init__(self, margin=None, normalize_feature=False):
super(SoftTripletLoss, self).__init__()
self.margin = margin
self.normalize_feature = normalize_feature
def forward(self, emb1, emb2, label):
if self.normalize_feature:
# equal to cosine similarity
emb1 = F.normalize(emb1)
emb2 = F.normalize(emb2)
mat_dist = euclidean_dist(emb1, emb1)
assert mat_dist.size(0) == mat_dist.size(1)
N = mat_dist.size(0)
mat_sim = label.expand(N, N).eq(label.expand(N, N).t()).float()
dist_ap, dist_an, ap_idx, an_idx = _batch_hard(mat_dist, mat_sim, indice=True)
assert dist_an.size(0)==dist_ap.size(0)
triple_dist = torch.stack((dist_ap, dist_an), dim=1)
triple_dist = F.log_softmax(triple_dist, dim=1)
mat_dist_ref = euclidean_dist(emb2, emb2)
dist_ap_ref = torch.gather(mat_dist_ref, 1, ap_idx.view(N,1).expand(N,N))[:,0]
dist_an_ref = torch.gather(mat_dist_ref, 1, an_idx.view(N,1).expand(N,N))[:,0]
triple_dist_ref = torch.stack((dist_ap_ref, dist_an_ref), dim=1)
triple_dist_ref = F.softmax(triple_dist_ref, dim=1).detach()
loss = (- triple_dist_ref * triple_dist).mean(0).sum()
return loss
解决coupling problem的证明
作者测量了训练图像的欧氏距离,距离越大代表更具有多样性。