目录
17-ICCV-No Fuss Distance Metric Learning using Proxies
20-CVPR-Proxy Anchor Loss for Deep Metric Learning
17-ICCV-No Fuss Distance Metric Learning using Proxies
Heaviside step function
L梯度处处为0->替代损失
- margin-based triplet loss
hinge function合页损失函数[x ]+=max(x,0)
- Neighborhood Component Analysis (NCA)
采样:k个类,|D|=n个样本 x y z
Proxy Ranking Loss
|P|<<|D|,P作为所有数据点的近似
x的代表:离x最近的p。
代理近似误差:所有数据点中最差的近似。
静态代理分配
每个代表关联一个语义标签,根据标签给数据点x分配代表。
不需要采样三元组,只需要采样anchor x。
动态代理分配
没有语义标签,给x分配最近的代表。
proxy-based loss是triplet loss的严格上界,不需要采样,收敛速度快。
class ProxyNCA(torch.nn.Module):
def __init__(self,
nb_classes,
sz_embedding,
smoothing_const = 0.1,
scaling_x = 1,
scaling_p = 3
):
torch.nn.Module.__init__(self)
# initialize proxies s.t. norm of each proxy ~1 through div by 8
# i.e. proxies.norm(2, dim=1)) should be close to [1,1,...,1]
# TODO: use norm instead of div 8, because of embedding size
self.proxies = Parameter(torch.randn(nb_classes, sz_embedding) / 8)
self.smoothing_const = smoothing_const
self.scaling_x = scaling_x
self.scaling_p = scaling_p
def forward(self, X, T):
P = F.normalize(self.proxies, p = 2, dim = -1) * self.scaling_p
X = F.normalize(X, p = 2, dim = -1) * self.scaling_x
D = torch.cdist(X, P) ** 2
T = binarize_and_smooth_labels(T, len(P), self.smoothing_const)
# note that compared to proxy nca, positive included in denominator
loss = torch.sum(-T * F.log_softmax(-D, -1), -1)
return loss.mean()
scaling_x,scaling_p?
代码:https://github.com/dichotomies/proxy-nca
20-CVPR-Proxy Anchor Loss for Deep Metric Learning
N-pair loss、Lifted Structure loss:没有利用batch中的全部数据,元组采样->调整超参数。
Proxy-NCA loss:没有利用数据-数据的关系,关联每个数据点的只有代表。
s(x,p)余弦相似度
LSE Log-Sum-Exp function
解决上溢下溢
Proxy Anchor Loss
每个代表作为一个anchor,和batch中的所有数据点关联。
P表示所有代表,P+正代表
根据样本的困难程度以不同的力度来拉近或者拉远embedding vectors
class Proxy_Anchor(torch.nn.Module):
def __init__(self, nb_classes, sz_embed, mrg = 0.1, alpha = 32):
torch.nn.Module.__init__(self)
# Proxy Anchor Initialization
self.proxies = torch.nn.Parameter(torch.randn(nb_classes, sz_embed).cuda())
nn.init.kaiming_normal_(self.proxies, mode='fan_out')
self.nb_classes = nb_classes
self.sz_embed = sz_embed
self.mrg = mrg
self.alpha = alpha
def forward(self, X, T):
P = self.proxies
cos = F.linear(l2_norm(X), l2_norm(P)) # Calcluate cosine similarity
P_one_hot = binarize(T = T, nb_classes = self.nb_classes)
N_one_hot = 1 - P_one_hot
pos_exp = torch.exp(-self.alpha * (cos - self.mrg))
neg_exp = torch.exp(self.alpha * (cos + self.mrg))
with_pos_proxies = torch.nonzero(P_one_hot.sum(dim = 0) != 0).squeeze(dim = 1) # The set of positive proxies of data in the batch
num_valid_proxies = len(with_pos_proxies) # The number of positive proxies
P_sim_sum = torch.where(P_one_hot == 1, pos_exp, torch.zeros_like(pos_exp)).sum(dim=0)
N_sim_sum = torch.where(N_one_hot == 1, neg_exp, torch.zeros_like(neg_exp)).sum(dim=0)
pos_term = torch.log(1 + P_sim_sum).sum() / num_valid_proxies
neg_term = torch.log(1 + N_sim_sum).sum() / self.nb_classes
loss = pos_term + neg_term
return loss
代码:https://github.com/tjddus9597/Proxy-Anchor-CVPR2020/tree/51db57031e38f75c03f69bbdfad1a3233afd9787