三元组损失(Triplet loss)及TensorFlow(v1)的实现

三元组损失(Triplet loss)

Abstract

Triplet Loss - Special applications: Face recognition & Neural style transfer | Coursera

三元:

  • Anchor
  • Positive
  • Negative

Anchor与Positive差距小,Anchor与Negative差距大。

为什么叫三元损失,因为计算时需要用到上面3种样本

目标:A与P的差距(d(A,P))小于A与N的差距(d(A,N))

∣ ∣ f ( A ) − f ( P ) ∣ ∣ 2 ≤ ∣ ∣ f ( A ) − f ( N ) ∣ ∣ 2 ||f(A)-f(P)||^2\leq||f(A)-f(N)||^2 f(A)f(P)2f(A)f(N)2

这里, f ( ∗ ) f(*) f()代表的是样本A的向量

变换为:

d ( A , P ) − d ( A , N ) ≤ 0 d(A,P)-d(A,N)\leq 0 d(A,P)d(A,N)0

假如学到一个函数,使得所有输出为0,即 f ( ∗ ) = 0 f(*)=0 f()=0,上述不等式也成立。

这个时候可以加入间隔(margin)来避免这种情况

  • 👉svm中margin的概念

在这里插入图片描述

于是乎,公示变为

d ( A , P ) − d ( A , N ) + α ≤ 0 , α   a . k . a   m a r g i n d(A,P)-d(A,N)+\alpha \leq 0,\alpha \ a.k.a \ margin d(A,P)d(A,N)+α0,α a.k.a margin

也就是说,d(A,P)和d(A,N)的间距(gap)要大于等于margin

损失函数

  • 对于单个样本来说

L ( A , P , N ) = m a x ( ∣ ∣ f ( A ) − f ( P ) ∣ ∣ 2 − ∣ ∣ f ( A ) − f ( N ) ∣ ∣ 2 + α , 0 ) \mathcal L(A,P,N)=max(||f(A)-f(P)||^2-||f(A)-f(N)||^2+\alpha,0) L(A,P,N)=max(f(A)f(P)2f(A)f(N)2+α,0)

  • 对于数据集来说

J = ∑ i = 0 c l s L ( A i , P i , N i ) \mathcal J=\sum_{i=0}^{cls}\mathcal L(A^i,P^i,N^i) J=i=0clsL(Ai,Pi,Ni)

注意,A,P对应该不只一对。比如有1k人的数据,那么数据可能有10k个A,P对。这会保证有同一个人有不同的照片

🚩训练时,如果A,P,N满足随机选择条件,那么公式 L ( ∗ ) \mathcal L(*) L()将会很容易满足条件,这意味着神经网络将学不到多少东西。所以我们需要选择d(A,P)和d(A,N)相近的样本(hard sample),即 d ( A , P ) ≈ d ( A , N ) d(A,P)\approx d(A,N) d(A,P)d(A,N),这样,网络才有动力将d(A,P)减小,d(A,N)增大

Advance

这是一篇tensorflow实现的文章,为了更好的理解实现代码,去看原文

👇check here to know how to implement triplet loss

omoindrot/tensorflow-triplet-loss

Triplet Loss and Online Triplet Mining in TensorFlow

Triplet loss is known to be difficult to implement, especially if you add the constraints of building a computational graph in TensorFlow.

Triplet loss in this case is a way to learn good embeddings for each face. In the embedding space, faces from the same person should be close together and form well separated clusters.

这这个情况下(指人脸识别)三元损失是一种为每个人脸学到良好embedding(特性向量)的方式。在embedding空间中,同一个人的人脸(的特征向量)应该彼此靠近,并且形成分离良好的簇(cluster)。

The goal of the triplet loss is to make sure that:

  • Two examples with the same label have their embeddings close together in the embedding space
  • Two examples with different labels have their embeddings far away.

即,增大类间距,减少类内距。

However, we don’t want to push the train embeddings of each label to collapse into very small clusters. The only requirement is that given two positive examples of the same class(即A,P) and one negative example(即N), the negative should be farther away than the positive by some margin. This is very similar to the margin used in SVMs, and here we want the clusters of each class to be separated by the margin.

即通过类似svm的方式扩大类间距,这个距离就是margin

🏆Triplet mining

从损失的值来看,triplets有3中类别

  • easy triplets:loss=0,因为 d ( A , P ) + m a r g i n < d ( A , N ) d(A,P)+margin<d(A,N) d(A,P)+margin<d(A,N)
  • hard triplets: d ( A , P ) > d ( A , N ) d(A,P)>d(A,N) d(A,P)>d(A,N)
  • semi-hard triplets: d ( A , P ) < d ( A , N ) < d ( A , P ) + m a r g i n d(A,P)<d(A,N)<d(A,P)+margin d(A,P)<d(A,N)<d(A,P)+margin

在这里插入图片描述

显然,triplets越hard,P和A靠的越近,loss越大,模型学到的越多。

Choosing what kind of triplets we want to train on will greatly impact our metrics. In the original Facenet paper, they pick a random semi-hard negative for every pair of anchor and positive, and train on these triplets.

既然easy triplets的loss为0,那我们只能选semi-hard和hard样本来给网络学习。那么,如何选择呢?

在线和离线的triplet挖掘/Offline and Online triplet mining

这里,mine=sample

offline

at the beginning of each epoch for instance. We compute all the embeddings on the training set, and then only select hard or semi-hard triplets. We can then train one epoch on these triplets.

训练一轮,计算所有的embedding,然后选择hard和semi-hard进行下一轮的训练。随着训练的进行,hard和semi-hard triplets samples应该是逐渐减少的。

online

The idea here is to compute useful triplets on the fly, for each batch of inputs. Given a batch of B examples (for instance B images of faces), we compute the B embeddings and we then can find a maximum of B 3 B^3 B3 triplets. Of course, most of these triplets are not valid (i.e. they don’t have 2 positives and 1 negative).

在每个batch的输入中动态计算有用的triplets。倘如有一个batch的样本,样本数为B,那么我们最高能够找到 B 3 B^3 B3个triplets(排列组合,全部满足2个positive和1个negative的情况)

在这里插入图片描述

online mining的策略:

In online mining, we have computed a batch of B embeddings from a batch of B B B inputs. Now we want to generate triplets from these B embeddings.

无论何时,我们都有3个值, i , j , k ∈ [ 1 , B ] i,j,k\in[1,B] i,j,k[1,B],**如果样本 i , j i,j i,j独立且拥有相同的标签,样本 k k k拥有不同的标签,那么我们称 ( i , j , k ) (i,j,k) (i,j,k)是一个有效的三元组。**剩下的就是如何使用一个好的策略从有效的三元组中来采样我们需要计算loss的三元组。

假设,在一个Batch中, B = P K B=PK B=PK,即batch有P个人的各自K张图片组成,K typically equals 4.有两种策略:

  • **batch all.**select all the valid triplets, and average the loss on the hard and semi-hard triplets.
    • 不考虑easy triplets,因为他们的loss=0,当取均值的时候将会拉低整体的loss
    • 这个将会产生 P K ( K − 1 ) ( P K − K ) PK(K-1)(PK-K) PK(K1)(PKK)个triplets, P K PK PK个anchor, K − 1 K-1 K1个positive, P K − K PK-K PKK个negative(不放回采样)
  • **batch hard.**for each anchor, select the hardest positive (biggest distance d ( a , p ) d(a,p) d(a,p)) and the hardest negative among the batch.(选择类间距的最小值d(a,n)和类内距的最大值d(a,p)作为hardest sample)
    • 将产生 P K PK PK个triplets(biggest?smallest?二选一?)
    • the selected triplets are the hardest among the batch

According to the paper cited above, the batch hard strategy yields the best performance:

Additionally, the selected triplets can be considered moderate triplets, since they are the hardest within a small subset of the data, which is exactly what is best for learning with the triplet loss.

Implementation

  • offline inefficient implementation

计算 d ( ∗ ) d(*) d()

anchor_output = ...    # shape [None, 128]
positive_output = ...  # shape [None, 128]
negative_output = ...  # shape [None, 128]

d_pos = tf.reduce_sum(tf.square(anchor_output - positive_output), 1)
d_neg = tf.reduce_sum(tf.square(anchor_output - negative_output), 1)

batch hard

loss = tf.maximum(0.0, margin + d_pos - d_neg)
loss = tf.reduce_mean(loss)

anchor_output, positive_output ,negative_output是网络对3个样本的输出,即 B B B anchors , B B B positive, B B B negative

🏵A better implementation with online triplet mining

实现:

omoindrot/tensorflow-triplet-loss

  • 1.计算距离矩阵

用矩阵的方法计算 d ( A , P ) d(A,P) d(A,P) d ( A , N ) d(A,N) d(A,N),L2范数。源代码中的定义

def _pairwise_distances(embeddings, squared=False):
    """Compute the 2D matrix of distances between all the embeddings.

    Args:
        embeddings: tensor of shape (batch_size, embed_dim)
        squared: Boolean. If true, output is the pairwise squared euclidean distance matrix.
                 If false, output is the pairwise euclidean distance matrix.

    Returns:
        pairwise_distances: tensor of shape (batch_size, batch_size)
    """
    # Get the dot product between all embeddings
    # shape (batch_size, batch_size)
    dot_product = tf.matmul(embeddings, tf.transpose(embeddings))
		#这里[N,M]->[N,N]

    # Get squared L2 norm for each embedding. We can just take the diagonal of `dot_product`.
    # This also provides more numerical stability (the diagonal of the result will be exactly 0).
    # shape (batch_size,)
    square_norm = tf.diag_part(dot_product)
		#对角线里面的值是每个元素的l2范数,即x^2,x in X,见第一个笔记
    # Compute the pairwise distance matrix as we have:
    # ||a - b||^2 = ||a||^2  - 2 <a, b> + ||b||^2
    # shape (batch_size, batch_size)
    distances = tf.expand_dims(square_norm, 0) - 2.0 * dot_product + tf.expand_dims(square_norm, 1)
		#dot_product:[N,N]
		#sqauare_norm:[N],0 dim expand [[n,n,n]]这里会广播,1 dim expand [[n],[n],[n]],n in N
		
    # Because of computation errors, some distances might be negative so we put everything >= 0.0
    distances = tf.maximum(distances, 0.0)

    if not squared:
        # Because the gradient of sqrt is infinite when distances == 0.0 (ex: on the diagonal)
        # we need to add a small epsilon where distances == 0.0
        mask = tf.to_float(tf.equal(distances, 0.0))#应该返回的是一个shape相同的为零元素的索引的稀疏矩阵
        distances = distances + mask * 1e-16#

        distances = tf.sqrt(distances)

        # Correct the epsilon added: set the distances on the mask to be exactly 0.0
        distances = distances * (1.0 - mask)

    return distances

最后,该函数返回一个对角线为0的矩阵,且关于对角线对称

在这里插入图片描述

Batch All strategy

In this strategy, we want to compute the triplet loss on almost all triplets. In the TensorFlow graph, we want to create a 3D tensor of shape ( B , B , B ) (B,B,B) (B,B,B),where the element at index ( i , j , k ) (i,j,k) (i,j,k)contains the loss for triplet ( i , j , k ) (i,j,k) (i,j,k)

We then get a 3D mask of the valid triplets with function_get_triplet_mask.Here,mask[i, j, k]is true if ( i , j , k ) (i,j,k) (i,j,k) is a valid triplet.

Finally, we set to 0 the loss of the invalid triplets and take the average over the positive triplets.

def batch_all_triplet_loss(labels, embeddings, margin, squared=False):
    """Build the triplet loss over a batch of embeddings.

    We generate all the valid triplets and average the loss over the positive ones.

    Args:
        labels: labels of the batch, of size (batch_size,)
        embeddings: tensor of shape (batch_size, embed_dim)
        margin: margin for triplet loss
        squared: Boolean. If true, output is the pairwise squared euclidean distance matrix.
                 If false, output is the pairwise euclidean distance matrix.

    Returns:
        triplet_loss: scalar tensor containing the triplet loss
    """
    # Get the pairwise distance matrix
    pairwise_dist = _pairwise_distances(embeddings, squared=squared)#[N,N]

		#这里重要,anchor是和batch相等
    # Compute a 3D tensor of size (batch_size, batch_size, batch_size)
    # triplet_loss[i, j, k] will contain the triplet loss of anchor=i, positive=j, negative=k
    # Uses broadcasting where the 1st argument has shape (batch_size, batch_size, 1)
    # and the 2nd (batch_size, 1, batch_size)
    anchor_positive_dist = tf.expand_dims(pairwise_dist, 2)#[N,N,1]
    anchor_negative_dist = tf.expand_dims(pairwise_dist, 1)#[N,1,N],
    triplet_loss = anchor_positive_dist - anchor_negative_dist + margin#[N,N,N]

    # Put to zero the invalid triplets
    # (where label(a) != label(p) or label(n) == label(a) or a == p)
    mask = _get_triplet_mask(labels)#通过mask过滤掉无效的triplet,变成稀疏,pk(k-1)(pk-k)
    mask = tf.to_float(mask)
    triplet_loss = tf.multiply(mask, triplet_loss)

    # Remove negative losses (i.e. the easy triplets)
    triplet_loss = tf.maximum(triplet_loss, 0.0)

    # Count number of positive triplets (where triplet_loss > 0)
    valid_triplets = tf.to_float(tf.greater(triplet_loss, 1e-16))
    num_positive_triplets = tf.reduce_sum(valid_triplets)
    num_valid_triplets = tf.reduce_sum(mask)
    fraction_positive_triplets = num_positive_triplets / (num_valid_triplets + 1e-16)

    # Get final mean triplet loss over the positive valid triplets
    triplet_loss = tf.reduce_sum(triplet_loss) / (num_positive_triplets + 1e-16)

    return triplet_loss, fraction_positive_triplets

Batch hard strategy

In this strategy, we want to find the hardest positive and negative for each anchor.

hardest positive

maximum distance of valid pairs (a,p)

方法:得到成对的距离矩阵,然后通过mask得到关于d(a,p)的稀疏矩阵,最后在每一行获取最大的距离

hardest negative

def batch_hard_triplet_loss(labels, embeddings, margin, squared=False):
    """Build the triplet loss over a batch of embeddings.

    For each anchor, we get the hardest positive and hardest negative to form a triplet.

    Args:
        labels: labels of the batch, of size (batch_size,)
        embeddings: tensor of shape (batch_size, embed_dim)
        margin: margin for triplet loss
        squared: Boolean. If true, output is the pairwise squared euclidean distance matrix.
                 If false, output is the pairwise euclidean distance matrix.

    Returns:
        triplet_loss: scalar tensor containing the triplet loss
    """
    # Get the pairwise distance matrix
    pairwise_dist = _pairwise_distances(embeddings, squared=squared)

    # For each anchor, get the hardest positive
    # First, we need to get a mask for every valid positive (they should have same label)
    mask_anchor_positive = _get_anchor_positive_triplet_mask(labels)
    mask_anchor_positive = tf.to_float(mask_anchor_positive)

    # We put to 0 any element where (a, p) is not valid (valid if a != p and label(a) == label(p))
    anchor_positive_dist = tf.multiply(mask_anchor_positive, pairwise_dist)

    # shape (batch_size, 1)
    hardest_positive_dist = tf.reduce_max(anchor_positive_dist, axis=1, keepdims=True)

    # For each anchor, get the hardest negative
    # First, we need to get a mask for every valid negative (they should have different labels)
    mask_anchor_negative = _get_anchor_negative_triplet_mask(labels)
    mask_anchor_negative = tf.to_float(mask_anchor_negative)

    # We add the maximum value in each row to the invalid negatives (label(a) == label(n))
    max_anchor_negative_dist = tf.reduce_max(pairwise_dist, axis=1, keepdims=True)
    anchor_negative_dist = pairwise_dist + max_anchor_negative_dist * (1.0 - mask_anchor_negative)

    # shape (batch_size,)
    hardest_negative_dist = tf.reduce_min(anchor_negative_dist, axis=1, keepdims=True)

    # Combine biggest d(a, p) and smallest d(a, n) into final triplet loss
    triplet_loss = tf.maximum(hardest_positive_dist - hardest_negative_dist + margin, 0.0)

    # Get final mean triplet loss
    triplet_loss = tf.reduce_mean(triplet_loss)

    return triplet_loss

The final step is to combine these into the triplet loss:

triplet_loss **=** tf.maximum(hardest_positive_dist **-** hardest_negative_dist **+** margin, 0.0)

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
首先,您需要定义 `Enumerate Angular Triplet Loss` 损失函数。这个损失函数的目的是在三元组中最大化目标和负样本之间的角度,并最小化正样本和目标之间的角度。您可以按照以下方式实现这个损失函数: ``` python import torch import torch.nn as nn import torch.nn.functional as F class EnumerateAngularTripletLoss(nn.Module): def __init__(self, margin=0.1, max_violation=False): super(EnumerateAngularTripletLoss, self).__init__() self.margin = margin self.max_violation = max_violation def forward(self, anchor, positive, negative): # 计算每个样本的向量范数 anchor_norm = torch.norm(anchor, p=2, dim=1, keepdim=True) positive_norm = torch.norm(positive, p=2, dim=1, keepdim=True) negative_norm = torch.norm(negative, p=2, dim=1, keepdim=True) # 计算每个样本的单位向量 anchor_unit = anchor / anchor_norm.clamp(min=1e-12) # 避免除以零 positive_unit = positive / positive_norm.clamp(min=1e-12) negative_unit = negative / negative_norm.clamp(min=1e-12) # 计算每个样本的角度 pos_cosine = F.cosine_similarity(anchor_unit, positive_unit) neg_cosine = F.cosine_similarity(anchor_unit, negative_unit) # 使用 margin 方法计算 loss triplet_loss = F.relu(neg_cosine - pos_cosine + self.margin) if self.max_violation: # 使用 max violation 方法计算 loss neg_cosine_sorted, _ = torch.sort(neg_cosine, descending=True) triplet_loss = torch.mean(F.relu(neg_cosine_sorted[:anchor.size(0)] - pos_cosine + self.margin)) return triplet_loss.mean() ``` 在这个代码中,我们首先计算每个样本的向量范数和单位向量,然后计算每个样本的角度。我们使用 `margin` 参数来控制正样本和目标之间的角度和目标和负样本之间的角度之间的差异。如果 `max_violation` 参数为 True,则使用 max violation 方法计算损失函数。 接下来,您需要使用定义的损失函数来训练您的模型。假设您已经有了一个数据加载器(`data_loader`)、一个模型(`model`)和一个优化器(`optimizer`),您可以按照以下方式实现训练循环: ``` python # 定义损失函数和学习率调度器 criterion = EnumerateAngularTripletLoss() scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1) # 训练循环 for epoch in range(num_epochs): for i, (anchor, positive, negative) in enumerate(data_loader): anchor = anchor.to(device) positive = positive.to(device) negative = negative.to(device) # 前向传递和反向传播 optimizer.zero_grad() loss = criterion(anchor, positive, negative) loss.backward() optimizer.step() # 打印损失函数 if i % 10 == 0: print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, i+1, len(data_loader), loss.item())) # 更新学习率 scheduler.step() ``` 在这个训练循环中,我们首先将数据加载到设备上,然后进行前向传递和反向传播,并使用优化器更新模型的参数。我们还使用学习率调度器来动态地调整学习率。最后,我们打印损失函数并进行下一轮训练。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值