三元组损失(Triplet loss)及TensorFlow(v1)的实现

三元组损失(Triplet loss)

Abstract

Triplet Loss - Special applications: Face recognition & Neural style transfer | Coursera

三元:

  • Anchor
  • Positive
  • Negative

Anchor与Positive差距小,Anchor与Negative差距大。

为什么叫三元损失,因为计算时需要用到上面3种样本

目标:A与P的差距(d(A,P))小于A与N的差距(d(A,N))

∣ ∣ f ( A ) − f ( P ) ∣ ∣ 2 ≤ ∣ ∣ f ( A ) − f ( N ) ∣ ∣ 2 ||f(A)-f(P)||^2\leq||f(A)-f(N)||^2 f(A)f(P)2f(A)f(N)2

这里, f ( ∗ ) f(*) f()代表的是样本A的向量

变换为:

d ( A , P ) − d ( A , N ) ≤ 0 d(A,P)-d(A,N)\leq 0 d(A,P)d(A,N)0

假如学到一个函数,使得所有输出为0,即 f ( ∗ ) = 0 f(*)=0 f()=0,上述不等式也成立。

这个时候可以加入间隔(margin)来避免这种情况

  • 👉svm中margin的概念

在这里插入图片描述

于是乎,公示变为

d ( A , P ) − d ( A , N ) + α ≤ 0 , α   a . k . a   m a r g i n d(A,P)-d(A,N)+\alpha \leq 0,\alpha \ a.k.a \ margin d(A,P)d(A,N)+α0,α a.k.a margin

也就是说,d(A,P)和d(A,N)的间距(gap)要大于等于margin

损失函数

  • 对于单个样本来说

L ( A , P , N ) = m a x ( ∣ ∣ f ( A ) − f ( P ) ∣ ∣ 2 − ∣ ∣ f ( A ) − f ( N ) ∣ ∣ 2 + α , 0 ) \mathcal L(A,P,N)=max(||f(A)-f(P)||^2-||f(A)-f(N)||^2+\alpha,0) L(A,P,N)=max(f(A)f(P)2f(A)f(N)2+α,0)

  • 对于数据集来说

J = ∑ i = 0 c l s L ( A i , P i , N i ) \mathcal J=\sum_{i=0}^{cls}\mathcal L(A^i,P^i,N^i) J=i=0clsL(Ai,Pi,Ni)

注意,A,P对应该不只一对。比如有1k人的数据,那么数据可能有10k个A,P对。这会保证有同一个人有不同的照片

🚩训练时,如果A,P,N满足随机选择条件,那么公式 L ( ∗ ) \mathcal L(*) L()将会很容易满足条件,这意味着神经网络将学不到多少东西。所以我们需要选择d(A,P)和d(A,N)相近的样本(hard sample),即 d ( A , P ) ≈ d ( A , N ) d(A,P)\approx d(A,N) d(A,P)d(A,N),这样,网络才有动力将d(A,P)减小,d(A,N)增大

Advance

这是一篇tensorflow实现的文章,为了更好的理解实现代码,去看原文

👇check here to know how to implement triplet loss

omoindrot/tensorflow-triplet-loss

Triplet Loss and Online Triplet Mining in TensorFlow

Triplet loss is known to be difficult to implement, especially if you add the constraints of building a computational graph in TensorFlow.

Triplet loss in this case is a way to learn good embeddings for each face. In the embedding space, faces from the same person should be close together and form well separated clusters.

这这个情况下(指人脸识别)三元损失是一种为每个人脸学到良好embedding(特性向量)的方式。在embedding空间中,同一个人的人脸(的特征向量)应该彼此靠近,并且形成分离良好的簇(cluster)。

The goal of the triplet loss is to make sure that:

  • Two examples with the same label have their embeddings close together in the embedding space
  • Two examples with different labels have their embeddings far away.

即,增大类间距,减少类内距。

However, we don’t want to push the train embeddings of each label to collapse into very small clusters. The only requirement is that given two positive examples of the same class(即A,P) and one negative example(即N), the negative should be farther away than the positive by some margin. This is very similar to the margin used in SVMs, and here we want the clusters of each class to be separated by the margin.

即通过类似svm的方式扩大类间距,这个距离就是margin

🏆Triplet mining

从损失的值来看,triplets有3中类别

  • easy triplets:loss=0,因为 d ( A , P ) + m a r g i n < d ( A , N ) d(A,P)+margin<d(A,N) d(A,P)+margin<d(A,N)
  • hard triplets: d ( A , P ) > d ( A , N ) d(A,P)>d(A,N) d(A,P)>d(A,N)
  • semi-hard triplets: d ( A , P ) < d ( A , N ) < d ( A , P ) + m a r g i n d(A,P)<d(A,N)<d(A,P)+margin d(A,P)<d(A,N)<d(A,P)+margin

在这里插入图片描述

显然,triplets越hard,P和A靠的越近,loss越大,模型学到的越多。

Choosing what kind of triplets we want to train on will greatly impact our metrics. In the original Facenet paper, they pick a random semi-hard negative for every pair of anchor and positive, and train on these triplets.

既然easy triplets的loss为0,那我们只能选semi-hard和hard样本来给网络学习。那么,如何选择呢?

在线和离线的triplet挖掘/Offline and Online triplet mining

这里,mine=sample

offline

at the beginning of each epoch for instance. We compute all the embeddings on the training set, and then only select hard or semi-hard triplets. We can then train one epoch on these triplets.

训练一轮,计算所有的embedding,然后选择hard和semi-hard进行下一轮的训练。随着训练的进行,hard和semi-hard triplets samples应该是逐渐减少的。

online

The idea here is to compute useful triplets on the fly, for each batch of inputs. Given a batch of B examples (for instance B images of faces), we compute the B embeddings and we then can find a maximum of B 3 B^3 B3 triplets. Of course, most of these triplets are not valid (i.e. they don’t have 2 positives and 1 negative).

在每个batch的输入中动态计算有用的triplets。倘如有一个batch的样本,样本数为B,那么我们最高能够找到 B 3 B^3 B3个triplets(排列组合,全部满足2个positive和1个negative的情况)

在这里插入图片描述

online mining的策略:

In online mining, we have computed a batch of B embeddings from a batch of B B B inputs. Now we want to generate triplets from these B embeddings.

无论何时,我们都有3个值, i , j , k ∈ [ 1 , B ] i,j,k\in[1,B] i,j,k[1,B],**如果样本 i , j i,j i,j独立且拥有相同的标签,样本 k k k拥有不同的标签,那么我们称 ( i , j , k ) (i,j,k) (i,j,k)是一个有效的三元组。**剩下的就是如何使用一个好的策略从有效的三元组中来采样我们需要计算loss的三元组。

假设,在一个Batch中, B = P K B=PK B=PK,即batch有P个人的各自K张图片组成,K typically equals 4.有两种策略:

  • **batch all.**select all the valid triplets, and average the loss on the hard and semi-hard triplets.
    • 不考虑easy triplets,因为他们的loss=0,当取均值的时候将会拉低整体的loss
    • 这个将会产生 P K ( K − 1 ) ( P K − K ) PK(K-1)(PK-K) PK(K1)(PKK)个triplets, P K PK PK个anchor, K − 1 K-1 K1个positive, P K − K PK-K PKK个negative(不放回采样)
  • **batch hard.**for each anchor, select the hardest positive (biggest distance d ( a , p ) d(a,p) d(a,p)) and the hardest negative among the batch.(选择类间距的最小值d(a,n)和类内距的最大值d(a,p)作为hardest sample)
    • 将产生 P K PK PK个triplets(biggest?smallest?二选一?)
    • the selected triplets are the hardest among the batch

According to the paper cited above, the batch hard strategy yields the best performance:

Additionally, the selected triplets can be considered moderate triplets, since they are the hardest within a small subset of the data, which is exactly what is best for learning with the triplet loss.

Implementation

  • offline inefficient implementation

计算 d ( ∗ ) d(*) d()

anchor_output = ...    # shape [None, 128]
positive_output = ...  # shape [None, 128]
negative_output = ...  # shape [None, 128]

d_pos = tf.reduce_sum(tf.square(anchor_output - positive_output), 1)
d_neg = tf.reduce_sum(tf.square(anchor_output - negative_output), 1)

batch hard

loss = tf.maximum(0.0, margin + d_pos - d_neg)
loss = tf.reduce_mean(loss)

anchor_output, positive_output ,negative_output是网络对3个样本的输出,即 B B B anchors , B B B positive, B B B negative

🏵A better implementation with online triplet mining

实现:

omoindrot/tensorflow-triplet-loss

  • 1.计算距离矩阵

用矩阵的方法计算 d ( A , P ) d(A,P) d(A,P) d ( A , N ) d(A,N) d(A,N),L2范数。源代码中的定义

def _pairwise_distances(embeddings, squared=False):
    """Compute the 2D matrix of distances between all the embeddings.

    Args:
        embeddings: tensor of shape (batch_size, embed_dim)
        squared: Boolean. If true, output is the pairwise squared euclidean distance matrix.
                 If false, output is the pairwise euclidean distance matrix.

    Returns:
        pairwise_distances: tensor of shape (batch_size, batch_size)
    """
    # Get the dot product between all embeddings
    # shape (batch_size, batch_size)
    dot_product = tf.matmul(embeddings, tf.transpose(embeddings))
		#这里[N,M]->[N,N]

    # Get squared L2 norm for each embedding. We can just take the diagonal of `dot_product`.
    # This also provides more numerical stability (the diagonal of the result will be exactly 0).
    # shape (batch_size,)
    square_norm = tf.diag_part(dot_product)
		#对角线里面的值是每个元素的l2范数,即x^2,x in X,见第一个笔记
    # Compute the pairwise distance matrix as we have:
    # ||a - b||^2 = ||a||^2  - 2 <a, b> + ||b||^2
    # shape (batch_size, batch_size)
    distances = tf.expand_dims(square_norm, 0) - 2.0 * dot_product + tf.expand_dims(square_norm, 1)
		#dot_product:[N,N]
		#sqauare_norm:[N],0 dim expand [[n,n,n]]这里会广播,1 dim expand [[n],[n],[n]],n in N
		
    # Because of computation errors, some distances might be negative so we put everything >= 0.0
    distances = tf.maximum(distances, 0.0)

    if not squared:
        # Because the gradient of sqrt is infinite when distances == 0.0 (ex: on the diagonal)
        # we need to add a small epsilon where distances == 0.0
        mask = tf.to_float(tf.equal(distances, 0.0))#应该返回的是一个shape相同的为零元素的索引的稀疏矩阵
        distances = distances + mask * 1e-16#

        distances = tf.sqrt(distances)

        # Correct the epsilon added: set the distances on the mask to be exactly 0.0
        distances = distances * (1.0 - mask)

    return distances

最后,该函数返回一个对角线为0的矩阵,且关于对角线对称

在这里插入图片描述

Batch All strategy

In this strategy, we want to compute the triplet loss on almost all triplets. In the TensorFlow graph, we want to create a 3D tensor of shape ( B , B , B ) (B,B,B) (B,B,B),where the element at index ( i , j , k ) (i,j,k) (i,j,k)contains the loss for triplet ( i , j , k ) (i,j,k) (i,j,k)

We then get a 3D mask of the valid triplets with function_get_triplet_mask.Here,mask[i, j, k]is true if ( i , j , k ) (i,j,k) (i,j,k) is a valid triplet.

Finally, we set to 0 the loss of the invalid triplets and take the average over the positive triplets.

def batch_all_triplet_loss(labels, embeddings, margin, squared=False):
    """Build the triplet loss over a batch of embeddings.

    We generate all the valid triplets and average the loss over the positive ones.

    Args:
        labels: labels of the batch, of size (batch_size,)
        embeddings: tensor of shape (batch_size, embed_dim)
        margin: margin for triplet loss
        squared: Boolean. If true, output is the pairwise squared euclidean distance matrix.
                 If false, output is the pairwise euclidean distance matrix.

    Returns:
        triplet_loss: scalar tensor containing the triplet loss
    """
    # Get the pairwise distance matrix
    pairwise_dist = _pairwise_distances(embeddings, squared=squared)#[N,N]

		#这里重要,anchor是和batch相等
    # Compute a 3D tensor of size (batch_size, batch_size, batch_size)
    # triplet_loss[i, j, k] will contain the triplet loss of anchor=i, positive=j, negative=k
    # Uses broadcasting where the 1st argument has shape (batch_size, batch_size, 1)
    # and the 2nd (batch_size, 1, batch_size)
    anchor_positive_dist = tf.expand_dims(pairwise_dist, 2)#[N,N,1]
    anchor_negative_dist = tf.expand_dims(pairwise_dist, 1)#[N,1,N],
    triplet_loss = anchor_positive_dist - anchor_negative_dist + margin#[N,N,N]

    # Put to zero the invalid triplets
    # (where label(a) != label(p) or label(n) == label(a) or a == p)
    mask = _get_triplet_mask(labels)#通过mask过滤掉无效的triplet,变成稀疏,pk(k-1)(pk-k)
    mask = tf.to_float(mask)
    triplet_loss = tf.multiply(mask, triplet_loss)

    # Remove negative losses (i.e. the easy triplets)
    triplet_loss = tf.maximum(triplet_loss, 0.0)

    # Count number of positive triplets (where triplet_loss > 0)
    valid_triplets = tf.to_float(tf.greater(triplet_loss, 1e-16))
    num_positive_triplets = tf.reduce_sum(valid_triplets)
    num_valid_triplets = tf.reduce_sum(mask)
    fraction_positive_triplets = num_positive_triplets / (num_valid_triplets + 1e-16)

    # Get final mean triplet loss over the positive valid triplets
    triplet_loss = tf.reduce_sum(triplet_loss) / (num_positive_triplets + 1e-16)

    return triplet_loss, fraction_positive_triplets

Batch hard strategy

In this strategy, we want to find the hardest positive and negative for each anchor.

hardest positive

maximum distance of valid pairs (a,p)

方法:得到成对的距离矩阵,然后通过mask得到关于d(a,p)的稀疏矩阵,最后在每一行获取最大的距离

hardest negative

def batch_hard_triplet_loss(labels, embeddings, margin, squared=False):
    """Build the triplet loss over a batch of embeddings.

    For each anchor, we get the hardest positive and hardest negative to form a triplet.

    Args:
        labels: labels of the batch, of size (batch_size,)
        embeddings: tensor of shape (batch_size, embed_dim)
        margin: margin for triplet loss
        squared: Boolean. If true, output is the pairwise squared euclidean distance matrix.
                 If false, output is the pairwise euclidean distance matrix.

    Returns:
        triplet_loss: scalar tensor containing the triplet loss
    """
    # Get the pairwise distance matrix
    pairwise_dist = _pairwise_distances(embeddings, squared=squared)

    # For each anchor, get the hardest positive
    # First, we need to get a mask for every valid positive (they should have same label)
    mask_anchor_positive = _get_anchor_positive_triplet_mask(labels)
    mask_anchor_positive = tf.to_float(mask_anchor_positive)

    # We put to 0 any element where (a, p) is not valid (valid if a != p and label(a) == label(p))
    anchor_positive_dist = tf.multiply(mask_anchor_positive, pairwise_dist)

    # shape (batch_size, 1)
    hardest_positive_dist = tf.reduce_max(anchor_positive_dist, axis=1, keepdims=True)

    # For each anchor, get the hardest negative
    # First, we need to get a mask for every valid negative (they should have different labels)
    mask_anchor_negative = _get_anchor_negative_triplet_mask(labels)
    mask_anchor_negative = tf.to_float(mask_anchor_negative)

    # We add the maximum value in each row to the invalid negatives (label(a) == label(n))
    max_anchor_negative_dist = tf.reduce_max(pairwise_dist, axis=1, keepdims=True)
    anchor_negative_dist = pairwise_dist + max_anchor_negative_dist * (1.0 - mask_anchor_negative)

    # shape (batch_size,)
    hardest_negative_dist = tf.reduce_min(anchor_negative_dist, axis=1, keepdims=True)

    # Combine biggest d(a, p) and smallest d(a, n) into final triplet loss
    triplet_loss = tf.maximum(hardest_positive_dist - hardest_negative_dist + margin, 0.0)

    # Get final mean triplet loss
    triplet_loss = tf.reduce_mean(triplet_loss)

    return triplet_loss

The final step is to combine these into the triplet loss:

triplet_loss **=** tf.maximum(hardest_positive_dist **-** hardest_negative_dist **+** margin, 0.0)

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
实现triplet loss,需要以下步骤: 1. 定义模型架构:这包括定义模型的输入和输出,以及模型的层次结构。 2. 定义损失函数:triplet loss是一个三元组损失函数,它需要计算三个样本之间的距离。具体来说,对于给定的一个锚点样本和两个正负样本,我们需要计算锚点和正样本之间的距离(d_ap)以及锚点和负样本之间的距离(d_an)。然后,我们需要将它们放在一个损失函数中,使得 d_ap < d_an。 3. 训练模型:在训练过程中,我们需要随机选择一个锚点样本和两个正负样本,计算它们之间的距离,并基于损失函数更新模型参数。 下面是一个使用Keras实现triplet loss的示例代码: ```python from keras.layers import Input, Dense, Lambda from keras.models import Model import keras.backend as K def triplet_loss(y_true, y_pred, alpha = 0.2): """ Triplet loss function. """ anchor, positive, negative = y_pred[:,0], y_pred[:,1], y_pred[:,2] # Calculate Euclidean distances pos_dist = K.sum(K.square(anchor - positive), axis = -1) neg_dist = K.sum(K.square(anchor - negative), axis = -1) # Calculate loss basic_loss = pos_dist - neg_dist + alpha loss = K.maximum(basic_loss, 0.0) return loss def build_model(input_shape): """ Build a triplet loss model. """ # Define the input tensor input_tensor = Input(shape=input_shape) # Define the shared embedding layer shared_layer = Dense(128, activation='relu') # Define the anchor, positive, and negative inputs anchor_input = Input(shape=input_shape) positive_input = Input(shape=input_shape) negative_input = Input(shape=input_shape) # Generate the embeddings anchor_embedding = shared_layer(anchor_input) positive_embedding = shared_layer(positive_input) negative_embedding = shared_layer(negative_input) # Use lambda layers to calculate the Euclidean distance between the anchor # and positive embedding, as well as the anchor and negative embedding positive_distance = Lambda(lambda x: K.sum(K.square(x[0]-x[1]), axis=1, keepdims=True))([anchor_embedding, positive_embedding]) negative_distance = Lambda(lambda x: K.sum(K.square(x[0]-x[1]), axis=1, keepdims=True))([anchor_embedding, negative_embedding]) # Concatenate the distances and add the alpha parameter distances = Lambda(lambda x: K.concatenate([x[0], x[1], x[2]], axis=1))([positive_distance, negative_distance, positive_distance-negative_distance]) # Define the model model = Model(inputs=[anchor_input, positive_input, negative_input], outputs=distances) # Compile the model with the triplet loss function model.compile(loss=triplet_loss, optimizer='adam') return model ``` 在这个例子中,我们使用了Keras来定义模型架构和损失函数。我们定义了一个triplet_loss函数,它接受三个参数:y_true,y_pred和alpha。y_true和y_pred是标签和预测值,而alpha是一个超参数,用于控制正负样本之间的距离。triplet_loss函数计算三个样本之间的距离,并返回一个损失值。 我们还定义了一个build_model函数,它使用Keras定义了一个三元组损失模型。该模型接受三个输入:锚点样本,正样本和负样本,并生成三个嵌入向量。我们使用Lambda层来计算锚点和正样本,以及锚点和负样本之间的距离。然后,我们将这些距离连接在一起,并将它们作为模型的输出。最后,我们使用compile方法来编译模型,并将triplet_loss函数作为损失函数传递给它。 在训练过程中,我们将输入数据分成锚点样本,正样本和负样本,并将它们传递给模型。模型将返回三个距离,我们可以使用这些距离来计算损失并更新模型参数。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值