【人脸识别】FaceNet（一）---Triplet Loss

最新推荐文章于 2024-04-27 09:57:21 发布

1273545169

最新推荐文章于 2024-04-27 09:57:21 发布

阅读量8.1k

点赞数 12

文章标签： facenet

本文链接：https://blog.csdn.net/baidu_27643275/article/details/79222206

版权

本文分以下三个部分来讲解： siamese network、Triplet loss、FaceNet中的Triplet Selection

（一）siamese network孪生网络

对于两个不同的输入，运行相同的卷积神经网络，然后比较它们，这一般叫做 Siamese network。

这里写图片描述
若一张图片经过一般的CNN网络（包括CONV层、POOL层、FC层），最终得到全连接层FC，该FC层可以看成是原始图片的编码，表征了原始图片的关键特征。

图片之间的相似度可以用图片的空间距离来表征，将两张图片的距离定义为这两张图片的编码之差的范数
$d(x_{1},x_{2} )= ||f(x_{1})-f(x_{2})||^{2}$

**如何训练？**利用梯度下降算法，不断更新网络参数，使得属于同一人的图片之间 $d$ 值很小，而不同人的图片之间 $d$ 值很大。

（二）Triplet loss（三元组损失函数）

Triplet Loss需要每个样本包含三张图片：
目标（Anchor）、正例（Positive）、反例（Negative）
这里写图片描述

loss函数推导过程：

$d(A,P)=||f(A)-f(P)||^{2}$ 、 $d(A,N)=||f(A)-f(N)||^{2}$

$d (A, P) 和 d (A, N)$ 之间存在如下关系：

$d(A,P) <d(A,N) \Rightarrow d(A,P)-d(A,N)<0 $

存在一个参数 $\alpha，\alpha>0$

$\Rightarrow d(A,P)-d(A,N) \leq -\alpha \Rightarrow d(A,P)-d(A,N) +\alpha \leq 0$

Triplet Loss定义如下： $l(A,P,N)=max(||f(A)-f(P)||^{2}-||f(A)-f(N)||^{2}+\alpha，0)$

为了更好地训练网络，我们需要选择那些训练有“难度”的三元组，也就是选择的三元组满足：
$d(A,P)\simeq d(A,N)$
人为选择A与P相差较大，A与N相差较小。

最后模型效果：
如果两个图片是同一个人，那么它们的d就会很小，如果两个图片不是同一个人，它们的d会很大。

这里写图片描述

（三）谷歌人脸识别系统FaceNet
FaceNet采用的方法是通过卷积神经网络学习将图像映射到欧几里得空间。空间距离直接和图片相似度相关：同一个人的不同图像在空间距离很小，不同人的图像在空间中有较大的距离。

这里写图片描述

Triplet Selection

在线生成triplets，即在每个mini-batch中进行筛选positive/negative样本。每个mini-batch中，我们对单个个体选择40张图片作为正样本，随机筛选其它图片作为负样本。

样本选取的理想情况：

模型想要达到的效果：

a minimal number of exemplars of any one identity is present in each mini-batch

单个类别的全部样本必须在一个mini-batch中

In our experiments we sample the training data such that around 40 faces are selected per identity per minibatch. Additionally, randomly sampled negative faces are added to each mini-batch.

一个mini-batch中，每个类别有40张图片，先组成A-P对，然后再随机抽取满足条件的N组成（A,P,N)三元组。

Instead of picking the hardest positive, we use all anchor-positive pairs in a mini-batch while still selecting the hard negatives. We found in practice that the all anchor-positive method was more stable and converged slightly faster（收敛更快） at the beginning of training

具体过程：
1、调用sample_people()方法从训练数据集中抽取一组图片
2、计算得到这组图片在当时的网络模型中的embedding，保存在emb_array当中。
3、调用select_triplets()得到（A,P,N）三元组

每个batch有多类图，每类图有多张

（A,N)需满足的条件：

all_neg = np.where(neg_dists_sqr-pos_dist_sqr<alpha)[0] # VGG Face

实现如下：

1、从数据集中选择图片，组成一个batch

#从数据集中进行抽样图片，参数为训练数据集，每一个batch抽样多少人，每个人抽样多少张
def sample_people(dataset, people_per_batch, images_per_person):
    #总共应该抽样多少张 默认：people_per_batch：45  images_per_person：40
    nrof_images = people_per_batch * images_per_person
  
    #数据集中一共有多少人的图像
    nrof_classes = len(dataset)
    #每个人的索引
    class_indices = np.arange(nrof_classes)
    #随机打乱一下
    np.random.shuffle(class_indices)
    
    i = 0
    #保存抽样出来的图像的路径
    image_paths = []
    #抽样的样本是属于哪一个人的，作为label
    num_per_class = []
    sampled_class_indices = []
    # Sample images from these classes until we have enough
    # 不断抽样直到达到指定数量
    while len(image_paths)<nrof_images:
        #从第i个人开始抽样
        class_index = class_indices[i]
        #第i个人有多少张图片
        nrof_images_in_class = len(dataset[class_index])
        #这些图片的索引
        image_indices = np.arange(nrof_images_in_class)
        np.random.shuffle(image_indices)
        #从第i个人中抽样的图片数量
        nrof_images_from_class = min(nrof_images_in_class, images_per_person, nrof_images-len(image_paths))
        idx = image_indices[0:nrof_images_from_class]
        #抽样出来的人的路径
        image_paths_for_class = [dataset[class_index].image_paths[j] for j in idx]
        #图片的label
        sampled_class_indices += [class_index]*nrof_images_from_class
        image_paths += image_paths_for_class
        #第i个人抽样了多少张
        num_per_class.append(nrof_images_from_class)
        i+=1
  
    return image_paths, num_per_class

2、三元组的选取

def select_triplets(embeddings, nrof_images_per_class, image_paths, people_per_batch, alpha):
    """ Select the triplets for training
    """
    trip_idx = 0
    #某个人的图片的embedding在emb_arr中的开始的索引
    emb_start_idx = 0
    num_trips = 0
    triplets = []
    
    # VGG Face: Choosing good triplets is crucial and should strike a balance between
    #  selecting informative (i.e. challenging) examples and swamping training with examples that
    #  are too hard. This is achieve by extending each pair (a, p) to a triplet (a, p, n) by sampling
    #  the image n at random, but only between the ones that violate the triplet loss margin. The
    #  latter is a form of hard-negative mining, but it is not as aggressive (and much cheaper) than
    #  choosing the maximally violating example, as often done in structured output learning.
    #遍历每一个人
    for i in xrange(people_per_batch):
        #这个人有多少张图片
        nrof_images = int(nrof_images_per_class[i])
        #遍历第i个人的所有图片
        for j in xrange(1,nrof_images):
            #第j张图的embedding在emb_arr 中的位置
            a_idx = emb_start_idx + j - 1
            #第j张图跟其他所有图片的欧氏距离
            neg_dists_sqr = np.sum(np.square(embeddings[a_idx] - embeddings), 1)
            #遍历每一对可能的(anchor,postive)图片，记为(a,p)吧
            for pair in xrange(j, nrof_images): # For every possible positive pair.
                #第p张图片在emb_arr中的位置
                p_idx = emb_start_idx + pair
                #(a,p)之前的欧式距离
                pos_dist_sqr = np.sum(np.square(embeddings[a_idx]-embeddings[p_idx]))
                #同一个人的图片不作为negative，所以将距离设为无穷大
                neg_dists_sqr[emb_start_idx:emb_start_idx+nrof_images] = np.NaN
                #all_neg = np.where(np.logical_and(neg_dists_sqr-pos_dist_sqr<alpha, pos_dist_sqr<neg_dists_sqr))[0]  # FaceNet selection
                #其他人的图片中有哪些图片与a之间的距离-p与a之间的距离小于alpha的
                all_neg = np.where(neg_dists_sqr-pos_dist_sqr<alpha)[0] # VGG Face selecction
                #所有可能的negative
                nrof_random_negs = all_neg.shape[0]
                #如果有满足条件的negative
                if nrof_random_negs>0:
                    #从中随机选取一个作为n
                    rnd_idx = np.random.randint(nrof_random_negs)
                    n_idx = all_neg[rnd_idx]
                    # 选到(a,p,n)作为三元组
                    triplets.append((image_paths[a_idx], image_paths[p_idx], image_paths[n_idx]))
                    #print('Triplet %d: (%d, %d, %d), pos_dist=%2.6f, neg_dist=%2.6f (%d, %d, %d, %d, %d)' % 
                    #    (trip_idx, a_idx, p_idx, n_idx, pos_dist_sqr, neg_dists_sqr[n_idx], nrof_random_negs, rnd_idx, i, j, emb_start_idx))
                    trip_idx += 1

                num_trips += 1

        emb_start_idx += nrof_images

    np.random.shuffle(triplets)
    return triplets, num_trips, len(triplets)

1273545169

关注

12
点赞
踩
48

收藏

觉得还不错? 一键收藏
0
评论
【人脸识别】FaceNet（一）---Triplet Loss

样本选取的理想情况：模型想要达到的效果： a minimal number of exemplars of any one identity is present in each mini-batch单个个体的全部样本必须存在于每个mini-batch中 In our experiments we sample the training data su...
复制链接

扫一扫