第4门课程-卷积神经网络-第四周作业(人脸识别)

0- 背景

FaceNet从神经网络中学习到以128维度的向量对人脸图像进行表示。通过对比两个向量的相似度,从而确定两者是否是同一个人。
本文中将采用 triplet loss function(三元组的损失函数),且采用一个预训练的模型对图像进行编码矢量化;在此基础上,再进行人脸的校验(face verification ,1对1问题)和识别(face recognition,1对多问题)。
本文的依赖包如下:

from keras.models import Sequential
from keras.layers import Conv2D, ZeroPadding2D, Activation, Input, concatenate
from keras.models import Model
from keras.layers.normalization import BatchNormalization
from keras.layers.pooling import MaxPooling2D, AveragePooling2D
from keras.layers.merge import Concatenate
from keras.layers.core import Lambda, Flatten, Dense
from keras.initializers import glorot_uniform
from keras.engine.topology import Layer
from keras import backend as K
K.set_image_data_format('channels_first')
import cv2
import os
import numpy as np
from numpy import genfromtxt
import pandas as pd
import tensorflow as tf
from fr_utils import *
from inception_blocks import *

%matplotlib inline
%load_ext autoreload
%autoreload 2

np.set_printoptions(threshold=np.nan)

1- 简单的Face Verification

对比两张图像的相似程度,最简单的是像素与像素之间的直接比较:
这里写图片描述
这种方法对于光照或者方位角度都很敏感,显然不尽人意。为此,我们一般是 f(img) 对两张图像进行编码,再对比两者的相似度。这是在提取了更高级的图像特征后再进行对比,鲁棒性更强。

2- 图像编码为128维的向量

2-1采用卷积神经网络进行编码

FaceNet mode的训练耗时大,为此我们直接导入其他人已经训练过的权重参数。该模型的网络结构(Inception model)在inception_blocks.py中有具体的细节,在这里就不展开介绍了。

该网络的输入时96x96的RGB图像,所以当作为Tensor输入的时候,其尺寸是 (m,nC,nH,nW)=(m,3,96,96) ,其中m是batch size的大小 。注:ConvNet activations我们设置的是”channels first”,也就是说通道数量这个参数在前面,另一种是”channels last” ,就是通道数这个参数在最后( (m,nH,nW,nC) )。该网络的输出尺寸是 (m,128) ,即每个样本都是以一个128维的向量进行编码。

模型创建:

FRmodel = faceRecoModel(input_shape=(3, 96, 96))
print("Total Params:", FRmodel.count_params())

运行结果如下:

Total Params: 3743280

在卷积神经网络的输出层,紧接着的是一个128个神经元的全连接层,以确保其最后的输出维度是128。在这之后,就可以对两张图像128维的编码结果进行比较:
这里写图片描述
当比较结果小于某个阈值时,就可以视为两张图像是同一个人的。
同时也要求,不是同一个人的图像distance值较大,为此引出一个三元组的损失函数,相同的图像distance尽可能小,不同图像的distance值尽可能大。
这里写图片描述

2-2 The Triplet Loss

对于图像 x ,其编码结果f(x)如下:
这里写图片描述
训练的过程都是采用三元组的数据 (A,P,N)

  • A is an “Anchor” image–a picture of a person.
  • P is a “Positive” image–a picture of the same person as the Anchor image.
  • N is a “Negative” image–a picture of a different person than the Anchor image.

该三元组数据取自于训练集, (A(i),P(i),N(i)) 表示第 i 个训练样本。
我们要确保图像 A(i)和正样本图像 P(i) 更接近,且与 负样本图像 N(i) 之间的距离至少为 α

f(A(i))f(P(i))22+α<f(A(i))f(N(i))22

我们通过最小化 “triplet cost”来获取最终的参数:

J=i=1N[f(A(i))f(P(i))22(1)f(A(i))f(N(i))22(2)+α]+(3)

这里的 “ [z]+ ” 等价于 max(z,0)
我们要 尽可能使得term (1)变小,且尽可能使 term (2)变大。

以下是三个人之间的distance:
这里写图片描述
具体的代码实现如下:

# GRADED FUNCTION: triplet_loss

def triplet_loss(y_true, y_pred, alpha = 0.2):
    """
    Implementation of the triplet loss as defined by formula (3)

    Arguments:
    y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
    y_pred -- python list containing three objects:
            anchor -- the encodings for the anchor images, of shape (None, 128)
            positive -- the encodings for the positive images, of shape (None, 128)
            negative -- the encodings for the negative images, of shape (None, 128)

    Returns:
    loss -- real number, value of the loss
    """

    anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]

    ### START CODE HERE ### (≈ 4 lines)
    # Step 1: Compute the (encoding) distance between the anchor and the positive
    pos_dist = tf.reduce_sum(tf.square((tf.subtract(anchor,positive))))
    # Step 2: Compute the (encoding) distance between the anchor and the negative
    neg_dist = tf.reduce_sum(tf.square((tf.subtract(anchor,negative))))
    # Step 3: subtract the two previous distances and add alpha.
    basic_loss = tf.add(tf.subtract(pos_dist,neg_dist),alpha)
    # Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
    loss = tf.reduce_sum(tf.maximum(basic_loss,0))
    ### END CODE HERE ###

    return loss

代码测试:

with tf.Session() as test:
    tf.set_random_seed(1)
    y_true = (None, None, None)
    y_pred = (tf.random_normal([3, 128], mean=6, stddev=0.1, seed = 1),
              tf.random_normal([3, 128], mean=1, stddev=1, seed = 1),
              tf.random_normal([3, 128], mean=3, stddev=4, seed = 1))
    loss = triplet_loss(y_true, y_pred)

    print("loss = " + str(loss.eval()))

运行结果如下:

loss = 350.027

3- 加载预训练模型:

通过加载权重参数的方式加载模型:

FRmodel.compile(optimizer = 'adam', loss = triplet_loss, metrics = ['accuracy'])
load_weights_from_FaceNet(FRmodel)
#其中的权重文件位于weights文件夹中

4- 模型的使用

4-1 Face verification:

每个人展示ID card,然后face recognition system校验ID和人脸是否表示同一个人。
我们需要预先对已有的database中的图像,做一次处理(img_to_encoding(image_path, model)),使得现有图像都被编码成向量。注:img_to_encoding是定义于fr_utils.py中的函数,其实是模型的前向传播过程。
我们对已经收集到的图像做编码如下:
字典方式,key为name,value值即为该name对应图像的编码结果。这里的name就是ID card的作用。

database = {}
database["danielle"] = img_to_encoding("images/danielle.png", FRmodel)
database["younes"] = img_to_encoding("images/younes.jpg", FRmodel)
database["tian"] = img_to_encoding("images/tian.jpg", FRmodel)
database["andrew"] = img_to_encoding("images/andrew.jpg", FRmodel)
database["kian"] = img_to_encoding("images/kian.jpg", FRmodel)
database["dan"] = img_to_encoding("images/dan.jpg", FRmodel)
database["sebastiano"] = img_to_encoding("images/sebastiano.jpg", FRmodel)
database["bertrand"] = img_to_encoding("images/bertrand.jpg", FRmodel)
database["kevin"] = img_to_encoding("images/kevin.jpg", FRmodel)
database["felix"] = img_to_encoding("images/felix.jpg", FRmodel)
database["benoit"] = img_to_encoding("images/benoit.jpg", FRmodel)
database["arnaud"] = img_to_encoding("images/arnaud.jpg", FRmodel)

对于来到Face Verification系统前的人来说,ta输入自己的name(ID card),摄像头会在采集ta的人脸信息,计算出一个128维的特征向量。系统通过name在已有的数据库中找到对应的向量,再与算出的特征向量做对比,如果distance小于某个阈值,则视为是同一个人。
具体实现如下:

# GRADED FUNCTION: verify

def verify(image_path, identity, database, model):
    """
    Function that verifies if the person on the "image_path" image is "identity".

    Arguments:
    image_path -- path to an image
    identity -- string, name of the person you'd like to verify the identity. Has to be a resident of the Happy house.
    database -- python dictionary mapping names of allowed people's names (strings) to their encodings (vectors).
    model -- your Inception model instance in Keras

    Returns:
    dist -- distance between the image_path and the image of "identity" in the database.
    door_open -- True, if the door should open. False otherwise.
    """

    ### START CODE HERE ###

    # Step 1: Compute the encoding for the image. Use img_to_encoding() see example above. (≈ 1 line)
    encoding = img_to_encoding(image_path, model)

    # Step 2: Compute distance with identity's image (≈ 1 line)
    dist = np.linalg.norm(encoding-database[identity])

    # Step 3: Open the door if dist < 0.7, else don't open (≈ 3 lines)
    if dist<0.7:
        print("It's " + str(identity) + ", welcome home!")
        door_open = True
    else:
        print("It's not " + str(identity) + ", please go away")
        door_open = False

    ### END CODE HERE ###

    return dist, door_open

测试如下:

verify("images/camera_0.jpg", "younes", database, FRmodel)

测试结果如下:

It's younes, welcome home!

(0.65938449, True)

再看一个不是同一个人的case:

verify("images/camera_2.jpg", "kian", database, FRmodel)

测试结果如下:

It's not kian, please go away

(0.86225712, False)

4-2 Face Recognition

上述的情况,是需要预先有一个ID card类似的东西。但是如果这个ID card被遗失或者遗忘的话,该系统就尴尬了。为此,采用Face Recognition可以解决这种问题。这种方法其实在在系统内部遍历了ID card对应的编码结果与当前输入图像编码结果的L2 distance,选择最小的,看最小值是否满足阈值范围。

具体实现:

# GRADED FUNCTION: who_is_it

def who_is_it(image_path, database, model):
    """
    Implements face recognition for the happy house by finding who is the person on the image_path image.

    Arguments:
    image_path -- path to an image
    database -- database containing image encodings along with the name of the person on the image
    model -- your Inception model instance in Keras

    Returns:
    min_dist -- the minimum distance between image_path encoding and the encodings from the database
    identity -- string, the name prediction for the person on image_path
    """

    ### START CODE HERE ### 

    ## Step 1: Compute the target "encoding" for the image. Use img_to_encoding() see example above. ## (≈ 1 line)
    encoding = img_to_encoding(image_path, model)

    ## Step 2: Find the closest encoding ##

    # Initialize "min_dist" to a large value, say 100 (≈1 line)
    min_dist = 100

    # Loop over the database dictionary's names and encodings.
    for name in database:
    #for (name, db_enc) in database.items()这个方法也可以

        # Compute L2 distance between the target "encoding" and the current "emb" from the database. (≈ 1 line)
        dist = np.linalg.norm(encoding-database[name])

        # If this distance is less than the min_dist, then set min_dist to dist, and identity to name. (≈ 3 lines)
        if dist<min_dist:
            min_dist = dist
            identity = name

    ### END CODE HERE ###

    if min_dist > 0.7:
        print("Not in the database.")
    else:
        print ("it's " + str(identity) + ", the distance is " + str(min_dist))

    return min_dist, identity

测试:

who_is_it("images/camera_0.jpg", database, FRmodel)

测试结果:

it's younes, the distance is 0.659384

(0.65938449, 'younes')
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值