Face recognition problems commonly fall into two categories:
- Face Verification - “is this the claimed person?”. For example, at some airports, you can pass through customs by letting a system scan your passport and then verifying that you (the person carrying the passport) are the correct person. A mobile phone that unlocks using your face is also using face verification. This is a 1:1 matching problem.
- Face Recognition - “who is this person?”. For example, the video lecture showed a face recognition video of Baidu employees entering the office without needing to otherwise identify themselves. This is a 1:K matching problem.
FaceNet learns a neural network that encodes a face image into a vector of 128 numbers. By comparing two such vectors, you can then determine if two pictures are of the same person.
you will:
- Implement the triplet loss function
- Use a pertained model to map face images into 128-dimensional encodings
- Use these encodings to perform face verification and face recognition
1 - Encoding face images into a 128-dimensional vector
The key things you need to know are:
- This network uses 96x96 dimensional RGB images as its input. Specifically, inputs a face image (or batch of m face images) as a tensor of shape (m, nc, nh, hw) = (m, 3, 96, 96)
- It outputs a matrix of shape (m, 128) that encodes each input face image into a 128-dimensional vector
So, an encoding is a good one if:
- The encodings of two images of the same person are quite similar to each other
- The encodings of two images of different persons are very different
FRmodel = faceRecoModel(input_shape=(3, 96, 96))
FRmodel.compile(optimizer='adam', loss=triplet_loss, metrics=['accuracy'])
1.2 - The Triplet Loss
For an image x, we denote its encoding f(x), where f is the function computed by the neural network.
Training will use triplets of images (A, P, N):
- A is an "Anchor image–a picture of a person.
- P is a “” image–a picture of the same person as the Anchor image.
- N is a “Negative” image–a picure of a different person than the Anchor image.
What you should remember:
- Face verification solves an easier 1:1 matching problem; face recognition addresses a harder 1:K matching problem.
- The triplet loss is an effective loss function for training a neural network to learn an encoding of a face image.
- The same encoding can be used for verification and recognition. Measuring distances between two images’ encodings allows you to determine whether they are pictures of the same person.