对mtcnn的人脸对齐的理解

最新推荐文章于 2023-07-17 14:33:12 发布

liguiyuan112

最新推荐文章于 2023-07-17 14:33:12 发布

阅读量4.7k

点赞数 3

分类专栏：人脸识别 AI

本文链接：https://blog.csdn.net/u012505617/article/details/89813007

版权

AI 同时被 2 个专栏收录

42 篇文章 2 订阅

订阅专栏

人脸识别

13 篇文章 4 订阅

订阅专栏

概念理解

人脸识的流程：人脸检测 ——人脸对齐——特征提取——相似度对比

人脸对齐也是关键的一步，在不同的应用场景下，会直接影响到人脸识别的结果。因为是否进行人脸对齐，会影响到提取到的特征，对齐前后提取到的特征是有差别的。

人脸对齐（矫正）：就是检测到人脸角度不正，关键点不对齐，然后需要对齐操作。

人脸对齐前后的效果对比如下图，发现对齐后效果还是挺好的。

那么我们要怎么实施人脸对齐呢？大致的思路是：先设定一个src作为标准的人脸关键点的位置，然后和我们检测到的人脸关键点dst进行相似变换，变换的过程包括旋转、平移、缩放，这样就得到一个齐次变换矩阵M，然后把M作为参数进行仿射变换得到对齐后的人脸图片。

代码

可以结合下代码做更好的理解：

def preprocess(img, bbox=None, landmark=None, **kwargs):
  if isinstance(img, str):                                    # 判断一个对象是否是一个已知的类型，类似type()
    img = read_image(img, **kwargs)
  M = None
  image_size = []
  str_image_size = kwargs.get('image_size', '')
  if len(str_image_size)>0:                                   # 得到图片的image_size，这里用112x112
    image_size = [int(x) for x in str_image_size.split(',')]
    if len(image_size)==1:
      image_size = [image_size[0], image_size[0]]
    assert len(image_size)==2
    assert image_size[0]==112
    assert image_size[0]==112 or image_size[1]==96
  if landmark is not None:                                    # 如果landmark不为none，就计算出M
    assert len(image_size)==2
    src = np.array([                                          # 人脸的5个关键点的位置，是固定的
      [30.2946, 51.6963],
      [65.5318, 51.5014],
      [48.0252, 71.7366],
      [33.5493, 92.3655],
      [62.7299, 92.2041] ], dtype=np.float32 )
    if image_size[1]==112:                                    # 如果为112，则要把这些坐标的横坐标都加上8.0
      src[:,0] += 8.0                                         # 那么8.0是怎么计算的呢？(112-96)/2 = 8.0
    dst = landmark.astype(np.float32)                         # 目标关键点，设置一下它的数据类型

    tform = trans.SimilarityTransform()                       # 引用 class SimilarityTransform()
    tform.estimate(dst, src)                                  # 从一组对应的点估计转换
    M = tform.params[0:2,:]                                   # 得到(3, 3) 的齐次变换矩阵
    #M = cv2.estimateRigidTransform( dst.reshape(1,5,2), src.reshape(1,5,2), False)

  if M is None:                                               # 如果通过上面的变换没有找到齐次变换矩阵，就用以下的方法来调整bbox
    if bbox is None: #use center crop                         # 如果没有bbox，用中心来进行裁剪
      det = np.zeros(4, dtype=np.int32)
      det[0] = int(img.shape[1]*0.0625)
      det[1] = int(img.shape[0]*0.0625)
      det[2] = img.shape[1] - det[0]
      det[3] = img.shape[0] - det[1]
    else:                                                     # 直接使用bbox
      det = bbox
    margin = kwargs.get('margin', 44)                         # margin的值一般为0.2，表示两个类之间的间距
    bb = np.zeros(4, dtype=np.int32)                          # 4个关键点坐标
    bb[0] = np.maximum(det[0]-margin/2, 0)
    bb[1] = np.maximum(det[1]-margin/2, 0)
    bb[2] = np.minimum(det[2]+margin/2, img.shape[1])
    bb[3] = np.minimum(det[3]+margin/2, img.shape[0])
    ret = img[bb[1]:bb[3],bb[0]:bb[2],:]                      # 得到4个关键点坐标
    if len(image_size)>0:
      ret = cv2.resize(ret, (image_size[1], image_size[0]))   # 图片缩放到112
    return ret 
  else: #do align using landmark
    assert len(image_size)==2

    #src = src[0:3,:]
    #dst = dst[0:3,:]


    #print(src.shape, dst.shape)
    #print(src)
    #print(dst)
    #print(M)
    warped = cv2.warpAffine(img,M,(image_size[1],image_size[0]), borderValue = 0.0) # 进行仿射变换

    #tform3 = trans.ProjectiveTransform()
    #tform3.estimate(src, dst)
    #warped = trans.warp(img, tform3, output_shape=_shape)
    return warped

class SimilarityTransform(EuclideanTransform):
    """2D similarity transformation of the form:

        X = a0 * x - b0 * y + a1 =
          = s * x * cos(rotation) - s * y * sin(rotation) + a1

        Y = b0 * x + a0 * y + b1 =
          = s * x * sin(rotation) + s * y * cos(rotation) + b1

    where ``s`` is a scale factor and the homogeneous transformation matrix is::

        [[a0  b0  a1]
         [b0  a0  b1]
         [0   0    1]]

    The similarity transformation extends the Euclidean transformation with a
    single scaling factor in addition to the rotation and translation
    parameters.

    Parameters
    ----------
    matrix : (3, 3) array, optional
        Homogeneous transformation matrix.
    scale : float, optional
        Scale factor.
    rotation : float, optional
        Rotation angle in counter-clockwise direction as radians.
    translation : (tx, ty) as array, list or tuple, optional
        x, y translation parameters.

    Attributes
    ----------
    params : (3, 3) array
        Homogeneous transformation matrix.

    """

    def __init__(self, matrix=None, scale=None, rotation=None,
                 translation=None):
        params = any(param is not None
                     for param in (scale, rotation, translation))

        if params and matrix is not None:
            raise ValueError("You cannot specify the transformation matrix and"
                             " the implicit parameters at the same time.")
        elif matrix is not None:
            if matrix.shape != (3, 3):
                raise ValueError("Invalid shape of transformation matrix.")
            self.params = matrix
        elif params:
            if scale is None:
                scale = 1
            if rotation is None:
                rotation = 0
            if translation is None:
                translation = (0, 0)

            self.params = np.array([
                [math.cos(rotation), - math.sin(rotation), 0],
                [math.sin(rotation),   math.cos(rotation), 0],
                [                 0,                    0, 1]
            ])
            self.params[0:2, 0:2] *= scale
            self.params[0:2, 2] = translation
        else:
            # default to an identity transform
            self.params = np.eye(3)

    def estimate(self, src, dst):
        """Estimate the transformation from a set of corresponding points.

        You can determine the over-, well- and under-determined parameters
        with the total least-squares method.

        Number of source and destination coordinates must match.

        Parameters
        ----------
        src : (N, 2) array
            Source coordinates.
        dst : (N, 2) array
            Destination coordinates.

        Returns
        -------
        success : bool
            True, if model estimation succeeds.

        """

        self.params = _umeyama(src, dst, True)

        return True

    @property
    def scale(self):
        if abs(math.cos(self.rotation)) < np.spacing(1):
            # sin(self.rotation) == 1
            scale = self.params[1, 0]
        else:
            scale = self.params[0, 0] / math.cos(self.rotation)
        return scale

2维相似度变换公式：

$\large X = a_0\cdot x - b_0\cdot y + a_1 \\ = s\cdot x\cdot \cos (rotation)-s\cdot y\cdot sin(rotation)+a_1$

$\large X = b_0\cdot x - a_0\cdot y + b_1 \\ = s\cdot x\cdot \sin (rotation) + s\cdot y\cdot cos(rotation)+b_1$

公式中s是缩放因子，齐次变换矩阵是

[[a0 b0 a1]
[b0 a0 b1]
[0 0 1 ]]

参数：

matrix : (3, 3) 数组，可选的齐次变换矩阵

scale : 缩放因子

rotation : 逆时针旋转角度为弧度

translation : (tx, ty) 是一个 array, list or tuple, 转换参数

params : (3, 3) 数组，齐次变换矩阵

除了旋转和平移参数外，相似变换还扩展了具有单个比例因子的欧几里得变换。从一组相应的点估计转换，可以使用总最小二乘法确定过、好和欠的参数，且要求源坐标和目标坐标的数量必须匹配。

liguiyuan112

关注

3
点赞
踩
21

收藏

觉得还不错? 一键收藏
打赏
8
评论
对mtcnn的人脸对齐的理解

概念理解人脸识的流程：人脸检测 ——人脸对齐——特征提取——相似度对比人脸对齐也是关键的一步，在不同的应用场景下，会直接影响到人脸识别的结果。因为是否进行人脸对齐，会影响到提取到的特征，对齐前后提取到的特征是有差别的。人脸对齐（矫正）：就是检测到人脸角度不正，关键点不对齐，然后需要对齐操作。人脸对齐前后的效果对比如下图，发现对齐后效果还是挺好的。那么我们要怎么实施...
复制链接

扫一扫