论文阅读:Generating Talking Face Landmarks from Speech

最新推荐文章于 2023-04-06 13:43:39 发布

live_for_myself

最新推荐文章于 2023-04-06 13:43:39 发布

阅读量634

点赞数 1

分类专栏：论文阅读文章标签：深度学习 pytorch

本文链接：https://blog.csdn.net/landing_guy_/article/details/117456876

版权

论文阅读专栏收录该内容

39 篇文章 13 订阅

订阅专栏

文章目录

前言

给岁月以文明，而不是给文明以岁月。

正文

摘要

这篇文章主要是用了LSTM网络，主要是变换视频帧到一个固定的位置，然后将整个landmarks转变为平均脸来删除身份信息。同时它输入的是log-mel频谱的一阶和二阶时间差作为输入来预测landmarks，计算的误差使用MSE loss和一阶和二阶时间差？

方法

作者使用了GRID数据库进行训练，使用720*576的分辨率视频， 25帧每秒提取帧，音频采样率为44.1kHz
使用40ms的汉宁窗计算音频64位的log-mel频谱，没有加overlap来匹配视频帧。然后计算 log-mel 谱的一阶和二阶时间差异，并将它们用作我们网络的输入（128 维特征序列，两个64自然是128）。

人脸landmarks对齐（Face Landmark Alignment）

将每个视频的第一帧中的两个外眼角简单的固定到图像坐标中的两个固定位置 (180, 200), (420, 200)
然后通过一个6 DOF 的仿射变换，然后用相同的变换变换所有视频帧中的landmarks，具体还需要看程序
这里假设头部不会在视频中显著移动，否则相同的变换无法对齐不同帧中的人脸

从landmarks中删除身份信息（Removing Identity Information from Landmarks）

对齐后不同说话人的人脸大小和大致位置相似，但是他们的形状和嘴部的位置仍然不同, 所以希望在训练网络之前从landmarks中删除身份信息。
具体是这样的：

平均整个训练集中的所有已经对齐的landmarks来计算平均人脸形状。
对于每个face landmarks，计算平均人脸形状与第一帧之间的变换
计算当前帧与第一帧之间的差异，然后把第2步得到的变换矩阵乘以差异？
加上平均结果，获得没有身份的人脸标志（没看懂）

LSTM 网络

我们来瞧瞧这个网络

在这里插入图片描述
这个有四层LSTM，对于输入提供了当前帧和前N帧对数谱的一阶和二阶时间差。输出是预测的当前帧（如果没有添加延迟）或前一帧（如果添加延迟）的面部标志的 x 和 y 坐标。我们引入的延迟量介于 1（40 毫秒）和 5 帧之间（200 毫秒），因为1s分为了25帧。误差是MSE函数。

在这里插入图片描述

程序分析

有部分代码用到的知识放在了附录中。

附录

1. dlib检测人脸

这里用一个小例子单独说明
可以看到这里是很简单的，就是图片检测人脸，然后检测人脸坐标，没了

import cv2
import matplotlib.pyplot as plt
import numpy as np
import dlib


image = cv2.imread('../0001.jpeg')
# 这里的路径是带人脸的图片

detector = dlib.get_frontal_face_detector()

predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')

# gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

faces = detector(image) # 也可以使用参数1或者2放大
pos = []
for face in faces:
    cv2.rectangle(image, (face.left(), face.top()), (face.right(), face.bottom()), (122, 122, 123), 3)
    shape = predictor(image, face)  # 得到68个关键点坐标
    print(shape.parts())


    for pt in shape.parts():
        pt_position = (pt.x, pt.y)
        pos.append(pt_position)

    pos = pos[48:66]
    for position in pos:
        print(position)
        cv2.circle(image, position, 3, (123, 123, 0), -1)


plt.imshow(image[:,:,::-1])
plt.axis('off')
plt.show()

2. 对人脸的处理

其中一段代码用到了这个技术，所以先介绍一下。
这里介绍一下face-morph，就是把一张照片变换为另一张照片。

在这里插入图片描述
背后的想法也很naive，就是通过混合两个图像来创建中间的图像，就像下面的公式：

在这里插入图片描述
当 $\alpha$ 为0时是另一张图， 1的时候是另一张图的样子，对应的操作是像素级。当然这样做效果会不好，如上图。

出现这种问题的原因也很好理解，就是对应像素并不匹配，假如对于图像中的每个像素我们都能神奇的找到对应关系，然后就可以对每个像素用下面的公式：

在这里插入图片描述
$x_i$ 对应图 $I$ 的像素点坐标, $x_j$ 对应图 $J$ 的像素点坐标, $x_m$ 对应要 morph的图的像素位置, 举个例子就是假如说都是眼睛, 在两张图上的位置不同, 可以通过调整参数确定眼睛的新位置.

然后我们确定 morph 的图片每个像素的强度, 也就是颜色吧.

在这里插入图片描述

当然这是比较复杂且不必要的, 应该有更好的方法去做这个。其实我们可以先确定几个点的位置然后其余像素做插值，然后我们看看怎么具体操作。
方法简述就是用三角剖分的形式，在图像之间按三角的范围变换。

Face Morphing

在这里我按作者的意思描述，源网址在这。
作者首先 dlib 检测了68个点，然后在人的右手边耳朵上加了1个点，脖子上加了1个点，左右肩膀上加了2个点，图片四周定位加了8个点，这总共是80个点了(当然越多点越好)。如下图：
在这里插入图片描述

Delaunay Triangulation

中文是德劳内三角化，是三角剖分的一种算法。那么什么是德劳内三角化呢？
在Delaunay三角剖分中，选择的三角形没有点在任何三角形的外接圆内。就像下图， C需要在 $\Delta ABD$ 的外接圆外。
Delaunay三角剖分的一个有趣的特性是它不喜欢“瘦”三角形（即具有一个大角度的三角形）。

在这里插入图片描述

可以看到上图中的B和D点移动了位置，然后为了切分 $∠ B C D$ ，防止它太大，所以切分的三角形变化了。

在这里插入图片描述

最明显的（但不是最有效的）方法是从任何三角形开始，检查任何三角形的外接圆是否包含另一个点。如果有，翻转并继续，直到没有一个三角形外接圆包含点。说到德劳内三角剖分，就需要先了解Voronoi Diagram，也就是维诺图。

Voronoi Diagram

在这里插入图片描述

有些像每两个点之间的垂直平分线, 假如你连接在维诺图中相邻的点，就会得到三角剖分。如下图：
这里的相邻是指的互相接壤。

在这里插入图片描述

放代码，咬人！

使用 subdiv.getTriangleList 获取 Delaunay 三角形列表

#!/usr/bin/python

import cv2
import numpy as np
import random

# Check if a point is inside a rectangle
def rect_contains(rect, point) :
    if point[0] < rect[0] :
        return False
    elif point[1] < rect[1] :
        return False
    elif point[0] > rect[2] :
        return False
    elif point[1] > rect[3] :
        return False
    return True

# Draw a point
def draw_point(img, p, color ) :
    cv2.circle( img, p, 2, color, cv2.FILLED, cv2.LINE_AA, 0 )

# Draw delaunay triangles
def draw_delaunay(img, subdiv, delaunay_color ) :

    triangleList = subdiv.getTriangleList()

    size = img.shape
    r = (0, 0, size[1], size[0])


    for t in triangleList :
        print(t)
        pt1 = (int(t[0]), int(t[1]))
        pt2 = (int(t[2]), int(t[3]))
        pt3 = (int(t[4]), int(t[5]))

        if rect_contains(r, pt1) and rect_contains(r, pt2) and rect_contains(r, pt3) :

            cv2.line(img, pt1, pt2, delaunay_color, 1, cv2.LINE_AA, 0)
            cv2.line(img, pt2, pt3, delaunay_color, 1, cv2.LINE_AA, 0)
            cv2.line(img, pt3, pt1, delaunay_color, 1, cv2.LINE_AA, 0)

# Draw voronoi diagram
def draw_voronoi(img, subdiv) :

    ( facets, centers) = subdiv.getVoronoiFacetList([])

    for i in range(0,len(facets)) :
        ifacet_arr = []
        for f in facets[i] :
            ifacet_arr.append(f)

        ifacet = np.array(ifacet_arr, np.int)
        color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))

        cv2.fillConvexPoly(img, ifacet, color, cv2.LINE_AA, 0);
        ifacets = np.array([ifacet])
        cv2.polylines(img, ifacets, True, (0, 0, 0), 1, cv2.LINE_AA, 0)
        cv2.circle(img, (int(centers[i][0]), int(centers[i][1])), 3, (0, 0, 0), cv2.FILLED, cv2.LINE_AA, 0)

if __name__ == '__main__':

    # Define window names
    win_delaunay = "Delaunay Triangulation"
    win_voronoi = "Voronoi Diagram"

    # Turn on animation while drawing triangles
    animate = True

    # Define colors for drawing.
    delaunay_color = (255,255,255)
    points_color = (0, 0, 255)

    # Read in the image.
    img = cv2.imread("ted.jpg")

    # Keep a copy around
    img_orig = img.copy()

    # Rectangle to be used with Subdiv2D
    size = img.shape
    print(size)
    rect = (0, 0, size[1], size[0])

    # Create an instance of Subdiv2D
    subdiv = cv2.Subdiv2D(rect)


    # Create an array of points.
    points = []

    # Read in the points from a text file
    with open("ted_points.txt") as file :
        for line in file :
            x, y = line.split()
            points.append((int(x), int(y)))


    # Insert points into subdiv
    for p in points :
        subdiv.insert(p)


        # Show animation
        if animate :
            img_copy = img_orig.copy()
            # Draw delaunay triangles
            draw_delaunay( img_copy, subdiv, (255, 255, 255) )
            cv2.imshow(win_delaunay, img_copy)
            cv2.waitKey(100)

    # Draw delaunay triangles
    draw_delaunay( img, subdiv, (255, 255, 255) )


    # Draw points
    for p in points :
        draw_point(img, p, (0,0,255))

    # Allocate space for Voronoi Diagram
    img_voronoi = np.zeros(img.shape, dtype = img.dtype)

    # Draw Voronoi diagram
    draw_voronoi(img_voronoi,subdiv)

    # Show results
    cv2.imshow(win_delaunay,img)
    cv2.imshow(win_voronoi,img_voronoi)
    cv2.waitKey(0)

回到正题

我们的目的是进行图片变换，那么现在有了三角区域对应，然后我们可以进行变换了。

在morph 图像中确定特征点的位置，也就是像下图的公式：

在这里插入图片描述

计算仿射变换

现在我们有图片1， 2 的80个点，还有要morph图片的80个点
使用opencv的getAffineTransform函数，计算第一张图到morph图的仿射变换，同理计算图片2和morph图片的仿射变换。 80个点对应149个三角形

Warp triangles (中文直译扭曲三角)

上一步我们获得了仿射变换矩阵，现在我们可以把图片1中对应三角的所有像素变换为morph的图像的，然后重复对所有的三角操作，获得morph的图片，同样的也对图片2进行操作。Opencv对应的函数是 warpAffine。但是warpAffine 接收的是图像而不是三角形，所以trick 是对每个三角创建一个bounding box ，使用warpAffine扭曲在bounding box内的所有像素，然后mask在bounding box外的所有像素。这个三角形的mask是用fillConvexPoly 创造的。确保使用warpAffine是使用blendMode BORDER_REFLECT_101，这能够比较好的隐藏接缝。

Alpha blend warped images
在上一步中，我们获得了图像1和图像2的扭曲版本。这两个图像可以使用公式进行alpha混合，这是最终的变形图像。

上代码！

#!/usr/bin/env python

import numpy as np
import cv2
import sys


# Read points from text file
def readPoints(path):
    # Create an array of points.
    points = []
    # Read points
    with open(path) as file:
        for line in file:
            x, y = line.split()
            points.append((int(x), int(y)))

    return points


# Apply affine transform calculated using srcTri and dstTri to src and
# output an image of size.
def applyAffineTransform(src, srcTri, dstTri, size):
    # Given a pair of triangles, find the affine transform.
    warpMat = cv2.getAffineTransform(np.float32(srcTri), np.float32(dstTri))

    # Apply the Affine Transform just found to the src image
    dst = cv2.warpAffine(src, warpMat, (size[0], size[1]), None, flags=cv2.INTER_LINEAR,
                         borderMode=cv2.BORDER_REFLECT_101)

    return dst


# Warps and alpha blends triangular regions from img1 and img2 to img
def morphTriangle(img1, img2, img, t1, t2, t, alpha):
    # Find bounding rectangle for each triangle
    r1 = cv2.boundingRect(np.float32([t1]))
    r2 = cv2.boundingRect(np.float32([t2]))
    r = cv2.boundingRect(np.float32([t]))

    # Offset points by left top corner of the respective rectangles
    t1Rect = []
    t2Rect = []
    tRect = []

    for i in range(0, 3):
        tRect.append(((t[i][0] - r[0]), (t[i][1] - r[1])))
        t1Rect.append(((t1[i][0] - r1[0]), (t1[i][1] - r1[1])))
        t2Rect.append(((t2[i][0] - r2[0]), (t2[i][1] - r2[1])))

    # Get mask by filling triangle
    mask = np.zeros((r[3], r[2], 3), dtype=np.float32)
    cv2.fillConvexPoly(mask, np.int32(tRect), (1.0, 1.0, 1.0), 16, 0);

    # Apply warpImage to small rectangular patches
    img1Rect = img1[r1[1]:r1[1] + r1[3], r1[0]:r1[0] + r1[2]]
    img2Rect = img2[r2[1]:r2[1] + r2[3], r2[0]:r2[0] + r2[2]]

    size = (r[2], r[3])
    warpImage1 = applyAffineTransform(img1Rect, t1Rect, tRect, size)
    warpImage2 = applyAffineTransform(img2Rect, t2Rect, tRect, size)

    # Alpha blend rectangular patches
    imgRect = (1.0 - alpha) * warpImage1 + alpha * warpImage2

    # Copy triangular region of the rectangular patch to the output image
    img[r[1]:r[1] + r[3], r[0]:r[0] + r[2]] = img[r[1]:r[1] + r[3], r[0]:r[0] + r[2]] * (1 - mask) + imgRect * mask


if __name__ == '__main__':

    filename1 = 'hillary.jpg'
    filename2 = 'ted.jpg'
    alpha = 0.5
    # Read images
    img1 = cv2.imread(filename1)
    img2 = cv2.imread(filename2)

    # Convert Mat to float data type
    img1 = np.float32(img1)
    img2 = np.float32(img2)

    # Read array of corresponding points
    points1 = readPoints('ted_points.txt')
    points2 = readPoints('hillary.txt')
    points = []

    # Compute weighted average point coordinates
    for i in range(0, len(points1)):
        x = (1 - alpha) * points1[i][0] + alpha * points2[i][0]
        y = (1 - alpha) * points1[i][1] + alpha * points2[i][1]
        points.append((x, y))

    # Allocate space for final output
    imgMorph = np.zeros(img1.shape, dtype=img1.dtype)

    # Read triangles from tri.txt
    with open("tri.txt") as file:
        for line in file:
            x, y, z = line.split()

            x = int(x)
            y = int(y)
            z = int(z)

            t1 = [points1[x], points1[y], points1[z]]
            t2 = [points2[x], points2[y], points2[z]]
            t = [points[x], points[y], points[z]]

            # Morph one triangle at a time.
            morphTriangle(img1, img2, imgMorph, t1, t2, t, alpha)

    # Display Result
    cv2.imshow("Morphed Face", np.uint8(imgMorph))
    cv2.waitKey(0)