tensorflow可微分渲染器mesh_renderer说明

最新推荐文章于 2024-07-11 21:08:56 发布

yuanl15

最新推荐文章于 2024-07-11 21:08:56 发布

阅读量3.1k

点赞数

分类专栏：计算机视觉文章标签：深度学习 tensorflow

本文链接：https://blog.csdn.net/yuanl15/article/details/108726220

版权

计算机视觉专栏收录该内容

1 篇文章 0 订阅

订阅专栏

可微分渲染器mesh_renderer

mesh_renderer是一个用于tensorflow框架的可微分3d网格渲染器。mesh_renderer源码可在github上google公司的开源代码内获得（https://github.com/google/tf_mesh_renderer），他是非官方的，底层基于C++，提供了python接口在mesh_renderer.py和rasterize_triangles.py内，通过引入这两个文件提供的tensorflow算子ops可以引入可微分渲染操作。

输入3D顶点坐标和三角面片所包含的3D顶点id，输出是渲染图像每个像素所对应的三角面片id和此三角面片3个顶点的重心权重。同时渲染器还提供了和像素重心重心权重关于顶点位置的微分。

安装方法

注意这个渲染器只支持linux平台。

通过Bazel从源码编译https://github.com/google/tf_mesh_renderer

通过pip直接安装

pip install mesh_renderer

python API

camera_utils.py

def look_at(eye, center, world_up):
  """Computes camera viewing matrices.

  Functionality mimes gluLookAt (third_party/GL/glu/include/GLU/glu.h).

  Args:

    eye: 2-D float32 tensor with shape [batch_size, 3] containing the XYZ world

        space position of the camera.输入相机在世界坐标系中的三维位置。

    center: 2-D float32 tensor with shape [batch_size, 3] containing a position

        along the center of the camera's gaze.输入相机中心向前方向上一点的坐标。

    world_up: 2-D float32 tensor with shape [batch_size, 3] specifying the

        world's up direction; the output camera will have no tilt with respect

        to this direction.输入世界坐标系向上方向上的向量。

  Returns:

    A [batch_size, 4, 4] float tensor containing a right-handed camera

    extrinsics matrix that maps points from world space to points in eye space.
    输出世界坐标系转到相机坐标系的转换矩阵
        
  """
  camera_matrices = tf.matmul(R, T)
  return camera_matrices

look_at函数是仿照openGL中的gluLookAt函数，用来计算世界坐标与相机坐标系的转换矩阵，最终是以齐次式来表达，返回矩阵camera_matrices为旋转矩阵R乘以位移矩阵T。

$R=\begin{bmatrix} ToSide_x & ToSide_y & ToSide_z& 0\\ CamUp_x& CamUp_y & CamUp_z & 0\\ -forward_x& -forward_y & -forward_z & 0\\ 0& 0 & 0 & 1 \end{bmatrix}$

$T=\begin{bmatrix} 1 & 0 & 0 & -eye_x\\ 0 & 1 & 0 & -eye_y\\ 0 & 0 & 1 & -eye_z\\ 0 & 0 & 0 & 1 \end{bmatrix}$

camera_matrices=R*T

def perspective(aspect_ratio, fov_y, near_clip, far_clip):
 """Computes perspective transformation matrices.
  Functionality mimes gluPerspective (third_party/GL/glu/include/GLU/glu.h).
  计算透视投影矩阵
  Args:
    aspect_ratio: float value specifying the image aspect ratio (width/height).
    图像宽高比
    fov_y: 1-D float32 Tensor with shape [batch_size] specifying output vertical
        field of views in degrees.
    垂直方向上视场角
    near_clip: 1-D float32 Tensor with shape [batch_size] specifying near
        clipping plane distance.
    近平面
    far_clip: 1-D float32 Tensor with shape [batch_size] specifying far clipping
        plane distance.
    远平面
  Returns:
    A [batch_size, 4, 4] float tensor that maps from right-handed points in eye
    space to left-handed points in clip space.
    从右手系的点转换为clip空间下的左手系点的转换矩阵
  """
  return perspective_transform

perspective函数是仿照openGL中gluPerspective函数，获取透视矩阵。

$T=\begin{bmatrix} focal_x/2*imageWidth & 0 & 0 & 0\\ 0 & focal_y/2*imageHeight & 0 & 0\\ 0 & 0 & -(farClip+nearClip)/depthRange & -2farClip*nearClip/depthRange\\ 0 & 0 & 1 & 0 \end{bmatrix}$

def transform_homogeneous(matrices, vertices):
  """Applies batched 4x4 homogenous matrix transformations to 3-D vertices.
  The vertices are input and output as as row-major, but are interpreted as
  column vectors multiplied on the right-hand side of the matrices. More
  explicitly, this function computes (MV^T)^T.
  Vertices are assumed to be xyz, and are extended to xyzw with w=1.
  Args:
    matrices: a [batch_size, 4, 4] tensor of matrices.
    输入将物体从世界坐标系转到相机clip space空间的转换矩阵
    vertices: a [batch_size, N, 3] tensor of xyz vertices.
    输入三维物体顶点坐标系
  Returns:
    a [batch_size, N, 4] tensor of xyzw vertices.
    返回物体在相机clip空间下的三维坐标
  Raises:
    ValueError: if matrices or vertices have the wrong number of dimensions.
  """
  if len(matrices.shape) != 3:
    raise ValueError(
        'matrices must have 3 dimensions (missing batch dimension?)')
  if len(vertices.shape) != 3:
    raise ValueError(
        'vertices must have 3 dimensions (missing batch dimension?)')
  homogeneous_coord = tf.ones(
      [tf.shape(vertices)[0], tf.shape(vertices)[1], 1], dtype=tf.float32)
  vertices_homogeneous = tf.concat([vertices, homogeneous_coord], 2)
  return tf.matmul(vertices_homogeneous, matrices, transpose_b=True)

将三维物体坐标乘以转换矩阵并返回，运算中均使用齐次坐标。其中homogeneous_coord是将原三维物体点云转换为齐次坐标，matrices则是转换矩阵perspective_transform*R*T，使用上文中提到的透视矩阵、旋转矩阵、位移矩阵。

rasterize_triangles.py

rasterize模块主要是用于将三维物体光栅化，获取光栅化后的各个像素对应的属性。

def rasterize(world_space_vertices, attributes, triangles, camera_matrices,

              image_width, image_height, background_value):

  """Rasterizes a mesh and computes interpolated vertex attributes.
  Applies projection matrices and then calls rasterize_clip_space().
  Args:
    world_space_vertices: 3-D float32 tensor of xyz positions with shape
      [batch_size, vertex_count, 3].
    输入世界坐标系下的三维物体点云
    attributes: 3-D float32 tensor with shape [batch_size, vertex_count,
      attribute_count]. Each vertex attribute is interpolated across the
      triangle using barycentric interpolation.
    输入点云属性，光栅化的像素
    triangles: 2-D int32 tensor with shape [triangle_count, 3]. Each triplet
      should contain vertex indices describing a triangle such that the
      triangle's normal points toward the viewer if the forward order of the
      triplet defines a clockwise winding of the vertices. Gradients with
      respect to this tensor are not available.
    输入三角面片对应的点云id
    camera_matrices: 3-D float tensor with shape [batch_size, 4, 4] containing
      model-view-perspective projection matrices.
    输入相机坐标系转换矩阵
    image_width: int specifying desired output image width in pixels.
    输入图像宽度
    image_height: int specifying desired output image height in pixels.
    输入图像高度
    background_value: a 1-D float32 tensor with shape [attribute_count]. Pixels
      that lie outside all triangles take this value.
    背景点像素值
  Returns:
    A 4-D float32 tensor with shape [batch_size, image_height, image_width，attribute_count], containing the interpolated vertex attributes at each pixel.
  Raises:
    ValueError: An invalid argument to the method is detected.
  """
  clip_space_vertices = camera_utils.transform_homogeneous(
      camera_matrices, world_space_vertices)
  return rasterize_clip_space(clip_space_vertices, attributes, triangles,
                              image_width, image_height, background_value)

rasterize函数是光栅化的接口，将世界坐标系下的三维物体光栅化到相平面上。

def rasterize_clip_space(clip_space_vertices, attributes, triangles,
                         image_width, image_height, background_value):
  """Rasterizes the input mesh expressed in clip-space (xyzw) coordinates.
  Interpolates vertex attributes using perspective-correct interpolation and
  clips triangles that lie outside the viewing frustum.
  Args:
    clip_space_vertices: 3-D float32 tensor of homogenous vertices (xyzw) with
      shape [batch_size, vertex_count, 4].
输入clip空间下的3d点云
    attributes: 3-D float32 tensor with shape [batch_size, vertex_count,
      attribute_count]. Each vertex attribute is interpolated across the
      triangle using barycentric interpolation.
输入点云对应属性
    triangles: 2-D int32 tensor with shape [triangle_count, 3]. Each triplet
      should contain vertex indices describing a triangle such that the
      triangle's normal points toward the viewer if the forward order of the
      triplet defines a clockwise winding of the vertices. Gradients with
      respect to this tensor are not available.
输入三角面片对应点云id
    image_width: int specifying desired output image width in pixels.
输入图像宽度
    image_height: int specifying desired output image height in pixels.
输入图像高度    
    background_value: a 1-D float32 tensor with shape [attribute_count]. Pixels
      that lie outside all triangles take this value.
背景颜色
  Returns:
    A 4-D float32 tensor with shape [batch_size, image_height, image_width,
    attribute_count], containing the interpolated vertex attributes at
    each pixel.
返回光栅化之后每个像素对应的属性
  Raises:
    ValueError: An invalid argument to the method is detected.
  """

rasterize_clip_space将转到clip空间下的三维物体进行光栅化，使用迭代的方式将一个batch中的样本一一光栅化，调用c++接口rasterize_triangles，获取每个样本光栅化后像素对应的顶点重心权重barycentric_coords、三角面片序号triangle_ids。再用重心加权vertices的属性来求取对应像素的属性。

c++ API

rasterize_triangles_impl.h

#ifndef MESH_RENDERER_KERNELS_RASTERIZE_TRIANGLES_IMPL_H_
#define MESH_RENDERER_KERNELS_RASTERIZE_TRIANGLES_IMPL_H_

namespace tf_mesh_renderer {
// Copied from tensorflow/core/platform/default/integral_types.h
// to avoid making this file depend on tensorflow.
typedef int int32;
typedef long long int64;
// Computes the triangle id, barycentric coordinates, and z-buffer at each pixel
// in the image.
//
// vertices: A flattened 2D array with 4*vertex_count elements.
//     Each contiguous triplet is the XYZW location of the vertex with that
//     triplet's id. The coordinates are assumed to be OpenGL-style clip-space
//     (i.e., post-projection, pre-divide), where X points right, Y points up,
//     Z points away.
输入三维点云齐次表达式，float指针指向4*顶点数的内存空间
// triangles: A flattened 2D array with 3*triangle_count elements.
//     Each contiguous triplet is the three vertex ids indexing into vertices
//     describing one triangle with clockwise winding.
输入三角面片所对应的顶点id，int指针指向3*三角面片数量的内存空间
// triangle_count: The number of triangles stored in the array triangles.
输入三角面片数量
// triangle_ids: A flattened 2D array with image_height*image_width elements.
//     At return, each pixel contains a triangle id in the range
//     [0, triangle_count). The id value is also 0 if there is no triangle
//     at the pixel. The barycentric_coordinates must be checked to
//     distinguish the two cases.
返回三角化后每个像素对应的三角面片id，int指针指向图像width*height大小的内存空间
// barycentric_coordinates: A flattened 3D array with
//     image_height*image_width*3 elements. At return, contains the triplet of
//     barycentric coordinates at each pixel in the same vertex ordering as
//     triangles. If no triangle is present, all coordinates are 0.
返回重心插值权重
// z_buffer: A flattened 2D array with image_height*image_width elements. At
//     return, contains the normalized device Z coordinates of the rendered
//     triangles.
void RasterizeTrianglesImpl(const float* vertices, const int32* triangles,
                            int32 triangle_count, int32 image_width,
                            int32 image_height, int32* triangle_ids,
                            float* barycentric_coordinates, float* z_buffer);
}  // namespace tf_mesh_renderer
#endif  // MESH_RENDERER_OPS_KERNELS_RASTERIZE_TRIANGLES_IMPL_H_

rasterize_triangles_impl.cc

#include <algorithm>
#include <cmath>

#include "rasterize_triangles_impl.h"

namespace tf_mesh_renderer {
namespace {
// Takes the minimum of a, b, and c, rounds down, and converts to an integer
// in the range [low, high].
inline int ClampedIntegerMin(float a, float b, float c, int low, int high) {
  return std::min(
      std::max(static_cast<int>(std::floor(std::min(std::min(a, b), c))), low),
      high);
}

// Takes the maximum of a, b, and c, rounds up, and converts to an integer
// in the range [low, high].
inline int ClampedIntegerMax(float a, float b, float c, int low, int high) {
  return std::min(
      std::max(static_cast<int>(std::ceil(std::max(std::max(a, b), c))), low),
      high);
}


// Computes a 3x3 matrix inverse without dividing by the determinant.
// Instead, makes an unnormalized matrix inverse with the correct sign
// by flipping the sign of the matrix if the determinant is negative.
// By leaving out determinant division, the rows of M^-1 only depend on two out
// of three of the columns of M; i.e., the first row of M^-1 only depends on the
// second and third columns of M, the second only depends on the first and
// third, etc. This means we can compute edge functions for two neighboring
// triangles independently and produce exactly the same numerical result up to
// the sign. This in turn means we can avoid cracks in rasterization without
// using fixed-point arithmetic.
// See http://mathworld.wolfram.com/MatrixInverse.html
void ComputeUnnormalizedMatrixInverse(const float a11, const float a12,
                                      const float a13, const float a21,
                                      const float a22, const float a23,
                                      const float a31, const float a32,
                                      const float a33, float m_inv[9]) {
  m_inv[0] = a22 * a33 - a32 * a23;
  m_inv[1] = a13 * a32 - a33 * a12;
  m_inv[2] = a12 * a23 - a22 * a13;
  m_inv[3] = a23 * a31 - a33 * a21;
  m_inv[4] = a11 * a33 - a31 * a13;
  m_inv[5] = a13 * a21 - a23 * a11;
  m_inv[6] = a21 * a32 - a31 * a22;
  m_inv[7] = a12 * a31 - a32 * a11;
  m_inv[8] = a11 * a22 - a21 * a12;

  // The first column of the unnormalized M^-1 contains intermediate values for
  // det(M).
  const float det = a11 * m_inv[0] + a12 * m_inv[3] + a13 * m_inv[6];

  // Transfer the sign of the determinant.
  if (det < 0.0f) {
    for (int i = 0; i < 9; ++i) {
      m_inv[i] = -m_inv[i];
    }
  }
}

// Computes the edge functions from M^-1 as described by Olano and Greer,
// "Triangle Scan Conversion using 2D Homogeneous Coordinates."
//
// This function combines equations (3) and (4). It first computes
// [a b c] = u_i * M^-1, where u_0 = [1 0 0], u_1 = [0 1 0], etc.,
// then computes edge_i = aX + bY + c
void ComputeEdgeFunctions(const float px, const float py, const float m_inv[9],
                          float values[3]) {
  for (int i = 0; i < 3; ++i) {
    const float a = m_inv[3 * i + 0];
    const float b = m_inv[3 * i + 1];
    const float c = m_inv[3 * i + 2];
    values[i] = a * px + b * py + c;
  }
}

// Determines whether the point p lies inside a front-facing triangle.
// Counts pixels exactly on an edge as inside the triangle, as long as the
// triangle is not degenerate. Degenerate (zero-area) triangles always fail the
// inside test.
bool PixelIsInsideTriangle(const float edge_values[3]) {
  // Check that the edge values are all non-negative and that at least one is
  // positive (triangle is non-degenerate).
  return (edge_values[0] >= 0 && edge_values[1] >= 0 && edge_values[2] >= 0) &&
         (edge_values[0] > 0 || edge_values[1] > 0 || edge_values[2] > 0);
}
}  // namespace

void RasterizeTrianglesImpl(const float* vertices, const int32* triangles,
                            int32 triangle_count, int32 image_width,
                            int32 image_height, int32* triangle_ids,
                            float* barycentric_coordinates, float* z_buffer) {
  const float half_image_width = 0.5 * image_width;
  const float half_image_height = 0.5 * image_height;
  float unnormalized_matrix_inverse[9];
  float b_over_w[3];
  for (int32 triangle_id = 0; triangle_id < triangle_count; ++triangle_id) {
    // 获取三角面片顶点id
	const int32 v0_x_id = 4 * triangles[3 * triangle_id];
    const int32 v1_x_id = 4 * triangles[3 * triangle_id + 1];
    const int32 v2_x_id = 4 * triangles[3 * triangle_id + 2];
	// 获取三角面片顶点clipSpace下的z坐标
    const float v0w = vertices[v0_x_id + 3];
    const float v1w = vertices[v1_x_id + 3];
    const float v2w = vertices[v2_x_id + 3];
    // Early exit: if all w < 0, triangle is entirely behind the eye.
    if (v0w < 0 && v1w < 0 && v2w < 0) {
      continue;
    }

    const float v0x = vertices[v0_x_id];
    const float v0y = vertices[v0_x_id + 1];
    const float v1x = vertices[v1_x_id];
    const float v1y = vertices[v1_x_id + 1];
    const float v2x = vertices[v2_x_id];
    const float v2y = vertices[v2_x_id + 1];

    ComputeUnnormalizedMatrixInverse(v0x, v1x, v2x, v0y, v1y, v2y, v0w, v1w,
                                     v2w, unnormalized_matrix_inverse);

    // Initialize the bounding box to the entire screen.
    int left = 0, right = image_width, bottom = 0, top = image_height;
    // If the triangle is entirely inside the screen, project the vertices to
    // pixel coordinates and find the triangle bounding box enlarged to the
    // nearest integer and clamped to the image boundaries.

    if (v0w > 0 && v1w > 0 && v2w > 0) {
	  // 当nearClip为0时，等同于相机透视投影
      const float p0x = (v0x / v0w + 1.0) * half_image_width;
      const float p1x = (v1x / v1w + 1.0) * half_image_width;
      const float p2x = (v2x / v2w + 1.0) * half_image_width;
      const float p0y = (v0y / v0w + 1.0) * half_image_height;
      const float p1y = (v1y / v1w + 1.0) * half_image_height;
      const float p2y = (v2y / v2w + 1.0) * half_image_height;
      left = ClampedIntegerMin(p0x, p1x, p2x, 0, image_width);
      right = ClampedIntegerMax(p0x, p1x, p2x, 0, image_width);
      bottom = ClampedIntegerMin(p0y, p1y, p2y, 0, image_height);
      top = ClampedIntegerMax(p0y, p1y, p2y, 0, image_height);
    }

    // Iterate over each pixel in the bounding box.
    for (int iy = bottom; iy < top; ++iy) {
      for (int ix = left; ix < right; ++ix) {
        const float px = ((ix + 0.5) / half_image_width) - 1.0;
        const float py = ((iy + 0.5) / half_image_height) - 1.0;
        const int pixel_idx = iy * image_width + ix;
        ComputeEdgeFunctions(px, py, unnormalized_matrix_inverse, b_over_w);
        if (!PixelIsInsideTriangle(b_over_w)) {
          continue;
        }
        const float one_over_w = b_over_w[0] + b_over_w[1] + b_over_w[2];
        const float b0 = b_over_w[0] / one_over_w;
        const float b1 = b_over_w[1] / one_over_w;
        const float b2 = b_over_w[2] / one_over_w;
        const float v0z = vertices[v0_x_id + 2];
        const float v1z = vertices[v1_x_id + 2];
        const float v2z = vertices[v2_x_id + 2];
        // Since we computed an unnormalized w above, we need to recompute
        // a properly scaled clip-space w value and then divide clip-space z
        // by that.
        const float clip_z = b0 * v0z + b1 * v1z + b2 * v2z;
        const float clip_w = b0 * v0w + b1 * v1w + b2 * v2w;
        const float z = clip_z / clip_w;
        // Skip the pixel if it is farther than the current z-buffer pixel or
        // beyond the near or far clipping plane.
        if (z < -1.0 || z > 1.0 || z > z_buffer[pixel_idx]) {
          continue;
        }
        triangle_ids[pixel_idx] = triangle_id;
        z_buffer[pixel_idx] = z;
        barycentric_coordinates[3 * pixel_idx + 0] = b0;
        barycentric_coordinates[3 * pixel_idx + 1] = b1;
        barycentric_coordinates[3 * pixel_idx + 2] = b2;
      }
    }
  }
}

}  // namespace tf_mesh_renderer

在光栅化的实现，使用了z-buffer算法，获取到每个像素由哪个三角面片构成，并且通过重心插值的方法从面片顶点获得像素的属性值。

附录

世界坐标系到视点坐标系的转换

在openGL中可以调用gluLookAt(eye, center, up)函数，相机位置在视角eye，up在相机同一个平面上，视线指向center，这与mesh_renderer一致。

相机位置C=eye（Cx, Cy, Cz）

镜头朝向的单位向量N=center - eye（Nx, Ny, Nz）

相机向上的向量V= up - eye

相机坐标系x轴方向U= N x V

对方向向量归一化后，得到C、U、V、N组成了视点坐标系

现在的目标就是将世界坐标系下的点p（objx, objy, objz）转化为相机坐标系下坐标（obju, objv, objn）。而p在uvn坐标系下的坐标即为点p在u、v、n三个轴上的投影分量，而向量在某个单位向量上的投影，可以通过点乘求取。即可以通过p点到相机位置center(Cx, Cy, Cz)的向量，与u、v、n分别点乘求取。

则使用齐次向量表达为

（obju, objv, objn, 1) = (objx, objy, objz, 1) * $\begin{bmatrix} 1 & 0 & 0 & 0\\ 0& 1& 0 & 0\\ 0& 0& 1 & 0\\ -Cx& -Cy & -Cz & 1 \end{bmatrix}$ * $\begin{bmatrix} ux & vx & nx & 0\\ uy& vy& ny & 0\\ uz& vz& nz & 0\\ 0& 0 & 0 & 1 \end{bmatrix}$

参考文献

Unsupervised Training for 3D Morphable Model Regression. Kyle Genova, Forrester Cole, Aaron Maschinot, Aaron Sarna, Daniel Vlasic, and William T. Freeman. CVPR 2018, pp. 8377-8386.

Triangle Scan Conversion using 2D Homogeneous Coordinates.HWWS 1997.