图像坐标空间变换：透视变换（Perspective Transformation），或称为单应性（Homography）变换

最新推荐文章于 2025-03-31 10:22:49 发布

拜阳

最新推荐文章于 2025-03-31 10:22:49 发布

阅读量2.9w

点赞数 73

分类专栏：数字图像处理坐标变换公式推导文章标签：计算机视觉

本文链接：https://blog.csdn.net/bby1987/article/details/106317354

版权

数字图像处理同时被 3 个专栏收录

22 篇文章

订阅专栏

公式推导

12 篇文章

订阅专栏

坐标变换

2 篇文章

订阅专栏

本文深入解析透视变换原理，涵盖从预备知识到公式推导的全过程，包括投影、齐次坐标变换及图像插值等内容，附带实例讲解A4纸视角校正，揭示透视变换的局限性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

透视变换简介

真实的世界是三维的，而图像是二维的（至少目前是），如果要以二维图像描述三维世界，并且看起来足够真实，那么三维世界向二维图像转化的过程就需要满足一定的几何投影关系，即透视关系，用非常简单的话来讲就是近处的物体在图像中显得比较大，远处的物体在图像中显得比较小。比如下图中的铁轨，越往远处看，图像中铁轨的间距越小，但实际上铁轨的间距是不变的。
在这里插入图片描述
透视变换就是对图像中的物体进行空间坐标变换时，使变换结果满足一定的透视关系。透视变换包含以下三个过程：

二维坐标向齐次坐标的变换
齐次坐标投影
图像插值

说到这里可能仍然非常抽象，继续向下看应该能使各位对透视变换的概念逐渐变得清晰一些。

预备知识

在继续向下了解透视变换的公式推导之前，建议先详细了解一下仿射变换和图像插值的相关知识，这部分知识在其他地方已经写过了，所以这里不会再写，不过请务必详细了解，否则再往下理解起来可能会有点费劲。
图像坐标空间变换：仿射变换（Affine Transformation）
图像插值：最邻近（nearest）与双线性（bilinear）

其中特别注意以下内容：

符号约定：(u,v)用来表示原始图像中的坐标，(x,y)用来表示变换后图像的坐标
齐次坐标的初步概念
仿射变换通式
前向映射与后向映射，并理解为了图像插值方便，空间坐标变换通常使用后向映射方式实现

透视变换公式推导

下面我们先讲投影的概念，然后再讲二维坐标向齐次坐标的变换（请注意，这个讲述顺序与计算顺序相反）

投影

在仿射变换（Affine Transformation）中曾经简单讲过， $(x, y, 1)$ 就是 $(x, y)$ 对应的齐次坐标。现在我们要扩展一下齐次坐标的概念。当我们使用相机进行拍摄时，可以认为三维空间中的物体都投影在了一个二维图像上面，如下图所示，红色的平面就可以认为是投影平面，我们将其定义为 $z = 1$ 的平面，那么 $(x, y, 1)$ 就是二维图像上的点 $(x, y)$ 在三维空间中的坐标。

接下来，对 $(x, y, 1)$ 乘以一个系数，即 $\alpha(x,y,1)$ ，改变 $\alpha$ 就相当于在红色虚线上移动 $(x, y, 1)$ 点，这一点可以通过相似三角形的原理进行证明。红色虚线通过连接观察者（或相机）与点 $(x, y, 1)$ 得到。

我们将 $\alpha(x,y,1)$ 记为 $(X, Y, Z)$ ，即 $(X=\alpha x, Y=\alpha y, Z=\alpha)$ 。投影就是将 $(X, Y, Z)$ 投影至 $(x, y, 1)$ 的过程。若已知 $(X, Y, Z)$ ，可以通过投影计算出 $(x, y)$ ，其计算过程比较简单，根据 $\alpha(x,y,1)=(X,Y,Z)$ 的定义， $(X, Y)$ 同除以 $Z$ 即可：
$\begin{cases} x = X/Z \\[2ex] y = Y/Z \end{cases}$
上述定义和计算过程表明，所有由 $\alpha(x,y,1)$ 定义的点有着一个共同的投影坐标，即 $(x, y, 1)$ 。

二维坐标向齐次坐标的变换

此处我们直接推导后向映射过程，后向映射是编程时真正使用的计算方法。
记原始图像中的坐标为 $(u, v)$ ，对应的齐次坐标为 $(U, V, W)$ ，变换后图像的坐标为 $(x, y)$ 。
那么有：
$\begin{bmatrix} U & V & W \end{bmatrix} = \begin{bmatrix} x & y & 1 \end{bmatrix} \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}$
请注意，这个式子不好求解，因为如果将此式化为方程组形式，你会发现没有常系数项。接下来，我们要运用上面投影部分讲过的概念变换此式，使其容易求解。

根据投影的概念，当我们求出 $(U, V, W)$ 后，我们接着通过 $(u = U / W, v = V / W)$ 来求解 $(u, v)$ ，请特别注意，在这个过程中，我们可以对 $(U, V, W)$ 乘以任意一个常系数而不会影响计算结果，也就是说，我们可以对上面的矩阵方程两边同乘以一个系数。

那么我们两边同除以 $t_{33}$ ，得到的结果除了将 $t_{33}$ 改写为1以外，其他符号保持不变，所以可得：
$\begin{bmatrix} U & V & W \end{bmatrix} = \begin{bmatrix} x & y & 1 \end{bmatrix} \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & 1 \end{bmatrix}$
这个式子就容易求解了。
现在可以去跟仿射变换（Affine Transformation）对比一下，然后会发现，这里的变换矩阵多了两个参数： $t_{13}$ 和 $t_{23}$ 。

公式求解

将上述矩阵方程式展开得：
$\begin{cases} U = t_{11}*x + t_{21}*y + t_{31} \\ V = t_{12}*x + t_{22}*y + t_{32} \\ W = t_{13}*x + t_{23}*y + 1 \end{cases}$
由投影过程可得：
$\begin{cases} u = \frac {U}{W} = \frac {t_{11}*x + t_{21}*y + t_{31}} {t_{13}*x + t_{23}*y + 1} \\[2ex] v = \frac {V}{W} = \frac {t_{12}*x + t_{22}*y + t_{32}} {t_{13}*x + t_{23}*y + 1} \end{cases}$
进一步可得：
$\begin{cases} u(t_{13}*x + t_{23}*y + 1) = t_{11}*x + t_{21}*y + t_{31} \\[2ex] v(t_{13}*x + t_{23}*y + 1) = t_{12}*x + t_{22}*y + t_{32} \end{cases}$
整理得：
$\begin{cases} x*t_{11} + y*t_{21} + 1*t_{31} + 0*t_{12} + 0*t_{22} + 0*t_{32} - ux*t_{13} - uy*t_{23} = u \\[2ex] 0*t_{11} + 0*t_{21} + 0*t_{31} + x*t_{12} + y*t_{22} + 1*t_{32} - vx*t_{13} - vy*t_{23} = v \end{cases}$
假设我们现在有n组对应的点： $u_1,v_1)$ 对应 $x_1,y_1)$ ， $u_2,v_2)$ 对应 $x_2,y_2)$ , …, $u_n,v_n)$ 对应 $x_n,y_n)$ ，那么可以得到矩阵形式的线性方程组：
$\begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -u_1x_1 & -u_1y_1 \\ 0 & 0 & 0 & x_1 & y_1 & 1 & -v_1x_1 & -v_1y_1 \\ x_2 & y_2 & 1 & 0 & 0 & 0 & -u_2x_2 & -u_2y_2 \\ 0 & 0 & 0 & x_2 & y_2 & 1 & -v_2x_2 & -v_2y_2 \\ ... \\ x_n & y_n & 1 & 0 & 0 & 0 & -u_nx_n & -u_ny_n \\ 0 & 0 & 0 & x_n & y_n & 1 & -v_nx_n & -v_ny_n \\ \end{bmatrix} \begin{bmatrix} t_{11}\\ t_{21} \\ t_{31} \\ t_{12} \\ t_{22} \\ t_{32} \\ t_{13} \\ t_{23} \end{bmatrix} = \begin{bmatrix} u_1 \\ v_1 \\ u_2 \\ v_2 \\ ... \\ u_n \\ v_n \end{bmatrix}$
记：
$\begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -u_1x_1 & -u_1y_1 \\ 0 & 0 & 0 & x_1 & y_1 & 1 & -v_1x_1 & -v_1y_1 \\ x_2 & y_2 & 1 & 0 & 0 & 0 & -u_2x_2 & -u_2y_2 \\ 0 & 0 & 0 & x_2 & y_2 & 1 & -v_2x_2 & -v_2y_2 \\ ... \\ x_n & y_n & 1 & 0 & 0 & 0 & -u_nx_n & -u_ny_n \\ 0 & 0 & 0 & x_n & y_n & 1 & -v_nx_n & -v_ny_n \\ \end{bmatrix}$

$\begin{bmatrix} t_{11}\\ t_{21} \\ t_{31} \\ t_{12} \\ t_{22} \\ t_{32} \\ t_{13} \\ t_{23} \end{bmatrix}$

$\begin{bmatrix} u_1 \\ v_1 \\ u_2 \\ v_2 \\ ... \\ u_n \\ v_n \end{bmatrix}$
那么可得最小二乘解为：
$T = (A^TA)^{-1}A^TB$
最小二乘解的推导过程是：
$\\[2ex] A^TAT = A^TB \\[2ex] (A^TA)^{-1}A^TAT = (A^TA)^{-1}A^TB \\[2ex] T = (A^TA)^{-1}A^TB$
注意：由于上述方线性程组中有8个未知数，所以最少需要4组对应的点才能得到方程组的解，如果刚好有4组点，那么得到的解就是精确解，如果多于4组点，那么得到的解就称为最小二乘解。

例子：A4纸视角校正

原始图片和坐标变换模板

有的时候我们拍摄角度不太精准的情况下，A4纸在拍摄的图像中不是呈现出标准的长方形，而是不规则的四边形，这时我们可以通过透视变换进行视角校正。

比如下图中有一张A4纸，但是很明显它不是长方形，现在我们就用透视变换来校正它，步骤如下：

拾取A4纸四个顶点的坐标，从左上角开始，顺时针依次是(1208, 1192), (2814, 1116), (3557, 1866), (1160, 2089)
定义校正后的模板，这个模板通过A4纸的标准尺寸得来，其标准尺寸是210mm x 297mm。我这里以标准尺寸为基础定义了两个模板：模板1仅包含A4纸本身；模板2除了A4纸外，还包含周围的一些物体。
- 模板1：此处我将210和297两个数字乘以5，得到像素坐标模板，从左上角开始，顺时针依次为(0, 0), (1485, 0), (1485, 1050), (0, 1050)。
- 模板2：在模板1的基础上，左上角偏移(1000, 1000)，然后右下角边界扩展(500, 500)，得到的坐标为：(1000, 1000), (2485, 1000), (2485, 2050), (1000, 2050)
然后用后面的程序来做计算

下图在程序中的名字是side.jpg
原始图
在这里插入图片描述

计算程序

# -*- coding: utf-8 -*-

import cv2
import numpy as np


def nearest_sampler(image, coords):
    """
    Nearest Neighbour sampler

    Parameters
    ----------
    image: ndarray
        source image. shape is [height, width, channels]
    coords: ndarray
        coordinates to be interpolated, the length of last axis should be 2,
        meaning 2D coordinate

    Returns
    -------
    output: ndarray
        the interpolated image, same shape as coords except the last axis
    """
    height, width, channels = image.shape[0:3]
    output_shape = list(coords.shape)
    coords = np.reshape(coords, (-1, output_shape[-1]))
    output_shape[-1] = channels
    coords = np.round(coords).astype(np.int32)
    idx = (coords[:, 0] >= 0) & (coords[:, 0] < width) & \
          (coords[:, 1] >= 0) & (coords[:, 1] < height)
    output = np.zeros((coords.shape[0], channels), dtype=np.uint8)
    output[idx] = image[coords[idx, 1], coords[idx, 0], :]
    output = np.reshape(output, output_shape)
    return output


def perspective_transform_matrix(uv, xy):
    """
    Compute perspective transform matrix

    Parameters
    ----------
    uv: ndarray
        coordinates of feature points in original image. shape is [n, 2], n is
        the number of points
    xy: ndarray
        coordinates of feature points in perspective transformed image. shape
        is same as uv

    Returns
    -------
    transform_matrix: ndarray
        transform matrix. shape is [3, 3]
    """
    A = np.zeros((2 * xy.shape[0], 8))
    B = np.zeros((2 * xy.shape[0], 1))
    for i in range(xy.shape[0]):
        A[2 * i, 0] = xy[i, 0]
        A[2 * i, 1] = xy[i, 1]
        A[2 * i, 2] = 1.0
        A[2 * i, 3] = 0.0
        A[2 * i, 4] = 0.0
        A[2 * i, 5] = 0.0
        A[2 * i, 6] = -uv[i, 0] * xy[i, 0]
        A[2 * i, 7] = -uv[i, 0] * xy[i, 1]

        A[2 * i + 1, 0] = 0.0
        A[2 * i + 1, 1] = 0.0
        A[2 * i + 1, 2] = 0.0
        A[2 * i + 1, 3] = xy[i, 0]
        A[2 * i + 1, 4] = xy[i, 1]
        A[2 * i + 1, 5] = 1.0
        A[2 * i + 1, 6] = -uv[i, 1] * xy[i, 0]
        A[2 * i + 1, 7] = -uv[i, 1] * xy[i, 1]

        B[2 * i] = uv[i, 0]
        B[2 * i + 1] = uv[i, 1]
    T = np.linalg.inv(A.T @ A) @ A.T @ B
    transform_matrix = np.append(T, [1.0]).reshape([3, 3]).T
    return transform_matrix


def perspective_coordinates(transform_matrix, height, width):
    """
    Compute perspective coordinates acoording to the transform matrix and the
    transformed image shape

    Parameters
    ----------
    transform_matrix: ndarray
        perspective transform matrix. shape is [3, 3]
    height: int
        height of transformed image
    height: int
        width of transformed image

    Returns
    -------
    coords: ndarray
        perspective coordinates. shape is [height, width, 2]
    """
    coords = np.meshgrid(np.arange(0, width), np.arange(0, height))
    # shape = [height, width, 2] after transpose
    coords = np.array(coords).transpose([1, 2, 0])
    ones = np.ones([height, width, 1])
    # homogeneous  coordinates
    coords = np.concatenate((coords, ones), axis=2)
    # transformed coordinates
    coords = coords @ transform_matrix
    # projection of coordinates
    coords[:, :, 0:2] /= coords[:, :, 2:]
    return coords[:, :, 0:2]


if __name__ == '__main__':
    # feature points
    uv = np.array([[1208, 1192],
                   [2814, 1116],
                   [3557, 1866],
                   [1160, 2089]])
    # model points
    xy1 = np.array([[0, 0],
                    [1485, 0],
                    [1485, 1050],
                    [0, 1050]])

    xy2 = np.array([[1000, 1000],
                    [2485, 1000],
                    [2485, 2050],
                    [1000, 2050]])

    transform_matrix1 = perspective_transform_matrix(uv, xy1)
    coords1 = perspective_coordinates(transform_matrix1, 1050, 1485)

    transform_matrix2 = perspective_transform_matrix(uv, xy2)
    coords2 = perspective_coordinates(transform_matrix2, 2550, 2985)

    # sampler
    image = cv2.imread('side.jpg')
    img1 = nearest_sampler(image, coords1)
    img2 = nearest_sampler(image, coords2)

    cv2.namedWindow('projection1', 0)
    cv2.namedWindow('projection2', 0)
    cv2.imshow('projection1', img1)
    cv2.imshow('projection2', img2)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

结果

运行上述程得到的结果是：

projection1：
在这里插入图片描述

projection2
在这里插入图片描述

透视变换的限制

从上述例子的projection2可以看出，除了A4纸本身以外，其他地方产生了比较奇怪的形变，这种形变不符合真实视角下的情况。下图是我从比较正的角度拍摄的图，各位可以对比看看。（只对比几何形变，不要在意光照变化）

这也正是透视变换的限制。这种只由一次矩阵变换+投影所做的空间坐标变换，只能对单一平面有准确的物理意义，而其他的平面则会产生非真实的形变。在实际应用时要特别注意这一特性，不是任何情况都可以随意使用透视变换，不然可能带来难以预料的结果。

从这个角度上讲，我们处理视频时通过透视变换所做的多帧对齐，严格来讲大多存在形变误差（除非镜头中只有单一平面，比如一堵墙）。因为我们通常会在图像中广泛地检测特征点，这些特征点往往分布在不同的平面中，所以这种形变误差可以认为是一种平均效果。当帧间变化较小时，可以称之为误差；当帧间变化较大时，就不能称之为误差而是错误了，此时大概率不能由透视变换来做好对齐工作。
在这里插入图片描述