图像插值：最邻近（nearest）与双线性（bilinear）

最新推荐文章于 2024-04-20 22:11:14 发布

拜阳

最新推荐文章于 2024-04-20 22:11:14 发布

阅读量2.7w

点赞数 38

分类专栏：公式推导数字图像处理文章标签：算法

本文链接：https://blog.csdn.net/bby1987/article/details/105851870

版权

数字图像处理同时被 2 个专栏收录

18 篇文章 9 订阅

订阅专栏

公式推导

10 篇文章 3 订阅

订阅专栏

文章目录

为什么需要图像插值？
输入说明以及图像和坐标预处理
最邻近插值（nearest）
双线性插值（bilinear）
- 概念法
- 权重法
测试

为什么需要图像插值？

本质原因就是计算机只能处理和保存离散的数据，而图像的很多处理得到的结果是连续的，为了得到离散点上的值，就必需得插值。这里面的典型代表就是图像坐标的空间变换，如仿射变换
，坐标经仿射变换后通常是小数，而图像只保存整数坐标位置上的像素值，所以就需要插值。

不过话说回来，不只是图像里面需要插值，图形学也要，自然科学里面也到处都需要插值，基本都是因为离散这个原因。

图像里面的插值算法非常多，常用的有最邻近（Nearest Neighbour），双线性（Bilinear），双三次（Bicubic）等，不太常用的有spline，sinc，lanczos以及一些更加复杂的自适应插值方法。

接下来详细介绍最邻近和双线性两种插值方法，将配合下图进行说明。在下图中：

四个角上的点 $P_{x,y}, P_{x+1,y},P_{x,y+1},P_{x+1,y+1}$ 是整数坐标位置上的已知点；
红色的点 $P_{x_p, y_p}$ 是待插值的点；
红色虚线连着的两个点 $P_{x_p, y}, P_{x_p, y+1}$ 是两个辅助点。

输入说明以及图像和坐标预处理

最邻近和双线性两种插值方法都比较简单，仅需要源图像（程序中变量名是image）和待插值的坐标（变量名是coords）就可以。
名词解释：维度与shape。比如一个高为height，宽为width的矩阵，我们会说它是一个2维矩阵（即维度是2），它的shape是[height, width]。同理，对于一个彩图，它的shape是[height, width, channels]，它的维度是3。

输入说明：image的维度可以是2（灰度图）或者3（彩图），不能是其他维度。coords是我们将要在image中进行采样的位置，它的最后一个维度的长度必需是2，表示二维坐标（x, y），其他的维度可以任意。

输入预处理：为了简化后续的程序处理，使得各种输入的情况可以共用一套代码，需要对输入进行一些规则化处理，主要包括：

image的维度统一设置成3，也就是说当image维度是2时，将在其最后拼一个无意义的维度作为通道，维度的长度设为1，这不会对图像数据造成任何影响，但是我们可以使用统一的代码来获取channel的值，方便后续处理。
对coords做reshape，除了其最后一个维度外，其他维度全部合并为一个。这一点非常关键，这样可以方便我们做向量化的编程，并且无论输入的coords有多少维度，我们都可以用一套代码搞定。
将输出的shape缓存下来（变量名是output_shape），这一点的存在是为了配合coords的reshape。对于程序使用者而言，他肯定希望他的输出的shape与输入的shape是一样的，如果我们随意改变了输出的shape，就会给使用者带来额外的负担甚至是困扰。在这里，主要就是保证输出的图像与插值坐标的shape相同，但最后一个维度除外，因为coords的最后一个维度表示（x, y）坐标，而图像的最后一个维度表示（r, g, b）之类的通道值。但是由于我们之前讲了要对coords做reshape，所以就需要将coords原来的shape先缓存下来，否则在对coords做完reshape之后，就无法再获取到输出的shape了。

下面的预处理代码基本就是按照上述说明所写，输入是image和coords，输出是处理过后的image，coords和缓存的output_shape，这三个变量将会供真正的插值函数进行使用。

import numpy as np


def __preprocess_sampler_input(image, coords):
    """
    Nearest Neighbour sampler

    Parameters
    ----------
    image: ndarray
        source image, whose shape is [height, width, channel] or
        [height, width]
    coords: ndarray
        coordinates to be interpolated, the length of last axis should be 2,
        meaning 2D coordinate

    Returns
    -------
    image: ndarray
        source image. if its original shape is [height, width], it will be
        expanded with a new axis to have a shape of [height, width, 1]; if its
        original shape is [height, width, channel], it will not be changed
    coords: ndarray
        reshape from the original [n1, n2, ..., 2] to [n, 2],
        where n = n1 * n2 * ...
    output_shape: list
        the output shape of sampler function, same as coords expcept the last axis.
    """
    assert image.ndim == 2 or image.ndim == 3
    # cache output_shape
    output_shape = list(coords.shape)
    if image.ndim == 2:
        output_shape.pop()
        image = np.expand_dims(image, axis=-1)
    else:
        output_shape[-1] = image.shape[-1]
    coords = np.reshape(coords, (-1, coords.shape[-1]))
    return image, coords, output_shape

最邻近插值（nearest）

最近邻插值是将距离 $P_{x_p, y_p}$ 最近的已知点的像素值直接赋给 $P_{x_p, y_p}$ 。比如在上图中，与 $P_{x_p, y_p}$ 最近的点是 $P_{x,y}$ ，所以直接令 $P_{x_p, y_p} = P_{x,y}$ 即可。那么怎么从 $x_p, y_p)$ 得到 $(x, y)$ 呢，四舍五入就OK了，如果是用numpy的话可以用np.round函数，如果是C/C++的话，可以用int(x ± 0.499999f)，x>0时用加号，x<0时用减号。

这种插值方法计算量非常小，且非常简单，不会产生出任何新的像素值，所有插值出来的像素值必然来自于原图的某个像素。但是视觉效果往往不是太好，特别是图像内的一些线条和边缘（edge）会出现明显的锯齿效应。

下面用两种方式实现了最邻近插值，第一种nearest是向量化的方式，第二种nearest_naive是比较容易理解的简单方式，两种的差别主要在于是使用了向量化（Vectorization）的编程方式还是for循环的编程方式，使用向量化编程方式可以明显提升程序运行速度。我使用一张1920 x 1080的图片进行仿射变换，nearest_naive的运行时间是nearest的12倍（我的CPU是G4560，略烂，换个CPU可能比例关系就不太一样了）。

def nearest(image, coords):
    """
    Nearest Neighbour sampler
    
    Parameters
    ----------
    image: ndarray
        source image, whose shape is [height, width, channel] or 
        [height, width]
    coords: ndarray
        coordinates to be interpolated, the length of last axis should be 2,
        meaning 2D coordinate
    
    Returns
    -------
    output: ndarray
        the interpolated image, same shape as coords except the last axis
    """
    image, coords, output_shape = __preprocess_sampler_input(image, coords)
    height, width, channel = image.shape
    coords = np.round(coords).astype(np.int32)
    idx = (coords[:, 0] >= 0) & (coords[:, 0] < width) & \
          (coords[:, 1] >= 0) & (coords[:, 1] < height)
    output = np.zeros((coords.shape[0], channel), dtype=np.uint8)
    output[idx] = image[coords[idx, 1], coords[idx, 0]]
    # reshape back to the output_shape
    output = np.reshape(output, output_shape)
    return output


def nearest_naive(image, coords):
    image, coords, output_shape = __preprocess_sampler_input(image, coords)
    height, width, channel = image.shape
    coords = np.round(coords).astype(np.int32)
    output = np.zeros((coords.shape[0], channel), dtype=np.uint8)
    for i, coord in enumerate(coords):
        if 0 <= coord[0] < width and 0 <= coord[1] < height:
            output[i] = image[coord[1], coord[0]]
    # reshape back to the output_shape
    output = np.reshape(output, output_shape)
    return output

双线性插值（bilinear）

双线性插值从两个方向进行线性插值，注意不是做两次线性插值，事实上需要做三次，在第一个方向做两次线性插值，然后在第二个方向做一次线性插值得到最终解。

双线性插值需要做一些公式推导，仍然参考上面的图片。这里选取横向为第一个方向，纵向为第二个方向进行推导（当然也可以反过来，以纵向为第一方向横向为第二方向，结果是一样的）。

推导的结果可以有两种表达形式，一种是非常淳朴的概念法，另一种是在概念法的基础上进行公式合并得到的权重法。下面分别进行推导。

概念法

现在首先看图中正方形最上面的一条边，线性插值要求 $P_{x_p, y}$ 处在 $P_{x,y}P_{x+1,y}$ 线段上，因此三条线段 $P_{x,y}P_{x_p, y}$ ， $P_{x_p, y}P_{x+1,y}$ 与 $P_{x,y}P_{x+1,y}$ 的斜率应当相同，任取其中两条可以得到等式关系，下面我们取 $P_{x,y}P_{x_p, y}$ 和 $P_{x,y}P_{x+1,y}$ ，得：
$\frac {P_{x_p,y} - P_{x,y}} {x_p - x} = \frac {P_{x+1,y} - P_{x,y}} {(x+1) - x}$
化简可得：
$P_{x_p,y} = P_{x,y} + (P_{x+1,y} - P_{x,y}) (x_p - x) \tag{1}$
同理，通过正方形下面的边可以得到：
$P_{x_p,y+1} = P_{x,y+1} + (P_{x+1,y+1} - P_{x,y+1}) (x_p - x) \tag{2}$
现在来看纵向，与之前一样道理， $P_{x_p,y}P_{x_p,y_p}$ 与 $P_{x_p,y}P_{x_p,y+1}$ 斜率应当相同，所以继续按照之前的推导可得：
$P_{x_p,y_p} = P_{x_p,y} + (P_{x_p,y+1} - P_{x_p,y}) (y_p - y) \tag{3}$
到这里我们就可以求出 $P_{x_p,y_p}$ 的值，推导也就到此结束，公式（1），（2），（3）合并起来就是双线性插值的整个流程。这就是非常原始，非常淳朴的概念法，完全跟着双线性的概念来推导。

代码如下：
与最邻近插值的代码同理，bilinear是向量化实现，bilinear_naive是for循环实现，在我的电脑上后者耗时是前者的23倍左右。

def bilinear(image, coords):
    """
    Bilinear sampler

    Parameters
    ----------
    image: ndarray
        source image, whose shape is [height, width, channel] or
        [height, width]
    coords: ndarray
        coordinates to be interpolated, the length of last axis should be 2,
        meaning 2D coordinate

    Returns
    -------
    output: ndarray
        the interpolated image, same shape as coords except the last axis
    """
    image, coords, output_shape = __preprocess_sampler_input(image, coords)
    height, width, channel = image.shape
    # convert image dtype to float32, very important
    image = np.float32(image)
    # coordinates of four corners
    tl = np.floor(coords).astype(np.int32)
    tr = tl + np.array([[1, 0]])
    bl = tl + np.array([[0, 1]])
    br = tl + np.array([[1, 1]])
    idx = (tl[:, 0] >= 0) & (tl[:, 1] >= 0) & \
          (br[:, 0] < width) & (br[:, 1] < height)
    x_offset = np.reshape(coords[:, 0] - tl[:, 0], (tl.shape[0], 1))
    y_offset = np.reshape(coords[:, 1] - tl[:, 1], (tl.shape[0], 1))
    # initialize variables for interpolation
    pt_y0 = np.zeros((coords.shape[0], channel), dtype=np.float32)
    pt_y1 = np.zeros((coords.shape[0], channel), dtype=np.float32)
    output = np.zeros((coords.shape[0], channel), dtype=np.float32)
    # linear interpolation of first direction
    pt_y0[idx] = image[tl[idx, 1], tl[idx, 0]] + \
                 (image[tr[idx, 1], tr[idx, 0]] -
                  image[tl[idx, 1], tl[idx, 0]]) * x_offset[idx]
    pt_y1[idx] = image[bl[idx, 1], bl[idx, 0]] + \
                 (image[br[idx, 1], br[idx, 0]] -
                  image[bl[idx, 1], bl[idx, 0]]) * x_offset[idx]
    # linear interpolation of second direction
    output[idx] = pt_y0[idx] + (pt_y1[idx] - pt_y0[idx]) * y_offset[idx]
    # reshape back to the output_shape, clip value, and change dtype to uint8
    output = np.reshape(output, output_shape)
    output = np.round(np.clip(output, 0, 255)).astype(np.uint8)
    return output


def bilinear_naive(image, coords):
    image, coords, output_shape = __preprocess_sampler_input(image, coords)
    height, width, channel = image.shape
    # convert image dtype to float32, very important
    image = np.float32(image)
    output = np.zeros((coords.shape[0], channel), dtype=np.float32)
    for i, coord in enumerate(coords):
        tl = np.floor(coord).astype(np.int32)
        tr = tl + np.array([1, 0])
        bl = tl + np.array([0, 1])
        br = tl + np.array([1, 1])
        offset = coord - tl
        if tl[0] >= 0 and tl[1] >= 0 and br[0] < width and br[1] < height:
            pt_y0 = image[tl[1], tl[0]] + (
                    image[tr[1], tr[0]] - image[tl[1], tl[0]]) * offset[0]
            pt_y1 = image[bl[1], bl[0]] + (
                    image[br[1], br[0]] - image[bl[1], bl[0]]) * offset[0]
            output[i] = pt_y0 + (pt_y1 - pt_y0) * offset[1]
    # reshape back to the output_shape, clip value, and change dtype to uint8
    output = np.reshape(output, output_shape)
    output = np.round(np.clip(output, 0, 255)).astype(np.uint8)
    return output

权重法

权重法就是在上述概念法推导的基础上，将公式（1）和（2）代入公式（3），消去公式（3）中的 $P_{x_p,y}$ 和 $P_{x_p,y+1}$ ，然后进行化简得到的。因为最终的表达形式是四个标量数字与四个顶点的像素值求点积（dot product，相乘再求和），所以四个标量数字其实可以看作是一种权重，因此称之为权重法。

下面按照上面的说明进行推导：
将公式（1）和（2）代入公式（3）可得：
$P_{x_p,y_p} = P_{x,y} + (P_{x+1,y} - P_{x,y}) (x_p - x) + \\[2ex] ([P_{x,y+1} + (P_{x+1,y+1} - P_{x,y+1}) (x_p - x)] - [P_{x,y} + (P_{x+1,y} - P_{x,y}) (x_p - x)]) (y_p - y)$
这个公式看起来有点长，需要仔细点化简，化简方法是分别以四个顶点为基准进行合并同类项，第一步化简后可得：
$P_{x_p,y_p} = P_{x,y} + P_{x+1,y} (x_p - x) - P_{x,y} (x_p - x) + \\[2ex] P_{x,y+1}(y_p - y) + P_{x+1,y+1} (x_p - x) (y_p - y) - P_{x,y+1} (x_p - x) (y_p - y) - \\[2ex] P_{x,y}(y_p - y) - P_{x+1,y} (x_p - x) (y_p - y) + P_{x,y} (x_p - x) (y_p - y)$
以 $P_{x,y}$ 为基准合并同类项可得：
$P_{x,y} [1 - (x_p - x) - (y_p - y) + (x_p - x) (y_p - y)] = \\[2ex] (x_p-x-1) (y_p-y-1) P_{x,y}$
以 $P_{x+1,y}$ 为基准合并同类项可得：
$P_{x+1,y} [(x_p - x) - (x_p - x) (y_p - y)] = \\[2ex] -(x_p - x) (y_p - y - 1) P_{x+1,y}$
以 $P_{x,y+1}$ 为基准合并同类项可得：
$P_{x,y+1} [(y_p - y) - (x_p - x) (y_p - y)] = \\[2ex] -(x_p - x - 1) (y_p - y) P_{x,y+1}$
还剩下一个 $P_{x+1,y+1}$ 项，只有一项，也不用合并了，直接可以写下来：
$x_p - x) (y_p - y) P_{x+1,y+1}$
现在四项合并完毕，写一起得到：
$\begin{aligned} P_{x_p,y_p} = & (x_p-x-1) (y_p-y-1) P_{x,y} \\ & -(x_p - x) (y_p - y - 1) P_{x+1,y} \\ & -(x_p - x - 1) (y_p - y) P_{x,y+1} \\ & +(x_p - x) (y_p - y) P_{x+1,y+1} \end{aligned} \tag{4}$
记：
$\begin{aligned} x_{offset} &= x_p - x \\[2ex] y_{offset} &= y_p - y \end{aligned} \tag{5}$
进一步记：
$\begin{aligned} W_{x,y} &= (x_{offset} - 1) (y_{offset} - 1) \\[2ex] W_{x+1,y} &= -x_{offset} (y_{offset} - 1) \\[2ex] W_{x,y+1} &= -(x_{offset} - 1) y_{offset} \\[2ex] W_{x+1,y+1} &= x_{offset} * y_{offset} \end{aligned} \tag{6}$
那么有：
$\begin{aligned} P_{x_p,y_p} = W_{x,y} P_{x,y} + W_{x+1,y} P_{x+1,y} + W_{x,y+1} P_{x,y+1} + W_{x+1,y+1} P_{x+1,y+1} \end{aligned} \tag{7}$
至此推导完毕。在实际写程序时可以使用公式（4），也可以使用（5），（6），（7），推荐使用（5），（6），（7），因为条理比较清楚一些，不容易出错，也更容易使用向量化编程实现。公式（6）的四个权重之和是1，各位可以算一下看看。

现在来比较一下概念法与权重法的计算量：

概念法需要9次加减法，3次乘法
权重法使用（5），（6），（7）流程，需要6次加减法，2次取负号，8次乘法

表面上看起来概念法的计算量更少一些，流程也更清楚简单，那为何还要搞权重法？因为权重法的应用面更广一些，如果四个顶点中的某一个或几个不能参与运算（不能四个全部失效），权重法仍然可以运用，运用方法是将失效顶点的权重置为0，然后运用公式（8）。此时概念法运用起来就不是那么顺滑了。当插值碰到图像边界（border）时就会碰到这种情况，有一些比较复杂的图像算法也会碰到这种情况，图形学里面的插值也有可能碰到。
$\begin{aligned} P_{x_p,y_p} =\frac {W_{x,y} P_{x,y} + W_{x+1,y} P_{x+1,y} + W_{x,y+1} P_{x,y+1} + W_{x+1,y+1} P_{x+1,y+1}} {W_{x,y} + W_{x+1,y} + W_{x,y+1} + W_{x+1,y+1}} \end{aligned} \tag{8}$

代码如下：
同理，bilinear_weight_mode是向量化实现，bilinear_weight_mode_naive是for循环实现，在我的电脑上后者耗时是前者的55倍左右。
虽然我觉得权重法比概念法的计算量大，但是在我电脑上，bilinear_weight_mode的耗时小于bilinear，前者约为后者的80%，既然我是用numpy的向量化编程方式实现的，感觉可能跟numpy的底层实现有点关系，不过这么靠近底层的东西不是我的关注范围，所以就不再深入去了解。

def bilinear_weight_mode(image, coords):
    """
    Bilinear sampler

    Parameters
    ----------
    image: ndarray
        source image, whose shape is [height, width, channel] or
        [height, width]
    coords: ndarray
        coordinates to be interpolated, the length of last axis should be 2,
        meaning 2D coordinate

    Returns
    -------
    output: ndarray
        the interpolated image, same shape as coords except the last axis
    """
    image, coords, output_shape = __preprocess_sampler_input(image, coords)
    height, width, channel = image.shape
    # convert image dtype to float32, very important
    image = np.float32(image)
    # coordinates of four corners
    tl = np.floor(coords).astype(np.int32)
    tr = tl + np.array([[1, 0]])
    bl = tl + np.array([[0, 1]])
    br = tl + np.array([[1, 1]])
    idx = (tl[:, 0] >= 0) & (tl[:, 1] >= 0) & \
          (br[:, 0] < width) & (br[:, 1] < height)
    x_offset = np.reshape(coords[:, 0] - tl[:, 0], (tl.shape[0], 1))
    y_offset = np.reshape(coords[:, 1] - tl[:, 1], (tl.shape[0], 1))
    # compute the weights of four corners
    w_tl = (x_offset - 1) * (y_offset - 1)
    w_tr = x_offset * (1 - y_offset)
    w_bl = (1 - x_offset) * y_offset
    w_br = x_offset * y_offset
    # interpolate the output by weight method
    output = np.zeros((coords.shape[0], channel), dtype=np.float32)
    output[idx] = w_tl[idx] * image[tl[idx, 1], tl[idx, 0]] + \
                  w_tr[idx] * image[tr[idx, 1], tr[idx, 0]] + \
                  w_bl[idx] * image[bl[idx, 1], bl[idx, 0]] + \
                  w_br[idx] * image[br[idx, 1], br[idx, 0]]
    # reshape back to the output_shape, clip value, and change dtype to uint8
    output = np.reshape(output, output_shape)
    output = np.round(np.clip(output, 0, 255)).astype(np.uint8)
    return output


def bilinear_weight_mode_naive(image, coords):
    image, coords, output_shape = __preprocess_sampler_input(image, coords)
    height, width, channel = image.shape
    # convert image dtype to float32, very important
    image = np.float32(image)
    output = np.zeros((coords.shape[0], channel), dtype=np.float32)
    for i, coord in enumerate(coords):
        tl = np.floor(coord).astype(np.int32)
        corners = np.array([
            tl,
            tl + np.array([1, 0]),
            tl + np.array([0, 1]),
            tl + np.array([1, 1])])
        offset = coord - tl
        weights = np.array([
            (offset[0] - 1) * (offset[1] - 1),
            offset[0] * (1 - offset[1]),
            (1 - offset[0]) * offset[1],
            offset[0] * offset[1]])
        total_weight = 0.0
        pixel = np.zeros([channel])
        for k in range(4):
            # more conditions can be added in the following part if needed
            if 0 <= corners[k, 0] < width and 0 <= corners[k, 1] < height:
                pixel += weights[k] * image[corners[k, 1], corners[k, 0]]
                total_weight += weights[k]
        if total_weight != 0.0:
            output[i] = pixel / total_weight
    # reshape back to the output_shape, clip value, and change dtype to uint8
    output = np.reshape(output, output_shape)
    output = np.round(np.clip(output, 0, 255)).astype(np.uint8)
    return output

测试

可以使用如下代码测试上面的插值函数。下面代码总共分为两部分，第一部分是坐标变换coordinate transform（利用仿射变换实现），第二部分是插值sampler。可以在第二部分尝试使用不同的插值函数，并观察效果和耗时。

import cv2
import time
import numpy as np

# coordinate transform
T = np.array([[1.1, 0.1],
              [-0.1, 1.1],
              [0, 0]])
image = cv2.imread('1.jpg', cv2.IMREAD_UNCHANGED)
height, width = image.shape[0:2]
coords = np.meshgrid(np.arange(0, width), np.arange(0, height))
# shape = [height, width, 2] after transpose
coords = np.array(coords).transpose([1, 2, 0])
ones = np.ones([height, width, 1])
# homogeneous  coordinates
coords = np.concatenate((coords, ones), axis=2)
# transformed coordinates
coords = coords @ T

# sampler
t0 = time.time()
img = bilinear(image, coords)  # try different functions here
t_cost = time.time() - t0
print(t_cost)

测试图像如下，分辨率是1920 x 1080 （宽 x 高），各位可以随便找一个图像来测试。
在这里插入图片描述
程序运行出来的图像是：

将凳子部分放大，可以明显看出最邻近和双线性插值的区别，最邻近插值后线条的锯齿感较强，而双线性插值则平滑地多。

最邻近插值结果：
在这里插入图片描述
双线性插值结果：

拜阳

关注

38
点赞
踩
109

收藏

觉得还不错? 一键收藏
1
评论
图像插值：最邻近（nearest）与双线性（bilinear）

文章目录为什么需要图像插值？最邻近插值（nearest）双线性插值（bilinear）概念法权重法为什么需要图像插值？最邻近插值（nearest）双线性插值（bilinear）概念法权重法...
复制链接

扫一扫

专栏目录