图像的上采样整理

最新推荐文章于 2024-08-02 19:30:00 发布

星画天

最新推荐文章于 2024-08-02 19:30:00 发布

阅读量1.1k

点赞数 1

分类专栏： AI保研之旅文章标签：深度学习

本文链接：https://blog.csdn.net/Sakura_day/article/details/115824498

版权

AI保研之旅专栏收录该内容

19 篇文章 5 订阅

订阅专栏

图像的上采样整理：

图像的上采样其实也是一个非常重要的概念，它会应用在分割的图像场景中。图片分割的各种分割的概念很容易混淆，在这里稍微整理一下

语义分割：

语义分割更像是分类，只不过分类的最小的单位是像素而已。它把各种物体分类出来用不同的颜色进行标记。这应该是分割中比较简单的吧

实例分割：

就只分割车或者人等实体，没有对背景进行分割。

全景分割：

这个是最难的，不仅要把每一个类别分出来，每一个类别对应的个体也要用不同的颜色进行标记。要分割一张图片的所有区域，这是非常难的。

上采样技术：

注意膨胀卷积不是上采样！！！！！！！！！

Unpooling:

现在知道的有两种，一个是Zero Unpooling，就是直接暴力填充0。
另一个是Max Unpooling，就是觉得像填数独一样，因为这种方法是不可逆的，一般就是直接记住原来pooling数据的位置，然后把数字填充下去之后在其他位置填充0就可以了。

最近邻插值：

把和要插值的像素最近的那个像素的值赋予这个像素。这个插值方法应该就是字面上理解就可以了。

img = cv2.imread('cat.png')
def nearest_interpolation(src, dst_size):
    # w和h其实已经反过来了，所以这个代码适用于w = h的情况，但是有助于理解公式
    src_h, src_w, channel = src.shape
    dst_w, dst_h = dst_size
    # 反过来计算吗？
    x_scale = dst_w / src_w
    y_scale = dst_h / src_h
    dst = np.zeros((dst_h, dst_w, channel), dtype=src.dtype)
    # 看起来也还是有些道理，至少对应是对的
    for c in range(channel):
        for dst_x in range(dst_w):
            for dst_y in range(dst_h):
                # 计算目标图的当前坐标在源图像中的位置
                # 数学关系：src_x + 0.5 = (dst_x + 0.5) / x_scale
                src_x = int(round((dst_x + 0.5) / x_scale - 0.5))
                src_y = int(round((dst_y + 0.5) / y_scale - 0.5))
                # 这种量化确实是麻烦
                src_x = src_x if src_x > 0 else 0
                src_x = src_x if src_x < src_w else src_w - 1
                src_y = src_y if src_y > 0 else 0
                src_y = src_y if src_y < src_h else src_h - 1
                dst[dst_y, dst_x, c] = src[src_y, src_x, c]
    return dst

# 下面的版本是矩阵运算，能够并行计算提高速度 
def nearest_interpolation_v2(src, dst_size):
    src_h, src_w, channel = src.shape
    # 因为插值完没有差别，所以宽高随意。但是只能进行正方形的图片插值
    dst_w, dst_h = dst_size
    x_scale = dst_w / src_w
    y_scale = dst_h / src_h
    # 生成网格，dst_y 是这个512 * 512 就是【512个0】【512个1】这样的 dst_x是dst_y的转置
    dst_y, dst_x = np.mgrid[:dst_h, :dst_w]
    # 双线性插值公式，真的有最近邻插值的样子！！！！这个公式要注意一下
    src_x = np.around((dst_x + 0.5) / x_scale - 0.5).astype(np.int64)
    src_y = np.around((dst_y + 0.5) / y_scale - 0.5).astype(np.int64)
    # 对数组进行限制 ，限制在0-src_w-1之间，在这里src_x,src_y已经把格子的像素index标定出来了，可以debug
    src_x = np.clip(src_x, 0, src_w-1)
    src_y = np.clip(src_y, 0, src_h-1)
    # copy是干嘛？，原来是直接把对应像素的值复制过去
    dst = src[src_y, src_x].copy()
    return dst


img_n = nearest_interpolation(img, (512, 512))
cv2.imshow('show', img_n)
cv2.waitKey(0)
cv2.destroyAllWindows()

双线性插值：

原理如下：

首先对x方向插值，然后是y方向。当然你反过来也可以。具体公式如下：
$f\left(R_{1}\right) \approx \frac{x_{2}-x}{x_{2}-x_{1}} f\left(Q_{11}\right)+\frac{x-x_{1}}{x_{2}-x_{1}} f\left(Q_{21}\right)$ where $R_{1}=\left(x, y_{1}\right)$
$f\left(R_{2}\right) \approx \frac{x_{2}-x}{x_{2}-x_{1}} f\left(Q_{12}\right)+\frac{x-x_{1}}{x_{2}-x_{1}} f\left(Q_{22}\right)$ where $R_{2}=\left(x, y_{2}\right)$
$\approx \frac{y_{2}-y}{y_{2}-y_{1}} f\left(R_{1}\right)+\frac{y-y_{1}}{y_{2}-y_{1}} f\left(R_{2}\right)$
就不把这些公式合起来了，太长了。一般来说，x2-x1都可以当成1去计算。
在变换公式的时候，其实不一定每一个像素都能够进行差值，不一定全部整除，所以变换公式需要改一下，这个公式会在接下来的代码中体现：

img = cv2.imread('cat.png')

def bilinear_interpolation_v2(src, dst_size):
    # 先拿出src的长宽高
    src_h, src_w, channel = src.shape
    dst_h, dst_w = dst_size
    x_scale = dst_w / src_w
    y_scale = dst_h / src_h
    # 好细节。。。
    dst = np.zeros((dst_h, dst_w, channel), dtype=src.dtype)
    # 开始插值了
    dst_y, dst_x = np.mgrid[:dst_h, :dst_w]
    # 双线性插值公式，真的有最近邻插值的样子！！！！这个公式要注意一下，这是真的位置插值
    ## TODO:这个就是重要的标定坐标的公式
    src_x = (dst_x + 0.5) / x_scale - 0.5
    src_y = (dst_y + 0.5) / y_scale - 0.5

    src_x1 = src_x.astype(np.int64)
    src_y1 = src_y.astype(np.int64)
    src_x2 = src_x1 + 1
    src_y2 = src_y1 + 1

    # 对数组进行限制 ，限制src的范围
    src_x1 = np.clip(src_x1, 0, src_w - 1)
    src_y1 = np.clip(src_y1, 0, src_h - 1)
    src_x2 = np.clip(src_x2, 0, src_w - 1)
    src_y2 = np.clip(src_y2, 0, src_h - 1)

    x_to_x1 =  np.expand_dims((src_x - src_x1), axis=-1)
    x2_to_x = np.expand_dims((src_x2 - src_x), axis=-1)
    y_to_y1 =  np.expand_dims((src_y - src_y1), axis=-1)
    y2_to_y = np.expand_dims((src_y2 - src_y), axis=-1)

    y1_value = x_to_x1 * src[src_y1, src_x2] + x2_to_x * src[src_y1, src_x1]
    y2_value = x_to_x1 * src[src_y2, src_x2] + x2_to_x * src[src_y2, src_x1]
    dst[dst_y, dst_x] = y_to_y1 * y2_value + y2_to_y * y1_value
    return dst

双三次插值：

这种差值方法就比较难了，需要构造BiCubic函数，也不懂是什么东西：
就是这个公式： $W(x)=\left\{\begin{array}{ll}(a+2)|x|^{3}-(a+3)|x|^{2}+1 & \text { for }|x| \leq 1 \\ a|x|^{3}-5 a|x|^{2}+8 a|x|-4 a & \text { for } 1<|x|<2 \\ 0 & \text { otherwise }\end{array}\right.$ （a一般取-0.5）
函数形状如下：觉得就像sinc函数的变种
$y)=\sum_{i=0}^{3} \sum_{j=0}^{3} f\left(x_{i}, y_{j}\right) W\left(x-x_{i}\right) W\left(y-y_{j}\right)$ 这个是总的公式，取离它最新的4邻域点进行插值。没有代码了，本身计算太复杂，也没有很大必要的感觉。