[CS131 Computer Vision] 图像处理中卷积的理解与Python实现

最新推荐文章于 2024-10-10 07:51:39 发布

Chris_Yg

最新推荐文章于 2024-10-10 07:51:39 发布

阅读量2.4k

点赞数 2

分类专栏： CS131 Computer Vision 文章标签：卷积 convolution cs131 图像处理

本文链接：https://blog.csdn.net/ygjustgo/article/details/79405594

版权

CS131 Computer Vision 专栏收录该内容

1 篇文章 1 订阅

订阅专栏

博主： Chris_yg
学海无涯，欢迎讨论，共同进步

本文将主要介绍二维卷积公式，性质，计算方法以及Python实现。

1. 二维卷积公式及性质

在图像处理中，图片由离散的像素组成，卷积运算通常用于表示某一像素邻域的加权和，二维卷积的离散形式如下：

g (m, n) = f * h = \sum k = - \infty \infty \sum l = - \infty \infty f (m, n) h (m - k, n - l)

$g(m,n)=f*h=\sum_{k=-∞}^∞\sum_{l=-∞}^∞f(m,n)h(m-k,n-l)$
卷积运算满足以下性质：

交换律： $f*h=h*f$
结合律： $f*(g*h)=(f*g)*h$
分配律： $f*(g+h)=f*g+f*h$

2.二维卷积的计算方法及python实现

(1) 利用原始公式进行计算，需要4层嵌套循环：

设 f 大小为 $(M_1,N_1)$ ， h 大小为 $(M_2,N_2)$ ，卷积公式可表示如下：

g (m, n) = f * h = h * f = \sum k = 0 M 1 - 1 \sum l = - 0 N 1 - 1 h (m, n) f (m - k, n - l)

$g(m,n)=f*h=h*f=\sum_{k=0}^{M_1-1}\sum_{l=-0}^{N_1-1}h(m,n)f(m-k,n-l)$
其中，

0≤m<M1+M2−1,0≤m<N1+N2−1 0 ≤ m < M 1 + M 2 − 1 , 0 ≤ m < N 1 + N 2 − 1 $0≤m<M_1+M_2-1 , 0≤m<N_1+N_2-1$
利用上述公式计算所得如下图中full区域所示，实际上在图像处理中，我们所需的为same区域，即保持图像大小在卷积前后保持不变。
卷积示例

import numpy as np


def conv_nested(image, kernel):
    """A naive implementation of convolution filter.

    This is a naive implementation of convolution using 4 nested for-loops.
    This function computes convolution of an image with a kernel and outputs
    the result that has the same shape as the input image.

    Args:
        image: numpy array of shape (Hi, Wi)
        kernel: numpy array of shape (Hk, Wk)

    Returns:
        out: numpy array of shape (Hi, Wi)
    """
    Hi, Wi = image.shape
    Hk, Wk = kernel.shape
    out = np.zeros((Hi, Wi))

    temp_m = np.zeros((Hi+Hk-1, Wi+Wk-1))     # 所得为 full 矩阵
    for i in range(Hi+Hk-1):
        for j in range(Wi+Wk-1):
            temp = 0
            # 通常来说，卷积核的尺寸远小于图片尺寸，同时卷积满足交换律，为了加快运算，可用h*f 代替 f*h 进行计算
            for m in range(Hk):
                for n in range(Wk):
                    if ((i-m)>=0 and (i-m)<Hi and (j-n)>=0 and (j-n)<Wi):
                        temp += image[i-m][j-n] * kernel[m][n]
            temp_m[i][j] = temp
    # 截取出 same 矩阵 （输出尺寸同输入）
    for i in range(Hi):
        for j in range(Wi):
            out[i][j] = temp_m[int(i+(Hk-1)/2)][int(j+(Wk-1)/2)]            

    return out

(2) 旋转卷积核180°，原始图像进行zero-padding，随后滑动卷积核加权求和：

这里写图片描述
此过程计算效率比第一种方法高。卷积核的旋转可通过两次翻转完成（分别对x,y轴进行），代码如下：

def zero_pad(image, pad_height, pad_width):
    """ Zero-pad an image.

    Ex: a 1x1 image [[1]] with pad_height = 1, pad_width = 2 becomes:

        [[0, 0, 0, 0, 0],
         [0, 0, 1, 0, 0],
         [0, 0, 0, 0, 0]]         of shape (3, 5)

    Args:
        image: numpy array of shape (H, W)
        pad_width: width of the zero padding (left and right padding)
        pad_height: height of the zero padding (bottom and top padding)

    Returns:
        out: numpy array of shape (H+2*pad_height, W+2*pad_width)
    """

    H, W = image.shape
    out = None

    out = np.zeros((H+2*pad_height, W+2*pad_width))
    out[pad_height:pad_height+H, pad_width:pad_width+W] = image

    return out


def conv_fast(image, kernel):
    """ An efficient implementation of convolution filter.

    This function uses element-wise multiplication and np.sum()
    to efficiently compute weighted sum of neighborhood at each
    pixel.

    Hints:
        - Use the zero_pad function you implemented above
        - There should be two nested for-loops
        - You may find np.flip() and np.sum() useful

    Args:
        image: numpy array of shape (Hi, Wi)
        kernel: numpy array of shape (Hk, Wk)

    Returns:
        out: numpy array of shape (Hi, Wi)
    """
    Hi, Wi = image.shape
    Hk, Wk = kernel.shape
    out = np.zeros((Hi, Wi))

    pad_height = Hk // 2
    pad_width = Wk // 2
    image_padding = zero_pad(image, pad_height, pad_width)
    kernel_flip = np.flip(np.flip(kernel, 0), 1)

    for i in range(Hi):
        for j in range(Wi):            
            out[i][j] = np.sum(np.multiply(kernel_flip, image_padding[i:(i+Hk), j:(j+Wk)]))

    return out