传统卷积到卷积神经网络CNN

最新推荐文章于 2024-09-03 21:35:52 发布

智障学AI

最新推荐文章于 2024-09-03 21:35:52 发布

阅读量399

点赞数 5

分类专栏：检测器理解入门文章标签：计算机视觉 cnn 深度学习

本文链接：https://blog.csdn.net/bobchen1017/article/details/128529537

版权

检测器理解入门专栏收录该内容

9 篇文章 7 订阅

订阅专栏

图像卷积用于提取特征和增强图像，通过卷积核如拉普拉斯核和高斯核实现滤波与边缘检测。卷积神经网络(CNN)进一步扩展了这一概念，自动学习特征提取，解决传统机器学习中手动特征工程的问题。

摘要由CSDN通过智能技术生成

图像卷积

图像卷积是一种图像处理技术，它可以用来提取图像中的特征、增强图像的清晰度或消除噪声。在图像卷积过程中，我们使用一个称为卷积核（也称为滤波器）的小矩阵对图像进行卷积。卷积核按照图像的像素值进行计算，并将计算结果存储在输出图像的对应位置上。通常，图像卷积用于图像滤波、边缘检测、图像增强等应用。

下面随便拿一张图片来进行演示，还会演示几个经典的卷积核(滤波器)

导包

import numpy as np
import cv2
import matplotlib.pyplot as plt

查看图像维度

image = cv2.imread("./x.jpg")
image.shape  # （H, W, C）

(324, 450, 3)

图像维度Height = 324, Width = 450, channel = 3
这里解释下channel:

图像颜色通道是指在图像中使用的颜色表示方式。在数字图像中，颜色通常使用 RGB 模型表示。RGB 模型使用三个独立的通道（红色、绿色和蓝色）来表示图像的颜色。每个像素都由这三个通道的值表示。
例如，一个像素可能由以下值表示：
(R=255, G=0, B=0) # 红色
(R=0, G=255, B=0) # 绿色
(R=0, G=0, B=255) # 蓝色
(R=255, G=255, B=255) # 白色
(R=0, G=0, B=0) # 黑色

图像可能还有其他颜色通道，例如灰度图像只有单独的一个通道。还有其他颜色空间，如 HSL 和 CMYK，使用不同的通道表示颜色。

这里看下图像用什么储存的

type(image)

numpy.ndarray

这里用一个ndarray储存图像

plt.imshow(image)          # BGR

用plt展示图像

plt.imshow(image[..., ::-1]) # plt要求的格式是RGB

拉普拉斯核

拉普拉斯核是一种图像滤波器，它可以用来检测图像中的边缘。拉普拉斯核是由许多不同的数字组成的小矩阵，这些数字组成了拉普拉斯算子。拉普拉斯算子是一种常用的差分算子，用于检测图像中的边缘。
拉普拉斯核通常用于图像锐化。它可以将边缘的像素值增强，从而使图像看起来更加清晰。拉普拉斯核还可以用于图像边缘检测，这可以通过将拉普拉斯核应用于图像并寻找像素值变化剧烈的位置来实现。
下面是一个常用的拉普拉斯核：
[0 -1 0]

[-1 4 -1]

[0 -1 0]
这个核的大小为 3x3。它可以用来检测图像中的垂直和水平边缘。

kernel = np.array(
    [[1, 1, 1],
    [1, -8, 1],
    [1, 1, 1]]
).astype(np.float32) 
kernel.shape

(3, 3)

以中心点为基准: Image_size = 5 * 5   kernel = 3 * 3     
    - 中心点是 (1, 1)   (3, 1)    1 = kernel_half_width = kernel // 2 = 3 // 2 = 1 
              (1, 3)   (3, 3)    3 = image_width - kernel_half_width - 1  = 5 - 1 - 1 = 3  

                                 由于for i in range()  range [begin, end] end 取不到

for x in range(kernel_half_width, image_width - kernel_half_width):       
    for y in range(kernel_half_height, image_width - kernel_half_height):  
        Operation

# kernel and input size
kernel_height , kernel_width = kernel.shape
kernel_half_width, kernel_half_height = kernel_width // 2, kernel_height // 2 
image_height, image_width, _ = image.shape 

# output size 
output_height = image_height - kernel_height + 1 
output_width = image_width - kernel_width + 1 
output = np.zeros((output_height, output_width, 3))

# x, y 当前卷积区域的中心位置
for x in range(kernel_half_width, image_width - kernel_half_width):     # 以中心点为基准
    for y in range(kernel_half_height, image_height - kernel_half_height):
        for kx in range(kernel_width):
            for ky in range(kernel_height):
                
                # 卷积核左上角的位置
                kernel_value = kernel[ky, kx]      # y is rows information， x is cols information 
                # pixel_value = image[y, x]    # 当前卷积区域的中心位置 
                pixel_value = image[y - kernel_half_height + ky, x - kernel_half_width + kx]  # 拿到左上角的位置, 同时也跟着卷积核做迁移
                output[y - kernel_half_height, x - kernel_half_width] += kernel_value * pixel_value

output.shape

(322, 448, 3)

output.dtype

dtype('float64')

# 不带归一化
plt.imshow(output)

# 归一化
output = (output - np.min(output)) / (np.max(output) - np.min(output))    # 做归一化
plt.imshow(output)

高斯核

# 高斯核
def gaussian_kernel2d(size, sigma):
    
    s = 2 * sigma ** 2
    center = size // 2
    output = np.zeros((size, size))
    for i in range(size):
        for j in range(size):
            y = i - center
            x = j - center
            output[i, j] = np.exp(-(x ** 2 + y ** 2) / s)
    return output / np.sum(output)
kernel = gaussian_kernel2d(7, 1)

# kernel and input size
kernel_height , kernel_width = kernel.shape
kernel_half_width, kernel_half_height = kernel_width // 2, kernel_height // 2 
image_height, image_width, _ = image.shape 

# output size 
output_height = image_height - kernel_height + 1 
output_width = image_width - kernel_width + 1 
output = np.zeros((output_height, output_width, 3))

# x, y 当前卷积区域的中心位置
for x in range(kernel_half_width, image_width - kernel_half_width):     # 以中心点为基准
    for y in range(kernel_half_height, image_height - kernel_half_height):
        for kx in range(kernel_width):
            for ky in range(kernel_height):
                
                # 卷积核左上角的位置
                kernel_value = kernel[ky, kx]      # y is rows information， x is cols information 
                # pixel_value = image[y, x]    # 当前卷积区域的中心位置 
                pixel_value = image[y - kernel_half_height + ky, x - kernel_half_width + kx]  # 拿到左上角的位置, 同时也跟着卷积核做迁移
                output[y - kernel_half_height, x - kernel_half_width, ...] += kernel_value * pixel_value

plt.imshow(output)

# 归一化
output = (output - np.min(output)) / (np.max(output) - np.min(output))    # 做归一化
plt.imshow(output)

卷积神经网络

传统的卷积是 R G B 三个通道同时跟一个卷积核进行卷积, 可以理解为是卷积核对于图像的滤波。 CNN是三个通道分别和三个卷积核卷积再加上bias, 卷完了结果加在一块儿

多少个Filter多少个通道, 多少个bias

由卷积堆出来的NN叫CNN

输出的图像尺寸 = (输入尺寸 + 2 * padding – kernel_size) / stride + 1

Group卷积是一种参数下降的方法(ResNext)

CNN跟BP很大一个不一样的地方是他使用了权重共享, 不同的特征共享了一个相同的权重

CNN的直观理解

每个卷积层都是对输入的一种抽象。随着层次加深，网络抽象层次提高从开始的pixel级别抽象，到逐渐的part级别抽象，到最后object级别抽象

前面的卷积叫做提特征, 前面的表达特征太低级，难以分类出来，随着CNN的叠加更加明显了, 决策就更加清晰了

特征有层级的概念！！！我们只是在提取特征, 在对特征进行加工

想象我们买股票看不太懂, 有人给我们加工了一下, 加工成我们看得懂的层级, 决策也好做。也有可能一开始买亏了然后一直不断迭代之后就越来越好

卷积核很多, 各种花式提取特征

传统的机器学习时代, 流程整体都是一样的, 还是前面先提取特征再最后做一个全连接层, 但是前面的特征不是卷积神经网络而是人为定义, 例如说是拐角有多少个, 找多少个斜线, 多少个曲率, 直方图投影等等。

CNN解决了什么问题? 做到了自动提取特征。这里也可以看出深度学习的不可解释性, 自动学习提取特征

无论做什么, 一定记住特征是分层次的

逐层汇总: 工人提取特征, 汇报给小组长, 小组长汇报给VP, 董事长最后作决策(FC)

虽然说跨层, 跳连接, 工人直接给上层上层的领导汇报工作, 虽然现实不好, 但是在网络中却能使得消息很好的流通

乘加本身也是一种总结

试想一下

你有一个朋友e找你投资，你想知道e的经济能力来决定投资多少钱给他

你知道e身边的朋友abcdfghi，所以你想从e本身，和e身边的朋友下手，得到一个结论

这时候你找到了9个狗仔队去打听消息
但是这9个狗仔队里面，有的人很靠谱，有的人不靠谱，所以你心里默认给他们定了一个权重（随机）

当他们打听到结果后，你对结果进行加权汇总。你发现狗仔队3号这人有问题，失信骗人。你很生气，把3号的权重下调，如此往复，当你们磨合很久以后，就发现原来谁谁谁是可靠的，谁不是可靠的

Im2col

卷积神经网络-执行过程-GEMM方法-im2col

1. 之前实现卷积的方法乘法分散, 效率太低。
2. 优化性能所考虑的是将计算密集, 集中起来。尽可能的利用矩阵乘法, 被各个计算机科学家优化的很快。
3. Images to column, 把图像拉成一个列绿色的是Images, 蓝色是kernel
在这里插入图片描述

Im2col代码实现

import numpy as np
import matplotlib.pyplot as plt
import cv2

image = cv2.imread("x.jpg")
#image = cv2.resize(image, (5, 5))
itensor = image.transpose(2, 0, 1)[None]
itensor.shape

(1, 3, 324, 450)

def gaussian_kernel2d(size, sigma):
    
    s = 2 * sigma ** 2
    center = size // 2
    output = np.zeros((size, size))
    for i in range(size):
        for j in range(size):
            y = i - center
            x = j - center
            output[i, j] = np.exp(-(x ** 2 + y ** 2) / s)
    return output / np.sum(output)

#kernel = np.repeat(gaussian_kernel2d(3, 1)[None], 3, axis=0)
kernel = np.vstack([
    np.array(
    [
        [1, 0, -1], 
        [1, 0, -1], 
        [1, 0, -1]
    ])[None],
    np.array(
    [
        [1, 0, -1], 
        [1, 0, -1], 
        [1, 0, -1]
    ])[None],
    gaussian_kernel2d(3, 10)[None]
    #gaussian_kernel2d(3, 1)[None]
])
kernel.shape

(3, 3, 3)

_, kh, kw = kernel.shape
kh, kw

(3, 3)

n, c, h, w = itensor.shape
n, c, h, w

(1, 3, 324, 450)

计算输出维度的列数s = (image_width - kernel_width + 1) x (image_height - kernel_height + 1)

s = (w - kw + 1) * (h - kh + 1)
s

column = np.zeros((kh * kw * c, s))
column.shape

(27, 144256)

col_kernel = kernel.reshape(1, -1)
col_kernel.shape

(1, 27)

* image2column

half_kx = kw // 2
half_ky = kh // 2
ksize = kw * kh
for ic in range(c):
    col_x = 0
    for iy in range(half_ky, h - half_ky):
        for ix in range(half_kx, w - half_kx):
            for iky in range(kh):
                for ikx in range(kw):
                    pixel_value = itensor[0, ic, iy - half_ky + iky, ix - half_kx + ikx]
                    col_y = ic * ksize + ikx + iky * kw
                    column[col_y, col_x] = pixel_value
            col_x += 1