计算机视觉入门 4）滑动窗口

Avasla

已于 2023-11-22 15:52:48 修改

阅读量732

点赞数

分类专栏：深度学习 # TensorFlow 文章标签：计算机视觉人工智能

于 2023-08-23 09:30:42 首次发布

本文链接：https://blog.csdn.net/WHYbeHERE/article/details/132281049

版权

深度学习同时被 2 个专栏收录

13 篇文章

订阅专栏

TensorFlow

7 篇文章

订阅专栏

系列文章目录

提示：仅为个人学习笔记分享，若有错漏请各位老师同学指出，Thanks♪(･ω･)ﾉ

一、滑动窗口 The Sliding Window

在前几个文章中，介绍了图像特征提取的三个操作：

使用卷积层进行_filter_操作。
使用ReLU激活函数进行_detect_操作。
使用最大池化层进行_condense_操作。

卷积和池化操作共享一个共同特征：它们都是在一个滑动窗口上执行的。对于卷积来说，这个“窗口”由内核的尺寸参数kernel_size确定。对于池化来说，它是池化窗口，由pool_size给出。

一个二维滑动窗口。

还有两个影响卷积和池化层的额外参数——这些是窗口的strides和是否在图像边缘使用padding。

strides参数表示窗口在每一步移动多远，
而padding参数描述了我们如何处理输入图像边缘的像素。

使用这两个参数，定义这两个层如下：

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Conv2D(filters=64,
                  kernel_size=3,
                  strides=1,
                  padding='same',
                  activation='relu'),
    layers.MaxPool2D(pool_size=2,
                     strides=1,
                     padding='same')
    # ……
])

步幅（stride）

每次移动窗口的距离称为步幅（stride）。我们需要在图像的两个维度上指定步幅：一个用于从左到右移动，一个用于从上到下移动。以下动画展示了strides=(2, 2)，每次步幅移动2个像素。

步幅为 (2, 2) 的滑动窗口。

步幅有什么影响？当任一方向的步幅大于1时，滑动窗口在每一步都会跳过输入中的一些像素。

因为我们希望获得用于分类的高质量特征，卷积层通常会设置为strides=(1, 1)。增加步幅意味着我们会错过汇总中的潜在有价值的信息。然而，最大池化层的步幅几乎总是大于1的值，如(2, 2)或(3, 3)，但不会超过窗口本身的大小。

最后，需要注意的是，当strides的值在两个方向上相同时，只需设置一个数字；例如，可以使用strides=2来代替strides=(2, 2)进行参数设置。

填充（Padding）

在执行滑动窗口计算时，会遇到一个问题，即在输入的边界上应该如何处理。如果完全保持窗口在输入图像内部，那么窗口将永远不会像处理其他像素那样准确地位于边界像素上。由于我们不是对所有像素完全相同地进行处理，因此可能会出现问题。

卷积对这些边界值的处理由其padding参数确定。在 TensorFlow 中，您有两个选择：padding='same' 或 padding='valid'。这两种方式都有各自的利弊：

当设置为padding='valid'时，卷积窗口将完全位于输入内部。缺点是输出会减小（丢失像素），对于更大的卷积核，输出会减小得更多。这会限制网络可以包含的层数，特别是在输入尺寸较小的情况下。
另一种选择是使用padding='same'。这里的关键是在输入的边界周围使用0进行填充，只需使用足够的0来使输出的尺寸与输入的尺寸相同。然而，这可能会削弱边界像素的影响。下面的动画展示了使用'same'填充的滑动窗口。

Illustration of zero (same) padding.

二、【代码示例】

步骤1：导入包、函数封装

import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
from itertools import product
from skimage import draw, transform

def circle(size, val=None, r_shrink=0):
    circle = np.zeros([size[0]+1, size[1]+1])
    rr, cc = draw.circle_perimeter(
        size[0]//2, size[1]//2,
        radius=size[0]//2 - r_shrink,
        shape=[size[0]+1, size[1]+1],
    )
    if val is None:
        circle[rr, cc] = np.random.uniform(size=circle.shape)[rr, cc]
    else:
        circle[rr, cc] = val
    circle = transform.resize(circle, size, order=0)
    return circle

def show_kernel(kernel, label=True, digits=None, text_size=28):
    # Format kernel
    kernel = np.array(kernel)
    if digits is not None:
        kernel = kernel.round(digits)

    # Plot kernel
    cmap = plt.get_cmap('Blues_r')
    plt.imshow(kernel, cmap=cmap)
    rows, cols = kernel.shape
    thresh = (kernel.max()+kernel.min())/2
    # Optionally, add value labels
    if label:
        for i, j in product(range(rows), range(cols)):
            val = kernel[i, j]
            color = cmap(0) if val > thresh else cmap(255)
            plt.text(j, i, val, 
                     color=color, size=text_size,
                     horizontalalignment='center', verticalalignment='center')
    plt.xticks([])
    plt.yticks([])


def show_extraction(image,
                    kernel,
                    conv_stride=1,
                    conv_padding='valid',
                    activation='relu',
                    pool_size=2,
                    pool_stride=2,
                    pool_padding='same',
                    figsize=(10, 10),
                    subplot_shape=(2, 2),
                    ops=['Input', 'Filter', 'Detect', 'Condense'],
                    gamma=1.0):
    # Create Layers
    model = tf.keras.Sequential([
                    tf.keras.layers.Conv2D(
                        filters=1,
                        kernel_size=kernel.shape,
                        strides=conv_stride,
                        padding=conv_padding,
                        use_bias=False,
                        input_shape=image.shape,
                    ),
                    tf.keras.layers.Activation(activation),
                    tf.keras.layers.MaxPool2D(
                        pool_size=pool_size,
                        strides=pool_stride,
                        padding=pool_padding,
                    ),
                   ])

    layer_filter, layer_detect, layer_condense = model.layers
    kernel = tf.reshape(kernel, [*kernel.shape, 1, 1])
    layer_filter.set_weights([kernel])

    # Format for TF
    image = tf.expand_dims(image, axis=0)
    image = tf.image.convert_image_dtype(image, dtype=tf.float32) 
    
    # Extract Feature
    image_filter = layer_filter(image)
    image_detect = layer_detect(image_filter)
    image_condense = layer_condense(image_detect)
    
    images = {}
    if 'Input' in ops:
        images.update({'Input': (image, 1.0)})
    if 'Filter' in ops:
        images.update({'Filter': (image_filter, 1.0)})
    if 'Detect' in ops:
        images.update({'Detect': (image_detect, gamma)})
    if 'Condense' in ops:
        images.update({'Condense': (image_condense, gamma)})
    
    # Plot
    plt.figure(figsize=figsize)
    for i, title in enumerate(ops):
        image, gamma = images[title]
        plt.subplot(*subplot_shape, i+1)
        plt.imshow(tf.image.adjust_gamma(tf.squeeze(image), gamma))
        plt.axis('off')
        plt.title(title)

步骤2：定义核函数

import tensorflow as tf
import matplotlib.pyplot as plt

plt.rc('figure', autolayout=True)
plt.rc('axes', labelweight='bold', labelsize='large',
       titleweight='bold', titlesize=18, titlepad=10)
plt.rc('image', cmap='magma')

image = circle([64, 64], val=1.0, r_shrink=3)
image = tf.reshape(image, [*image.shape, 1])
# Bottom sobel
kernel = tf.constant(
    [[-1, -2, -1],
     [0, 0, 0],
     [1, 2, 1]],
)

show_kernel(kernel)

在这里插入图片描述

步骤3：结果对比

show_extraction(
    image, kernel,
    
    conv_stride=1,
    pool_size=2,
    pool_stride=2,

    subplot_shape=(1, 4),
    figsize=(14, 6),
)

在这里插入图片描述

show_extraction(
    image, kernel,

    conv_stride=3, # 修改为3 
    pool_size=2,
    pool_stride=2,

    subplot_shape=(1, 4),
    figsize=(14, 6),
)

在这里插入图片描述