Deep Learning for Computer Vision with Python

最新推荐文章于 2024-04-13 11:37:13 发布

Robin_Pi

最新推荐文章于 2024-04-13 11:37:13 发布

阅读量1.9k

点赞数

分类专栏： Books 深度学习（DL）

本文链接：https://blog.csdn.net/Robin_Pi/article/details/104837452

版权

本书介绍了深度学习在计算机视觉领域的应用，从图像基础、数据输入到卷积神经网络，涵盖预训练模型、迁移学习、正则化等技术。讲解了Keras中的实践，如LeNet和迷你VGG，强调了参数学习、优化方法和正则化的重要性，并探讨了模型的参数调整，如学习率、损失函数和激活函数的选择。此外，还涉及ImageNet上的训练及深度学习的实际应用，如情绪识别。

摘要由CSDN通过智能技术生成

几个比较刷新认知的点：

参数学习（而不是模型学习）
一切为寻得能够表示模型的参数、能够被学习的参数。
即最重要的东西是“模型参数”，而与它相关几个参数必然也是重要的：（可调参数）
① 学习率(是最重要参数) 决定了参数更新的多少
② 正则化(是最重要参数) ：帮助选出最优参数
是（从满足训练loss低的很多种参数中）选出最优参数，甚至以提高训练loss为代价地来提高测试 loss
batch 决定了多久（多少样本）可以更新一次参数
通过 Loss 图像（而不是ACC）来分析很多问题
Loss -> 损失函数 -> 参数更新的效果 -> 学习的效果
神经元是二元输出，而这靠激活函数来完成/激活
激活函数理解为激活层
预训练模型——模型使用 vs 模型训练（理解误区）

0. 介绍

0.1 书本类容

Starter Bundle （初级篇）
学习基础知识：
- 机器学习
- 神经网络
- 卷积神经网络
- 在自定义数据集上的运用
Practitioner Bundle （中级篇）
更深入地学习深度学习，理解高级的技巧以探索最佳实践技巧和经验法则。
ImageNet Bundle（高级篇）
完全深入深度学习在计算机视觉的实践，包括如何在大型数据集-ImageNet上训练大规模地神经网络以及年龄和性别预测、汽车制造和模型分类、表情识别等实际案例。

0.2 工具

主要三个模块：

Python
Keras
Mxnet（高级篇）
其它：
OpenCV
scikit-image
Scikit-learn
…

1. Starter Bundle

图像基础

图像构成的基础：像素(pixel)

通常，像素被认为是出现在图像中给定位置的光的“颜色”或“强度”。
像素的两种形式：

灰度/单通道
在灰度图中，每个像素是0-255中的一个实数值，0代表黑而255代表白。其它值代表不同程度的灰色。
彩色
彩色像素，通常在RGB颜色空间中表示（当然，还有其它的颜色空间）。
RGB色彩空间中的像素不再像灰度/单通道图像中那样是标量值-而是由三个值的列表表示像素：一个代表 Red 元素，一个代表 Green元素还有另一个代表 Blue 元素。
要在RGB颜色模型中定义颜色，我们要做的就是定义单个像素中包含的红色，绿色和蓝色的数量。
每个红色，绿色和蓝色通道可以具有在**[0，255]**范围内定义的值，总共256个“shades”，其中0表示无表示，而255表示完整表示。
考虑到像素值仅需要在[0，255]范围内，我们通常使用8位无符号整数表示强度。

在这里插入图片描述
…

Forming an Image From Channels

我们可以将RGB图像概念化，该图像由宽度W和高度H的三个独立矩阵组成，每个RGB分量一个，如图3.5所示。我们可以将这三个矩阵结合起来，得到形状为W×H×D的多维数组，其中D是通道的深度或数量（对于RGB颜色空间，D = 3）
在这里插入图片描述

图像在python中的表示：NumPy array

NumPy ：(height, width, depth)

关于NumPy中为什么把高放在前面：

When defining the dimensions of matrix, we always write it as rows x columns. The number of rows in an image is its height whereas the number of columns is the image’s width. The depth will still remain the depth.

RGB vs GBR

要注意：OpenCV实际上以蓝色，绿色，红色顺序（GBR）存储像素值（而不是RGB）。
——历史原因

缩放和宽高比( aspect ratio)

注意：忽略宽高比(aspect ratio)可能会导致图像看起来压缩和变形

From a strictly aesthetic point of view, you almost always want to ensure the aspect ratio of the image is maintained when resizing an image – but this guideline isn’t always the case for deep learning. Most neural networks and Convolutional Neural Networks applied to the task of image classification assume a fixed size input, meaning that the dimensions of all images you pass through the network must be the same. Common choices for width and height image sizes inputted to Convolutional Neural Networks include 32 × 32, 64 × 64, 224 × 224, 227 × 227, 256 × 256, and 299 × 299.

数据输入

输入大小固定

必要性
机器学习算法模型，比如KNN、SVM甚至是CNN都需要一个固定大小的特征向量。因此，图片都需要进过预处理并缩放到具有相同的宽和高。
-方法
① 有很多高级方法在考虑**纵横比( aspect ratio)**的情况下进行转换；
② 有些方法直接粗暴转换（忽视纵横比）；

如何选择要看方差因素的复杂性：在有些情况，忽略纵横比效果可以，但是有些情况下却需要保留纵横比。

代码

① preprocess an image：
（simple_preprocessor）

import cv2
class SimplePreprocessor:
     def __init__(self, width, height, inter=cv2.INTER_AREA):
          # store the target image width, height, and interpolation
          # method used when resizing
          self.width = width
          self.height = height
          self.inter = inter
     def preprocess(self, image):
          # resize the image to a fixed size, ignoring the aspect
          # ratio
          return cv2.resize(image, (self.width, self.height),
               interpolation=self.inter)

（这里只是处理单张图片-没有考虑导入整个数据）

② load a collection of images from disk
（simple_dataset_loader）

import os
import cv2
import numpy as np


class SimpleDatasetLoader:
    # Method: Constructor
    def __init__(self, preprocessors=None):
        """
        :param preprocessors: List of image preprocessors
        """
        self.preprocessors = preprocessors

        if self.preprocessors is None:
            self.preprocessors = []

    # Method: Used to load a list of images for pre-processing
    def load(self, image_paths, verbose=-1):
        """
        :param image_paths: List of image paths
        :param verbose: Parameter for printing information to console
        :return: Tuple of data and labels
        """
        data, labels = [], []

        for i, image_path in enumerate(image_paths):
            image = cv2.imread(image_path)
            label = image_path.split(os.path.sep)[-2]

            if self.preprocessors is not None:
                for p in self.preprocessors:
                    image = p.preprocess(image)

            data.append(image)
            labels.append(label)

            if verbose > 0 and i > 0 and (i+1) % verbose == 0:
                print('[INFO]: Processed {}/{}'.format(i+1, len(image_paths)))

        return (np.array(data), np.array(labels))

③ more advanced dataset loaders

从 K-NN 到参数学习

参数学习
K-NN 算法实质上并没有“学习”任何东西，它完全依赖于输入的数据对新数据做出决策。
因此如果算法出错，它无法做出相应的“改进“，而且算法的体量完全随着输入数据维度的增加而线性增加。

取而代之，更理想的方法是定义一个机器学习模型，该模型可以在训练期间从我们的输入数据中学习模式（要求我们在训练过程中花费更多的时间），但是其却有以下好处：由少量参数定义，而且可以简单的用这些参数（而不是训练量）来表示模型。这种方式的机器学习就叫做参数学习。

“A learning model that summarizes data with a set of parameters of fixed size (independent of the number of training examples) is called a parametric model. No matter how much data you throw at the parametric model, it won’t change its mind about how many parameters it needs.” – Russell and Norvig (2009) [73]

使用了参数化学习(parameterized learning)，我们就可以从输入数据中学习并发现潜在的模式。

参数学习四大块：

Data
Scoring function
Loss function
Weights and biases

优化方法和正则化

优化方法

传统的梯度下降算法在训练完整个训练集之后才更新一次权重（performs only one weight update per epoch）：太慢，而且浪费计算资源.

参数：learning rate
梯度下降由学习率来进行控制。
学习率是迄今训练自己的模型最为重要的一个参数：

太大的学习率会导致学不到任何东西，太小的学习率会导致要话花费很长时间才能到达一个合理的loss。

（并不需要经过一次 epoch，而是）每隔一个 batch 就更新一次权重

参数： batch size

Instead of computing our gradient over the entire data set, we instead sample our data, yielding a batch. We evaluate the gradient on the batch, and update our weight matrix W.

纯SGD的思想：mini-batch size = 1，表示随机从训练集中选取一个样本，计算梯度并进行权重更新；
一般做法：mini-batch size =＞1，常取得值是 32、64、128和256.
原因：① 参数更新中减少方差 ② 批大小为2的幂次有利于优化（提高线性优化器的优化效率）
3. Extensions to SGD

Momentum
Nesterov’s Acceleration
Anecdotal Recommendations

正则化

定义

以增加训练误差为代价以减少测试误差的一些策略。

“Many strategies used in machine learning are explicitly designed to reduce the test error, possibly at the expense of increased training error. These strategies are collectively known as regularization.” – Goodfellow et al. [10]
目的：帮助选择参数（以提高模型泛化）
我们如何选择一组参数来确保模型能够很好地泛化，或者，至少可以减轻过度拟合的影响？
答案是正则化。

keep in mind that we are working in a real-valued space, thus there are an infinite set of parameters that will obtain reasonable classification accuracy on our dataset (for some definition of “reasonable”).

How do we go about choosing a set of parameters that help ensure our model generalizes well? Or, at the very least, lessen the effects of overfitting. The answer is regularization. Second only to your learning rate, regularization is the most important parameter of your model that can you tune.
深刻理解
损失函数帮助我们找到在训练集表现良好的参数”们“，而正则化帮助我们从中再选择出对未知测试集也表现良好的参数
正则化的手段

权重衰减（作用于损失函数）：比如 L1 和 L2

例子：
参数-penalty ： None 、l1、l2

for r in (None, "l1", "l2"):
	model = SGDClassifier(loss='log', penalty=r, max_iter=10,
	learning_rate="constant", eta0=0.01, random_state=42)
	model.fit(trainX, trainY)

作用于模型：比如 Dropout
暗藏在训练过程中，比如数据增强和”早停“

神经网络

在这里插入图片描述
一个简单的NN，它采用输入x和权重w的加权和。然后，该加权总和通过激活函数传递，以确定神经元是否触发。
（一个神经元的输出状态只有两种！这完全依靠激活函数来控制）

激活函数

包括”两类“：
classica ones：step, sigmoid, tanh…
modern ones：ReLU, Leaky ReLU, ELU…
（附录中还包含更详细的信息）
如何选？建议：
从正常的ReLU开始，并调整网络中的其他参数-然后交换更多的ReLU变体。

（前馈）神经网络

在这里插入图片描述（传统的前馈神经网络所有的层都是全连接层- fully- connected (FC) layer）

Layer0：包含3个输入，即我们的x_i值。这些可以是图像的原始像素强度或从图像中提取的特征向量。
Layer1 和 Layer2：包分别含2个和3个节点的隐藏层。
Layer3：第3层是输出层或可见层–在这里，我们可以从网络中获得总体输出分类。（输出层节点的数量与可能输出的