计算机视觉（十二） --CNN 可视化

最新推荐文章于 2024-01-09 01:21:44 发布

Robin_shie

最新推荐文章于 2024-01-09 01:21:44 发布

阅读量1.8k

点赞数 1

分类专栏： cv AI 文章标签： cv AI CNN 计算机视觉

本文链接：https://blog.csdn.net/u010676526/article/details/80241738

版权

cv 同时被 2 个专栏收录

12 篇文章 0 订阅

订阅专栏

1 篇文章 0 订阅

订阅专栏

一、神经网络为什么比传统的分类器好

1.传统的分类器有 LR（逻辑斯特回归）或者 linear SVM ，多用来做线性分割，假如所有的样本可以看做一个个点，如下图，有蓝色的点和绿色的点，传统的分类器就是要找到一条直线把这两类样本点分开。

对于非线性可分的样本，可以加一些kernel核函数或者特征的映射使其成为一个曲线或者一个曲面将样本分开。但为什么效果不好，主要原因是你很难保证样本点的分布会如图所示那么规则，我们无法控制其分布，当绿色的点中混杂几个蓝色的点，就很难分开了，及时用曲线可以分开，这个曲线也会变得非常的扭曲，不仅难以学到，还会出现过拟合的问题。而且作为分开着两类样本的特征的抽取也是一个问题。这也是现在不用的原因。

2.那神经网络是怎么做到的呢？

神经网络其实就是使用AND和OR操作把样本点中得那一块抠出来。正如下图，最上面的绿色区域的每一个变都可以看成一个线性分类器，把样本分成正例和负例，那这些分类器做AND操作，得出的结果就是一个绿色的区域，然后把多个绿色的区域再用OR操作。而一个神经元就可以实现AND操作或者OR操作，我们只需要提供样本，神经网络就可以自己学到。这也就是它的优点所在。总结起来就是：对线性分类器的『与』和『或』的组合，完美对平面样本点分布进行分类

二、什么是卷积神经网络

卷积神经网络依旧是层级网络，但层的功能和形式做了变化。层级结构可参照下图

1.层级结构

其层级结构包括：数据输入层/ Input layer，卷积计算层 / CONV layer， ReLU 激励层 / ReLU layer，池化层 / Pooling layer，全连接层 / FC layer

（1）数据输入层/ Input layer

有3种常见的图像数据处理方式

去均值：把输入数据各个维度都中心化到0，也就是算出所有样本的平均值，再让所有样本减去这个均值

归一化：幅度归一化到同样的范围，比如把样本数据压缩到0-1

PCA/白化：用PCA降维，白化是对数据每个特征轴上的幅度归一化

CNN在图像上的处理往往只有去均值。

（2）卷积计算层/ CONV layer

它不再是全连接了，而是局部关联。每个神经元看做一个filter。通过对窗口(receptive field)做滑动操作，filter对局部数据计算

这里还有3个概念：

深度/depth，在这幅图中指的3，通常为图片的rgb3个颜色通道。

步长/stride：即窗口每次滑动多远

填充值/zero-padding：为了使滑动窗口正好滑动到边界，需要在周围填充0，padding等于几，就填充几圈

下面给一个卷积的具体例子：

这个例子的depth为2，因为只有两个filter，每个颜色通道的上都有一个3*3的滑动窗口，这个窗口里的值与filter里的w对应相乘，每个通道上都会得到一个值，把这3个值加起来就得出了Output Volume层值，Filter w0对输入输入卷积后得到output volume层的第一个矩阵，filter w2得到第二个。

另外，卷积层有一个特别重要的特点就是参数共享机制，即每个神经元连接数据窗的权重是固定的，可以这样理解参数共享机制：

固定每个神经元连接权重，可以看做模板，每个神经元只关注一个特性。

它带来的好处就是需要估算的权重个数减少，例如AlexNet网络从1亿个需要调节的参数减少到3.5w个。

（3）激励层 (ReLU)

把卷积层输出结果做非线性映射

常见的激励函数有：Sigmoid，Tanh(双曲正切)，ReLU，Leaky ReLU，ELU，Maxout

Sigmoid：最开始使用的，现在已经基本不用了，因为当x比较大时，它的输出值都比较接近于1，它的梯度是接近于0，而我们是要要利用梯度取做优化的，这将导致无法完成权重的优化

ReLU：比价常用的激励函数，它有收敛快，求梯度简单，较脆弱这些特点，较脆弱的原因是，当x的值小于0后，它任然会出现梯度为0 的结果。

Leaky ReLU

不会“饱和”/挂掉，计算也很快

指数线性单元ELU：所有ReLU有的优点都有，不会挂，输出均值趋于0，因为指数存在，计算量略大

Maxout：计算是线性的，不会饱和不会挂，多了好些参数，两条直线拼接

实际经验：

1）不要用sigmoid

2）首先试RELU，因为快，但要小心点

3）如果2失效，请用Leaky ReLU或者Maxout

4）某些情况下tanh倒是有不错的结果，但是很少

（4）池化层 / Pooling layer

它的位置一般是夹在连续的卷积层中间，作用是压缩数据和参数的量，减小过拟合

（5）全连接层 / FC layer

两层之间所有神经元都有权重连接，通常全连接层在卷积神经网络尾部。业界人解释放一个FC layer的主要目的是最大可能的利用现在经过窗口滑动和池化后保留下的少量的信息还原原来的输入信息

（6）CNN的一般结构可归结为：

1）INPUT 2) [[CONV -> RELU]*N -> POOL?]*M 3) [FC -> RELU]*K 或FC

2.典型的CNN

LeNet，这是最早用于数字识别的CNN；AlexNet，2012 ILSVRC比赛远超第2名的CNN，比LeNet更深，用多层小卷积层叠加替换单大卷积层；ZF Net，2013 ILSVRC比赛冠军；GoogLeNet，2014 ILSVRC比赛冠军；VGGNet，2014 ILSVRC比赛中的模型，图像识别略差于GoogLeNet，但是在很多图像转化学习问题(比如object detection)上效果奇好

3.fine-tuning

（1）何谓 fine-tuning：使用已用于其他目标，预训练好模型的权重或者部分权重，作为初始值开始训练

（2）为什么要fine-tuning:首先自己从头训练卷积神经网络容易出现问题，其次fine-tuning能很快收敛到一个较理想的状态

（3）怎么做：一般复用相同层的权重，新定义层取随机权重初始值，但要注意调大新定义层的的学习率，调小复用层学习率

三、CNN的常用框架

1.Caffe：源于Berkeley的主流CV工具包，支持C++,python,matlab，Model Zoo中有大量预训练好的模型供使用

2.TensorFlow：Google的深度学习框架，TensorBoard可视化很方便，数据和模型并行化好，速度快

3.Torch:Facebook用的卷积神经网络工具包,通过时域卷积的本地接口，使用非常直观,定义新网络层简单.

四、典型应用

1.图像识别与检索

2.人脸识别

3.性别/年龄/情绪识别

4.物体检测

五、CNN训练注意事项

1.用Mini-batch SGD对神经网络做训练的过程如下：

不断循环：

①  采样一个 batch 数据( ( 比如 32 张）

②前向计算得到损失 loss

③  反向传播计算梯度( 一个 batch）

④  用这部分梯度迭代更新权重参数

2.去均值

去均值一般有两种方式：第一种是在每个像素点都算出3个颜色通道上的平均值，然后对应减去，如AlexNet。第二种是在整个样本上就只得到一组数，不分像素点了，如VGGNet。

3.权重初始化

1）用均值为0的高斯函数，随机取一些点去初始化W。这种初始化方法对于层次不深的神经网络 OK，深层网络容易带来整个网络( 激活传递)的不对称性

2）当激励函数是sigmoid函数时，输入层神经元个数为input，输出层为output，则输出层的W可为input和output之间的一个数/input的平方根，公式如下

3）当激励函数变成当今最为流行ReLu函数时，以上方式又失效了，除以（input/2）的平方根是可以的

关于Batch Normalization：通常在全连接层后，激励层前做如下操作

它的作用是自动的约束输出不会发散，导致导致整个网络的训练死掉，具体的官方的好处有如下四点：

1）梯度传递（计算）更为顺畅

2）学习率设高一点也没关系

3）对于初始值的依赖减少了！！

4）其实这里也可以看做一种正则化，减少了对dropout的需求。

4.Dropout

这是一种防止过拟合的一种正则化方式，以前的正则化方式是在loss中加上所有的W，但在神经网络中不可行，因为w的数量太大了，不仅会使loss值很大，也会浪费很多时间在计算w的和上。简单的理解dropout就是别一次开启所有学习单元

一个最简单的实现方式可参照如下代码所示：

dropout一般在训练阶段使用，在测试或者预测时并不会去dropout，工业上的做法是在输入的X上乘以P得到X的期望，或者输入不做变化而是对所有的有dropout层都做X/p

dropout能防止过拟合的的理解方式：

理解一：别让你的神经网络记住那么多东西

理解二：每次都关掉一部分感知器，得到一个新模型，最后做融合

以上文章转自（https://www.cnblogs.com/softzrp/p/6724884.html）

五、Pytoch 可视化CNN

import torch
import torchvision

from torchvision.datasets import MNIST
from torch.utils.data import DataLoader
from torchvision import transforms

data_transform = transforms.ToTensor()
train_data = MNIST(root='./data', train=True,
                                   download=True, transform=data_transform)

test_data = MNIST(root='./data', train=False,
                                  download=True, transform=data_transform)
print('Train data, number of images: ', len(train_data))
print('Test data, numtorchvision.datasets.FashionMNISTber of images: ', len(test_data))

Train data, number of images:  60000
Test data, numtorchvision.datasets.FashionMNISTber of images:  10000

    In [2]: 
  

batch_size = 16
train_loader = DataLoader(train_data,batch_size=batch_size,shuffle=True)
test_loader = DataLoader(test_data,batch_size = batch_size,shuffle=True)

classes =['0','1','2','3','4','5','6','7','8','9']

import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline
    
# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(28, 4))
for idx in np.arange(batch_size):
    ax = fig.add_subplot(2, batch_size/2, idx+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[idx]), cmap='gray')
    ax.set_title(classes[labels[idx]])

    In [3]: 
  

import torch.nn as nn
import torch.nn.functional as F
core_size = 3
p=0.4
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 gray,10 deepth,core_size
        # input size = (1*28*28)
        self.conv1 = nn.Conv2d(1,10,core_size)
        #size = (10*26*26)
        self.pool = nn.MaxPool2d(2,2)
        # size =  10*13*13
        self.conv2 = nn.Conv2d(10,20,core_size)
        # size =  20*11*11
        # after pool size =  20*5*5
        self.fc1 = nn.Linear(20*5*5,100)
        self.fc1_drop = nn.Dropout(p=p)
        self.out = nn.Linear(100,10)
        
    # define the feedforward behavior
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = self.fc1_drop(x)
        x = self.out(x)
        # final output
        return x

# instantiate and print your Net
net = Net()
print(net)

Net (
  (conv1): Conv2d(1, 10, kernel_size=(3, 3), stride=(1, 1))
  (pool): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
  (conv2): Conv2d(10, 20, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear (500 -> 100)
  (fc1_drop): Dropout (p = 0.4)
  (out): Linear (100 -> 10)
)

    In [4]: 
  

import torch.optim as optim

criterion = nn.CrossEntropyLoss()

optimizer = optim.SGD(net.parameters(),lr = 0.001,momentum=0.9)

    In [5]: 
  

from torch.autograd import Variable

def train(n_epochs):
    
    for epoch in range(n_epochs):  # loop over the dataset multiple times

        running_loss = 0.0
        for batch_i, data in enumerate(train_loader):
            # get the input images and their corresponding labels
            inputs, labels = data

            # wrap them in a torch Variable
            inputs, labels = Variable(inputs), Variable(labels)        

            # zero the parameter (weight) gradients
            optimizer.zero_grad()

            # forward pass to get outputs
            outputs = net(inputs)

            # calculate the loss
            loss = criterion(outputs, labels)

            # backward pass to calculate the parameter gradients
            loss.backward()

            # update the parameters
            optimizer.step()

            # print loss statistics
            running_loss += loss.data[0]
            if batch_i % 1000 == 999:    # print every 1000 mini-batches
                print('Epoch: {}, Batch: {}, Avg. Loss: {}'.format(epoch + 1, batch_i+1, running_loss/1000))
                running_loss = 0.0

    print('Finished Training')

    In [6]: 
  

# define the number of epochs to train for
n_epochs = 5 # start small to see if your model works, initially

# call train
train(n_epochs)

Epoch: 1, Batch: 1000, Avg. Loss: 1.5169855401515961
Epoch: 1, Batch: 2000, Avg. Loss: 0.4941652193292975
Epoch: 1, Batch: 3000, Avg. Loss: 0.3443797050341964
Epoch: 2, Batch: 1000, Avg. Loss: 0.24720131688704713
Epoch: 2, Batch: 2000, Avg. Loss: 0.21661910496046766
Epoch: 2, Batch: 3000, Avg. Loss: 0.18730598537158222
Epoch: 3, Batch: 1000, Avg. Loss: 0.15680811562226155
Epoch: 3, Batch: 2000, Avg. Loss: 0.16085775446286424
Epoch: 3, Batch: 3000, Avg. Loss: 0.13771982360375115
Epoch: 4, Batch: 1000, Avg. Loss: 0.1270391019823146
Epoch: 4, Batch: 2000, Avg. Loss: 0.12446171787852654
Epoch: 4, Batch: 3000, Avg. Loss: 0.11772122978675179
Epoch: 5, Batch: 1000, Avg. Loss: 0.10325256000779336
Epoch: 5, Batch: 2000, Avg. Loss: 0.10139964807790239
Epoch: 5, Batch: 3000, Avg. Loss: 0.09845059596118517
Finished Training

    In [7]: 
  

# initialize tensor and lists to monitor test loss and accuracy
test_loss = torch.zeros(1)
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

# set the module to evaluation mode
net.eval()

for batch_i, data in enumerate(test_loader):
    
    # get the input images and their corresponding labels
    inputs, labels = data
    
    # wrap them in a torch Variable
    # volatile means we do not have to track how the inputs change
    inputs, labels = Variable(inputs, volatile=True), Variable(labels, volatile=True)
    
    # forward pass to get outputs
    outputs = net(inputs)

    # calculate the loss
    loss = criterion(outputs, labels)
            
    # update average test loss 
    test_loss = test_loss + ((torch.ones(1) / (batch_i + 1)) * (loss.data - test_loss))
    
    # get the predicted class from the maximum value in the output-list of class scores
    _, predicted = torch.max(outputs.data, 1)
    
    # compare predictions to true label
    correct = np.squeeze(predicted.eq(labels.data.view_as(predicted)))
    
    # calculate test accuracy for *each* object class
    for i in range(batch_size):
        label = labels.data[i]
        class_correct[label] += correct[i]
        class_total[label] += 1

print('Test Loss: {:.6f}\n'.format(test_loss.numpy()[0]))

for i in range(10):
    if class_total[i] > 0:
        print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
            classes[i], 100 * class_correct[i] / class_total[i],
            np.sum(class_correct[i]), np.sum(class_total[i])))
    else:
        print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))

        
print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (
    100. * np.sum(class_correct) / np.sum(class_total),
    np.sum(class_correct), np.sum(class_total)))

Test Loss: 0.057502

Test Accuracy of     0: 99% (971/980)
Test Accuracy of     1: 99% (1125/1135)
Test Accuracy of     2: 97% (1006/1032)
Test Accuracy of     3: 97% (986/1010)
Test Accuracy of     4: 98% (966/982)
Test Accuracy of     5: 98% (876/892)
Test Accuracy of     6: 98% (945/958)
Test Accuracy of     7: 98% (1010/1028)
Test Accuracy of     8: 97% (946/974)
Test Accuracy of     9: 96% (974/1009)

Test Accuracy (Overall): 98% (9805/10000)

    In [9]: 
  

# obtain one batch of test images
dataiter = iter(test_loader)
images, labels = dataiter.next()
# get predictions
preds = np.squeeze(net(Variable(images, volatile=True)).data.max(1)[1].numpy())
images = images.numpy()

# plot the images in the batch, along with predicted and true labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(batch_size):
    ax = fig.add_subplot(2, batch_size/2, idx+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[idx]), cmap='gray')
    ax.set_title("{} ({})".format(classes[preds[idx]], classes[labels[idx]]),
                 color=("green" if preds[idx]==labels[idx] else "red"))

Save Model

    In [11]: 
  

model_dir = './saved_models/'
model_name = 'model_1.pt'
torch.save(net.state_dict(), model_dir+model_name)

Load Model

    In [12]: 
  

net = Net()
net.load_state_dict(torch.load('saved_models/model_1.pt'))
print(net)

Net (
  (conv1): Conv2d(1, 10, kernel_size=(3, 3), stride=(1, 1))
  (pool): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
  (conv2): Conv2d(10, 20, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear (500 -> 100)
  (fc1_drop): Dropout (p = 0.4)
  (out): Linear (100 -> 10)
)

    In [16]: 
  

# Get the weights in the first conv layer
weights = net.conv2.weight.data
w = weights.numpy()

# for 10 filters
fig=plt.figure(figsize=(20, 8))
columns = 5
rows = 2
for i in range(0, columns*rows):
    fig.add_subplot(rows, columns, i+1)
    plt.imshow(w[i][0], cmap='gray')
    
print('First convolutional layer')
plt.show()

weights = net.conv2.weight.data
w = weights.numpy()

First convolutional layer

    In [17]: 
  

# obtain one batch of testing images
dataiter = iter(test_loader)
images, labels = dataiter.next()
images = images.numpy()

# select an image by index
idx = 3
img = np.squeeze(images[idx])

# Use OpenCV's filter2D function 
# apply a specific set of filter weights (like the one's displayed above) to the test image

import cv2
plt.imshow(img, cmap='gray')

weights = net.conv1.weight.data
w = weights.numpy()

# 1. first conv layer
# for 10 filters
fig=plt.figure(figsize=(30, 10))
columns = 5*2
rows = 2
for i in range(0, columns*rows):
    fig.add_subplot(rows, columns, i+1)
    if ((i%2)==0):
        plt.imshow(w[int(i/2)][0], cmap='gray')
    else:
        c = cv2.filter2D(img, -1, w[int((i-1)/2)][0])
        plt.imshow(c, cmap='gray')
plt.show()

    In [18]: 
  

# Same process but for the second conv layer (20, 3x3 filters):
plt.imshow(img, cmap='gray')

# second conv layer, conv2
weights = net.conv2.weight.data
w = weights.numpy()

# 1. first conv layer
# for 20 filters
fig=plt.figure(figsize=(30, 10))
columns = 5*2
rows = 2*2
for i in range(0, columns*rows):
    fig.add_subplot(rows, columns, i+1)
    if ((i%2)==0):
        plt.imshow(w[int(i/2)][0], cmap='gray')
    else:
        c = cv2.filter2D(img, -1, w[int((i-1)/2)][0])
        plt.imshow(c, cmap='gray')
plt.show()