深度学习用于计算机视觉(2)

最新推荐文章于 2024-08-16 14:48:24 发布

MrUncle德鲁

最新推荐文章于 2024-08-16 14:48:24 发布

阅读量441

点赞数 2

分类专栏：机器学习

本文链接：https://blog.csdn.net/FANGLICHAOLIUJIE/article/details/96863418

版权

机器学习专栏收录该内容

13 篇文章 1 订阅

订阅专栏

使用预训练网络

在小数据集上使用深度学习，除了自己从头开始训练，另一种高效的方法是使用别人在大型数据集上预训练好的模型。

from keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
                 include_top=False,  #模型中是否包含全连接的分类器
                 input_shape=(150,150,3))

Using TensorFlow backend.

conv_base.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 150, 150, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 150, 150, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 150, 150, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 75, 75, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 75, 75, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 75, 75, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 37, 37, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 37, 37, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 37, 37, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 37, 37, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 18, 18, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 18, 18, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 18, 18, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 18, 18, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 9, 9, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 4, 4, 512)         0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________

可以看出最后输出的特征图的大小是（4，4，512），下面需要在这个特征上添加一个密集连接的分类器，有两种方法：
- A：在自己的数据集上运行卷积基，将输出的结果保存在本地硬盘的numpy数组中，然后用这些数据作为输入，输入到一个独立的全连接分类器中。
  - 特点：速度快，计算代价低，但是不可以使用数据增强技术
- B：在conv_base模型的基础上直接添加Dense层，扩展已有的模型，并在输入数据上端到端的运行整个模型。
  - 特点：可以使用数据增强技术，但是计算的代价较高。

1、A方案

import os
import numpy as np
from keras.preprocessing.image import ImageDataGenerator

base_dir = r"F:\Data_Set"
train_dir = os.path.join(base_dir, 'train_dir')
validation_dir = os.path.join(base_dir, 'validation_dir')
test_dir = os.path.join(base_dir, 'test_dir')

datagen = ImageDataGenerator(rescale=1./255)
batch_size = 20

def extract_features(directory, sample_count):
    features = np.zeros(shape=(sample_count, 4,4,512))
    labels = np.zeros(shape=(sample_count))
    generator = datagen.flow_from_directory(directory,
                                           target_size=(150,150),
                                           batch_size=batch_size,
                                           class_mode="binary")
    i = 0
    for input_batch, labels_batch in generator:
        features_batch = conv_base.predict(input_batch)
        features[i * batch_size:(i + 1) * batch_size] = features_batch
        labels[i * batch_size:(i+1) * batch_size] = labels_batch
        i += 1
        if i* batch_size >= sample_count:
            break
    return features, labels

train_features, train_labels = extract_features(train_dir, 2000)
validation_features, validation_labels = extract_features(validation_dir, 1000)
test_features, test_labels = extract_features(test_dir, 1000)

Found 2000 images belonging to 2 classes.

上面的代码可能是因为电脑内存的原因导致运行不成功
现在假设上面的代码已经将输入的特征提取出来了，其大小为（samples, 4,4,512）
我们将其flatten，训练一个二分类的分类器

train_features = np.reshape(train_features, (2000, 4,4,512))
validation_features = np.reshape(validation_features,(1000,4,4,512))
test_features = np.reshape(test_features, (1000,4,4512))

上面的代码相当于为我们准备好了输入的特征数据
搭建分类模型

from keras import models
from keras import layers
from keras import optimizers

model = models.Sequential()
model.add(layers.Dense(256, activation='relu', input_dim=4*4*512))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1,activation='sigmoid'))


model.compile(optimizer=optimizers.RMSprop(lr=2e-5),
             loss='binary_crossentropy',
             metrics=['acc'])

history = model.fit(train_features, train_labels,
                   epochs=30,
                   batch_size=20,
                   validation_data=(validation_features, validation_labels))

2、B方案

from keras import models
from keras import layers
model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
vgg16 (Model)                (None, 4, 4, 512)         14714688  
_________________________________________________________________
flatten_1 (Flatten)          (None, 8192)              0         
_________________________________________________________________
dense_5 (Dense)              (None, 256)               2097408   
_________________________________________________________________
dense_6 (Dense)              (None, 1)                 257       
=================================================================
Total params: 16,812,353
Trainable params: 16,812,353
Non-trainable params: 0
_________________________________________________________________

可以看到VGG16的参数非常多，在编译和训练模型之前，一定要冻结卷积基。
冻结：是指一个层或者多个层在训练的过程中保持其权重不变。

print("在冻结之前可训练的权重张量的个数： ",len(model.trainable_weights))

conv_base.trainable = False
print("冻结之后可训练的权重张量的个数： ", len(model.trainable_weights))

在冻结之前可训练的权重张量的个数：  30
冻结之后可训练的权重张量的个数：  4

可以看到在冻结之后，只有添加的两个全连接层的权重才会被训练。每层两个权重张量。
接下来利用冻结的卷积基端到端的训练模型

from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers

train_datagen = ImageDataGenerator(rescale=1./255,
                                  rotation_range = 40,
                                  width_shift_range=0.2,
                                  height_shift_range=0.2,
                                  shear_range=0.2,
                                  zoom_range=0.2,
                                  horizontal_flip=True,
                                  fill_mode="nearest")
tset_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(train_dir,
                                                   target_size=(150,150),
                                                   batch_size=20,
                                                   class_mode='binary')
validation_generator = test_datagen.flow_from_directory(validation_dir,
                                                   target_size=(150,150),
                                                   batch_size=20,
                                                   class_mode='binary')

model.compile(loss='binary_crossentropy',
             optimizer=optimizers.RMSprop(lr=2e-5),
             metrics=['acc'])

history = model.fit_generator(train_generator,
                             step_per_epoch=100,
                             epochs=30,
                             validation_data=validation_generator,
                             validation_steps=50)

卷积神经网络的可视化

可视化中间激活： 对于指定的输入，展示网络中各个卷积层和池化层的输出特征图（层的输出通常被称为该层的激活，即激活函数的输出）

from keras.models import load_model
model = load_model("model_weights/cats_and_dogs_samll_2.h5")
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_5 (Conv2D)            (None, 148, 148, 32)      896       
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 72, 72, 64)        18496     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 36, 36, 64)        0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 34, 34, 128)       73856     
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 17, 17, 128)       0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 15, 15, 128)       147584    
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 7, 7, 128)         0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 6272)              0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 6272)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 512)               3211776   
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 513       
=================================================================
Total params: 3,453,121
Trainable params: 3,453,121
Non-trainable params: 0
_________________________________________________________________

# 使用一张图像
img_path = r"F:\Data_Set\test_dir\test_cats_dir\cat.1502.jpg"
from keras.preprocessing import image
import numpy as np
img = image.load_img(img_path, target_size=(150,150))
img_tensor = image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)
img_tensor /= 255.
print(img_tensor.shape)

(1, 150, 150, 3)

import  matplotlib.pyplot  as plt
plt.imshow(img_tensor[0])
plt.show()

在这里插入图片描述

from keras import models
layer_outputs = [layer.output for layer in model.layers[:8]]# 提取前8层的输出
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)# 创建一个模型，给定一个输入，可以返回各层的输出
activations = activation_model.predict(img_tensor) # 返回8个numpy数组组成的list，每层的输出对应一个数组

first_layer_activation = activations[0]
print(first_layer_activation.shape)

(1, 148, 148, 32)

plt.matshow(first_layer_activation[0,:,:,4], cmap='viridis')

<matplotlib.image.AxesImage at 0x1ef74c397f0>

在这里插入图片描述

plt.matshow(first_layer_activation[0,:,:,10], cmap='viridis')

<matplotlib.image.AxesImage at 0x1ef88121588>

在这里插入图片描述

接下来展示所有的特征图

#获取层的名称
layer_names = []
for layer in model.layers[:8]:
    layer_names.append(layer.name)

images_per_row = 16
for layer_name, layer_activation in zip(layer_names, activations):
    n_features = layer_activation.shape[-1]
    size = layer_activation.shape[1]
    n_cols = n_features // images_per_row
    display_grid = np.zeros((size * n_cols, images_per_row * size ))
    
    for col in range(n_cols):
        for row in range(images_per_row):
            channel_image = layer_activation[0, :, :, col * images_per_row + row]
            channel_image -= channel_image.mean()
            channel_image /= channel_image.std()
            channel_image *= 64
            channel_image += 128
            channel_image = np.clip(channel_image, 0,255).astype('uint8')
            
            display_grid[col * size : (col + 1) * size, 
                         row * size : (row + 1) * size] = channel_image
            
        scale = 1./size
        plt.figure(figsize=(scale * display_grid.shape[1], 
                            scale * display_grid.shape[0]))
        plt.title(layer_name)
        plt.grid(False)
        plt.imshow(display_grid,aspect='auto',cmap='viridis')

D:\Anaconda3\envs\GPU\lib\site-packages\matplotlib\pyplot.py:513: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  max_open_warning, RuntimeWarning)

在这里插入图片描述

layer_names

['conv2d_5',
 'max_pooling2d_5',
 'conv2d_6',
 'max_pooling2d_6',
 'conv2d_7',
 'max_pooling2d_7',
 'conv2d_8',
 'max_pooling2d_8']

显示激活的热力图

from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input,decode_predictions
import numpy as np


img_path = r"F:\Data_Set\elephant.jpg"
img = image.load_img(img_path, target_size=(224,224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
print(x.shape)

Using TensorFlow backend.


(1, 224, 224, 3)

model = VGG16(weights='imagenet')
preds = model.predict(x)
print(decode_predictions(preds, top=3)[0])

[('n02504458', 'African_elephant', 0.8086415), ('n01871265', 'tusker', 0.17776239), ('n02504013', 'Indian_elephant', 0.013450568)]

np.argmax(preds[0])

下面展示图像中的哪些部分最像非洲象,使用Grad_CAM

import keras.backend as K
import matplotlib.pyplot as plt
african_elephant_output = model.output[:, 386]
last_conv_layer = model.get_layer('block5_conv3')
grads = K.gradients(african_elephant_output, last_conv_layer.output)[0]

pooled_grads = K.mean(grads, axis=(0,1,2))
iterate = K.function([model.input],
                    [pooled_grads, last_conv_layer.output[0]])
pooled_grads_value, conv_layer_output_value = iterate([x])

for i in range(512):
    conv_layer_output_value[:,:,i] *= pooled_grads_value[i]

heatmap = np.mean(conv_layer_output_value, axis=-1)
heatmap = np.maximum(heatmap, 0)
heatmap /= np.max(heatmap)
plt.matshow(heatmap)

<matplotlib.image.AxesImage at 0x11dace6beb8>

在这里插入图片描述

最后使用opencv生成一张图像将原始图像添加在刚得到的热力图上

import cv2
img = cv2.imread(img_path)
print(img.shape)
print(heatmap.shape)
heatmap = np.uint8(255 * heatmap)
heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))

heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)
superimposed_img = heatmap * 0.4 + img

cv2.imwrite(r"F:\Data_Set\elephant_heatmap.jpg", superimposed_img)

(443, 666, 3)
(443, 666, 3)

True

在这里插入图片描述

补充：完整的VGG16模型结构

model2=VGG16()

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5
553467904/553467096 [==============================] - 522s 1us/step

model2.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000   
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________