深度学习之CNN卷积网络

最新推荐文章于 2024-07-09 10:05:30 发布

谷雨白

最新推荐文章于 2024-07-09 10:05:30 发布

阅读量979

点赞数 3

分类专栏：笔记文章标签：深度学习

本文链接：https://blog.csdn.net/weixin_45739665/article/details/115053448

版权

笔记专栏收录该内容

32 篇文章 3 订阅

订阅专栏

一、图像卷积运算

1.1定义

图像的卷积（convolution）运算，即通过对图像矩阵与滤波器矩阵进行对应相乘再求和运算，可实现图像中特定轮廓特征的快速搜索

1.2核心

寻找适合的轮廓过滤器：计算机根据训练图片及其类别，自动寻找合适的轮廓过滤器，再使用该过滤器去寻找图像轮廓用于判断新图片所属类别
最小化输出损失函数的过程，也是寻找合适w的过程，即寻找合适的过滤器

1.3彩色图的卷积运算

RGB图像的卷积：对R/G/B三个通道分别求卷积再相加
红绿蓝三通道图片，经过一个过滤器的卷积运算后，变成一个通道

二、池化（Pooling）

思考：一张图片中很多信息是不重要甚至重复多余的，使用所有像素点数据会增加运算量、可能导致过拟合、降低模型的容错性

2.1定义

池化：也称为欠采样或下采样，指按照一定的规则对图像矩阵进行处理，实现信息压缩与数据降维，减少过拟合，同时提高模型的容错性

2.2池化的方法

最大法池化（Max-pooling）：选择窗口中最大的数值[pooling_size:(2,2)窗口的大小，stride:(2,2)移动的步长]
平均法池化（Avg-pooling）:取平均值

三、卷积神经网络（Convolution Neural Network）

3.1核心

把卷积、池化、MLP先后连接在一起，组成一个能够高效提取图像重要信息的神经网络
卷积之后、池化之前会增加激活函数以筛选保留重要信息，通常选用Relu函数（使部分神经元为0，过滤噪音信息，防止过拟合；迭代快、易于求解）

3.2卷积神经网络两大特点

参数共享（parameter sharing）:同一个特征过滤器可用于整张图片
稀疏连接（sparsity of connections）:生成的特征图片每个节点只与原图片中特定节点连接

四、图像填充

4.1图像卷积的两个常见问题

边缘信息使用少，容易被忽略
图像被压缩，信息丢失

4.2图像填充核心

通过在图像周边添加新的像素，使边缘位置图像信息得到更多的利用
通过padding增加像素的数量，由过滤器尺寸与stride决定

五、经典的CNN模型

在这里插入图片描述

5.1LeNet-5

处理简单灰度图像
输入图像：32X32灰度图，1个通道
训练参数：约60000个
filter均为5*5（S=1）、池化为avg pool（f=2,s=2），卷积与池化先后成对使用
随着网络越深，图像的高度和宽度在缩小，通道数在增加

5.2AlexNet

输入图像：227X227X3 RGB图，3个通道
训练参数：约60000000个
引入图像填充，采用max pool
更为复杂的结构，能够提取出更丰富的特征
适用于识别较为复杂的彩色图，可识别1000种类别

5.3VGG-16

输入图像：224X224X3 RGB图，3个通道
训练参数：约138000000个
filter均为3*3，步长为1，卷积前均使用了图像填充padding（same convolution）;
池化均为max pool，窗口尺寸均为2*2，步长都是2
使用更多的filter提取轮廓信息，进一步提高识别准确率

5.4如何做

参考经典的CNN结构，将其核心思想运用到新模型设计
使用经典的CNN模型结构提取图像重要轮廓，再建立MLP模型
加载经典的CNN模型，剥除其FC层，用于提取图像的重要轮廓信息
把经过模型处理后的数据作为输入，分类结果为输出，建立MLP模型
模型训练，寻找图片不同类别对应的关键信息

六、任务

6.1基于数据，建立cnn模型，实现猫狗识别

通过ImageDataGenerator模块，实现本地图片批量加载
查看数据基本结构，可视化加载后的样本图片
建模并训练模型（迭代20次），计算训练集、测试集准确率
对提供的1-9猫/狗的图片，进行预测

# -*- coding: utf-8 -*-
# In[]
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# In[]
from keras.preprocessing.image import ImageDataGenerator

#图像增强/预处理配置（数据归一化、缩放、旋转、平移等）

#归一化
train_datagen=ImageDataGenerator(rescale=1./255)

#加载图像：
training_set=train_datagen.flow_from_directory('./CNN/task1_data/training_set',target_size=(50,50),batch_size=32,class_mode='binary')
#batch_size：每次取的个数


# In[]
#查看数据类型
print(type(training_set))
#每次批次样本的数量
print(training_set.batch_size)
#加载样本的名称
# print(training_set.filenames)
#training_set数据提取，第一次[]确定第几批次，第二个[]确定输入的图像数据x或者是结果标签y，第三个[]检索对应批次第几个样本数据
print(training_set[0][1])

#第一批次所有样本的图像数据维度确认
print(training_set[0][0].shape)

#可视化第一批次第一个样本
fig1=plt.figure()
plt.imshow(training_set[0][0][0,:,:,:])

# In[]
#加载后按批次存放的每个样本对应的索引号
print(training_set.index_array)

#获取文件名称
print(training_set.filenames[3172])
# In[]
#确认输入数据标签
print(training_set.class_indices)

#加载后按批次存放的每个样本对应原始数据序号
print(training_set.filenames[4917])

# In[]
#建立CNN模型
from keras.models import Sequential
from keras.layers import Conv2D,MaxPooling2D,Flatten,Dense

cnn_model=Sequential()

#卷积层
cnn_model.add(Conv2D(32,(3,3),input_shape=(50,50,3),activation='relu'))

#池化层
cnn_model.add(MaxPooling2D(pool_size=(2,2)))


# In[]
#第二个卷积、池化层
cnn_model.add(Conv2D(32,(3,3),activation='relu'))
cnn_model.add(MaxPooling2D(pool_size=(2,2)))

#Flattening层
cnn_model.add(Flatten())

#全连接层
cnn_model.add(Dense(units=128,activation='relu'))
cnn_model.add(Dense(units=1,activation='sigmoid'))
          



# In[]
#模型求解参数设置
cnn_model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
cnn_model.summary()

# In[]
#模型训练
cnn_model.fit_generator(training_set,epochs=20)

# In[]
#训练数据集整体样本预测准确率
accuracy_train=cnn_model.evaluate_generator(training_set)
print(accuracy_train)

# In[]
#模型存储
cnn_model.save('CatDog_model_1.h5')
# In[]
#模型加载
from keras.models import load_model
model_new=load_model('CatDog_model_1.h5')

# In[]
#测试数据集预测准确率
test_set=train_datagen.flow_from_directory('./CNN/task1_data/test_set',target_size=(50,50),batch_size=32,class_mode='binary')


accuracy_test=model_new.evaluate_generator(test_set)
print(accuracy_test)

# In[]
#单张图片加载与结果预测
from keras.preprocessing.image import load_img,img_to_array

pic_1='1.png'
pic_1=load_img(pic_1,target_size=(50,50))
print(pic_1)
pic_1=img_to_array(pic_1)
pic_1=pic_1/255
pic_1=pic_1.reshape(1,50,50,3)
print(pic_1.shape)
result=model_new.predict_classes(pic_1)
print('dog' if result==1 else 'cat')

fig10=plt.figure()



# In[]
fig2=plt.figure()
plt.imshow(pic_1[0])


# In[]
#本地9张图片处理
a =[i for i in range(1,10)]
fig3=plt.figure(figsize=(10,10))
for i in a:
    pic_name=str(i)+'.png'
    pic_i=load_img(pic_name,target_size=(50,50))
    pic_i=img_to_array(pic_i)
    pic_i=pic_i/255
    pic_i=pic_i.reshape(1,50,50,3)
    result=model_new.predict_classes(pic_i)
    
    plt.subplot(3,3,i)
    plt.imshow(pic_i[0])
    plt.title('dog' if result==1 else 'cat')
    
    plt.xticks([])
    plt.yticks([])

在这里插入图片描述

6.2基于数据,利用VGG16结构，提高猫狗识别的准确率

对单张图片，利用VGG16提取图像特征
对所有图片，利用VGG16进行特征提取，并把数据分为训练数据、测试数据两部分
对提取特征后的数据建立mlp模型，进行模型训练，计算模型在训练、测试数据集的准确率
对提供的1-9猫\狗模型，进行预测，将结果与任务一的结果进行对比
备注：数据分离参数，test_size=0.2,random_state=0
mlp模型只有一个隐藏层（10个神经元），激活函数’relu’

#单张图片加载
from keras.preprocessing.image import load_img,img_to_array
img_path='1.png'
img=load_img(img_path,target_size=(224,224))
print(type(img))

# In[]
#图像可视化
import matplotlib.pyplot  as plt
fig1=plt.figure()
plt.imshow(img)

# In[]
#将图片转化为数组格式类型
img=img_to_array(img)
print(img.shape)

# In[]
#维度转化
from keras.applications.vgg16 import preprocess_input
import numpy as np

x=np.expand_dims(img,axis=0)
# x=img.reshape(1,224,224,3)
x=preprocess_input(x)
print(x.shape)

# In[]

#特征提取
from keras.applications.vgg16 import VGG16
#载入VGG16结构（去掉全连接和输出层）
model_vgg=VGG16(weights='imagenet',include_top=False)
feature=model_vgg.predict(x)
print(feature.shape)

#flatten展开
feature=feature.reshape(1,7*7*512)
print(feature.shape)

# In[]
#批量提取图片特征
from keras.preprocessing.image import img_to_array,load_img
from keras.applications.vgg16 import VGG16
from keras.applications.vgg16 import preprocess_input
import numpy as np

model_vgg = VGG16(weights='imagenet', include_top=False)
#define a method to load and preprocess the image
def modelProcess(img_path,model):
    img = load_img(img_path, target_size=(224, 224))
    img = img_to_array(img)
    x = np.expand_dims(img,axis=0)
    x = preprocess_input(x)
    x_vgg = model.predict(x)
    x_vgg = x_vgg.reshape(1,25088)
    return x_vgg
#list file names of the training datasets
import os
folder = "task2_data/cats"
dirs = os.listdir(folder)
#generate path for the images
img_path = []
for i in dirs:                             
    if os.path.splitext(i)[1] == ".jpg":   
        img_path.append(i)
img_path = [folder+"//"+i for i in img_path]

#preprocess multiple images
features1 = np.zeros([len(img_path),25088])
for i in range(len(img_path)):
    feature_i = modelProcess(img_path[i],model_vgg)
    print('preprocessed:',img_path[i])
    features1[i] = feature_i
    
folder = "task2_data/dogs"
dirs = os.listdir(folder)
img_path = []
for i in dirs:                             
    if os.path.splitext(i)[1] == ".jpg":   
        img_path.append(i)
img_path = [folder+"//"+i for i in img_path]
features2 = np.zeros([len(img_path),25088])
for i in range(len(img_path)):
    feature_i = modelProcess(img_path[i],model_vgg)
    print('preprocessed:',img_path[i])
    features2[i] = feature_i
    
#label the results
print(features1.shape,features2.shape)
y1 = np.zeros(300)
y2 = np.ones(300)

#generate the training data
X = np.concatenate((features1,features2),axis=0)
y = np.concatenate((y1,y2),axis=0)
y = y.reshape(-1,1)
print(X.shape,y.shape)
 # In[]
# 数据分离
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)
# In[]
#建立mlp模型
from keras.models import Sequential
from keras.layers import Dense
model=Sequential()
model.add(Dense(units=10,activation='relu',input_dim=25088)) 
model.add(Dense(units=1,activation='sigmoid'))
model.summary()





# In[]
#参数配置与训练
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
model.fit(X_train,y_train,epochs=50)


# In[]
from sklearn.metrics import accuracy_score
y_train_predict=model.predict_classes(X_train)
accuracy_train=accuracy_score(y_train,y_train_predict)
print(accuracy_train)
# In[]
#图片加载 图片格式化 数据预处理 VGG16特征信息提取 模型预测
pic1='1.png'
pic1=load_img(pic1,target_size=(224,224))
pic1_array=img_to_array(pic1)
pic1_array=np.expand_dims(pic1_array,axis=0)
pic_array=preprocess_input(pic1_array)
pic1_features=model_vgg.predict(pic1_array)
pic1_feature=pic1_features.reshape(1,7*7*512)

#MLP模型预测
result=model.predict_classes(pic1_feature)
print('dog' if result==1 else 'cat')

fig3=plt.figure()
plt.imshow(pic1)

# In[]
#本地9张图片处理
a =[i for i in range(1,10)]
fig3=plt.figure(figsize=(10,10))
for i in a:
    pic_name=str(i)+'.png'
    
    pic1=pic_name
    pic1=load_img(pic1,target_size=(224,224))
    pic1_array=img_to_array(pic1)
    pic1_array=np.expand_dims(pic1_array,axis=0)
    pic_array=preprocess_input(pic1_array)
    pic1_features=model_vgg.predict(pic1_array)
    pic1_feature=pic1_features.reshape(1,7*7*512)
    result=model.predict_classes(pic1_feature)
    
    plt.subplot(3,3,i)
    plt.imshow(pic1)
    plt.title('dog' if result==1 else 'cat')
    
    plt.xticks([])
    plt.yticks([])
    



# In[]
model.save('CatDog_model_VGG.h5')