可视化类激活的热力图(keras实现)
前言
Grad_CAM:英语的全称是Gradient-weighted Class Activation Mapping,直接翻译是【梯度加权分类激活映射,简单说就是用CNN做图像分类的时候,到底是根据图像的哪里来判断属于这个分类的,给明确映射出来。
经典论文:Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
下载链接:https://arxiv.org/abs/1610.02391
实现流程
-
求图像经过特征提取后最后一次卷积后得到的特征图(也就是VGG16 conv5_3的特征图(7x7x512))
-
512张feature map在全连接层分类的权重肯定不同,利用反向传播求出每张特征图的权重。注意cam和Grad-cam的不同就在于求每张特征图权重的方式。其他流程都一样
-
用每张特征图乘以权重得到带权重的特征图(7x7x512),在第三维求均值得到7x7的map(np.mean(axis=-1)),relu激活,归一化处理(避免有些值不在0-255范围内)。
该步最重要的是relu激活(relu只保留大于0的值),relu后只保留该类别有用的特征。正数认为是该类别有用的特征,负数是其他类别的特征(或无用特征)。如下图,假设某类别最后加权后为0.8965,类别值越大则是该类别的概率就越高,那么属于该类别的特征既为wx值大于0的特征。小于0的特征可能是其他类的特征。通俗理解,假如图像中出现一个猫头,那么该特征在猫类别中为正特征,在狗类别中为负特征,要增加猫的置信度,降低狗的置信度。
w
1
x
1
+
w
2
x
2
+
⋯
+
w
n
x
n
=
0.8965
w_1x_1 + w_2x_2 + \dots + w_nx_n=0.8965
w1x1+w2x2+⋯+wnxn=0.8965
假如不加relu激活的话,heatmap代表着多类别的特征,论文中是这样概述的:如果没有relu,定位图谱显示的不仅仅是某一类的特征。而是所有类别的特征。
- 将处理后的heatmap放缩到图像尺寸大小,便于与图像加权
代码详解
添加依赖库
import os, cv2, random
import numpy as np
# 画图工具
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from keras.models import Sequential
from keras.layers import Input, Dense, Conv2D, MaxPool2D , GlobalAveragePooling2D, GlobalMaxPooling2D
from keras.optimizers import Adam
from keras.callbacks import Callback, EarlyStopping, TensorBoard, ModelCheckpoint, ReduceLROnPlateau
from keras.applications.vgg16 import VGG16
from keras.models import Model
from keras.models import load_model
from keras.utils import np_utils
from keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from keras.preprocessing import image
from keras import backend as K
K.set_image_data_format('channels_last') # 数据格式data_format设置为 NHWC
from PIL import Image
加载模型
saved_model = load_model("./output/vgg16_1.h5")
saved_model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 512, 512, 64) 1792
_________________________________________________________________
conv2d_2 (Conv2D) (None, 512, 512, 64) 36928
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 256, 256, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 256, 256, 128) 73856
_________________________________________________________________
conv2d_4 (Conv2D) (None, 256, 256, 128) 147584
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 128, 128, 128) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 128, 128, 256) 295168
_________________________________________________________________
conv2d_6 (Conv2D) (None, 128, 128, 256) 590080
_________________________________________________________________
conv2d_7 (Conv2D) (None, 128, 128, 256) 590080
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 64, 64, 256) 0
_________________________________________________________________
conv2d_8 (Conv2D) (None, 64, 64, 512) 1180160
_________________________________________________________________
conv2d_9 (Conv2D) (None, 64, 64, 512) 2359808
_________________________________________________________________
conv2d_10 (Conv2D) (None, 64, 64, 512) 2359808
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 32, 32, 512) 0
_________________________________________________________________
conv2d_11 (Conv2D) (None, 32, 32, 512) 2359808
_________________________________________________________________
conv2d_12 (Conv2D) (None, 32, 32, 512) 2359808
_________________________________________________________________
conv2d_13 (Conv2D) (None, 32, 32, 512) 2359808
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 16, 16, 512) 0
_________________________________________________________________
global_average_pooling2d_1 ( (None, 512) 0
_________________________________________________________________
dense_1 (Dense) (None, 7) 3591
=================================================================
Total params: 14,718,279
Trainable params: 14,718,279
Non-trainable params: 0
_________________________________________________________________
定义源路径和目标路径
name = '299-37-type7.jpg'
img_path = './Clip_sample/' + name
save_path = './heatmaps/' + name
加载图像
# 加载图像并拓展成(1, 512, 512, 3)
img = image.load_img(img_path, target_size=(512,512))
img = np.asarray(img)
plt.imshow(img)
img = np.expand_dims(img, axis=0)
预测图片类别并获取特征图
output = saved_model.predict(img)
predict = np.array(output[0])
# 获取预测向量中的推测元素、输出特征图
heisemei_output = saved_model.output[:, np.argmax(predict)]
last_conv_layer = saved_model.get_layer('conv2d_13') # 最后一个卷积
print(np.argmax(predict))
6
# 计算相对梯度,并对前三个维度取平均值
grads = K.gradients(heisemei_output, last_conv_layer.output)[0] # 计算特征图与预测元素之间的梯度
pooled_grads = K.mean(grads, axis=(0, 1, 2))
# 对于给定的样本图像, pooled_grads和最终的输出特征图
iterate = K.function([saved_model.input], [pooled_grads, last_conv_layer.output[0]])
pooled_grads_value, conv_layer_output_value = iterate([img])
for i in range(512):
conv_layer_output_value[:, :, i] *= pooled_grads_value[i]
# 得到特征图的逐通道平均值,即为类激活的热力图
heatmap = np.mean(conv_layer_output_value, axis=-1)
# print(heatmap)
特征图可视化
# 将热力图标准化到0-1之间, 并可视化
heatmap = np.maximum(heatmap, 0)
heatmap /= np.max(heatmap)
plt.imshow(heatmap)
热力图与原图进行叠加
# 重新读取原图像
img = cv2.imread(img_path)
# 将热力图的大小调整与原图一致
heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))
# 将热力图转换为RGB格式
heatmap = np.uint8(255 * heatmap)
# 将热利用应用于原始图像
heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)
# 这里的热力图因子是0.4
superimposed_img = heatmap * 0.2 +img
print(superimposed_img.shape)
(512, 512, 3)
存储热力图
# 保存图像
cv2.imwrite(save_path, superimposed_img)
print("success!")
success!
最终结果
思考总结
如果能够利用CAM对神经网络的关注机制进行理解,是不是能够更快地帮助人们找到重点,理解一些原本不易发现的内容,也许吧??