Learning Deep Features for Discriminative Localization

qq_52771580

已于 2024-08-23 18:04:30 修改

阅读量566

点赞数 24

分类专栏： CAM 文章标签： CAM 计算机视觉图像处理

于 2024-08-23 18:01:32 首次发布

本文链接：https://blog.csdn.net/qq_52771580/article/details/141437197

版权

CAM 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

1、引言

论文链接：https://arxiv.org/abs/1512.04150

Bolei Zhou[1] 等重新审视了 GAP(Gobal Average Pooling)，并阐明了它如何明确地使卷积神经网络具有显著的定位能力，同时提出 CAM(Class Activation Maps)[1] 技术来可视化这种能力。CAM 允许我们可视化任何给定图像上的预测类分数，突出显示 CNN(Convolutional Neural Networks) 检测到的判别对象部分。CAM 可以产生通用的定位深度特征来帮助其他研究人员了解CNN 对其任务使用的判别基础。

2、方法

图1 CNN for classification

如图 1 所示，GAP 输出最后一个卷积层每个单元的特征图的空间平均值。这些值的加权和用于生成最终输出。类似地，我们可以计算最后一个卷积层的特征图的加权和以获得 CAM，细节如图 2 所示，即若想计算给定图片在一个类别的上的 CAM，只需取出全连接层对应类别的参数 w1、w2、...、wn，易知每个参数 w 对应于最后一个卷积层输出的一个单元的特征图，即 n 也是最后一个卷积层的输出通道数，计算最后一个卷积层的特征图的加权和就可以获得 CAM，每个特征图的权重就是对应的 w

图2 CAM

为了更直观地观察 CAM，一般还需要经过以下步骤才能得到如图 3 所示的效果：

（1）归一化后映射到0-255。

（2）上采样到原图大小。

（3）获得 heatmap。

（4）计算待展示结果 result=0.6*heatmap+0.4*original_image。

图3 top 5 预测类别的 CAM 示例

3、总结

[1] 提出了用于具有 GAP 的 CNN。这使得分类训练的 CNN 能够学习执行对象定位，而无需使用任何边界框注释。CAM 允许我们可视化任何给定图像上的预测类分数，突出显示 CNN 检测到的判别对象部分，有助于理解和分析神经网络的工作原理及决策过程，进而去更好地选择或设计网络。我们还可以利用可视化的信息引导网络更好的学习，例如可以利用 CAM 信息通过"擦除"或""裁剪""的方式对数据进行增强。

作者开源的代码在：GitHub - zhoubolei/CAM: Class Activation Mapping，Pytorch 实现的最后一次更新的时间为 2021 年 6 月 30 日，故使用的 Pytorch 版本较老，本人重写了一遍如下所示：

import cv2
import torch
import numpy as np
from PIL import Image
import torch.nn.functional as F
from matplotlib import pyplot as plt
from torchvision.models.feature_extraction import create_feature_extractor


def showCAM(img_path, model, stage_name, class_dict, transform):
    """
    展示 model 预测 img 概率最高的 5 个类别的 CAM
    :param img_path: 待展示 CAM 的图片路径
    :param model:
    :param stage_name: model 的最后一个 stage 名称
    :param class_dict: 数据集字典列表，每个字典的形式为 {class_id, class_name}
    :param transform: 预处理 model 的输入图片
    :return:
    """
    model.eval()

    # 全连接层的权重
    last_layer = list(model.modules())[-1]
    fc_weights = last_layer.weight

    original_img = Image.open(img_path)

    # softmax计算概率
    img = transform(original_img).unsqueeze(0)
    output = model(img)
    psort = torch.sort(F.softmax(output, dim=1), descending=True)
    prob, cls_idx = psort

    # top5的类别和概率
    top5 = [(i.item(), j.item()) for i, j in zip(cls_idx.view(-1), prob.view(-1))][:5]

    fig, axs = plt.subplots(2, 3)
    axs.reshape(-1)[0].imshow(np.asarray(original_img))

    for idx, cls_prob in enumerate(top5):
        # 获取对应类别的权重
        cls_weights = fc_weights[cls_prob[0]].detach().unsqueeze(0)  # 1， class_num

        # 特征图提取
        feature_extractor = create_feature_extractor(model, return_nodes={stage_name: "feature_map"})
        forward = feature_extractor(img)
        b, c, h, w = forward["feature_map"].shape
        feature_map = forward["feature_map"].detach().reshape(c, h * w)

        # 激活类别特征映射
        CAM = torch.mm(cls_weights, feature_map).reshape(h, w)

        # 归一化后映射到0-255
        CAM = (CAM - torch.min(CAM)) / (torch.max(CAM) - torch.min(CAM))
        CAM = (CAM.numpy() * 255).astype("uint8")

        # 上采样到原图大小
        upsample = cv2.resize(CAM, original_img.size)

        # 热力图
        heatmap = cv2.applyColorMap(upsample, cv2.COLORMAP_JET)
        heatmap = cv2.cvtColor(heatmap, cv2.COLOR_BGR2RGB)

        result = heatmap * 0.6 + np.asarray(original_img) * 0.4
        # result = heatmap

        axs.reshape(-1)[idx + 1].imshow(np.uint8(result))
        axs.reshape(-1)[idx + 1].text(-10, -10, f"{class_dict[cls_prob[0]]}: {cls_prob[1]:.3f}", fontsize=12,
                                      color="black")
    plt.show()

参考文献

[1] Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning Deep Features for Discriminative Localization. In CVPR, 2016.