XAI系列基础知识之Grad-CAM

最新推荐文章于 2024-10-09 21:02:09 发布

Daft shiner

最新推荐文章于 2024-10-09 21:02:09 发布

阅读量452

点赞数

分类专栏： utils 文章标签：深度学习

本文链接：https://blog.csdn.net/weixin_46782905/article/details/120404465

版权

utils 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

Global Average Pooling(GAP)

参考：
深度学习基础系列（十）| Global Average Pooling是否可以替代全连接层？和深度学习|Global Average Pooling
Network In Network中对GAP的描述：
In this paper, we propose another strategy called global average pooling to replace the traditional fully connected layers in CNN. The idea is to generate one feature map for each corresponding category of the classification task in the last mlpconv layer. Instead of adding fully connected layers on top of the feature maps, we take the average of each feature map, and the resulting vector is fed directly into the softmax layer. One advantage of global average pooling over the fully connected layers is that it is more native to the convolution structure by enforcing correspondences between feature maps and categories. Thus the feature maps can be easily interpreted as categories confidence maps. Another advantage is that there is no parameter to optimize in the global average pooling thus overfitting is avoided at this layer. Futhermore, global average pooling sums out the spatial information, thus it is more robust to spatial translations of the input.
用图来表示：
在这里插入图片描述
从图中可以直观看出GAP就是对每张特征图取其均值，用这个均值来表示该特征图送入softmax计算。

Grad-CAM

参考文献：
深度学习论文笔记（可解释性）——CAM与Grad-CAM
在讲之前先明确一点：CNN最后一层特征图富含有最为丰富类别语意信息。

import torch
import torch.nn.functional as F


def find_vgg_layer(arch, target_layer_name):
    """Find vgg layer to calculate GradCAM and GradCAM++

    Args:
        arch: default torchvision densenet models
        target_layer_name (str): the name of layer with its hierarchical information. please refer to usages below.
            target_layer_name = 'features'
            target_layer_name = 'features_42'
            target_layer_name = 'classifier'
            target_layer_name = 'classifier_0'

    Return:
        target_layer: found layer. this layer will be hooked to get forward/backward pass information.
    """
    hierarchy = target_layer_name.split('_')

    if len(hierarchy) >= 1:
        target_layer = arch.features

    if len(hierarchy) == 2:
        target_layer = target_layer[int(hierarchy[1])]

    return target_layer


class GradCAM(object):
    """Calculate GradCAM salinecy map.

    A simple example:

        # initialize a model, model_dict and gradcam
        resnet = torchvision.models.resnet101(pretrained=True)
        resnet.eval()
        model_dict = dict(model_type='resnet', arch=resnet, layer_name='layer4', input_size=(224, 224))
        gradcam = GradCAM(model_dict)

        # get an image and normalize with mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)
        img = load_img()
        normed_img = normalizer(img)

        # get a GradCAM saliency map on the class index 10.
        mask, logit = gradcam(normed_img, class_idx=10)

        # make heatmap from mask and synthesize saliency map using heatmap and img
        heatmap, cam_result = visualize_cam(mask, img)


    Args:
        model_dict (dict): a dictionary that contains 'model_type', 'arch', layer_name', 'input_size'(optional) as keys.
        verbose (bool): whether to print output size of the saliency map givien 'layer_name' and 'input_size' in model_dict.
    """

    def __init__(self, model_dict, verbose=False):
        model_type = model_dict['type']
        layer_name = model_dict['layer_name']
        self.model_arch = model_dict['arch']

        self.gradients = dict()
        self.activations = dict()

        def backward_hook(module, grad_input, grad_output):
            self.gradients['value'] = grad_output[0]
            return None

        def forward_hook(module, input, output):
            self.activations['value'] = output
            return None

        if 'vgg' in model_type.lower():
            target_layer = find_vgg_layer(self.model_arch, layer_name)

        target_layer.register_forward_hook(forward_hook)
        target_layer.register_backward_hook(backward_hook)

        if verbose:
            try:
                input_size = model_dict['input_size']
            except KeyError:
                print("please specify size of input image in model_dict. e.g. {'input_size':(224, 224)}")
                pass
            else:
                device = 'cuda' if next(self.model_arch.parameters()).is_cuda else 'cpu'
                self.model_arch(torch.zeros(1, 3, *(input_size), device=device))
                print('saliency_map size :', self.activations['value'].shape[2:])

    def forward(self, input, class_idx=None, retain_graph=False):
        """
        Args:
            input: input image with shape of (1, 3, H, W)
            class_idx (int): class index for calculating GradCAM.
                    If not specified, the class index that makes the highest model prediction score will be used.
        Return:
            mask: saliency map of the same spatial dimension with input
            logit: model output
        """
        b, c, h, w = input.size()

        logit = self.model_arch(input)
        print(logit.shape)
        if class_idx is None:
            score = logit[:, logit.max(1)[-1]].squeeze()  # get the max socre
            print(score)
        else:
            score = logit[:, class_idx].squeeze()

        self.model_arch.zero_grad()
        score.backward(retain_graph=retain_graph)
        gradients = self.gradients['value']
        activations = self.activations['value']
        # print(gradients.shape, activations.shape)  # torch.Size([1, 512, 14, 14]) torch.Size([1, 512, 14, 14])
        b, k, u, v = gradients.size()
        alpha = gradients.view(b, k, -1).mean(2)  # torch.Size([1, 512])
        # alpha = F.relu(gradients.view(b, k, -1)).mean(2)
        weights = alpha.view(b, k, 1, 1)  # torch.Size([1, 512, 1, 1])

        saliency_map = (weights * activations).sum(1, keepdim=True)
        saliency_map = F.relu(saliency_map)
        print('saliency_map', saliency_map.shape)
        saliency_map = F.upsample(saliency_map, size=(h, w), mode='bilinear', align_corners=False)
        saliency_map_min, saliency_map_max = saliency_map.min(), saliency_map.max()
        saliency_map = (saliency_map - saliency_map_min).div(saliency_map_max - saliency_map_min).data

        return saliency_map, logit

    def __call__(self, input, class_idx=None, retain_graph=False):
        return self.forward(input, class_idx, retain_graph)

代码解读(注意这里的讲解用的字母都是根据原论文的)：
输入预训练模型，要提取的层（这里用vgg16最后一个MaxPool2d()前的relu(),即features_29），使用hook提取features_29层的激活值和梯度，这层的特征图、激活值和梯度大小为 $\times 512 \times 14 \times 14$ 。将每一个特征图用一个GAP获得神经元重要性权重 $\alpha_k^c$ ，对应代码和公式：
$\alpha_k^c=\frac{1}{Z}\sum_i \sum_j{\frac{\partial y^c}{\partial A^k_{ij}}}$

alpha = gradients.view(b, k, -1).mean(2)  # torch.Size([1, 512])
weights = alpha.view(b, k, 1, 1)  # torch.Size([1, 512, 1, 1])

We perform a weighted combination of forward activation maps, and follow it by a ReLU to obtain：
$L_{Grad-CAM}^c=ReLU(\sum_k{\alpha_k^cA^k})$

saliency_map = (weights * activations).sum(1, keepdim=True)
saliency_map = F.relu(saliency_map)

关于上采样的可以看pytorch torch.nn 实现上采样——nn.Upsample
在这里插入图片描述
实现结果：从左到右依次为原图，Grad-CAM的heatmap，Grad-CAM叠加后的效果

Daft shiner

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录