XAI系列基础知识之Grad-CAM

Global Average Pooling(GAP)

参考:
深度学习基础系列(十)| Global Average Pooling是否可以替代全连接层?深度学习|Global Average Pooling
Network In Network中对GAP的描述:
In this paper, we propose another strategy called global average pooling to replace the traditional fully connected layers in CNN. The idea is to generate one feature map for each corresponding category of the classification task in the last mlpconv layer. Instead of adding fully connected layers on top of the feature maps, we take the average of each feature map, and the resulting vector is fed directly into the softmax layer. One advantage of global average pooling over the fully connected layers is that it is more native to the convolution structure by enforcing correspondences between feature maps and categories. Thus the feature maps can be easily interpreted as categories confidence maps. Another advantage is that there is no parameter to optimize in the global average pooling thus overfitting is avoided at this layer. Futhermore, global average pooling sums out the spatial information, thus it is more robust to spatial translations of the input.
用图来表示:
在这里插入图片描述
从图中可以直观看出GAP就是对每张特征图取其均值,用这个均值来表示该特征图送入softmax计算。

Grad-CAM

参考文献:
深度学习论文笔记(可解释性)——CAM与Grad-CAM
在讲之前先明确一点:CNN最后一层特征图富含有最为丰富类别语意信息。

import torch
import torch.nn.functional as F


def find_vgg_layer(arch, target_layer_name):
    """Find vgg layer to calculate GradCAM and GradCAM++

    Args:
        arch: default torchvision densenet models
        target_layer_name (str): the name of layer with its hierarchical information. please refer to usages below.
            target_layer_name = 'features'
            target_layer_name = 'features_42'
            target_layer_name = 'classifier'
            target_layer_name = 'classifier_0'

    Return:
        target_layer: found layer. this layer will be hooked to get forward/backward pass information.
    """
    hierarchy = target_layer_name.split('_')

    if len(hierarchy) >= 1:
        target_layer = arch.features

    if len(hierarchy) == 2:
        target_layer = target_layer[int(hierarchy[1])]

    return target_layer


class GradCAM(object):
    """Calculate GradCAM salinecy map.

    A simple example:

        # initialize a model, model_dict and gradcam
        resnet = torchvision.models.resnet101(pretrained=True)
        resnet.eval()
        model_dict = dict(model_type='resnet', arch=resnet, layer_name='layer4', input_size=(224, 224))
        gradcam = GradCAM(model_dict)

        # get an image and normalize with mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)
        img = load_img()
        normed_img = normalizer(img)

        # get a GradCAM saliency map on the class index 10.
        mask, logit = gradcam(normed_img, class_idx=10)

        # make heatmap from mask and synthesize saliency map using heatmap and img
        heatmap, cam_result = visualize_cam(mask, img)


    Args:
        model_dict (dict): a dictionary that contains 'model_type', 'arch', layer_name', 'input_size'(optional) as keys.
        verbose (bool): whether to print output size of the saliency map givien 'layer_name' and 'input_size' in model_dict.
    """

    def __init__(self, model_dict, verbose=False):
        model_type = model_dict['type']
        layer_name = model_dict['layer_name']
        self.model_arch = model_dict['arch']

        self.gradients = dict()
        self.activations = dict()

        def backward_hook(module, grad_input, grad_output):
            self.gradients['value'] = grad_output[0]
            return None

        def forward_hook(module, input, output):
            self.activations['value'] = output
            return None

        if 'vgg' in model_type.lower():
            target_layer = find_vgg_layer(self.model_arch, layer_name)

        target_layer.register_forward_hook(forward_hook)
        target_layer.register_backward_hook(backward_hook)

        if verbose:
            try:
                input_size = model_dict['input_size']
            except KeyError:
                print("please specify size of input image in model_dict. e.g. {'input_size':(224, 224)}")
                pass
            else:
                device = 'cuda' if next(self.model_arch.parameters()).is_cuda else 'cpu'
                self.model_arch(torch.zeros(1, 3, *(input_size), device=device))
                print('saliency_map size :', self.activations['value'].shape[2:])

    def forward(self, input, class_idx=None, retain_graph=False):
        """
        Args:
            input: input image with shape of (1, 3, H, W)
            class_idx (int): class index for calculating GradCAM.
                    If not specified, the class index that makes the highest model prediction score will be used.
        Return:
            mask: saliency map of the same spatial dimension with input
            logit: model output
        """
        b, c, h, w = input.size()

        logit = self.model_arch(input)
        print(logit.shape)
        if class_idx is None:
            score = logit[:, logit.max(1)[-1]].squeeze()  # get the max socre
            print(score)
        else:
            score = logit[:, class_idx].squeeze()

        self.model_arch.zero_grad()
        score.backward(retain_graph=retain_graph)
        gradients = self.gradients['value']
        activations = self.activations['value']
        # print(gradients.shape, activations.shape)  # torch.Size([1, 512, 14, 14]) torch.Size([1, 512, 14, 14])
        b, k, u, v = gradients.size()
        alpha = gradients.view(b, k, -1).mean(2)  # torch.Size([1, 512])
        # alpha = F.relu(gradients.view(b, k, -1)).mean(2)
        weights = alpha.view(b, k, 1, 1)  # torch.Size([1, 512, 1, 1])

        saliency_map = (weights * activations).sum(1, keepdim=True)
        saliency_map = F.relu(saliency_map)
        print('saliency_map', saliency_map.shape)
        saliency_map = F.upsample(saliency_map, size=(h, w), mode='bilinear', align_corners=False)
        saliency_map_min, saliency_map_max = saliency_map.min(), saliency_map.max()
        saliency_map = (saliency_map - saliency_map_min).div(saliency_map_max - saliency_map_min).data

        return saliency_map, logit

    def __call__(self, input, class_idx=None, retain_graph=False):
        return self.forward(input, class_idx, retain_graph)

代码解读(注意这里的讲解用的字母都是根据原论文的):
输入预训练模型,要提取的层(这里用vgg16最后一个MaxPool2d()前的relu(),即features_29),使用hook提取features_29层的激活值和梯度,这层的特征图、激活值和梯度大小为 1 × 512 × 14 × 14 1 \times 512 \times 14 \times 14 1×512×14×14。将每一个特征图用一个GAP获得神经元重要性权重 α k c \alpha_k^c αkc,对应代码和公式:
α k c = 1 Z ∑ i ∑ j ∂ y c ∂ A i j k \alpha_k^c=\frac{1}{Z}\sum_i \sum_j{\frac{\partial y^c}{\partial A^k_{ij}}} αkc=Z1ijAijkyc

alpha = gradients.view(b, k, -1).mean(2)  # torch.Size([1, 512])
weights = alpha.view(b, k, 1, 1)  # torch.Size([1, 512, 1, 1])

We perform a weighted combination of forward activation maps, and follow it by a ReLU to obtain:
L G r a d − C A M c = R e L U ( ∑ k α k c A k ) L_{Grad-CAM}^c=ReLU(\sum_k{\alpha_k^cA^k}) LGradCAMc=ReLU(kαkcAk)

saliency_map = (weights * activations).sum(1, keepdim=True)
saliency_map = F.relu(saliency_map)

关于上采样的可以看pytorch torch.nn 实现上采样——nn.Upsample
在这里插入图片描述
实现结果:从左到右依次为原图,Grad-CAM的heatmap,Grad-CAM叠加后的效果

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值