Network Dissection量化网络的可解释性(interpretability)

引言

作者提出了量化网络latent representation可解释性的框架:评估单一隐藏单元与语义概念的对齐关系。给定一个CNN,可以对某一卷积层的隐藏单元的语义进行打分。语义具有如下的标签:objects(目标),parts(物体的一部分),scenes(场景),textures(纹理),materials(材料),color(颜色)。 Broadly and Densely Labeled Dataset (Broden) 收集了多个数据集的图片并进行了密集的pixel-wise的标注,除了纹理(texture)和场景(scene)是对整体图像的标注,部分例子如下所示:
在这里插入图片描述
本文阐述了一些可解释性的结论:

  • 可解释性是与坐标轴对齐(axis-aligned)的,对表示(representation)进行翻转(rotate),网络的可解释能力会下降,但是分类性能不变。
  • 越深的结构可解释性越好,ResNet>VGGNet>GoogleNet>AlexNet
  • 对于训练数据集的可解释性,Places > ImageNet,因为一个场景(scene)会包含多个目标,因此有益于多个目标检测器(object detectors)出现来识别场景。
  • 对训练条件的可解释性,训练论数越多越好。与初始化无关,dropout会增强可解释性。Batch normalization会降低可解释性,百化(whiten)操作会平滑缩放问题并且rotate中间特征的轴。

方法

在这里插入图片描述
对于给定CNN衡量语义对齐的单元。上图中探测最后卷积层的一个单元的表现。
对于一个给定的输入图像 x x x,某一卷积层的第 k k k个单元的激活图是 A k ( x ) A_{k}(x) Ak(x)。单元的激活分布为 α k \alpha_{k} αk,上分位点 T k T_{k} Tk通过 P ( α k > T K ) = 0.005 P(\alpha_{k}>T_{K})=0.005 P(αk>TK)=0.005在每个spatial位置上。如上图所示,首先对低像素的单元激活图 A k ( x ) A_{k}(x) Ak(x)进行上采样到 S k ( x ) S_{k}(x) Sk(x)为输入图像像素大小,之后和输入像素的标注mask L c L_{c} Lc进行比较,其中 c c c是某一个语义概念。 S k ( x ) S_{k}(x) Sk(x)进行一个阈值的二值分割: M k ( x ) ≡ S k ( x ) ≥ T k ( x ) M_{k}(x)\equiv S_{k}(x)\geq T_{k}(x) Mk(x)Sk(x)Tk(x)。计算 M k ( x ) ∩ L c ( x ) M_{k}(x)\cap L_{c}(x) Mk(x)Lc(x)通过IoU来得到单元 k k k对于概念 c c c的分数:
I o U k , c = ∑ ∣ M k ( x ) ∩ L c ( x ) ∣ ∑ ∣ M k ( x ) ∪ L c ( x ) ∣ IoU_{k,c}=\frac{\sum{|M_{k}(x)\cap L_{c}(x)|}}{\sum|M_{k}(x)\cup L_{c}(x)|} IoUk,c=Mk(x)Lc(x)Mk(x)Lc(x)
我们认为单元 k k k对于概念 c c c作为一个检测器如果 I o U k , c IoU_{k,c} IoUk,c超过了阈值。文中作者使用阈值 0.04 0.04 0.04。注意单个单元可能是多个概念的检测器,于是作者选择排名最好的label。量化某层的可解释性,我们count对齐于单元的独特概念的数量,称这个指标为the number of unique detectors,将某层可以作为detector的单元数量定义为the number of detectors.

pytorch实现

作者提供了Github链接:https://github.com/CSAILVision/NetDissect-Lite
在这里重点说明如何使用自己的模型进行dissection,注意首先要得到pre-trained权重文件。
(1)将模型module代码放在loader/下,如新建文件lenet.py:

import torch.nn as nn
import torch.nn.functional as F
import torch


class LeNet(nn.Module):
    def __init__(self, num_classes=1000):
        super(LeNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, 3),
            nn.ReLU(inplace=True))

        self.fc = nn.Linear(64, num_classes)

    def forward(self, x):
        out = self.features(x)
        out = F.adaptive_avg_pool2d(out, (1, 1))
        out = out.view(out.size(0), -1)
        out = self.fc(out)
        return out

(2)修改loader/下的model_loader.py,用于加载自己的lenet模型:
settings.py里加入IS_TORCHVISION变量,用于标识是不是torchvision中已有的模型。

import settings
import torch
import torchvision
from .lenet import LeNet

def loadmodel(hook_fn):
    if settings.IS_TORCHVISION:
        if settings.MODEL_FILE is None:
            model = torchvision.models.__dict__[settings.MODEL](pretrained=True)
        else:
            checkpoint = torch.load(settings.MODEL_FILE)
            if type(checkpoint).__name__ == 'OrderedDict' or type(checkpoint).__name__ == 'dict':
                model = torchvision.models.__dict__[settings.MODEL](num_classes=settings.NUM_CLASSES)
                if settings.MODEL_PARALLEL:
                    state_dict = {str.replace(k, 'module.', ''): v for k, v in checkpoint[
                        'state_dict'].items()}  # the data parallel layer will add 'module' before each layer name
                else:
                    state_dict = checkpoint
                model.load_state_dict(state_dict)
            else:
                model = checkpoint
    else:
        checkpoint = torch.load(settings.MODEL_FILE)
        model = LeNet(num_classes=settings.NUM_CLASSES)
        param_dict = {}  # 新的pre-trained权重文件,因为直接load checkpoint可能会引起组件名称不一致问题
        for k, v in zip(model.state_dict().keys(), checkpoint['net'].keys()):
            param_dict[k] = checkpoint['net'][v]
        model.load_state_dict(param_dict)

    for name in settings.FEATURE_NAMES:
        model._modules.get(name).register_forward_hook(hook_fn)
    if settings.GPU:
        model.cuda()
    model.eval()
    return model

(3)settings.py文件设置如下:

######### global settings  #########
GPU = True                                  # running on GPU is highly suggested
TEST_MODE = False                         # turning on the testmode means the code will run on a small dataset.
CLEAN = True                               # set to "True" if you want to clean the temporary large files after generating result
MODEL = 'lenet'                          # model arch: resnet18, alexnet, resnet50, densenet161
DATASET = 'imagenet'                       # model trained on: places365 or imagenet
QUANTILE = 0.005                            # the threshold used for activation
SEG_THRESHOLD = 0.04                        # the threshold used for visualization
SCORE_THRESHOLD = 0.04                      # the threshold used for IoU score (in HTML file)
TOPN = 10                                   # to show top N image with highest activation for each unit
PARALLEL = 1                                # how many process is used for tallying (Experiments show that 1 is the fastest)
CATAGORIES = ["object", "material", "part","scene","texture","color"] # concept categories that are chosen to detect: "object", "part", "scene", "material", "texture", "color"
OUTPUT_FOLDER = "result/pytorch_"+MODEL+"_"+DATASET # result will be stored in this folder

########### sub settings ###########
# In most of the case, you don't have to change them.
# DATA_DIRECTORY: where broaden dataset locates
# IMG_SIZE: image size, alexnet use 227x227
# NUM_CLASSES: how many labels in final prediction
# FEATURE_NAMES: the array of layer where features will be extracted
# MODEL_FILE: the model file to be probed, "None" means the pretrained model in torchvision
# MODEL_PARALLEL: some model is trained in multi-GPU, so there is another way to load them.
# WORKERS: how many workers are fetching images
# BATCH_SIZE: batch size used in feature extraction
# TALLY_BATCH_SIZE: batch size used in tallying
# INDEX_FILE: if you turn on the TEST_MODE, actually you should provide this file on your own

if MODEL != 'alexnet':
    DATA_DIRECTORY = 'dataset/broden1_224'
    IMG_SIZE = 224
else:
    DATA_DIRECTORY = 'dataset/broden1_227'
    IMG_SIZE = 227


NUM_CLASSES = 1000
IS_TORCHVISION = False

if MODEL == 'resnet50':
    FEATURE_NAMES = ['layer4']
    MODEL_FILE = None
    MODEL_PARALLEL = False
elif MODEL == 'densenet169':
    FEATURE_NAMES = ['features']
    MODEL_FILE = None
    MODEL_PARALLEL=False
elif MODEL == 'lenet':
    FEATURE_NAMES = ['features']
    MODEL_FILE = '/home/ws/winycg/mbagnet/checkpoint/LeNet.pth.tar' # 预训练权重文件


if TEST_MODE:
    WORKERS = 1
    BATCH_SIZE = 4
    TALLY_BATCH_SIZE = 2
    TALLY_AHEAD = 1
    INDEX_FILE = 'index_sm.csv'
    OUTPUT_FOLDER += "_test"
else:
    WORKERS = 12
    BATCH_SIZE = 16
    TALLY_BATCH_SIZE = 16
    TALLY_AHEAD = 4
    INDEX_FILE = 'index.csv'

©️2020 CSDN 皮肤主题: 编程工作室 设计师:CSDN官方博客 返回首页