Grad-CAM方法解析
convolutional layers naturally retain spatial information which is lost in fully-connected layers, so we
can expect the last convolutional layers to have the best compromise between high-level semantics and detailed spatial information. 意思即是,FCC层通常损失了spatial的信号,所以最后一层的CNN层总是最好的语义和空间信息的最好结合。
The neurons in these layers look for semantic class-specific information in the image (say object parts). 寻找class-specific的信号,例如,物体。
Grad-CAM uses the gradient information flowing into the last convolutional layer of the CNN to assign importance values to each neuron for a particular decision of interest. Although our technique is fairly general in that it can be used to explain activations in any layer of a deep network, in this work, we focus on explaining output layer decisions only. 我们这里只关注最后一层CNN.
A i , j k A^k_{i, \, j} Ai,jk 代表了在最后一层的convolution feature maps,class activation map被定义为,
α k c = 1 Z ∑ i ∑ j ∂ y c ∂ A i , j k , \alpha^c_k = \frac{1}{Z} \sum_i \sum_j \frac{\partial y^c}{ \partial A^k_{i, j}}, αkc=Z1