Reading Note: Interpretable Convolutional Neural Networks_interpretable convolutional neural networks github-CSDN博客

本文链接：https://blog.csdn.net/joshua_1988/article/details/78221931

TITLE: Interpretable Convolutional Neural Networks

AUTHOR: Quanshi Zhang, Ying Nian Wu, Song-Chun Zhu

ASSOCIATION: UCLA

FROM: arXiv:1710.00935

CONTRIBUTION

Slightly revised CNNs are propsed to improve their interpretability, which can be broadly applied to CNNs with different network structures.
No annotations of object parts and/or textures are needed to ensure each high-layer filter to have a certain semantic meaning. Each filter automatically learns a meaningful object-part representation without any additional human supervision.
When a traditional CNN is modified to an interpretable CNN, experimental settings need not to be changed for learning. I.e. the interpretable CNN does not change the previous loss function on the top layer and uses exactly the same training samples.
The design for interpretability may decrease the discriminative power of the network a bit, but such a decrease is limited within a small range.

METHOD

The loss for filter is illustrated in the following figure.

Framework {: .center-image .image-width-480}

A feature map is expected to be strongly activated in images of a certain category and keep silent on other images. Therefore, a number of templates are used to evaluate the fitness between the current feature map and the ideal distribution of activations w.r.t. its semantics. The template is an ideal distribution of activations according to space locations. The loss for layers is formulated as the mutual information between feature map $\textbf{X}$ and templates $\textbf{T}$ .

L o s s f = - M I (X; T)

$Loss_{f} = - MI(\textbf{X}; \textbf{T})$

the loss can be re-written as

L o s s f = - H (T) + H (T' = {T -, T + | X}) + \sum x p (T +, x) H (T + | X = x)

$Loss_{f} = - H(\textbf{T}) + H(\textbf{T'}=\{T^{-}, \textbf{T}^{+}|\textbf{X}\})+\sum_{x}p(\textbf{T}^{+},x)H(\textbf{T}^{+}|X=x)$

The first term is a constant denoting the piror entropy of $\textbf{T}^{+}$ . The second term encourages a low conditional entropy of inter-category activations which means that a well-learned filter needs to be exclusively activated by a certain category and keep silent on other categories. The third term encorages a low conditional entropy of spatial distribution of activations. A well-learned filter should only be activated by a single region of the feature map, instead of repetitively appearing at different locations.