READING NOTE: Understanding intermediate layers using linear classifier probes

最新推荐文章于 2022-09-13 02:18:47 发布

Joshua_Li_

最新推荐文章于 2022-09-13 02:18:47 发布

阅读量870

点赞数

分类专栏：计算机视觉 DL

本文链接：https://blog.csdn.net/joshua_1988/article/details/52959497

版权

计算机视觉同时被 2 个专栏收录

72 篇文章 0 订阅

订阅专栏

42 篇文章 0 订阅

订阅专栏

TITLE: Understanding intermediate layers using linear classifier probes

AUTHOR: Guillaume Alain, Yoshua Bengio

ASSOCIATION: Université de Montréal

FROM: arXiv:1610.01644

CONTRIBUTIONS

The concept of the linear classifier probe (probe) is introduced to understand the roles of the intermediate layers of a neural network, to measure how much information is gained at every layer (answer : technically, none). This powerful concept can be very useful to understand the dynamics involved in a deep neural network during training and after.

Linear Classifier Probes

Probes

The probes are implemented in a very simple manner, that using a fully-connected layer and a softmax as a linear classifier. The classifier’s error takes NO part in the back-propagation process and is only used to measure the features’ ability of solving classification problems, which are extracted from different layers of different depth in the network.

Probes on untrained model

Given an untrained network, the probes are set to see whether each layer would give useful features for a classification task. The data is generated from a Gaussian distribution, a very easy task.

The probe to layer 0 corresponding to the raw data are able to classify perfectly. And the performance degrades when applying random transformations brought by the intermediate layers. The phenomenon indicates that at the beginning on training, the usefulness of layers decays as we go deeper, reaching the point where the deeper layers are utterly useless. The authors give a very strong claim: garbage forwardprop, garbage backprop

Auxiliary loss branches and skip connections

From the experiment in the paper, it seems that Auxiliary loss branches help make the untrainable model trainable.

From another experiment, if we added a bridge (a skip connection) between layer 0 and layer 64, the model completely ignores layers 1-63. The following figure illustrates the phenomenon.

Some Ideas

the probes can be used to visualize the role of each layers.
ResNet is really necessary? Why it works if the skip will ignore the layers it covers.
Training stage by stage could be very useful when a very deep network is use.