READING NOTE: Understanding intermediate layers using linear classifier probes

TITLE: Understanding intermediate layers using linear classifier probes

AUTHOR: Guillaume Alain, Yoshua Bengio

ASSOCIATION: Université de Montréal

FROM: arXiv:1610.01644

CONTRIBUTIONS

The concept of the linear classifier probe (probe) is introduced to understand the roles of the intermediate layers of a neural network, to measure how much information is gained at every layer (answer : technically, none). This powerful concept can be very useful to understand the dynamics involved in a deep neural network during training and after.

Linear Classifier Probes

Probes

The probes are implemented in a very simple manner, that using a fully-connected layer and a softmax as a linear classifier. The classifier’s error takes NO part in the back-propagation process and is only used to measure the features’ ability of solving classification problems, which are extracted from different layers of different depth in the network.

Probes on untrained model

Given an untrained network, the probes are set to see whether each layer would give useful features for a classification task. The data is generated from a Gaussian distribution, a very easy task.

The probe to layer 0 corresponding to the raw data are able to classify perfectly. And the performance degrades when applying random transformations brought by the intermediate layers. The phenomenon indicates that at the beginning on training, the usefulness of layers decays as we go deeper, reaching the point where the deeper layers are utterly useless. The authors give a very strong claim: garbage forwardprop, garbage backprop

Auxiliary loss branches and skip connections

From the experiment in the paper, it seems that Auxiliary loss branches help make the untrainable model trainable.

From another experiment, if we added a bridge (a skip connection) between layer 0 and layer 64, the model completely ignores layers 1-63. The following figure illustrates the phenomenon.

Some Ideas

  1. the probes can be used to visualize the role of each layers.
  2. ResNet is really necessary? Why it works if the skip will ignore the layers it covers.
  3. Training stage by stage could be very useful when a very deep network is use.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值