该篇文章主要用来估计Joint的位置
贡献在于用了 ψ t > 1 \psi_{t>1} ψt>1 capture spatial context across all stages
值得注意的句子
- the magnitude of back-propagated gradients decreases in strength with the number of intermediate layers between the output layer and the input layer.
- Intermediate supervision has the advantage that, even though the full architecture can have many layers, it does not fall prey to the vanishing gradient problem as the intermediate loss functions replenish the gradients at each stages.