除了dropout神经元之外，是否还可以dropoutpool, fully connection, convolution ?
5x5conv->3x3 conv+3x3 fc :we end up with a net 18 25× reduction of computation,resulting in a relative gain of 28% by this factorization
Instead,we argue that the auxiliary classifiers act as regularizer
Thisis supported by the fact that the main classifier of the network performsbetter if the side branch is batch-normalized  or has a dropout layer. Thisalso gives a weak supporting evidence for the conjecture that batchnormalization acts as a regularizer
Inaddition, gradient clipping  was found to be useful to stabilize thetraining
Googlenet特点： Much of the original gains of theGoogLeNet network  arise from a very generous use of dimension reduction.
2. image generation
Inthis paper, we propose a novel representation for actions by modeling action asa transformation which changes the state of the environment before the actionhappens (precondition) to the state after the action (effect)