1 General Design Principles
[Avoid Representational Bottlenecks, Especially Early in the Network] The representation (feature map) size should gently decrease from the inputs to the outputs before reaching the final representation used for the task at hand. The dimensionality merely provides a rough estimate of information content since it discards important factors like correlation structure.
[Higher Dimensional Representations are Easier to Process Locally Within a Network] Increasing the activations per tile in a convolutional network allows for more disentangled features. The resulting networks will train faster.
[Spatial Aggregation Can Be Done Over Lower Dimensional Embeddings Without Much or Any Loss in Representational Power] One can reduce the dimensio