一直不知道 concatenation 和 element-wise summation 该选哪个?这个回答很不错。
[from] https://www.quora.com/What-is-the-theory-behind-the-concatenation-vs-summation-of-2-tensors-in-deep-learning-How-does-this-empirically-relate-to-information-passed
What is the theory behind the concatenation vs. summation of 2 tensors in deep learning? How does this empirically relate to information passed?
2 Answers
Farshid Rayhan, studied at University of Manchester
The summation of tensors are the signature of ResNet and concatenations of tensors were first used in Inception nets and later in DenseNet.
The identity matrix summation speeds up the training process and improves gradient flow since the skip connections are taken from previous conv operations. Thus the backpropagation can effectively transfer error corrections to earlier layers much easily. This addresses the vanishing gradient problem.
The channel wise concatenation was used in inception net to concatenate the feature maps generated by different filter sizes so that the user wont have to choose an effective filter size. DenseNet concatenates feature maps of previous layers, similar to Inception nets, so that the next layer can choose to work on either the features maps generated by the immediate earlier layer or it can work on the feature maps of the conv operation before the immediate earlier layer.
DenseNet procedure makes the model much more thick as channels are being joined after each conv operation while that doesnt happen in ResNet as it just sums the tensor. DenseNet paper argued that the summation harms the gradient flow as it sums up the values.
Both of the process are very state of the art. I personally prefer the channel concatenation process because I believe the summing up tensor regardless pollutes the feature map of both, the immediate conv operation and the source of the skip connection. Since deep nets are by nature a very strong learner this is why this ResNet process works quite well. Plus I also believe in the strategy of DenseNet where the goal is to make a smaller network with can do equivalent of very deep net like ResNet.