concatenation 和 element-wise summation 该选哪个？

最新推荐文章于 2022-05-03 23:59:00 发布

Phoenixtree_DongZhao

最新推荐文章于 2022-05-03 23:59:00 发布

阅读量6.1k

点赞数 2

分类专栏： deep learning 文章标签： deep learning concatenation element-wise summation

原文链接：https://www.quora.com/What-is-the-theory-behind-the-concatenation-vs-summation-of-2-tensors-in-deep-learning-How-does-this-empirically-relate-to-information-passed

版权

deep learning 专栏收录该内容

118 篇文章

订阅专栏

一直不知道 concatenation 和 element-wise summation 该选哪个？这个回答很不错。

[from] https://www.quora.com/What-is-the-theory-behind-the-concatenation-vs-summation-of-2-tensors-in-deep-learning-How-does-this-empirically-relate-to-information-passed

What is the theory behind the concatenation vs. summation of 2 tensors in deep learning? How does this empirically relate to information passed?

2 Answers

Farshid Rayhan, studied at University of Manchester

Updated Jul 16

The summation of tensors are the signature of ResNet and concatenations of tensors were first used in Inception nets and later in DenseNet.

The identity matrix summation speeds up the training process and improves gradient flow since the skip connections are taken from previous conv operations. Thus the backpropagation can effectively transfer error corrections to earlier layers much easily. This addresses the vanishing gradient problem.

The channel wise concatenation was used in inception net to concatenate the feature maps generated by different filter sizes so that the user wont have to choose an effective filter size. DenseNet concatenates feature maps of previous layers, similar to Inception nets, so that the next layer can choose to work on either the features maps generated by the immediate earlier layer or it can work on the feature maps of the conv operation before the immediate earlier layer.

DenseNet procedure makes the model much more thick as channels are being joined after each conv operation while that doesnt happen in ResNet as it just sums the tensor. DenseNet paper argued that the summation harms the gradient flow as it sums up the values.

Both of the process are very state of the art. I personally prefer the channel concatenation process because I believe the summing up tensor regardless pollutes the feature map of both, the immediate conv operation and the source of the skip connection. Since deep nets are by nature a very strong learner this is why this ResNet process works quite well. Plus I also believe in the strategy of DenseNet where the goal is to make a smaller network with can do equivalent of very deep net like ResNet.