Densely Connected Convolutional Networks 小陈读paper系列-CSDN博客

本文链接：https://blog.csdn.net/qq_68308828/article/details/129805535

好我读完了哈哈开玩笑的啦

in a feed-forward fashion

our network has L(L+1) 2 direct connections.

这个就非常好，1.缓解了梯度消失的问题 2.加强了特征的传播

3.鼓励特征的重新使用？4.大量减少了参数量

show 一下结果虽然也不是标着 CVPR 但是我一看什么

很顶的大学很顶的机构我就知道这篇就是不简单

开始讲之前现在的工作了

深度一下子就上来了

然后就引出了一个问题

train的输入在经过这么多层的时候，会消失洗掉哈哈哈当达到最后的时候

然后就说说大家是怎么解决这个问题的

然后总结了一下都是加点路径从前面拉到后面

拿不出意外呢，我们就可以看到接下来作者要卖自己的点子了

就读的很轻松啊和那个漂亮优雅的图是分不开的

命名怎么来的都说了我们这个和别人的传统的都不一样

参数少了我们这个可以保存需要的信息

A possibly counter-intuitive effect 可能与直觉相反的效果

as there is no need to relearn redundant feature-maps.

这个呢就在说Resnet 有点问题

什么问题呢

1.many layers contribute very little and can in fact be randomly dropped during training.

特征少然后甚至在训练中被随机扔掉

2.参数量大呀每一层都有自己的权重

我们的就非常好首先呢区分 add Net information 和要保留的信息知识

效果非常好这个参数不仅小而且能干 efficiency

然后呢easy to train and

英语有点小差不知所云哈哈

resurgence 复兴

写的真好

only scales 仅适用

我真的感觉啊 related work 真的是体现一个科研人的知识储备啊，写的真好啊啊啊啊

bypassing paths along with gating units,不知道这个有没有好心人给我解释解释

are presumed to be the key factor 写作表达学起来

被认为是关键因素

不过这一段也在肯定何凯明大牛的contribution

stochastic depth was proposed as a way to successfully train a 1202-layer ResNet [13].

我觉得以后这个也是大模型的一个可以开发的思路，这个ideal肯定可行，但是我现在肯定做不到哈哈哈by dropping layers randomly during training.

This shows that not all layers may be needed and highlights that there is a great amount of redundancy in deep (residual) networks. 信息太多了，富得流油，就显得冗余了哈哈

然后引出了部分的工作的motivation

An orthogonal approach 一种正交方法

generalized residual blocks 广义残差块

这个写法是可以学习的很多都是约定俗成的

先扬后抑哈哈

优点：梯度可以直接通过恒等函数从后面的层流向前面的层。

缺点：the identity function and the output of H` are combined by summation,

which may impede the information flow阻碍信息流？ in the network

感觉读得好快，但是越来越想看实现的代码

把三个操作看成一个

BN relu conv

但是

这也就遇到了一个问题当featuremap 改变尺寸的时候这个时候的连接操作看上去是不可行的

然后嘞作者将款之间的连接层称为过渡层（which do convolution and pooling.）

consist of a batch normalization layer and an 1×1 convolutional layer followed by a 2×2 average pooling layer.是由BN1*1卷积以及平均池化组成

在这里引入了一个 bottleneck层（1*1卷积）很work

and thus to improve computational efficiency

可分离卷积？我有点忘记了就是通过降维来减少计算量的

什么是深度卷积可分离Depthwise Separable conv_什么是depthwise_：）�东东要拼命的博客-CSDN博客

we let each 1×1 convolution produce 4kfeature-maps. 有点骇人

开始铺垫实验

Dense net B 用1*1卷积的

C 压缩0.5 feature map 一半吧

两个都用BC

我又懂了不过还是去查了资料

来附上出处DenseNet：比ResNet更优的CNN模型 - 知乎 (zhihu.com)

这里的参数少完全是做了轻量化的操作

效率非常高（这里插一句无关的啊，为啥突然去看2016的CVPR，这是因为dense net 和Res net 都是值得回味的）

又是一大亮点

% error on C100+) as the 1001-layer pre-activation ResNet using 90% fewer parameters. Figure 4 (right panel) shows the training loss and test errors of these two networks on C10+. The 1001-layer deep ResNet converges to a lower training loss value but a similar test error.

不太容易过拟合

提升很大

提出的 1*1 卷积和feature map 的压缩看上去很work

优化的底部还有很大 NAS

小改变大不同

The layers within the second and third dense block consistently assign the least weight to the outputs of the transition layer (the top row of the triangles), indicating that the transition layer outputs many redundant features (with low weight on average). This is in keeping with the strong results of DenseNet-BC where exactly these outputs are compressed.

冗余信息正好被压缩