其他论文杂读

最新推荐文章于 2022-08-26 19:57:51 发布

PeaceInMind

最新推荐文章于 2022-08-26 19:57:51 发布

阅读量899

点赞数

分类专栏：图像论文杂读

本文链接：https://blog.csdn.net/PeaceInMind/article/details/92072049

版权

图像同时被 2 个专栏收录

24 篇文章 1 订阅

订阅专栏

论文杂读

8 篇文章 0 订阅

订阅专栏

未看或未总结

Secure Federated Transfer Learning

Federated Learning of Deep Networks using Model Averaging

CHIP: Channel-wise Disentangled Interpretation of Deep Convolutional Neural Networks

Batch Normalization is a Cause of Adversarial Vulnerability

[201905-arxiv] Object Discovery with a Copy-Pasting GAN

[2019-ICLR] RETHINKING THE VALUE OF NETWORK PRUNING

Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned “important” weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited “important” weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm.

在training from scratch的时候，小模型训练的时候的computing cost最好接近原始的，大概率有可能取得更好的效果。

跟上一篇文章对比看，训练神经网络真的比炼丹还玄妙，不同的超参数初始化完全可以得到不同的结论。

[2019-ICLR-oral]THE LOTTERY TICKET HYPOTHESIS : FINDING SPARSE, TRAINABLE NEURAL NETWORKS

这篇文章很有意思，不过对着RETHINKING THE VALUE OF NETWORK PRUNING一起看会更有意思。一般在一个已经训练好的神经网络上通过pruning等方式可以得到一个更小网络同时精度损失不大，但是当用这个小网络架构重新初始化并重新训练，一般效果跟之前的会有很大差距。作者做了一个简单变化，小网络的初始化还是拷贝之前大网络里面对应的权重，就可以取得更高的精度

[201905-arxiv] Learning Sparse Networks Using Targeted Dropout

主要是在训练的过程中就考虑要pruning的事情。具体点比如可以通过对权重的幅值属性或者其他属性进行排序，挑选一些比较低的权值进行随机dropout.

[2019-ICLR] Dynamic Channel Pruning: Feature Boosting and Suppression

用SE的思路去做channle的saliency预测，然后用类似于relu的思想只对topk的进行激活

有点不理解的是还对SE的激活做了一个loss

[201905-arxiv]EnsembleNet: End-to-End Optimization of Multi-headed Models [paper]

这篇论文还不是很理解，不知道最大的创新点在哪。

在loss上有一个有意思的观察

One might expect that an optimal lamda should be somewhere between 0 and 1. However, for most of the strongly performing networks we experimented with on Youtube-8M and ImageNet, the optimal values for lamda are negative! Basically if we decrease lamda from 1 all the way to some negative number (e.g., lamda -1.5), the performance of the resulting model decreases on the train set and increases on the holdout set, shrinking the gap between them.

[2018-NIPS] Collaborative Learning for Deep Neural Networks

We introduce collaborative learning in which multiple classifier heads of the same network are simultaneously trained on the same training data to improve generalization and robustness to label noise with no extra inference cost.

加了multihead,在损失上加了consensus loss。训练的时候是multihead，测试的时候是single head

学习到的地方是Backpropagation rescaling.

[201905-arxiv] Multi-Sample Dropout for Accelerated Training and Better Generalization [paper]

作者在dropout的基础上提出了一种非常简单的正则化的方法，如下图。当然由于现在cnn的可解释比较差，作者有自己的解释。个人的一点疑惑是在加速训练上，因为这样设计的时候你也可以认为是一种变相提高学习率的方式

[2019-ICLR] FIXUP INITIALIZATION: RESIDUAL LEARNING WITHOUT NORMALIZATION

以后得多看些ICLR, NIPS之类的，理解问题确实不一样

这个看了好几遍，还没有看出第二节和第三节之间的联系，看来数学已经还给老师了，主要是这句，公式里的变量除了方差外，没看出跟网络深度有关--！水平太菜了

Our analysis in the previous section points out the failure mode of standard initializations for training
deep residual network: the gradient norm of certain layers is in expectation lower bounded by a
quantity that increases indefinitely with the network depth

[2018-NIPS]DropBlock: A regularization method for convolutional networks [paper]

dropout系列有DropConnect,maxout,variational dropout, droppath, zoneout, cutout,spatial dropout,shakedrop

stochastic depth,需要看

性能： On ImageNet classification, ResNet-50 architecture with DropBlock achieves 78.13% accuracy, which is more than 1.6% improvement on the baseline. On COCO detection, DropBlock improves Average Precision of RetinaNet from 36.8% to 38.4%

出发点： Although dropout is widely used as a regularization technique for fully connected layers, it is often less effective for convolutional layers. This lack of success of dropout for convolutional layers is perhaps due to the fact that activation units in convolutional layers are spatially correlated so information can still flow through convolutional networks despite dropout