其他论文杂读

未看或未总结

Secure Federated Transfer Learning

Federated Learning of Deep Networks using Model Averaging

CHIP: Channel-wise Disentangled Interpretation of Deep Convolutional Neural Networks

Batch Normalization is a Cause of Adversarial Vulnerability

[201905-arxiv] Object Discovery with a Copy-Pasting GAN

 

[2019-ICLR] RETHINKING THE VALUE OF NETWORK PRUNING

Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned “important” weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited “important” weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm.

在training from scratch的时候,小模型训练的时候的computing cost最好接近原始的,大概率有可能取得更好的效果。

跟上一篇文章对比看,训练神经网络真的比炼丹还玄妙,不同的超参数初始化完全可以得到不同的结论。

 

[2019-ICLR-oral]THE LOTTERY TICKET HYPOTHESIS : FINDING SPARSE, TRAINABLE NEURAL NETWORKS

这篇文章很有意思,不过对着RETHINKING THE VALUE OF NETWORK PRUNING一起看会更有意思。一般在一个已经训练好的神经网络上通过pruning等方式可以得到一个更小网络同时精度损失不大,但是当用这个小网络架构重新初始化并重新训练,一般效果跟之前的会有很大差距。作者做了一个简单变化,小网络的初始化还是拷贝之前大网络里面对应的权重,就可以取得更高的精度

 

[201905-arxiv] Learning Sparse Networks Using Targeted Dropout

主要是在训练的过程中就考虑要pruning的事情。具体点比如可以通过对权重的幅值属性或者其他属性进行排序,挑选一些比较低的权值进行随机dropout.

 

[2019-ICLR] Dynamic Channel Pruning: Feature Boosting and Suppression

用SE的思路去做channle的saliency预测,然后用类似于relu的思想只对topk的进行激活

有点不理解的是还对SE的激活做了一个loss

[201905-arxiv]EnsembleNet: End-to-End Optimization of Multi-headed Models [paper]

这篇论文还不是很理解,不知道最大的创新点在哪。

 

loss上有一个有意思的观察

One might expect that an optimal lamda should be somewhere between 0 and 1. However, for most of the strongly performing networks we experimented with on Youtube-8M and ImageNet, the optimal values for lamda are negative! Basically if we decrease lamda from 1 all the way to some negative number (e.g., lamda -1.5), the performance of the resulting model decreases on the train set and increases on the holdout set, shrinking the gap between them.

[2018-NIPS] Collaborative Learning for Deep Neural Networks

We introduce collaborative learning in which multiple classifier heads of the same network are simultaneously trained on the same training data to improve generalization and robustness to label noise with no extra inference cost.

加了multihead,在损失上加了consensus loss。训练的时候是multihead,测试的时候是single head

学习到的地方是Backpropagation rescaling.

 

[201905-arxiv] Multi-Sample Dropout for Accelerated Training and Better Generalization [paper]

作者在dropout的基础上提出了一种非常简单的正则化的方法,如下图。当然由于现在cnn的可解释比较差,作者有自己的解释。个人的一点疑惑是在加速训练上,因为这样设计的时候你也可以认为是一种变相提高学习率的方式

[2019-ICLR] FIXUP INITIALIZATION: RESIDUAL LEARNING WITHOUT NORMALIZATION

以后得多看些ICLR, NIPS之类的,理解问题确实不一样

这个看了好几遍,还没有看出第二节和第三节之间的联系,看来数学已经还给老师了,主要是这句,公式里的变量除了方差外,没看出跟网络深度有关--!水平太菜了

Our analysis in the previous section points out the failure mode of standard initializations for training
deep residual network: the gradient norm of certain layers is in expectation lower bounded by a
quantity that increases indefinitely with the network depth

 

[2018-NIPS]DropBlock: A regularization method for convolutional networks [paper]

dropout系列有DropConnect,maxout,variational dropout, droppath, zoneout, cutout,spatial dropout,shakedrop

stochastic depth,需要看

性能: On ImageNet classification, ResNet-50 architecture with DropBlock achieves 78.13% accuracy, which is more than 1.6% improvement on the baseline. On COCO detection, DropBlock improves Average Precision of RetinaNet from 36.8% to 38.4%

出发点: Although dropout is widely used as a regularization technique for fully connected layers, it is often less effective for convolutional layers. This lack of success of dropout for convolutional layers is perhaps due to the fact that activation units in convolutional layers are spatially correlated so information can still flow through convolutional networks despite dropout

dropblock主要有两个控制参数,一个是gamma,类似于drop ratio,另外一个是blocksize,就是控制drop矩形的范围

不过我想看一下分别对大小物体的效果,不过作者没有给

 

[2018-NIPS-oral] How Does Batch Normalization Help Optimization[paper]

传送门

[2016-CVPR] Learning Deep Features for Discriminative Localization [paper]

需要用global average pooling 和FC去替换网络

[2017-ICCV]Gradcam: Visual explanations from deep networks via gradient-based localization [paper]

在cam的基础上做了两点改变,alpha(也就是上面的w),在得到salient图前加了个relu

 

加了一个Guided Grad-CAM,不太能理解作者的解释。个人的理解是通过class的激活,往回传gradient至图片,并加这个梯度图与salient图点乘

[2018-CVPR-未理解]Interpretable Convolutional Neural Networks [paper]

思路:主要是为每一个filter加个loss以使得每个filter单独对应(排他性)一个独特的object parts or object.

前馈:

对应于一个特定的filter,对一个输入特征图进行卷积和Relu后,会得到一个n*n的特征图f,这样就可以找到激活值最大的点,认为这个点是这个filter所在的地方,并抑制其他地方的激活。(如果有多个类似的part,这样做合理吗?)

抑制的方法就是作者设计了一些template跟f进行卷积,作者这里采用的是L1 distance

反馈:

our loss pushes filter f to represent a specific object part of the category c and keep silent on images of other categories

首先得决定f是对应于哪个类别

作者定义的损失如下,初看有点像TKL距离的累加和,数学差不太能理解

 

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值