Decoupled Networks(DCNet) 阅读笔记

最新推荐文章于 2024-06-15 09:36:17 发布

aiqiu_gogogo

最新推荐文章于 2024-06-15 09:36:17 发布

阅读量2.2k

点赞数

分类专栏： cnn 文章标签： deep learning cnn ai

本文链接：https://blog.csdn.net/aiqiu_gogogo/article/details/80877137

版权

cnn 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

1，Motivation: CNN-learned features are naturally decoupled with the norm of features corresponding to the intra-class variation and the angle corresponding to the semantic difference. 通过对两个样本的CNN特征进行点积，从而计算两个样本的相似度，上述操作可以分解为：特征向量的模的乘积用来衡量类内差别，特征向量的夹角大小用来衡量类间差别。
1,1 Understanding how convolution naturally leads to discriminative representation and good generalization.
1.2 The original CNNs make a strong assumption that the intra-class variation can be linearly modeled via the multiplication of norms and the semantic difference is described by the cosine of the angle. However, this modeling approach is not necessarily optimal for all tasks.
1,3 $\left \langle w,x \right \rangle=w^{T}x$ couples the semantic difference and the intra-class variation in one unified measure. 这便引出一个问题：如果向量内积很大，我们不知道是因为夹角小（两个样本是一个类别），还是因为模大（两个样本可能不是一个类别）。所以，我们需要对模和夹角进行解耦合，也就是改写基本的 $\left \langle w,x \right \rangle=w^{T}x=\left \| w \right \|_2\left \| x \right \|_2\cos\left ( \theta_{w,x} \right )$ 操作。

2，Contribution: We propose a generic decoupled learning framework which models the intra-class variation and semantic difference independently. 本文提出基于CNN的解耦合的学习架构，从而分别对类内差别和语义差别进行建模。
2.1 Decoupling norm and angle in inner product can better model the intra-class variation and the semantic difference in deep networks. 通过对模和夹角进行解耦合，可以更好地在深度框架下对类内差别和类间差别进行建模。
2.2 We propose two different types of decoupled convolution operators: bounded operators and unbounded operators.
The bounded operators may yield faster convergence and better robustness against adversarial attacks.
The unbounded operators may have better representational power.
2.3 We introduce a operator radius for the decoupled operators, which allow the decoupled operators learnable.

3，Advantage:
3.1 不仅可以设计函数进行解耦合学习，而且可以学习这些函数而不是固定他们；
3.2 通过结合限幅函数，DCNet可以更快的收敛并相比传统CNN取得更好的效果；
3.3 可以通过一定手段利用限幅函数冻结每个类的特征空间，从而获得更强的鲁棒性；
3.4 提出的新解耦合操作可以方便的融入当前的深度学习框架中；

4，Related Works
4.1 Improving the discriminativeness of learned CNN features;
Deep hyperspherical learning. In NIPS, 2017. —— Only cares about the semantic difference and aims to compress the intra-class variation to a space that is as small as possible, while the decoupled framework focuses on both, and providing the flexibility to design or learn both magnitude function and angular function.

5， $f\left ( w,x \right )=\left \langle w,x \right \rangle=w^{T}x=\left \| w \right \|_2\left \| x \right \|_2\cos\left ( \theta_{w,x} \right )\rightarrow f_d\left ( w,x \right )=h\left ( \left \| w \right \|_2 ,\left \| x \right \|_2\right )\cdot g\left ( \theta_\left ( w,x \right ) \right )$

Next, we first consider the decoupled operator without the norm of the weights, where $\left \| w \right \|$ is not included in $h\left ( \cdot \right )$ .

6，Bounded Decoupled Operators
6.1 Hyperspherical Convolution. Let $h\left ( \left \| w \right \|,\left \| x \right \| \right )=\alpha$ , then $f_d\left ( w,x \right )=\alpha \cdot g\left ( \theta_ \left ( w,x \right ) \right )$ , and this can be viewed as projecting $w$ and $x$ to a hypersphere and then performing inner product, improving the problem conditioning in neural networks, making the network converge better.
6.2 Hyperball Convolution. Let $h\left ( \left \| w \right \|,\left \| x \right \| \right )=\alpha \min\left ( \left \| x \right \|,\rho \right )/\rho$ , then $f_d\left ( w,x \right )=\alpha \cdot \frac{\min\left ( \left \| x \right \|,\rho \right )}{\rho }\cdot g\left ( \theta_{\left ( w,x \right )} \right )$ , which can be viewed as projecting w to a hypersphere and projecting the input $x$ to a hyperball, and then performing the inner product. BallConv is more robust and flexible than SphereConv.
6.3 Hyperbolic Tangent Convolution. $f_d\left ( w,x \right )=\alpha \cdot \tanh\left ( \frac{\left \| x \right \|}{\rho } \right )\cdot g\left ( \theta_{\left ( w,x \right )} \right )$ , a smooth decoupled operator with bounded output, which not only shares the same advantages as BallConv but also has more convergence gain due to its smoothness.

7，Unbounded Decoupled Operators
7.1 Linear Convolution. $f_d\left ( w,x \right )=\alpha \left \| w \right \| \cdot g\left ( \theta_{\left ( w,x \right )} \right )$ , which differs the original convolution in the sense that it projects the weights to a hypersphere and has a parameter to control the slope.
7.2 Segmented Convolution. A flexible multi-range linear function corresponding to $\left \| x \right \|$ .
7.3 Logarithm Convolution. $f_d\left ( w,x \right )=\alpha \log\left ( 1+\left \| x \right \| \right )\cdot g\left ( \theta_{\left ( w,x \right )} \right )$ .
7.4 Mixed Convolution. $f_d\left ( w,x \right )=\alpha \left( \left \| x + \right \|\log\left ( 1+\left \| x \right \| \right )\right )\cdot g\left ( \theta_{\left ( w,x \right )} \right )$ which combines LogConv and LinearConv, becoming more flexible than both original operators.

8，Angular Activation Function
8.1 Linear angular activation： $g\left ( \theta_{w,x} \right )=-\frac{2}{\pi}\theta_{w,x}+1$ ;
8.2 Cosine angular activation： $g\left ( \theta_{w,x} \right )=\cos \left ( \theta_{w,x} \right )$ ;
8.3 Sigmoid angular activation： $g\left ( \theta_{\left ( w,x \right )} \right )=\frac{1+\exp{-\frac{\pi}{2k}}}{1-\exp{-\frac{\pi}{2k}}}\cdot \frac{1-\exp\left ( {\frac{\theta_{(w,x)}}{k}-\frac{\pi}{2k}}\right )}{1+\exp\left ( {\frac{\theta_{(w,x)}}{k}-\frac{\pi}{2k}}\right )}$
8.4 Square cosine angular activation： $g\left ( \theta_{\left ( w,x \right )} \right )=\text{sign}\left ( \cos\left ( \theta \right ) \right )\cdot \cos^2\left ( \theta \right )$

9，Learnable Decoupled Operators：学习所定义函数中的超参数，建议有大量数据的时候这样做；

aiqiu_gogogo

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Decoupled Networks(DCNet) 阅读笔记

1，Motivation: CNN-learned features are naturally decoupled with the norm of features corresponding to the intra-class variation and the angle corresponding to the semantic difference. 通过对两个样本的CNN特征进行点...
复制链接

扫一扫

专栏目录