Decoupled Networks(DCNet) 阅读笔记

1,Motivation: CNN-learned features are naturally decoupled with the norm of features corresponding to the intra-class variation and the angle corresponding to the semantic difference. 通过对两个样本的CNN特征进行点积,从而计算两个样本的相似度,上述操作可以分解为:特征向量的模的乘积用来衡量类内差别,特征向量的夹角大小用来衡量类间差别。
1,1 Understanding how convolution naturally leads to discriminative representation and good generalization.
1.2 The original CNNs make a strong assumption that the intra-class variation can be linearly modeled via the multiplication of norms and the semantic difference is described by the cosine of the angle. However, this modeling approach is not necessarily optimal for all tasks.
1,3 w,x=wTx ⟨ w , x ⟩ = w T x couples the semantic difference and the intra-class variation in one unified measure. 这便引出一个问题:如果向量内积很大,我们不知道是因为夹角小(两个样本是一个类别),还是因为模大(两个样本可能不是一个类别)。所以,我们需要对模和夹角进行解耦合,也就是改写基本的 w,x=wTx=w2x2cos(θw,x) ⟨ w , x ⟩ = w T x = ‖ w ‖ 2 ‖ x ‖ 2 cos ⁡ ( θ w , x ) 操作。

2,Contribution: We propose a generic decoupled learning framework which models the intra-class variation and semantic difference independently. 本文提出基于CNN的解耦合的学习架构,从而分别对类内差别和语义差别进行建模。
2.1 Decoupling norm and angle in inner product can better model the intra-class variation and the semantic difference in deep networks. 通过对模和夹角进行解耦合,可以更好地在深度框架下对类内差别和类间差别进行建模。
2.2 We propose two different types of decoupled convolution operators: bounded operators and unbounded operators.
The bounded operators may yield faster convergence and better robustness against adversarial attacks.
The unbounded operators may have better representational power.
2.3 We introduce a operator radius for the decoupled operators, which allow the decoupled operators learnable.

3,Advantage:
3.1 不仅可以设计函数进行解耦合学习,而且可以学习这些函数而不是固定他们;
3.2 通过结合限幅函数,DCNet可以更快的收敛并相比传统CNN取得更好的效果;
3.3 可以通过一定手段利用限幅函数冻结每个类的特征空间,从而获得更强的鲁棒性;
3.4 提出的新解耦合操作可以方便的融入当前的深度学习框架中;

4,Related Works
4.1 Improving the discriminativeness of learned CNN features;
Deep hyperspherical learning. In NIPS, 2017. —— Only cares about the semantic difference and aims to compress the intra-class variation to a space that is as small as possible, while the decoupled framework focuses on both, and providing the flexibility to design or learn both magnitude function and angular function.

5, f(w,x)=w,x=wTx=w2x2cos(θw,x)fd(w,x)=h(w2,x2)g(θ(w,x)) f ( w , x ) = ⟨ w , x ⟩ = w T x = ‖ w ‖ 2 ‖ x ‖ 2 cos ⁡ ( θ w , x ) → f d ( w , x ) = h ( ‖ w ‖ 2 , ‖ x ‖ 2 ) ⋅ g ( θ ( w , x ) )

Next, we first consider the decoupled operator without the norm of the weights, where w ‖ w ‖ is not included in h() h ( ⋅ ) .

6,Bounded Decoupled Operators
6.1 Hyperspherical Convolution. Let h(w,x)=α h ( ‖ w ‖ , ‖ x ‖ ) = α , then fd(w,x)=αg(θ(w,x)) f d ( w , x ) = α ⋅ g ( θ ( w , x ) ) , and this can be viewed as projecting w w and x to a hypersphere and then performing inner product, improving the problem conditioning in neural networks, making the network converge better.
6.2 Hyperball Convolution. Let h(w,x)=αmin(x,ρ)/ρ h ( ‖ w ‖ , ‖ x ‖ ) = α min ( ‖ x ‖ , ρ ) / ρ , then fd(w,x)=αmin(x,ρ)ρg(θ(w,x)) f d ( w , x ) = α ⋅ min ( ‖ x ‖ , ρ ) ρ ⋅ g ( θ ( w , x ) ) , which can be viewed as projecting w to a hypersphere and projecting the input x x to a hyperball, and then performing the inner product. BallConv is more robust and flexible than SphereConv.
6.3 Hyperbolic Tangent Convolution. fd(w,x)=αtanh(xρ)g(θ(w,x)), a smooth decoupled operator with bounded output, which not only shares the same advantages as BallConv but also has more convergence gain due to its smoothness.

7,Unbounded Decoupled Operators
7.1 Linear Convolution. fd(w,x)=αwg(θ(w,x)) f d ( w , x ) = α ‖ w ‖ ⋅ g ( θ ( w , x ) ) , which differs the original convolution in the sense that it projects the weights to a hypersphere and has a parameter to control the slope.
7.2 Segmented Convolution. A flexible multi-range linear function corresponding to x ‖ x ‖ .
7.3 Logarithm Convolution. fd(w,x)=αlog(1+x)g(θ(w,x)) f d ( w , x ) = α log ⁡ ( 1 + ‖ x ‖ ) ⋅ g ( θ ( w , x ) ) .
7.4 Mixed Convolution. fd(w,x)=α(x+log(1+x))g(θ(w,x)) f d ( w , x ) = α ( ‖ x + ‖ log ⁡ ( 1 + ‖ x ‖ ) ) ⋅ g ( θ ( w , x ) ) which combines LogConv and LinearConv, becoming more flexible than both original operators.

8,Angular Activation Function
8.1 Linear angular activation: g(θw,x)=2πθw,x+1 g ( θ w , x ) = − 2 π θ w , x + 1 ;
8.2 Cosine angular activation: g(θw,x)=cos(θw,x) g ( θ w , x ) = cos ⁡ ( θ w , x ) ;
8.3 Sigmoid angular activation: g(θ(w,x))=1+expπ2k1expπ2k1exp(θ(w,x)kπ2k)1+exp(θ(w,x)kπ2k) g ( θ ( w , x ) ) = 1 + exp ⁡ − π 2 k 1 − exp ⁡ − π 2 k ⋅ 1 − exp ⁡ ( θ ( w , x ) k − π 2 k ) 1 + exp ⁡ ( θ ( w , x ) k − π 2 k )
8.4 Square cosine angular activation: g(θ(w,x))=sign(cos(θ))cos2(θ) g ( θ ( w , x ) ) = sign ( cos ⁡ ( θ ) ) ⋅ cos 2 ⁡ ( θ )

9,Learnable Decoupled Operators:学习所定义函数中的超参数,建议有大量数据的时候这样做;

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值