[RelativeNAS] Relative Neural Architecture Search via Slow-Fast Learning

Relative Neural Architecture Search via Slow-Fast Learning

First author:Tan Hao [PDF]

NAS: Neural Architecture Search 神经架构搜素

automating the design of artificial neural networks

Motivation

To benefit form the merits while overcoming the deficiencies of the differentiable NAS and population-based NAS.

Deficiencies

Differentiable NAS

Search by gradient can be ineffective due to the lack of proper diversity

Population-Based NAS

Search efficiency is poor due to the stochastic crossover/mutation and a large number of performance evaluations.

Method

continuous encoding scheme

Spired by H. Liu, K. Simonyan, and Y. Yang, “DARTS: Differentiable architecture search,” in Proceedings of the International Conference on Learning Representations, 2019.

Cell-based architecture

Two types of cells: the normal cell and the reduction cell(down sampling to reduce the size of the feature)

Encodes the node and the operation separately(represented by a real value interval)

The network is a DAG(Directed acyclic graph)

different with DARTS: 1) no requirement of differentiability 2) directly encode the operations between pairwise nodes into real values

Pros: provide more flexibility and versatility

networks can achieve promising performance and high transferability for different tasks by adjusting the total number of cells in the final architecture

slow-fast learning paradigm

inspire by *Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He*; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6202-6211 , which proposed use Fast pathway(high frame rate) and Slow pathway(low frame rate) in model for video recognition

in each pair of architecture vectors, conside the one with worse performance as slow-learning and the one with better performance as fast-learning.

architecture vectors are upated by pseudo-gradient mechanism which detemined by slow-learning and fast-learning

At each generation, the population is randomly divided into N / 2 N/2 N/2 pairs, α p , s g \boldsymbol{\alpha}_{p, s}^{g} αp,sg is updated by learning form α p , f g \boldsymbol{\alpha}_{p, f}^{g} αp,fg with:

Δ α p , s g = λ 1 ( α p , f g − α p , s g ) + λ 2 Δ α p , s g − 1 \Delta \boldsymbol{\alpha}_{p, s}^{g}=\lambda_{1}\left(\boldsymbol{\alpha}_{p, f}^{g}-\boldsymbol{\alpha}_{p, s}^{g}\right)+\lambda_{2} \Delta \boldsymbol{\alpha}_{p, s}^{g-1} Δαp,sg=λ1(αp,fgαp,sg)+λ2Δαp,sg1

Due to pseudo-gradient based mechanism, the RelativeNAS is applicable any other generic continuously encoded search space

noval performance estimation strategy

adopt an operation collection as a weight set to estimate the performances

the weight set is not directly trained but update in an online manner

RelativeNAS is intuitively feasible to use performance estimations to obtain the approximate validation losses of the candidate architectures.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-uZwGrnhE-1627613441421)(/Users/coffechoz/Documents/Markdown/NAS/RelativeNAS.imgs/image-20210727111950806.png)]

Pros: save substantial computation costs

Result

Pick CIFAR-10 as dataset

  • It only takes about nine hours with a unique 1080Ti or seven hours with a Tesla V100 to complete the above search procedure.

  • RelativeNAS + Cutout has low Test Error(2.34%) and middle Params(3.93M) , efficient

Transferability Analyses

  • Intra-task Transferability: CIFAR-100, ImageNet
  • Inter-task Transferability: Object Detection, Semantic Segmentation, Keypoint Detection

Conclusion

This work highlight the merits of differentiable NAS and combining population-based NAS, to be more effective and more efficient. Moreover, the proposed slow-fast learning paradigm can be also potentially applicable to other generic learning/optimization tasks.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值