[RelativeNAS] Relative Neural Architecture Search via Slow-Fast Learning

最新推荐文章于 2024-06-03 14:38:01 发布

粽子小黑

最新推荐文章于 2024-06-03 14:38:01 发布

阅读量219

点赞数

分类专栏：文献总结文章标签： autoML NAS

本文链接：https://blog.csdn.net/weixin_40422121/article/details/119240918

版权

文献总结专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Relative Neural Architecture Search via Slow-Fast Learning

First author：Tan Hao [PDF]

NAS: Neural Architecture Search 神经架构搜素

automating the design of artificial neural networks

Motivation

To benefit form the merits while overcoming the deficiencies of the differentiable NAS and population-based NAS.

Deficiencies

Differentiable NAS

Search by gradient can be ineffective due to the lack of proper diversity

Population-Based NAS

Search efficiency is poor due to the stochastic crossover/mutation and a large number of performance evaluations.

Method

continuous encoding scheme

Spired by H. Liu, K. Simonyan, and Y. Yang, “DARTS: Differentiable architecture search,” in Proceedings of the International Conference on Learning Representations, 2019.

Cell-based architecture

Two types of cells: the normal cell and the reduction cell(down sampling to reduce the size of the feature)

Encodes the node and the operation separately(represented by a real value interval)

The network is a DAG(Directed acyclic graph)

different with DARTS: 1) no requirement of differentiability 2) directly encode the operations between pairwise nodes into real values

Pros: provide more flexibility and versatility

networks can achieve promising performance and high transferability for different tasks by adjusting the total number of cells in the final architecture

slow-fast learning paradigm

inspire by *Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He*; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6202-6211 , which proposed use Fast pathway(high frame rate) and Slow pathway(low frame rate) in model for video recognition

in each pair of architecture vectors, conside the one with worse performance as slow-learning and the one with better performance as fast-learning.

architecture vectors are upated by pseudo-gradient mechanism which detemined by slow-learning and fast-learning

At each generation, the population is randomly divided into $N / 2$ pairs, $\boldsymbol{\alpha}_{p, s}^{g}$ is updated by learning form $\boldsymbol{\alpha}_{p, f}^{g}$ with:

$\Delta \boldsymbol{\alpha}_{p, s}^{g}=\lambda_{1}\left(\boldsymbol{\alpha}_{p, f}^{g}-\boldsymbol{\alpha}_{p, s}^{g}\right)+\lambda_{2} \Delta \boldsymbol{\alpha}_{p, s}^{g-1}$

Due to pseudo-gradient based mechanism, the RelativeNAS is applicable any other generic continuously encoded search space

noval performance estimation strategy

adopt an operation collection as a weight set to estimate the performances

the weight set is not directly trained but update in an online manner

RelativeNAS is intuitively feasible to use performance estimations to obtain the approximate validation losses of the candidate architectures.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-uZwGrnhE-1627613441421)(/Users/coffechoz/Documents/Markdown/NAS/RelativeNAS.imgs/image-20210727111950806.png)]

Pros: save substantial computation costs

Result

Pick CIFAR-10 as dataset

It only takes about nine hours with a unique 1080Ti or seven hours with a Tesla V100 to complete the above search procedure.
RelativeNAS + Cutout has low Test Error(2.34%) and middle Params(3.93M) , efficient

Transferability Analyses

Intra-task Transferability: CIFAR-100, ImageNet
Inter-task Transferability: Object Detection, Semantic Segmentation, Keypoint Detection

Conclusion

This work highlight the merits of differentiable NAS and combining population-based NAS, to be more effective and more efficient. Moreover, the proposed slow-fast learning paradigm can be also potentially applicable to other generic learning/optimization tasks.