Neural Architecture Search: A survey

最新推荐文章于 2023-07-17 23:41:01 发布

Dr鹏

最新推荐文章于 2023-07-17 23:41:01 发布

阅读量1.4k

点赞数

分类专栏：神经架构搜索NAS

原文链接：http://jmlr.org/papers/volume20/18-598/18-598.pdf

版权

神经架构搜索NAS 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

参考文献：Thomas E, Jan Hendrik M, Frank H. Neural Architecture Search: A Survey[J]. Journal of Machine Learning Research, 2019, 20(55):1–21. https://arxiv.org/abs/1808.05377v2、 http://jmlr.org/papers/volume20/18-598/18-598.pdf

1. Introduction

在这里插入图片描述图1 NAS方法抽象图示。搜索策略从预定义的搜索空间 $A$ 中挑选一个结构A进行性能评估，并将评估结果反馈给搜索策略。

2. Search Space

2.1搜索空间定义：

The search space defines which neural architectures a NAS approach might discover in principle.

2.2 常见的搜索空间举例：

2.2.1 简单链式搜索空间：

抽象化定义：如图2中左图所示：A chain-structured neural network architecture A can be written as a sequence of n layers, where the i’th layer Li receives its input from layer i − 1 and its output serves as the input for layer i + 1, i.e., A = Ln ◦ . . . L1 ◦ L0.
具象化定义：简单链式搜索空间的参数：(1)（最大）层数n；（2）每一层执行的运算类型，如池化、卷积或其他更先进的运算，例如深度可分离卷积 (Chollet, 2016)或空洞卷积 (Yu and Koltun, 2016)；（3）与各层运算有关的超参数，例如滤波器的个数、卷积核的尺寸、全连接网络的单元数等。（3）中的超参数受制于（2）中的参数选取（其实（2）中的参数也受制于（1）的确定），所以简单链式搜索空间是不定长的且属于条件空间（conditional space）。
在这里插入图片描述

2.2.2 复杂多分支搜索空间

抽象化定义：如图2中右图所示，Recent work on NAS incorporates modern design elements known from hand-crafted architectures, such as skip connections, which allow to build complex, multi-branch networks
引入现代结构的相关工作：
Andrew Brock. SMASH: one-shot model architecture search through hypernetworks. In NIPS Workshop on Meta-Learning, 2017
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Simple And Efficient Architecture Search for Convolutional Neural Networks. In NIPS Workshop on Meta-Learning, 2017.
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. Learning transferable architectures for scalable image recognition. In Conference on Computer Vision and Pattern Recognition, 2018.
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Efficient multi-objective neural architecture search via lamarckian evolution. In International Conference on Learning Representations, 2019.
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. Aging Evolution for Image Classifier Architecture Search. In AAAI, 2019.
Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, and Yong Yu. Path-Level Network Transformation for Efficient Architecture Search. In International Conference on Machine Learning, June 2018b.
具象化定义： In this case the input of layer i can be formally described as a function $g_i(L_{i-1}^{out}, ...,L_{0}^{out})$ combining previous layer outputs.
几个特例：
(i)简单链式结构： $g_i(L_{i-1}^{out}, ...,L_{0}^{out})=L_{i-1}^{out}$
(ii)残差网络： $g_i(L_{i-1}^{out}, ...,L_{0}^{out})=L_{i-1}^{out}+L_{j}^{out},j<i-1$
(iii)DenseNets： $g_i(L_{i-1}^{out}, ...,L_{0}^{out})=concat(L_{i-1}^{out}, ...,L_{0}^{out})$

2.2.3 块搜索空间

受以下工作的启发，重复主题可以构成高性能网络：
InceptionNet、ResNet、DenseNet
BlockQNN: Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, and Cheng-Lin Liu. Practical block-wise neural network architecture generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2423–2432, 2018a.

在这里插入图片描述
块搜索空间的三大优势：

搜索空间维度得以降低，The size of the search space is drastically reduced.
迁移能力强，Architectures built from cells can more easily be transferred or adapted to other data sets by simply varying the number of cells and filters used within a model.
同时适用于CNN, RNN，Creating architectures by repeating building blocks has proven a useful design principle in general

鉴于上述优势，采用块搜索空间的代表性研究工作包括：
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. Aging Evolution for Image Classifier Architecture Search. In AAAI, 2019.
PNAS：Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive Neural Architecture Search. In European Conference on Computer Vision, 2018a.
ENAS: Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. Efficient neural architecture search via parameter sharing. In International Conference on Machine Learning, 2018.
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Efficient multi-objective neural architecture search via lamarckian evolution. In International Conference on Learning Representations, 2019.
Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, and Yong Yu. Path-Level Network Transformation for Efficient Architecture Search. In International Conference on Machine Learning, June 2018b.
DARTS: Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: Differentiable architecture search. In International Conference on Learning Representations, 2019b.
BlockQNN: Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, and Cheng-Lin Liu. Practical block-wise neural network architecture generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2423–2432, 2018a.

但使用块搜索空间时，需要考虑一些新的设计选择，即如何选择宏结构（macro-architecture
）的问题：how many cells shall be used and how should they be connected to build the actual model?
理想情况下，宏结构和块内微观结构需要同时优化，但现实中，可操作的优化方案往往是对搜索空间分层次讨论。The first level consists of the set of primitive operations, the second level of different motifs that connect primitive operations via a directed acyclic graph, the third level of motifs that encode how to connect second-level motifs, and so on.
Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, and Koray Kavukcuoglu. Hierarchical Representations for Efficient Architecture Search. In International Conference on Learning Representations, 2018b.

3.Search Strategy

许多搜索策略都可用于探索神经架构空间, including random search(RS), Bayesian optimization(BO), evolutionary algorithms(EA), reinforcement learning (RL), and gradient-based methods(GM)
机器学习时代大多用EA优化神经架构
BO在DNN超参数优化（Hyper Parameter Optimization）中取得较好效果

3.1 强化学习

RL推动了NAS的研究热潮:Different RL approaches differ in how they represent the agent’s policy and how they optimize it.
(1) use a recurrent neural network (RNN) policy to sequentially sample a string that in turn encodes the neural architecture,包括以下工作
Barret Zoph: NAS-RL
Bowen Baker: MetaQNN
NASNet PPO
BlockQNN、Faster BlockQNN
（2）将上述RL模型继续简化为MAB（多臂老虎机问题）模型，使用bi-directional LSTM编码变长网络架构，使用REINFORCE策略梯度端到端训练该bi-directional LSTM，包括以下工作：
Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. Efficient architecture search by network transformation. In Association for the Advancement of Artificial Intelligence, 2018a.
Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, and Yong Yu. Path-Level Network Transformation for Efficient Architecture Search. In International Conference on Machine Learning, June 2018b.

3.2 进化算法

Neuro-evolutionary methods differ in how they sample parents, update populations, and generate offsprings.

3.3 贝叶斯优化

3.4 梯度优化方法

3.5 其他优化方法

Monte Carlo Tree Search

hill climbing

4. Performance Estimation Strategy

4.1 朴素性能评估方法

架构搜索旨在发现一个能够最大化某些性能度量的神经网络架构，为指导搜索过程，架构搜索策略需要准确估计给定架构A的性能。最简单的评估方式为在训练集上训练该网络并在测试集上评估泛化性能。但这种性能估计策略通常需要消耗几千GPU days的计算量（NASNet、AmoebaNet），在算力资源受限的条件下进行NAS必须研究性能评估加速方案。

4.2 低保真度评估(Lower fidelity estimates)

通过降低保真度进行性能估计的方法可分为以下四个方向：

shorter training times:
NASNet Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. Learning transferable architectures for scalable image recognition. In Conference on Computer Vision and Pattern Recognition, 2018.
Arber Zela, Aaron Klein, Stefan Falkner, and Frank Hutter. Towards automated deep learning: Efficient joint neural architecture and hyperparameter search. In ICML 2018 Workshop on AutoML (AutoML 2018), 2018.
training on a subset of the data
Aaron Klein, Stefan Falkner, Simon Bartels, Philipp Hennig, and Frank Hutter. Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets. In Aarti Singh and Jerry Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 528–536, Fort Lauderdale, FL, USA, 20–22 Apr 2017b. PMLR.
training on lower-resolution images
Patryk Chrabaszcz, Ilya Loshchilov, and Frank Hutter. A downsampled variant of imagenet as an alternative to the CIFAR datasets. CoRR, abs/1707.08819, 2017.
training with less filters per layer and less cells
NASNet: Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. Learning transferable architectures for scalable image recognition. In Conference on Computer Vision and Pattern Recognition, 2018.
AmoebaNet: Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. Aging Evolution for Image Classifier Architecture Search. In AAAI, 2019.

但Zela研究表明但：性能评估近似不精确时，相对性能排序会发生剧变，
Arber Zela, Aaron Klein, Stefan Falkner, and Frank Hutter. Towards automated deep learning: Efficient joint neural architecture and hyperparameter search. In ICML 2018 Workshop on AutoML (AutoML 2018), 2018.
因此需要逐渐增加保真度以减小误差。
Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. Hyperband: bandit-based configuration evaluation for hyperparameter optimization. In International Conference on Learning Representations, 2017.
Stefan Falkner, Aaron Klein, and Frank Hutter. BOHB: Robust and efficient hyperparameter optimization at scale. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings ofMachine Learning Research, pages 1436–1445, Stockholmsmssan, Stockholm Sweden, 10–15 Jul 2018. PMLR.

4.3 学习曲线外推(Learning Curve Extrapolation)

通过曲线外推，提早终结性能差的结构的训练过程以加速架构搜索速度，研究工作包括：
T. Domhan, J. T. Springenberg, and F. Hutter. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), 2015.
结合超参数和部分学习曲线预测给定网络架构的最终性能，研究工作包括：
Kevin Swersky, Jasper Snoek, and Ryan Prescott Adams. Freeze-thaw bayesian optimization. 2014.
A. Klein, S. Falkner, J. T. Springenberg, and F. Hutter. Learning curve prediction with Bayesian neural networks. In International Conference on Learning Representations, 2017a.
Bowen Baker, Otkrist Gupta, Ramesh Raskar, and Nikhil Naik. Accelerating Neural Architecture Search using Performance Prediction. In NIPS Workshop on Meta-Learning, 2017b.
Aditya Rawal and Risto Miikkulainen. From Nodes to Networks: Evolving Recurrent Neural Networks. In arXiv:1803.04439, March 2018.
外推法之上：通过训练代理模型，利用已经训练处的代理模型整体架构与计算单元的性能外推更大尺寸的网络架构、更多类型计算单元在测试集上的表现，研究工作包括：
PNAS：Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive Neural Architecture Search. In European Conference on Computer Vision, 2018a.

4.4 权值继承(Weight Inheritance)与网络态射(Network Morphisms)

数学上，态射是一个重要的概念，In mathematics, particularly in category theory, a morphism is a structure-preserving map from one mathematical structure to another one of the same type. The notion of morphism recurs in much of contemporary mathematics. In set theory, morphisms are functions; in linear algebra, linear transformations; in group theory, group homomorphisms; in topology, continuous functions, and so on.

网络态射的基本思想由Wei等人提出：
Tao Wei, Changhu Wang, Yong Rui, and Chang Wen Chen. Network morphism. In International Conference on Machine Learning, 2016. introduction slides

基于网络态射理论，允许自网络不用从头开始训练模型，而是通过继承父模型的权重来进行初始化，使用权值继承方法进行NAS可以使计算量降低至数个GPU days：
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Simple And Efficient Architecture Search for Convolutional Neural Networks. In NIPS Workshop on Meta-Learning, 2017.
Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. Efficient architecture search by network transformation. In Association for the Advancement of Artificial Intelligence, 2018a.
Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, and Yong Yu. Path-Level Network Transformation for Efficient Architecture Search. In International Conference on Machine Learning, June 2018b.

但严格的网络态射会使网络尺寸无限增大，最终导致网络过于复杂，下文提出了引入收缩机制的近似态射缓解了该影响：
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Efficient multi-objective neural architecture search via lamarckian evolution. In International Conference on Learning Representations, 2019.

4.5 One-Shot Models 与权值共享(Weight Sharing)

抽象定义：One-Shot Architecture Search treats all architectures as different subgraphs of a supergraph (the one-shot model) and shares weights between architectures that have edges of this supergraph in common.

图4 左图为one-shot模型，又称超图（supergrah），右图为子网络，又称子图（subgraph）
当前one-shot模型的假设：The one-shot model typically incurs a large bias as it underestimates the actual performance of the best architectures severely; nevertheless, it allows ranking architectures, which would be sufficient if the estimated performance correlates strongly with the actual performance.
Shreyas Saxena and Jakob Verbeek. Convolutional neural fabrics. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 4053–4061. Curran Associates, Inc., 2016.
Andrew Brock, Theodore Lim, James M. Ritchie, and Nick Weston. SMASH: one-shot model architecture search through hypernetworks. In NIPS Workshop on Meta-Learning, 2017.
ENAS : Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. Efficient neural architecture search via parameter sharing. In International Conference on Machine Learning, 2018.
Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. Understanding and simplifying one-shot architecture search. In International Conference on Machine Learning, 2018.
Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. Understanding and simplifying one-shot architecture search. In International Conference on Machine Learning, 2018.
Han Cai, Ligeng Zhu, and Song Han. ProxylessNAS: Direct neural architecture search on target task and hardware. In International Conference on Learning Representations, 2019.
DARTS : Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: Differentiable architecture search. In International Conference on Learning Representations, 2019b.
SNAS : Sirui Xie, Hehui Zheng, Chunxiao Liu, and Liang Lin. SNAS: stochastic neural architecture search. In International Conference on Learning Representations, 2019.
one-shot理论的不足：相对性能排序是否准确并没有准确的结论。
Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. Understanding and simplifying one-shot architecture search. In International Conference on Machine Learning, 2018.
Christian Sciuto, Kaicheng Yu, Martin Jaggi, Claudiu Musat, and Mathieu Salzmann. Evaluating the search phase of neural architecture search. arXiv preprint, 2019.
one-shot分支：不同的one-shot NAS方法区别主要在于该 one-shot model 的训练方式：
ENAS: The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss.
DARTS: optimizes all weights of the one-shot model jointly with a continuous relaxation of the search space, obtained by placing a mixture of candidate operations on each edge of the one-shot model.
SNAS
ProxylessNAS

Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. Understanding and simplifying one-shot architecture search. In International Conference on Machine Learning, 2018.

5. Future Directions

Dr鹏

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
Neural Architecture Search: A survey

文章目录1. Introduction2. Search Space2.1搜索空间定义：2.2 常见的搜索空间举例：2.2.1 简单链式搜索空间：2.2.2 复杂多分支搜索空间2.2.3 块搜索空间3.Search Strategy参考文献：Thomas E, Jan Hendrik M, Frank H. Neural Architecture Search: A Survey[J]. Jou...
复制链接

扫一扫