Automl---模型评估/搜索加速(态射、one-shot、参数分享)

最新推荐文章于 2024-04-22 14:42:08 发布

xys430381_1

最新推荐文章于 2024-04-22 14:42:08 发布

阅读量807

点赞数

分类专栏：深度学习机器学习文章标签：深度学习 automl 神经架构搜索模型评估加速

本文链接：https://blog.csdn.net/xys430381_1/article/details/104175524

版权

深度学习同时被 2 个专栏收录

138 篇文章 57 订阅

订阅专栏

机器学习

49 篇文章 5 订阅

订阅专栏

在NAS过程中，最为耗时的其实就是对于候选模型的训练。而初版的NAS因为对每个候选模型都是从头训练的，因此会相当耗时。一个直观的想法是有没有办法让训练好的网络尽可能重用，目前主要有两种途径：

一种思路是利用网络态射从小网络开始做加法，所谓网络态射就是将网络进行变形，同时保持其功能不变。这样带来的好处是变形后可以重用之前训练好的权重，而不用重头开始训练。例如上海交大和伦敦大学学院的论文《Reinforcement Learning for Architecture Search by Network Transformation》中将Network morphisms（网络态射）与神经网络搜索结合，论文《Simple And Efficient Architecture Search for Convolutional Neural Networks》也使用了网络态射来达到共享权重的目的，只是它使用了爬山算法为搜索策略；
另一种思路就是从大网络（superNet）开始做减法，如One-Shot Architecture Search方法，就是在一个大而全的网络上做减法。

网络态射-权重继承

学习AutoML系统设计的四种不同技术（三）权重继承—渐变(morphism) 渐变是网络架构的生成方法，其对应的评估加速技术是权重继承

渐变（morphism）是指神经网络进行修改的过程，可能会有两个神经网络，a是原来的神经网络，b是新的神经网络，如果a和b的功能是等价的，他们结构虽然有些不同，但是是等价的，那么a和b就互为渐变。简单来说就是功能相同结构不同的神经网络互为渐变。
auto-keras就是用的渐变以及权重继承来做神经架构搜索的： Auto-Keras: An Efficient Neural Architecture Search System

基于superNet的参数共享

基于superNet的参数共享，又可以分为几个路线：

ENAS式的参数共享：先训练一个superNet，从superNet采样自网络，子网共享superNet的参数，然后每个子网络都finetune（对superNet要求不高）；
one-shot路线：先训练一个superNet，从superNet采样自网络，子网共享superNet的参数，然后每个子网络都不进行任何的fintune，直接进行推理，得到子网之间的性能排序。在选出最好的子网后，对最优子网train from scratch（对superNet要求高，需要能准确预测各个子网的性能排序）；
darts路线：将搜索空间松弛化，结构分布是被连续地参数化了，在超网络训练时这些结构参数与普通网络层权重参数一同被联合优化。（速度快，但存在两个问题：（1）超网络里的参数是深度耦合的—我的理解为超网络中各条路径是深度耦合的；（2）联合优化进一步引入了结构参数和超网络权重之间的耦合。）

ENAS路线

Efficient Neural Architecture Search via Parameter Sharing

(ENAS), where a super-net is constructed which contains every possible architecture in the search space as its child model, and thus all the architectures sampled from this super-net share the weights of their common graph nodes.
It significantly reduces the computational complexity of NAS by directly training and evaluating
sampled child models directly on the shared weight. After the training(of super-net) is done, a subset of child models is chosen and they are either finetuned or trained from scratch to get the final model.

one-shot路线

重要文章：

Understanding and Simplifying One-Shot Architecture Search：博文解读1
Single path one-shot neural architecture search with uniform sampling：博文解读1
官方源码，源码(仅block search)
FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search: 博文解读1，博文解读2，部分开源

网络博文：
AutoDL论文解读（七）：基于one-shot的NAS
基于one-shot的NAS（上）

在这里插入图片描述
one-shot的基本思想：

在一个大而全的网络上做减法

one-shot模型的基本套路：

第0步，设计一个合理的搜索空间
第一步，训练一个one-shot模型(或者超图)，得到超图的权重参数。
第二步，通过超图生成许多子图，每个子图都是一个网络架构。子图继承了超图的权重参数（不再训练），用继承的参数直接进行评估，得到各个子图的评估值并进行排序。（一种简单的生成子图的方式：超图包含了全部的连接边，将超图中的部分边删除，就可得到多个子图）
第三步，将最好的子图遴选出来，在训练集上从头开始训练，得到该子图的权重参数，并用该参数在测试集上进行评估。

《Understanding and Simplifying One-Shot Architecture Search》
The proposed approach for one-shot architecture search consists of four steps:
(1) Design a search space that allows us to represent a wide variety of architectures using a single one-shot model.
(2) Train the one-shot model to make it predictive of the validation accuracies of the architectures.
(3) Evaluate candidate architectures on the validation set using the pre-trained one shot model.
(4) Re-train the most promising architectures from scratch and evaluate their performance on the test set.