元学习(Meta Learning)学习笔记

						<p class="MsoNormal"><a name="_Toc10802404">致谢李宏毅老师的视频教程:</a><a href="https://www.bilibili.com/video/av46561029/?p=32"><span lang="EN-US">https://www.bilibili.com/video/av46561029/?p=32</span></a></p>


						<h2 style="margin-top:7.5pt;margin-right:0cm;margin-bottom:7.5pt;margin-left:

0cm;line-height:normal;background:white">前言

						<p class="MsoNormal" align="left" style="text-align:left"><b>· 为何要研究<span lang="EN-US">Meta Learning</span>?</b></p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>尽管我是研究生成模型的,但是最近对于<span lang="EN-US">meta learning</span>产生了强烈的兴趣。理由在于,<span lang="EN-US">GANs</span>本身是一个特别吃数据集的模型,从某种意义上来说,数据集的好坏对最后生成效果的影响,不亚于甚至高于生成模型本身的设计对最后生成效果的影响。造成这一现象的原因是,<span lang="EN-US">GANs</span>学习的本质是拟合数据的潜分布,而数据潜分布很大程度上由训练数据所具有的样本广度和质量来决定,因此<span lang="EN-US">GANs</span>的训练效果容易受到来自训练数据的质量的影响。</p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>如何摆脱这种<span lang="EN-US">GANs</span>对于数据集的过度依赖呢?一个比较好的检测方法是在少样本学习(<span lang="EN-US">few-shot learning</span>)上检验模型的学习效果。但是直接用<span lang="EN-US">GANs</span>架构是很难实现少样本学习的,原因是在数据量大的时候,充足的训练样本能够让判别器精准地找到真假样本的划分界线,从而让生成器拟合出精确的生成分布,但是在数据量少的时候,要想准确地拟合数据的潜分布就会变得比较困难。</p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>最近出现的<span lang="EN-US">meta

learning为解决这一问题提供了比较好的方法,首先因为meta learning已经被证明在少样本学习上取得了较好的效果,另外meta learning的方法也比较符合GANs的学习需求。为何这么说呢?meta learning的目标是学习如何去学习,换言之,它能够依靠对数据集规律的探索去制定学习策略,从而将传统GANs的训练过程由设计模型->寻找数据->验证模型,变为寻找数据->设计模型->验证模型,这对于我们解决GANs训练中的数据不匹配以及数据缺乏问题带来很大的帮助。

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>综上,<span lang="EN-US">meta

learning将GANs的传统训练思维,由用数据去匹配模型,转变为用模型去匹配数据,为解决生成模型的少样本学习问题提供了突破口。下面是meta learning学习笔记的正文。

						<span lang="EN-US" style="font-size:10.5pt;font-family:等线"><br clear="all" style="page-break-before:always">
						<p class="MsoNormal" align="left" style="text-align:left"><span lang="EN-US">&nbsp;</span></p>

						<h2 style="margin-top:7.5pt;margin-right:0cm;margin-bottom:7.5pt;margin-left:

0cm;line-height:normal;background:white">第一章 Meta Learning概述

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Meta Learning</span>被称作元学习,不同于<span lang="EN-US">Machine Learning</span>的目标是让机器能够学习,<span lang="EN-US">Meta Learning</span>则是要<b>让机器学会如何去学习</b>。</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US"><img border="0" width="444" height="240" id="图片 1" src="files/Meta Learning学习笔记.files/image001.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>举例来说,机器已经在过去的<span lang="EN-US">100</span>个任务上进行了学习,现在我们希望,机器能够基于过去<span lang="EN-US">100</span>个任务学习的经验,变成一个更厉害的学习者,这样当在第<span lang="EN-US">101</span>个新任务到来之时,机器能够更快地学习。值得注意的是,机器之所以能够学习地更快并不是依赖于在旧任务中已获取的“知识”,而是机器学到了如何去更好获取知识的方法,并将这一方法应用于新任务当中,从而较快地提升学习效率。</p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>以上图为例,假设前<span lang="EN-US">99</span>个学习任务都是各种辨识任务,例如语音辨识、图像辨识等,在前<span lang="EN-US">99</span>个任务学习完成之后,我们给机器一个新的学习任务,而这个新的学习任务与前<span lang="EN-US">99</span>个任务没有任何关联,譬如是一个文本分类任务。而现在,<span lang="EN-US">Meta Learning</span>的目的就是希望能够通过前<span lang="EN-US">99</span>个辨识任务的学习让机器在新的文本分类任务上学习得更好,也就是说,机器在前面的学习中不仅仅学到了如何解决某些特定的任务,而是学习到了学习本身这件事情,从而能提升自己在面对新任务上的学习能力。所以,<span lang="EN-US">Meta Learning</span>就是一门<b>研究如何让机器学会更好地学习</b>的新兴研究方向。</p>

						<span lang="EN-US" style="font-size:10.5pt;font-family:等线"><br clear="all" style="page-break-before:always">
						<p class="MsoNormal" align="left" style="text-align:left"><span lang="EN-US">&nbsp;</span></p>

						<h2 style="margin-top:7.5pt;margin-right:0cm;margin-bottom:7.5pt;margin-left:

0cm;line-height:normal;background:white">第二章 Meta Learning的建模思路

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>前篇提及的概念描述可能依然比较抽象,下面我们用具体的模型架构来解释一下<span lang="EN-US">Meta Learning</span>实际上在做的事情。</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US"><img border="0" width="296" height="220" id="图片 2" src="files/Meta Learning学习笔记.files/image002.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>首先,上图描述的是传统机器学习在做的事情——由人来设计一套学习算法,然后这个算法会输入一堆训练资料,通过长时间的训练得到算法里的参数,这堆参数拟合出一个函数<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="17" height="21" src="files/Meta Learning学习笔记.files/image003.png"></span><span lang="EN-US">&nbsp;</span>,然后用测试资料来测试这个<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="14" height="21" src="files/Meta Learning学习笔记.files/image004.png"></span>,如果效果达标就证明机器学到了该特定任务的实现函数<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="14" height="21" src="files/Meta Learning学习笔记.files/image004.png"></span>。而<span lang="EN-US">Meta Learning</span>做的事情与上述描述不同的地方在于,将其中由人来设计学习方法的过程,改成了由机器来设计一套学习方法。</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US"><img border="0" width="341" height="247" id="图片 5" src="files/Meta Learning学习笔记.files/image005.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp; </span>
								如上图所示,如果将原本机器学习中的训练资料记为<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:18.0pt"><img width="36" height="39" src="files/Meta Learning学习笔记.files/image006.png"></span><span>,那么在<span lang="EN-US">Meta Learning</span>中的训练资料变为一堆</span><span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:18.0pt"><img width="36" height="39" src="files/Meta Learning学习笔记.files/image006.png"></span><span>和一堆</span><span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:18.0pt"><img width="14" height="39" src="files/Meta Learning学习笔记.files/image007.png"></span><span>的组合,然后现在机器要求解的结果不再是</span><span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:18.0pt"><img width="14" height="39" src="files/Meta Learning学习笔记.files/image007.png"></span><span>,而是一个新的函数</span><span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:18.0pt"><img width="9" height="39" src="files/Meta Learning学习笔记.files/image008.png"></span><span '="">,<b>这个</b></span><span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:18.0pt"><img width="9" height="39" src="files/Meta Learning学习笔记.files/image009.png"></span><b><span>决定在给定</span></b><span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:18.0pt"><img width="37" height="39" src="files/Meta Learning学习笔记.files/image010.png"></span><b><span>的情况下</span></b><span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:18.0pt"><img width="13" height="39" src="files/Meta Learning学习笔记.files/image011.png"></span><b><span>的结果</span></b><span>。</span>
						</p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>简言之,如果机器学习的定义表述为:<b>根据资料找一个函数<span lang="EN-US">f</span>的能力</b>,如下图所示:</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US"><img border="0" width="141" height="41" id="图片 6" src="files/Meta Learning学习笔记.files/image012.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>那么<span lang="EN-US">Meta Learning</span>的定义就可以表述为:<b>根据资料找一个找一个函数<span lang="EN-US">f</span>的函数<span lang="EN-US"> F </span>的能力</b>,如下图所示:</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US"><img border="0" width="237" height="81" id="图片 7" src="files/Meta Learning学习笔记.files/image013.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>现在,清楚了<span lang="EN-US">Meta Learning</span>的架构搭建思路以后,我们就可以顺着该思路一步一步寻找解决方案。</p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>首先第一步要做的,是准备<span lang="EN-US">Meta

Learning的训练资料。前面说过,Meta Learning的训练资料是一堆和一堆的组合,显然一堆是很好准备的,于是重点在于,一堆该如何准备。事实上,本身是一个抽象概念,我们需要知道它的具体实例是什么,不妨以传统的神经网络为例来介绍。

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US"><img border="0" width="387" height="182" id="图片 4" src="files/Meta Learning学习笔记.files/image015.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>上图是大家都熟悉的梯度下降算法,它的流程可以简述为:设计一个网络架构<span lang="EN-US">-&gt;</span>给参数初始化<span lang="EN-US">-&gt;</span>读入训练数据批次<span lang="EN-US">-&gt;</span>计算梯度<span lang="EN-US">-&gt;</span>基于梯度更新参数<span lang="EN-US">-&gt;</span>进入下一轮训练<span lang="EN-US">-&gt;</span>……。对于每一个具体的任务来说,它的全部算法流程就构成了一个<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="14" height="21" src="files/Meta Learning学习笔记.files/image004.png"></span>,也就是说,(如图中红色框架)每当我们采用了一个不同的网络架构,或使用了不同的参数初始化,或决定了不同的参数更新方式时,我们都在定义一个新的<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="14" height="21" src="files/Meta Learning学习笔记.files/image004.png"></span>。所以,针对梯度下降算法来说,<span lang="EN-US">Meta Learning</span>的最终学习成果是在给定训练资料的条件下,机器能够找到针对这笔资料的<span lang="EN-US">SGD</span>最佳训练流程(<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="27" height="21" src="files/Meta Learning学习笔记.files/image016.png"></span>)。因此,前边我们探讨的为<span lang="EN-US">Meta Learning</span>准备的<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="14" height="21" src="files/Meta Learning学习笔记.files/image004.png"></span>,实际上是由包含尽量多和丰富的组合方式的不同训练流程来组成的。</p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>接下来,第二步要做的,就是设计评价函数<span lang="EN-US"> F</span>好坏的指标。具体来说,<span lang="EN-US">F</span>可以选择各种不同的训练流程<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="14" height="21" src="files/Meta Learning学习笔记.files/image004.png"></span>,如何评价<span lang="EN-US">F</span>找到的现有流程<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="14" height="21" src="files/Meta Learning学习笔记.files/image004.png"></span>,以及如何提升<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="14" height="21" src="files/Meta Learning学习笔记.files/image004.png"></span>,这是<span lang="EN-US">Meta Learning</span>中比较重要的部分。 </p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>我们先来说明<span lang="EN-US">Meta Learning</span>中函数<span lang="EN-US">F</span>的损失函数的定义。</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US"><img border="0" width="398" height="202" id="图片 3" src="files/Meta Learning学习笔记.files/image017.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>如上图所示,在<span lang="EN-US">Task1</span>中,函数<span lang="EN-US">F</span>学习到的训练算法是<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="15" height="21" src="files/Meta Learning学习笔记.files/image018.png"></span>,而<span lang="EN-US">Task1</span>中的测试集在<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="15" height="21" src="files/Meta Learning学习笔记.files/image018.png"></span>上的测试结果被记作在<span lang="EN-US">Task1</span>上的损失值<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="11" height="21" src="files/Meta Learning学习笔记.files/image019.png"></span>(注意测试结果不仅仅可以是分类任务中的分类损失,也可以定义为损失下降的速率等等,取决于我们希望<span lang="EN-US">F</span>学习到什么样的算法效果);同理,在<span lang="EN-US">Task2</span>中的测试集在<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="15" height="21" src="files/Meta Learning学习笔记.files/image020.png"></span>上的测试结果记作在<span lang="EN-US">Task2</span>上的损失值<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="11" height="21" src="files/Meta Learning学习笔记.files/image021.png"></span>。最终,函数<span lang="EN-US">F</span>的损失函数就定义为所有<span lang="EN-US">Task</span>上的损失的总和:</p>

						<p class="MsoNormal" style="text-align:center"><span lang="EN-US" style="font-size:10.5pt;font-family:等线;"><img width="84" height="62" src="files/Meta Learning学习笔记.files/image022.png"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>损失函数定义完毕后,我们该如何降低<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">F</span>的损失呢?由于<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Meta Learning</span>的求解是非常复杂的过程,我们先以<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">MAML</span>算法为例讲解一个<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Meta Learning</span>的简单情况的求解。</p>

						<span lang="EN-US" style="font-size:10.5pt;font-family:&quot;Cambria Math&quot;,serif"><br clear="all" style="page-break-before:always">
						<p class="MsoNormal" align="left" style="text-align:left"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;</span></p>

						<h2 style="margin-top:7.5pt;margin-right:0cm;margin-bottom:7.5pt;margin-left:

0cm;line-height:normal;background:white">第三章 Meta Learning的简单实例:MAML

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; MAML</span>算法想要解决的问题是,对于<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">F</span>在每一个任务中学习到的<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="8" height="21" src="files/Meta Learning学习笔记.files/image023.png"></span>,<b>规定</b><span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="8" height="21" src="files/Meta Learning学习笔记.files/image024.png"></span><b>只负责决定参数的赋值方式</b>,而不设计模型的架构,也不改变参数更新的方式。也就是说,<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">MAML</span>中的<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="11" height="21" src="files/Meta Learning学习笔记.files/image025.png"></span>的网络结构和更新方式都是提前固定的,<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">MAML</span>要解决的是如何针对不同任务为网络赋不同的初始值。</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US"><img border="0" width="380" height="183" id="图片 8" src="files/Meta Learning学习笔记.files/image026.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>如上图所示,<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="8" height="21" src="files/Meta Learning学习笔记.files/image023.png"></span>只需考虑参数的初始化方式。假设当前参数的初始化为<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="10" height="21" src="files/Meta Learning学习笔记.files/image027.png"></span>,将<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="10" height="21" src="files/Meta Learning学习笔记.files/image027.png"></span>应用于所有的训练任务中,并将所有任务最终训练结束后的参数记作<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="16" height="21" src="files/Meta Learning学习笔记.files/image028.png"></span>(<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="8" height="21" src="files/Meta Learning学习笔记.files/image029.png"></span>表示第<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="8" height="21" src="files/Meta Learning学习笔记.files/image029.png"></span>个任务),然后该参数下的测试损失记作<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="40" height="21" src="files/Meta Learning学习笔记.files/image030.png"></span>。于是,当前的初始化参数<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="10" height="21" src="files/Meta Learning学习笔记.files/image027.png"></span>的损失函数就表示为:</p>

						<p style="margin:0cm;text-align:center;"><span lang="EN-US" style="font-size:10.5pt;font-family:等线"><img width="113" height="81" src="files/Meta Learning学习笔记.files/image031.png"></span></p>

						<p style="margin:0cm;margin-bottom:.0001pt"><span lang="EN-US" style="font-size:

10.5pt;font-family:“Cambria Math”,serif">         接下来,我们需要求解,使得:

						<p style="margin:0cm;margin-bottom:.0001pt;text-align:center"><span lang="EN-US"><img width="115" height="60" src="files/Meta Learning学习笔记.files/image033.png"></span></p>

						<p style="margin:0cm;margin-bottom:.0001pt"><span lang="EN-US" style="font-size:

10.5pt;font-family:“Cambria Math”,serif">         将放入梯度下降算法中,得到:

						<p style="margin:0cm;margin-bottom:.0001pt;text-align:center;"><span lang="EN-US" style="font-size:10.5pt;font-family:等线"><img width="109" height="39" src="files/Meta Learning学习笔记.files/image034.png"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; MAML</span>为了更快地计算出上式的结果,做了两处计算上的调整。</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US"><img border="0" width="256" height="189" id="图片 9" src="files/Meta Learning学习笔记.files/image035.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>首先,如上图所示,由于在每一个训练任务上更新多次参数会占用太多时间,因此<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">MAML</span>选择只更新一次的参数结果作为该任务下的最终参数<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="16" height="21" src="files/Meta Learning学习笔记.files/image028.png"></span>,即只走一次梯度下降:</p>

						<p class="MsoNormal" style="text-align:center;"><span lang="EN-US" style="font-size:10.5pt;font-family:等线"><img width="110" height="42" src="files/Meta Learning学习笔记.files/image036.png"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>能这样做的原因是,如果模型只需训练一次就能达到好的效果,那么这样的训练初始参数基本上能符合好的参数,不过,在测试资料上依然需要训练多次以检验该初始参数的真正效果。</p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; MAML</span>的第二处调整是,对于<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.5pt"><img width="45" height="21" src="files/Meta Learning学习笔记.files/image037.png"></span>的实际计算做了简化。由于</p>

						<p style="margin:0cm;margin-bottom:.0001pt;text-align:center;"><span lang="EN-US" style="font-size:10.5pt;font-family:等线"><img width="246" height="81" src="files/Meta Learning学习笔记.files/image038.png"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>不妨观察一下<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="56" height="21" src="files/Meta Learning学习笔记.files/image039.png"></span>的展开式:</p>

						<p class="MsoNormal" align="left" style="text-align:center"><span lang="EN-US" style="font-size:10.5pt;font-family:等线"><img width="135" height="104" src="files/Meta Learning学习笔记.files/image040.png"></span></p>

						<p class="MsoNormal" align="left" style="text-align:left"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>而<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="26" height="21" src="files/Meta Learning学习笔记.files/image041.png"></span>与<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="14" height="21" src="files/Meta Learning学习笔记.files/image042.png"></span>的具体关系是:</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US"><img border="0" width="166" height="98" id="图片 25" src="files/Meta Learning学习笔记.files/image043.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>由此可以得到<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="56" height="21" src="files/Meta Learning学习笔记.files/image039.png"></span>的具体计算式为:</p>

						<p class="MsoNormal" align="left" style="text-align:center"><span lang="EN-US" style="font-size:10.5pt;font-family:等线"><img width="131" height="62" src="files/Meta Learning学习笔记.files/image044.png"></span></p>

						<p class="MsoNormal" align="left" style="text-align:left"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>很明显,上式的第一项<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:12.0pt"><img width="26" height="42" src="files/Meta Learning学习笔记.files/image045.png"></span>是容易计算的,因为它直接取决于<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="5" height="21" src="files/Meta Learning学习笔记.files/image046.png"></span>函数的定义是什么,下面我们来求解第二项<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:11.5pt"><img width="17" height="42" src="files/Meta Learning学习笔记.files/image047.png"></span>。</p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>将<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.5pt"><img width="12" height="21" src="files/Meta Learning学习笔记.files/image048.png"></span>带回到<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:11.5pt"><img width="110" height="42" src="files/Meta Learning学习笔记.files/image036.png"></span>中,得到:</p>

						<p class="MsoNormal" style="text-align:center"><span lang="EN-US" style="font-size:10.5pt;font-family:等线"><img width="98" height="42" src="files/Meta Learning学习笔记.files/image049.png"></span></p>

						<p class="MsoNormal" align="left" style="text-align:left"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>注意到当<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="28" height="21" src="files/Meta Learning学习笔记.files/image050.png"></span>时,有:</p>

						<p class="MsoNormal" align="left" style="text-align:center"><span lang="EN-US" style="font-size:10.5pt;font-family:等线"><img width="107" height="42" src="files/Meta Learning学习笔记.files/image051.png"></span></p>

						<p class="MsoNormal" align="left" style="text-align:left"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>当<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="28" height="21" src="files/Meta Learning学习笔记.files/image052.png"></span>时,有:</p>

						<p class="MsoNormal" align="left" style="text-align:center"><span lang="EN-US" style="font-size:10.5pt;font-family:等线"><img width="114" height="42" src="files/Meta Learning学习笔记.files/image053.png"></span></p>

						<p class="MsoNormal" align="left" style="text-align:left"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>现在的问题是,由于上式含有二次微分,并不好计算,<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">MAML</span>提出将二次微分项(<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:13.0pt"><img width="36" height="42" src="files/Meta Learning学习笔记.files/image054.png"></span>)直接舍弃掉(这种仅保留一次微分项的方法叫做“<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">first-order approximation</span>”)。</p>

						<p class="MsoNormal" align="left" style="text-align:left"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>舍弃二次微分项之后的计算结果就变成了:</p>

						<p class="MsoNormal" align="left" style="text-align:center"><span lang="EN-US" style="font-size:10.5pt;font-family:等线"><img width="125" height="42" src="files/Meta Learning学习笔记.files/image055.png"></span></p>

						<p class="MsoNormal" align="left" style="text-align:left"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>带回到<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="56" height="21" src="files/Meta Learning学习笔记.files/image039.png"></span>的计算式中,得到:</p>

						<p class="MsoNormal" align="left" style="text-align:center"><span lang="EN-US" style="font-size:10.5pt;font-family:等线"><img width="182" height="62" src="files/Meta Learning学习笔记.files/image056.png"></span></p>

						<p class="MsoNormal" align="left" style="text-align:left"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>进而得到最开始<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="56" height="21" src="files/Meta Learning学习笔记.files/image039.png"></span>的展开式变为:</p>

						<p style="margin:0cm;margin-bottom:.0001pt;text-align:center;"><span lang="EN-US" style="font-size:10.5pt;font-family:等线"><img width="285" height="123" src="files/Meta Learning学习笔记.files/image057.png"></span></p>

						<p class="MsoNormal" align="left" style="text-align:left"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>综上,简化后的<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">MAML</span>计算就变得简单许多,<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">MAML</span>的理论分析也到此结束。下面我们用一个实际的例子来理解上面的计算是如何执行的。</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US"><img border="0" width="24" height="40" id="图片 26" src="files/Meta Learning学习笔记.files/image058.jpg"></span></p>

						<p class="MsoNormal" align="left" style="text-align:left"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>首先,最开始有一个初始化的参数<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="17" height="21" src="files/Meta Learning学习笔记.files/image059.png"></span>。</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US"><img border="0" width="140" height="95" id="图片 27" src="files/Meta Learning学习笔记.files/image060.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>然后,在<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Task m</span>上训练一次<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="17" height="21" src="files/Meta Learning学习笔记.files/image059.png"></span>得到最终参数<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="19" height="21" src="files/Meta Learning学习笔记.files/image061.png"></span>,接着计算<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="19" height="21" src="files/Meta Learning学习笔记.files/image061.png"></span>的梯度(即上图中第二根绿色箭头),将这一梯度乘以学习率赋给<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="17" height="21" src="files/Meta Learning学习笔记.files/image059.png"></span>,得到<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="17" height="21" src="files/Meta Learning学习笔记.files/image059.png"></span>的第一次更新结果<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="16" height="21" src="files/Meta Learning学习笔记.files/image062.png"></span>。<span style="font-family:&quot;Cambria Math&quot;,serif"> </span></p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US"><img border="0" width="154" height="168" id="图片 28" src="files/Meta Learning学习笔记.files/image063.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>接下来,同样地,在<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Task n</span>上训练一次<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="16" height="21" src="files/Meta Learning学习笔记.files/image062.png"></span>得到最终参数<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="16" height="21" src="files/Meta Learning学习笔记.files/image028.png"></span>,接着计算<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="16" height="21" src="files/Meta Learning学习笔记.files/image028.png"></span>的梯度(即上图中第二根黄色箭头),将这一梯度乘以学习率赋给<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="16" height="21" src="files/Meta Learning学习笔记.files/image062.png"></span>,得到第二次训练的更新结果<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="17" height="21" src="files/Meta Learning学习笔记.files/image064.png"></span>。</p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>这样不断循环往复,直至在所有的训练<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Task</span>上完成训练,就找到了最终的初始化参数<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="17" height="21" src="files/Meta Learning学习笔记.files/image065.png"></span>。</p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>上述就是<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">MAML</span>算法的完整介绍,其实参数初始化问题还有很多其他的解法,其中一个有趣的想法是用<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">LSTM</span>来训练<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="10" height="21" src="files/Meta Learning学习笔记.files/image027.png"></span>,因为梯度下降算法本质上可以看作序列模型(如下图所示),于是通过改<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">LSTM</span>的架构也能实现<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="10" height="21" src="files/Meta Learning学习笔记.files/image027.png"></span>的训练:</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US"><img border="0" width="307" height="145" id="图片 34" src="files/Meta Learning学习笔记.files/image066.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>具体的实现过程就不在此细述了,感兴趣的读者可以去观看李宏毅老师的视频教程:</p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span><span lang="EN-US"><a href="https://www.bilibili.com/video/av46561029/?p=41"><span style="font-family:&quot;Cambria Math&quot;,serif">https://www.bilibili.com/video/av46561029/?p=41</span></a></span></p>

						<span lang="EN-US" style="font-size:10.5pt;font-family:&quot;Cambria Math&quot;,serif"><br clear="all" style="page-break-before:always">
						<p class="MsoNormal" align="left" style="text-align:left"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;</span></p>

						<h2 style="margin-top:7.5pt;margin-right:0cm;margin-bottom:7.5pt;margin-left:

0cm;line-height:normal;background:white">第四章 Meta Learning的复杂问题的解决方案

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>上篇介绍的<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">MAML</span>,只是解决了<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Meta Learning</span>中一个较为简单的问题:如何为模型赋值初始化参数。<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Meta Learning</span>中还有很多更复杂的问题,譬如如何设计模型的架构,以及如何构造参数更新的方式等。由于这些问题涉及到的<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="8" height="21" src="files/Meta Learning学习笔记.files/image023.png"></span>的求解已经不可微分,因此无法用梯度下降算法进行求解,我们只能考虑使用强化学习(<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Reinforcement Learning</span>)。</p>

						<h3><span lang="EN-US" style="line-height:173%;font-family:&quot;Georgia&quot;,serif;

color:#333333"> 4.1.Meta Learning如何设计模型架构

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>先考虑第一个问题:如何设计一个能够设计模型的模型?假设我们的任务是学习设计<span lang="EN-US">CNN</span>模型,可以考虑引入<span lang="EN-US">RNN</span>网络来实现设计<span lang="EN-US">CNN</span>。</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif"><img border="0" width="456" height="164" id="图片 10" src="files/Meta Learning学习笔记.files/image067.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>如上图所示,<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">RNN</span>横向的宽度取决于待设计网络的层数,而纵向的每一个<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">cell</span>的输出就对应网络的具体单元配置。譬如在设计<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">CNN</span>的网络中,<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">RNN</span>会输出一系列数据,分别对应到每个卷积核的具体参数配置,例如卷积核数目、核高度、核宽度、步幅高度、步幅宽度等。值得注意的是,每决定一个参数后,该参数就会成为下一个参数的输入,所以第一层的参数设计至关重要,它会影响到接下来的所有设计。</p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>设计完网络之后,我们需要评价此网络的好坏。首先依据给出的参数搭建好对应的网络,然后将该网络应用于众多训练<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Task</span>中进行训练,并在训练完成后计算准确性(<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Accuracy</span>)。值得注意的是,由于无法进行梯度下降,该准确性只能作为<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">reward</span>反馈给设计网络,并用强化学习的方式让网络在大量<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">reward</span>下学会产生更好的网络。</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif"><img border="0" width="474" height="308" id="图片 12" src="files/Meta Learning学习笔记.files/image068.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>上述的训练方式有一个显著的弊端是,得到大量<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">reward</span>消耗的时间是非常久的,譬如在谷歌一篇设计<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">LSTM</span>网络架构的文献中的方法,需要同时用<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">450</span>块<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">GPU</span>跑<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">3-4</span>天。我们可以用<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">weight share</span>的方法来降低得到<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">reward</span>的时间,简单来说,就是对于采用过的模块设计,直接将上一次采用该模块时训练得到的参数复制过来,作为该模块的初始化参数,这样能大幅加快新网络的训练收敛时间。实验证明,该方法让谷歌提出的设计<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">LSTM</span>架构的网络的训练时间,缩短到在<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">1080Ti</span>上仅需<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">16</span>小时。</p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>下面展示一个用<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">meta learning</span>设计出的<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">LSTM</span>架构(如下右图)<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">:</span></p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif"><img border="0" width="485" height="224" id="图片 20" src="files/Meta Learning学习笔记.files/image069.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>可以看出,右边机器设计的<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">LSTM</span>比左边人工设计的<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">LSTM</span>要复杂许多,其中一个有意思的地方是,右边机器设计的<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">LSTM</span>中完全没有使用<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">sin</span>函数,这与人的设计方式是一致的。而最终实验结果证明,右边的<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">LSTM</span>的训练效果,比左边的<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">LSTM</span>有少量提升。</p>

						<h3><span lang="EN-US" style="font-size:14.0pt;line-height:173%;font-family:&quot;Georgia&quot;,serif;

color:#333333"> 4.2.Meta Learning如何设计参数更新策略

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>这一节来介绍如何设计新的参数更新策略的网络。实际上,每一个参数更新策略,都可以看作由一个五元组构成。</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US"><img border="0" width="396" height="174" id="图片 13" src="files/Meta Learning学习笔记.files/image070.png"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>如上图所示,列举了三个最熟知的参数更新算法:<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">SGD</span>、<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">RMSProp</span>和<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Adam</span>。每个<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">5</span>元组中左下角蓝色区域代表第一个操作数<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">(</span>记作<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="21" height="21" src="files/Meta Learning学习笔记.files/image071.png"></span>,通常指梯度<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">)</span>,右下角蓝色区域代表第二个操作数<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">(</span>记作<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="22" height="21" src="files/Meta Learning学习笔记.files/image072.png"></span>,通常指学习率<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">)</span>,左边黄色区域代表<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="21" height="21" src="files/Meta Learning学习笔记.files/image071.png"></span>的单元计算(<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="66" height="21" src="files/Meta Learning学习笔记.files/image073.png"></span>),右边黄色区域代表<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="22" height="21" src="files/Meta Learning学习笔记.files/image072.png"></span>的单元计算(<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="66" height="21" src="files/Meta Learning学习笔记.files/image074.png"></span>),最上边紫色区域代表对两个单元计算结果的二元计算(<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="63" height="21" src="files/Meta Learning学习笔记.files/image075.png"></span>)。</p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>不妨将上图中的表达式代入三个参数更新算法中,得到<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">SGD</span>的更新值<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="58" height="21" src="files/Meta Learning学习笔记.files/image076.png"></span><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;</span>,<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">RMSProp</span>的更新值<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="83" height="21" src="files/Meta Learning学习笔记.files/image077.png"></span>,以及<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Adam</span>的更新值:<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="95" height="21" src="files/Meta Learning学习笔记.files/image078.png"></span>。因此,如过想设计新的参数更新算法,我们可以使用如下的<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">RNN</span>网络:</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif"><img border="0" width="315" height="156" id="图片 15" src="files/Meta Learning学习笔记.files/image079.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>这个网络很好理解,每个<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">cell</span>输出对应元件的选择项,最后组合在一起就是新设计的参数更新算法。其中各个元件的可选项有:</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif"><img border="0" width="280" height="152" id="图片 16" src="files/Meta Learning学习笔记.files/image080.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>下面展示一个用<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">meta learning</span>设计出的更新策略——<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">PowerSign</span>。<span style="font-family:&quot;Cambria Math&quot;,serif"> </span></p>

						<p class="MsoNormal" style="text-align:center"><span lang="EN-US" style="font-size:10.5pt;font-family:等线"><img width="144" height="21" src="files/Meta Learning学习笔记.files/image081.png"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>如上式所示,其中<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="9" height="21" src="files/Meta Learning学习笔记.files/image082.png"></span>代表<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="56" height="21" src="files/Meta Learning学习笔记.files/image083.png"></span>,<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="12" height="21" src="files/Meta Learning学习笔记.files/image084.png"></span>代表<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="69" height="21" src="files/Meta Learning学习笔记.files/image085.png"></span>,<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="28" height="21" src="files/Meta Learning学习笔记.files/image086.png"></span>取决于<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="9" height="21" src="files/Meta Learning学习笔记.files/image082.png"></span>与<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="12" height="21" src="files/Meta Learning学习笔记.files/image084.png"></span>是否同向,若同向它的值就大一些,若不同向它的值就小一些。下面看一下<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">PowerSign</span>的实验结果:</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif"><img border="0" width="326" height="386" id="图片 21" src="files/Meta Learning学习笔记.files/image087.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>可以看出,在月牙型的梯度训练轨迹下,<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">SGD</span>、<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Adam</span>和<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">RMSProp</span>都无法走到终点,<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Momentum</span>能到达终点但非常慢,只有<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">PowerSign</span>能较快地到达终点。由此可以看出,由<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">meta learning</span>设计的参数更新策略在某些情况下是占有优势的。</p>

						<h3><span lang="EN-US" style="font-size:14.0pt;line-height:173%;font-family:&quot;Georgia&quot;,serif;

color:#333333"> 4.3.Meta Learning如何设计激活函数

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>激活函数同样遵循如下的设计范式(即激活函数仅对<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">x</span>操作,一般不超过两次计算):</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif"><img border="0" width="272" height="143" id="图片 17" src="files/Meta Learning学习笔记.files/image088.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>依据这一范式可以构造出对应的设计网络:</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif"><img border="0" width="350" height="202" id="图片 18" src="files/Meta Learning学习笔记.files/image089.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>其中各个元件的可选项有:</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif"><img border="0" width="471" height="95" id="图片 19" src="files/Meta Learning学习笔记.files/image090.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>下面展示一个用<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">meta learning</span>设计出的激活函数——<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Swish</span>。</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif"><img border="0" width="359" height="242" id="图片 22" src="files/Meta Learning学习笔记.files/image091.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>如上图所示,当取β<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">=1.0</span>的时候<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Swish</span>的实验效果是最好的。有意思的地方在于,<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Swish</span>(β<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">=1.0</span>)的形状有点像<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Relu</span>和<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Leaky Relu</span>的结合,因为在从<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">0</span>向负走的时候,<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Swish</span>是先降低后又回归接近于<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">0</span>。下面我们看一下<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Swish</span>的实验测试结果:</p>

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif"><img border="0" width="519" height="83" id="图片 23" src="files/Meta Learning学习笔记.files/image092.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>可以看出,<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Swish</span>对于所有基线的比较都是占优的,特别是在与<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">Relu</span>的比较中,在<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">9</span>个任务集上达到了完胜。</p>

						<p class="MsoNormal"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>综上,关于<span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">meta learning</span>在复杂网络设计上的方法已全部介绍完毕。上述的这些网络设计模块还能通过组合使用,从而设计出包含模型架构、激活函数和更新策略在内的更复杂、更完善的网络。</p>

						<span lang="EN-US" style="font-size:10.5pt;font-family:&quot;Cambria Math&quot;,serif"><br clear="all" style="page-break-before:always">
						<p class="MsoNormal" align="left" style="text-align:left"><span lang="EN-US" style="font-family:&quot;Cambria Math&quot;,serif">&nbsp;</span></p>

						<h2 style="margin-top:7.5pt;margin-right:0cm;margin-bottom:7.5pt;margin-left:

0cm;line-height:normal;background:white">第五章 Meta Learning的未来趋势

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>总的来说,<span lang="EN-US">meta learning</span>还是一个挺神奇的研究方向,而且它解决的问题也颇具实际意义。除了前边介绍的诸多<span lang="EN-US">meta learning</span>方法之外,还有一些奇妙的领域有待人们研究和开拓,下边将介绍<span lang="EN-US">Learn

to Learn to Learn问题和F函数黑盒化想法。

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US"><img border="0" width="415" height="301" id="图片 32" src="files/Meta Learning学习笔记.files/image093.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>首先<span lang="EN-US">Learn to

Learn to Learn问题解决的是,如何学习一个模型,它能够学习出会学习的模型。举一个实例来说,在第三节中介绍的MAML,能够学习为模型初始化参数,那么这个参数本身的初始化(也就是初始化参数的初始化参数),依然是一个可以被学习的内容(如上图所示);再比如,4.1节中介绍的RNN模型,能够设计出CNN模型,而这个RNN模型本身的设计,也可以被一个能设计模型的模型来学习(如上图所示)……循环往复,我们可以衍生出一系列Learn to Learn to Learn,甚至是Learn to N Learn问题。而解决这些问题对于Meta Learning来说是具有里程碑意义的,因为那标志着机器学习的完全自动化。

						<p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US"><img border="0" width="278" height="244" id="图片 31" src="files/Meta Learning学习笔记.files/image094.jpg"></span></p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>另外一个值得研究的领域是<span lang="EN-US">F</span>函数的黑盒化想法。因为,当前<span lang="EN-US">F</span>学到的策略<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="14" height="21" src="files/Meta Learning学习笔记.files/image004.png"></span>是透明的,并且人为限制了策略<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="14" height="21" src="files/Meta Learning学习笔记.files/image004.png"></span>的内容,所以其结果不一定是机器学习能产生的最优结果。如果我们减少人为的影响,将<span lang="EN-US">F</span>和<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="14" height="21" src="files/Meta Learning学习笔记.files/image004.png"></span>封装成一个更大的目标<span lang="EN-US">H</span>,让机器自己去学习<span lang="EN-US">F</span>和<span lang="EN-US" style="font-size:10.5pt;font-family:等线;position:relative;top:4.0pt"><img width="14" height="21" src="files/Meta Learning学习笔记.files/image004.png"></span>的关系,其中<span lang="EN-US">H</span>的输入是训练资料<span lang="EN-US">+</span>测试资料,输出是测试资料的预测结果,或许能让机器学习到更好的结果。</p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>这方面已经有一些应用在尝试,譬如人脸辨识领域——输入一堆人脸库和一张测试人脸,输出预测该人脸是否是人脸库中的人脸。具体的做法及相关研究在此不再细述,感兴趣的读者可以观看李宏毅老师的视频教程:</p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a href="https://www.bilibili.com/video/av46561029/?p=44">https://www.bilibili.com/video/av46561029/?p=44</a></span></p>

						<p class="MsoNormal"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>综上,<span lang="EN-US">Meta Learning</span>的介绍到此就结束了,下一篇将回归到<span lang="EN-US">GANs</span>上,介绍<span lang="EN-US">Meta Learning</span>与<span lang="EN-US">GANs</span>的结合,并且讲述它们在少样本学习上取得的成果和应用。</p>

					</div>
				</div>
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值