2020-08-20 One-Shot Visual Imitation Learning via Meta-Learning 中的双头网络的理解

最新推荐文章于 2024-07-12 23:17:53 发布

hanx0204

最新推荐文章于 2024-07-12 23:17:53 发布

阅读量258

点赞数

分类专栏：强化学习文章标签：自然语言处理自动驾驶

原文链接：https://blog.csdn.net/weixin_40523230/article/details/85055637

版权

强化学习专栏收录该内容

8 篇文章 0 订阅

订阅专栏

以下内容来自这个blog

4.1 Two-Head Architecture: Meta-Learning a Loss for Fast Adaptation

在标准的MAML框架中，在“前梯度更新”和“后梯度更新”中，使用的网络是相同的，都输出的是action，并且都使用标准的loss function。本文中，我们做出了这样的尝试：“前梯度更新”和“后梯度更新”依旧共用前面所有的架构，只是输出动作之前的最后一个隐藏层不再共享，而是一人一个隐藏层，称之为两个不同的“head”。【这里的pre and post gradient update stages指的到底是什么我没搞清楚】。The parameters of the pre-update head are not used for the final, post-update policy, and the parameters of the post-update head are not updated using the demonstration。But, both sets of parameters are meta-learned for effective performance after adaptation。基础版的MAML里，内循环的loss function其实是一个标准损失函数，比如MSE也就意味着内循环中函数输出的是一个action：
【以下都是个人理解，非文章原话】
本文中做了如下的改进，出于的是这样的思考：内循环其实的作用只是产生梯度，如果只是为了在内循环学习一个“输出动作的函数”的话，那么我把内循环的f改成这样的形式（此式和上式其实没什么太大区别，但是是在新的独立的网络进行实现和学习）。但注意，此时内循环用来更新的梯度来源于”在内循环和groundtruth作比较“，也就是说，内循环梯度产生的前提是：内循环可以接触到groundtruth：
所以思考，如果只是为了产生梯度，能不能直接用“外循环和groundtruth作比较时所产生的梯度”经过BP回传到内循环的网络，此时内循环就不再需要groundtruth了。
之前我们认为：“在内循环中如果学习到的参数可以使得输出的动作逼近ground-truth，那么这么这个参数对应的loss 就适合传到外循环来更新网络。”
现在我们直接这样认为：“我们去掉ground-truth直接让网络自己学习一个loss function，如果这个loss 传到外循环可以有效地更新网络，那么这个loss function就是一个好的loss function。”【个人理解】。内循环的损失函数如下：
那么这样做有什么好处呢？
- 在meta-training阶段，我们依旧需要完整的“observation-action”数据来在Lv上产生梯度，并且BP到内循环。在经过多次迭代后，内循环已经产生了这么一个函数“输入observation，就可以产生合适的梯度送入外循环”。同时，在经过多次迭代后，外循环就已经收敛到一个比较适合整个任务分布的模型参数。
- 接下来，拿到一个新任务时，我们可以使用“只有observation，没有action的”demo进行微调。过程如下：输入observation，此时内循环的loss function就可以只根据osbervation1来产生梯度，并送入外循环。外循环接收到来自内循环的梯度，推动模型的参数进行fine tunning。【我们发现，此时外循环更新参数不再需要ground-truth了！毕竟，只要有梯度，外循环的参数就可以更新，既然我们训练的内循环loss-function已经可以给外循环提供梯度了，那么我们不再需要“外循环和ground-truth做比较”从而产生梯度了！exciting！深度学习的神奇之处！】
- 至此，我们成功的实现了“只需要observation的demo，就可以实现fine-tunning”。

关于pre- and post-gradient

The gradient update here refers to the update on the inner loss, i.e. the adaptation objective. This terminology is used by the author to mention the two head structure, with which he separates the inner loss and the outer loss. To be more specific, the pre-gradient update stage optimizes for the inner loss and the post-stage for the outer, and \theta, as shown in the first formula of this blog, is exactly optimized in the pre-stage, while \theta^{'} in the post-stage.

So getting rid of the ground truths in the inner loss function is just like handing in the control over the adaptability evaluation metrics to the network itself”，this explanation is amazing and really helps me understand the mechanism，

另一个重要的阅读笔记

hanx0204

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
2020-08-20 One-Shot Visual Imitation Learning via Meta-Learning 中的双头网络的理解

以下内容来自这个blog4.1Two-Head Architecture: Meta-Learning a Loss for Fast Adaptation 在标准的MAML框架中，在“前梯度更新”和“后梯度更新”中，使用的网络是相同的，都输出的是action，并且都使用标准的loss function。本文中，我们做出了这样的尝试：“前梯度更新”和“后梯度更新”依旧共用前面所有的架构，只是输出动作之前的最后一个隐藏层不再共享，而是一人一个隐藏层，称之为两个不同的“head”。【这里的prea..
复制链接

扫一扫