读文献——《Curriculum learning》

最新推荐文章于 2023-05-16 17:56:55 发布

Annie-qu

最新推荐文章于 2023-05-16 17:56:55 发布

阅读量781

点赞数

分类专栏：读文献文章标签：机器学习人工智能

原文链接：https://dl.acm.org/doi/10.1145/1553374.1553380

版权

读文献专栏收录该内容

8 篇文章 0 订阅

订阅专栏

原文地址：https://dl.acm.org/doi/10.1145/1553374.1553380

1、Abstract

Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them “curriculum learning”. (学习对象不是随机出现而是以一定意义的顺序组织，并且循序渐进，由简到难，人和动物的学习效果会好很多。我们在机器学习的背景下正式制定了这样的培训策略，并将其称为“课程学习”。)
课程学习可以加速收敛，并且在非凸优化中找到更好的局部最优点。

2、Introduction

借鉴于人类和动物的学习过程，机器学习能否也用相似的由易到难循序渐进的学习方法。The basic idea is to start small, learn easier aspects of the task or easier subtasks, and then gradually increase the difficulty level.
举了动物训练和循环网络学习语言的例子，引出需要循序渐进学习。
Simple multi-stage curriculum strategies give rise to improved generalization and faster convergence.
As a continuation method can help to find better local minima of a non-convex training criterion.
Appear on the surface to operate like a regularizer.
On convex criteria a curriculum strategy can speed the convergence of training towards the global minimum.

3、On the difficult optimization problem of training deep neural networks

Automatically learning multiple levels of abstraction may allow a system to induce complex functions mapping the input to the output directly from data, without depending heavily on human-crafted features. (学习多层次的抽象特征能够让系统根据数据的自动推导出输入输出之间的映射关系，而不依靠人工设计特征。)
深层结构训练困难，但应用却广泛，常有使用非监督预训练来帮助监督学习优化得到更好的结果，可以使test error降低，不过对training error 没什么提高。
This suggested a dual effect of unsupervised pre-training, both in terms of helping optimization and as a kind of regularizer. The pre-training with a curriculum strategy might act similarly to unsupervised pre-training, acting both as a way to find better local minima and as a regularizer. (这表示了无监督预训练的双重效果，无论是在帮助优化方面还是作为一种正则化器。采用课程学习的预训练与无监督的预训练相似，既可以作为查找更好的局部最小值的方式，又可以作为正则化器。)

4、A curriculum as a continuation method

The basic idea is to first optimize a smoothed objective and gradually consider less smoothing, with the intuition that a smooth version of the problems reveals the global picture. (基本思想是首先优化平滑的目标，然后逐步考虑减少平滑，直觉认为问题的平滑版本会揭示全局情况。)
对于一个优化问题Cλ（λ参数反映了优化问题难易程度），先优化一个较为平滑的目标C0（简单的目标），这个目标反映了问题整体的景象，然后逐渐增加λ并且保持θ是Cλ的局部最优，而最终C1 是实际想要优化的问题。
课程学习就是这种思想，根据训练样本训练的难易程度，给不同难度的样本不同的权重，一开始给简单的样本最高权重，占有较高的概率，接着将较难训练的样本权重调高，最后样本权重统一化之后，就可以直接在目标训练集上训练。
在这里插入图片描述

也就是说，往训练集中不断加入数据，the sequence of training distribution corresponds to a sequence of embedded training sets, starting with a small set of easy examples and ending with the target training set.

5、Toy experiments with a Convex Criterion

Cleaner examples may yield better generalization faster
Noisy会减慢收敛，做了一个简单的实验，用50个样本训练SVM二分类任务，简单数据的泛化误差比随机选取的数据低。（yw’x>0, 即正确分类的）困难样本可能信息丰富，但往往含有噪声，没用。
Introducing gradually more difficult examples speeds-up online training
使用两种方式来说明从简单到困难的课程学习策略的有效性。

1、the number of irrelevant inputs that is set to 0 varies randomly (uniformly) from example from the easiest (with all irrelevant inputs zeroed out) to the most difficult. 根据样本中不相关（irrelevant）数据的个数。
2、another way to sort example is by the margin yw^’ x, with easiest examples corresponding to larger values. yw^’ x的margin大小，margin越大说明特征越明显越容易区分。
实验结果如下：
在这里插入图片描述

实验结果很明显，curriculum学习更好。

6、Experiments on shape recognition

为了说明课程学习的效果，又做了个实验，关于三角形、长方形和椭圆形的形状的识别。作者用了两组数据集来区分样本的难易。一组数据集包含了等边三角形、正方形和圆形（BasicShapes），另一组中的形状并不那么规则（GeomShapes）。
训练方法：
首先将仅使用GeomShapes数据集训练的结果作为baseline。然后先用BasicShapes数据集中的数据进行训练，为了区分难易程度，分别训练0、2、4……、128个epochs（0 epoch就是baseline），然后再用GeomShapes训练至256个epochs，如果validation error到达设定的最小值就提前停止。
结果
在这里插入图片描述

本文进行了两个实验，一个是使用BasicShapes和GeomShapes两个数据集的数据在没有课程学习策略的情况下进行训练，另一个是只使用BasicShapes数据集中的数据进行非课程学习的训练，两个对比实验的结果都不好，从而说明的课程学习的效果。

7、Experiments on language modeling

将课程学习应用到预测下一个单词是什么的项目中，方案大致沿用Collobert和Weston的方法。
Cost:
在这里插入图片描述

对于一个可能的文本s我们想让f(s)尽量大（最大为1），这样其他文本对应的f(s^w )就会尽量小，那么C_s 就会接近于零。课程学习的策略为是否是常见的词汇，词典中每次增加5000个常见的词语，只要每组词汇中有词不在考虑的范围内，就丢掉这组词汇。没有课程学习策略的就直接从20000个单词中学习。
实验结果：
在这里插入图片描述

8、Discussion and Future Work

课程学习之所以有效可以从以下两个方面解释：在训练初期能够花更少的时间在有噪声的和很难去训练的数据上；可以引导训练走向更好的局部最优和更好的泛化效果：课程学习可以被看作是一种特殊的continuation method。
另外，如何寻找更好的课程将是未来的研究方向。

Annie-qu

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
读文献——《Curriculum learning》

原文地址：https://dl.acm.org/doi/10.1145/1553374.15533801、AbstractHumans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones.
复制链接

扫一扫

专栏目录