- 本文为李宏毅 2021 ML 课程的笔记
Life Long Learning, Continuous Learning, Never Ending Learning, Incremental Learning
Life Long Learning
Life Long Learning
What people think about AI …
- 通过让模型不断地学习新任务来不断增强模型的能力
→
\rightarrow
→ Even though I have learned task 2, I do not forget task 1.
Life Long Learning in real-world applications
Catastrophic Forgetting
- 但实际情况是,在训练过程中,模型会逐渐忘记之前学过的任务…
- Example:
The network has enough capacity to learn both tasks.
Multi-task training
- Multi-task training can be considered as the upper bound of LLL. But can multi-task training solve the problem?
- No:
- No:
Evaluation
- First of all, we need a sequence of tasks. (目前 LLL 研究的不同任务还是比较简单的)
- e.g. permutation 表示用某种规则将数字打乱
- e.g. permutation 表示用某种规则将数字打乱
Evaluation
- R i , j R_{i,j} Ri,j: after training task i i i, performance on task j j j. If i > j i>j i>j, After training task i i i, does task j j j be forgot. If i < j i<j i<j, Can we transfer the skill of task i i i to task j j j
- 一般评估方法有以下两种:
- (1) Accuracy
1 T ∑ i = 1 T R T , i \frac{1}{T}\sum_{i=1}^TR_{T,i} T1i=1∑TRT,i - (2) Backward Transfer (一般为负数)
1 T − 1 ∑ i = 1 T − 1 R T , i − R i , i \frac{1}{T-1}\sum_{i=1}^{T-1}R_{T,i}-R_{i,i} T−11i=1∑T−1RT,i−Ri,i
- (1) Accuracy
Research Directions
Selective Synaptic Plasticity (Regularization-based Approach)
Selective Synaptic Plasticity: 选择性的突触可塑性 (只让 NN 中的某些神经元间的连接具有可塑性,其余的必须被固化)
Why Catastrophic Forgetting?
- Basic Idea: Some parameters in the model are important to the previous tasks. Only change the unimportant parameters. (
θ
\theta
θ should be close to
θ
b
\theta^b
θb in certain directions.)
- If b i = 0 b_i=0 bi=0, there is no constraint on θ i \theta_i θi ⇒ \Rightarrow ⇒ Catastrophic Forgetting
- If
b
i
=
∞
b_i=\infty
bi=∞,
θ
i
\theta_i
θi would always be equal to
θ
i
b
\theta_i^b
θib
⇒
\Rightarrow
⇒ Intransigence
SGD 表示正常训练,会导致 catastrophic forgetting;L2 将 b i b_i bi 均设为 1,会导致 Intransigence
- 那么我们如何知道某一个参数对一个任务是否重要呢?大致思想是我们可以在该任务上训练一个模型,然后观察当某一个参数改变时,会不会对 loss 产生很大的影响,如果影响特别大,那么就认为该参数
θ
i
\theta_i
θi 比较重要,它对应的
b
i
b_i
bi 也可以设定为一个比较大的值。每次在一个 task 上训练完,都不断对
b
i
b_i
bi 进行累加作为最终的
b
i
b_i
bi:
计算 b i b_i bi 的不同方法
- Elastic Weight Consolidation (EWC)
- Synaptic Intelligence (SI)
- Memory Aware Synapses (MAS)
- RWalk
- Sliced Cramer Preservation (SCP)
Gradient Episodic Memory (GEM)
- paper: Gradient Episodic Memory for Continual Learning
- GEM 也是一种 Selective Synaptic Plasticity 方法,但它的思路是结合之前任务数据集上的梯度来计算新的梯度,因此需要额外存储少量的之前任务的训练数据:
Additional Neural Resource Allocation
Progressive Neural Networks
- paper: Progressive Neural Networks
PackNet
每个任务都只用部分参数
Compacting, Picking, and Growing (CPG)
- paper: Compacting, Picking and Growing for Unforgetting Continual Learning
- 结合了 Progressive Neural Networks 和 PackNet
Memory Reply
- paper:
Idea: Generating Data
- Generating pseudo-data using generative model for previous tasks
To learn more…
Adding new classes
要学习的不同任务所涉及的类别数不同
Curriculum Learning
- what is the proper learning order?