【论文阅读】Improving Face Recognition from Hard Samples via Distribution Distillation Loss
Abstract
In this paper, we propose a random path selection algorithm, called Adaptive RPS-Net, that progressively chooses optimal paths for the new tasks while encouraging parameter sharing between tasks.
- a new network capacity measure
- the proposed model integrates knowledge distillation and retrospection along with the path selection strategy to overcome catastrophic forgetting
- we propose a simple controller to dynamically balance the model plasticity
Introduction
- As a model learns new tasks, its performance on old tasks should not catastrophically deteriorate (旧知识不遗忘)
- Since learning tasks are inherently related, the knowledge acquired on old tasks should help in accelerating the learning on new(旧知识帮助新知识)
- Ensure that complimentary representations are learned from the current task so that the newly learned information can help improve the old task performance (i.e., backward transfer). (新知识帮助旧知识)
- As the class- incremental learning progresses, the network must share and reuse the previously tuned parameters to ensure a bounded computational complexity and memory footprint of the final model(共享和重用参数)
学习过程,保持新旧类平衡
RPS-NET: Right trade-off between ‘stability’ (leading to intransigence) and ‘plasticity’ (resulting in forgetting).
a 新的任务被学习就固定参数,未来可被共享
b 启用参数重用和最小化复杂度
c. stacked residual design for补充之前的学习的表征
d. 引入显示控制器
Method
- 自适应 RPS-NET
并行参差网络块,在一组随机抽样的的候选路径中为新任务选择最优路径(不同任务生成不同的resnet18)。最后一层的模块输出输出上应用了一个注意力机制,以适当的重估计来自先前学习任务的贡献。 使用了一个混合目标函数来训练的,它保证了网络稳定性和可塑性之间的平衡,从而避免了灾难性遗忘。
1.1 训练和推理路径
1.2 注意力机制的峰径响应
1.3 测量网络容量
1.4 路径选择
路径选择方法:给定任务k,将初始化N条随机路径。 对于每个路径,仅与先推断路径不同的模块k-1用于形成训练路径Ptr。 在N条此类路径中,选择最佳Pk并将其与Pts组合k-1获得点。值得注意的是,仅当µsat,k≥th时才执行路径选择。 在以上情形,µsat,k,µsat,k + 1,。 。 。 ,μsat,k + 1-1低于阈值th。 因此,相同的路径用于训练任务k,k + 1,…。 。 。 ,k + l和k + l添加到列表S,使得S = {。 。 。 ,k + l},并为下一个任务k + l + 1选择一条新路径。在训练期间,复杂度仍受标准单路径网络限制,并且资源在任务之间共享。
1.6损失函数
KD+softmax