本文将多任务学习作为多目标优化,总体目标是找到帕累托最优解。
现有的MGDA(multiple-gradient descent algorithm)缺陷:
- (i) The underlying optimization problem does not scale gracefully to high-dimensional gradients, which arise naturally in deep networks.
- (ii) The algorithm requires explicit computation of gradients per task, which results in linear scaling of the number of backward passes and roughly multiplies the training time by the number of tasks.
发