经典文章DAN
problem:
1、as deep features eventually transition from general to specific along the network, the feature transferability drops significantly in higher layers with increasing domain discrepancy.特征的迁移性在高层明显下降,并增加域差异。
2、Moreover, the summation over pairwise similarities between data points makes mini-batch stochastic gradient descent(SGD) more difficult, whereas mini-batch SGD is crucial to the training effectiveness of deep networks.此外,对数据点之间成对相似性的求和使得小批量随机梯度下降(SGD)更加困难,而小批量SGD对深度网络的训练有效性至关重要。
总函数:
上面的公式中,J函数是一组有标签样本的损失,dk2是第l层的mk-mmd距离。
网络应该是1-8层(1-3层是固定的,4-5是fine-tune,6-8层是learn&#x