【機器學習2021】Transformer (下)
8.1/8.2是train好的模型的运作方式,接下来关注如何进行training和testing
Teacher Forcing: use the GT as input
训练目标:minimize cross entropy(见4.)
Traning Tips (train seq2seq model)
1. Copy Mechanism
2. Guide Attention
客制化attention,通过对任务的理解,强制要求attention的顺序/位置等