文章目录 transformer(seq2seq + Multi-Head self-Attention) transformer(seq2seq + Multi-Head self-Attention) transformer : Multi-head Self-attention: Feed-Forward Network : Positional Encoding : Mask : Layer Normalization :