Day 22 Transformer
seqence to seqence
有什么用呢?
Encoder
![image-20240419201856942](https://img-blog.csdnimg.cn/img_convert/5ceeafebbd478f57780032655a5dc40d.png)
how Block work![image-20240419202000619](https://img-blog.csdnimg.cn/img_convert/8fec46b98bc3bed1393e014170f59858.png)
仔细讲讲Residual 的过程?
重构
Decoder - AutoRegressive
Mask
由于是文字接龙,所以无法考虑右边的 info
![image-20240419203641245](https://img-blog.csdnimg.cn/img_convert/8a63974e0164c9fd7539a6c4174db3b7.png)
另一种decoder
Encoder to Decoder – Cross Attend
怀疑begin那里没有做 Norm是bug
Training
很像分类的问题
Teacher Forcing : using the ground truth as input
Tips
how to resolve that?