P33 Transformer(下)
视频链接
1. Decoder: Autoregressive(AT)
Decoder原理:
![](https://img-blog.csdnimg.cn/baa916ee68994af4858ccaeef8a0e8e5.png)
![](https://img-blog.csdnimg.cn/08a2ca5121674be4897712e410292452.png)
![](https://img-blog.csdnimg.cn/8cb20777adfa4d5ebb0b9a1744a66c4e.png)
![](https://img-blog.csdnimg.cn/d5adaab962ad4074a50f030c4b9e31d7.png)
![](https://img-blog.csdnimg.cn/97b9814a71e74e15bb521fba41f92e2d.png)
![](https://img-blog.csdnimg.cn/813fed73c649474abc93cf8473732dac.png)
Encoder vs Decoder:
![](https://img-blog.csdnimg.cn/375ca0c7f75544689c6a978f5a16cb29.png)
![](https://img-blog.csdnimg.cn/598621ba66a44c4aa97d65922f0ec655.png)
Masked:
![](https://img-blog.csdnimg.cn/c360fabe9275467496c1636c2b388d16.png)
![](https://img-blog.csdnimg.cn/599d880cfe1344b8a68864575f32600e.png)
![](https://img-blog.csdnimg.cn/8d68f298433d47a5877f5e8a775450c8.png)
![](https://img-blog.csdnimg.cn/2172cad024c54bef9ea9d752b9b9245d.png)
how to stop:
![](https://img-blog.csdnimg.cn/7da0b3949dd648d5893e590bba2e8991.png)
![](https://img-blog.csdnimg.cn/3bfe92d6224e483b966a2eed6d4e1f28.png)
![](https://img-blog.csdnimg.cn/dbcd94d8635543e995b291d7c348de56.png)
2. Decoder: Non-autoregressive(NAT)
![](https://img-blog.csdnimg.cn/b5136f8031a44db7979e75359337f23c.png)
3. Encoder-Decoder
![](https://img-blog.csdnimg.cn/dbb003e058ae4c959e5f126f2f2846e5.png)
![](https://img-blog.csdnimg.cn/e383acedf7064f70a3051d0aa66b7eaa.png)
![](https://img-blog.csdnimg.cn/026ee92f30714538890c03c758269a72.png)
![](https://img-blog.csdnimg.cn/31acb0cc6a2c45dfa55aed789220962f.png)
4. Training
![](https://img-blog.csdnimg.cn/6e637f4db1b14255af03245d47d65a31.png)
![](https://img-blog.csdnimg.cn/112bcab0eb754b33a6360656a8012824.png)
Tips:
a. Copy Mechanism
![](https://img-blog.csdnimg.cn/8e3345a49db24ff682287f6d8c8657ee.png)
![](https://img-blog.csdnimg.cn/770a8d36368947a7ad72ed35022576c7.png)
![](https://img-blog.csdnimg.cn/6e1fe024fdbd4198ac0534386f793e5e.png)
b. Guided Attention
![](https://img-blog.csdnimg.cn/abf673d7906a4798b8493008cce5eb6a.png)
![](https://img-blog.csdnimg.cn/03797321f5644ce2954c71b9eb02552d.png)
c. Beam Search
![](https://img-blog.csdnimg.cn/0f98b616e398408b9ca067f43452f1ef.png)
- 适用场景:答案非常明确的任务(如,语音辨识);而对于需要有创造力的、不是只有一个答案的任务,则需要在decoder加入随机性。
d. BLEU score
![](https://img-blog.csdnimg.cn/77d7cccd1dd4456598cc902eb4777171.png)
e. exposure bias
![](https://img-blog.csdnimg.cn/aa18af490e074eb7b69972100fe352a2.png)
![](https://img-blog.csdnimg.cn/f9097598efc0430bbe03b4bde95ecf3f.png)