NLP 入门知识点

最新推荐文章于 2024-06-19 18:23:52 发布

YUAnthony

最新推荐文章于 2024-06-19 18:23:52 发布

阅读量177

点赞数

分类专栏：自然语言处理文章标签：自然语言处理深度学习神经网络

本文链接：https://blog.csdn.net/weixin_43937759/article/details/120562816

版权

3 篇文章 0 订阅

订阅专栏

这篇博客介绍了NLP的基础知识，包括LSTM模型的工作原理，如何提升RNN的效果如堆叠RNN和双向RNN，以及自动文本生成的训练过程。此外，还探讨了机器翻译中的Seq2Seq模型与注意力机制的重要性，最后提到了Transformer和BERT在预训练中的作用。

摘要由CSDN通过智能技术生成

最近从 B 站上找了个教程学习NLP 的知识，就以此篇博客作为载体记录课上学的知识点吧。

Long Short Term Memory (LSTM) 模型

LSTM uses a “conveyor belt” to get longer memory than SimpleRNN.
Each of the following blocks has a parameter matrix:
- Forget gate
- Input gate
- New values
- Output values
Number of parameters: 4 $\times$ shape(h) $\times$ [shape(h) + shape(x)]

在这里插入图片描述

在这里插入图片描述

Partition text to (segment, next_char) pairs.
One-hot encode the characters.
- Character $\rightarrow$ $v$ $\times$ 1 vector.
- Segment $\rightarrow$ $l$ $\times$ $v$ matrix
Build and train a neural network
- $l$ $\times$ $v$ matrix $\Rightarrow$ LSTM $\Rightarrow$ Dense $\Rightarrow$ $v$ $\times$ 1 vector

Propose a seed segment
Repeat the followings:
a) Feed the segment (with one-hot) to the neural network.
b) The neural network outputs probabilities.
c) next_char $\leftarrow$ Sample from the probabilities.
d) Append next_char to the segment.

在这里插入图片描述

Bi-LSTM instead of LSTM (Encoder only!!!)
- Encoder’s final states ( $h_t$ and $c_t$ ) have all the information of the English sentense.
- If the sentence is long, the final states have forgotten early inputs.
- Bi-LSTM (left-to-right and right-to-left) has longer memory.
- Use Bi-LSTM in the encoder; use unidirectional LSTM in the decoder.
Word-Level Tokenization
- Word-Level tokenization instead of char-level.
  - The average length of English words is 4.5 letters.
  - The sequences will be 4.5x shorter.
  - Shorter sequence -> less likely to forget.
- But you will need a large dataset,
  - of (frequently used) chars is ~ $10^2$ $\rightarrow$ one-hot suffices.
  - of (frequently used) words is ~ $10^4$ $\rightarrow$ must be embedding.
  - Embedding layer has many parameters $\rightarrow$ overfitting!
1. Multi-Task Learning (这样一来 encoder 只有一个而训练数据多了一倍，所以可以训练的更好)

Seq2Seq 模型有个缺点是无法记住一个很长的句子的完整信息，所以有可能句子中有个别词被忘记而 decoder 无从得知完整的句子进入无法得到正确的翻译。
在这里插入图片描述
因此，一些研究者引入了 Attention 机制，Attention 的特点如下：

关注

专栏目录