NLP 入门知识点

这篇博客介绍了NLP的基础知识,包括LSTM模型的工作原理,如何提升RNN的效果如堆叠RNN和双向RNN,以及自动文本生成的训练过程。此外,还探讨了机器翻译中的Seq2Seq模型与注意力机制的重要性,最后提到了Transformer和BERT在预训练中的作用。
摘要由CSDN通过智能技术生成

最近从 B 站上找了个教程 学习NLP 的知识,就以此篇博客作为载体记录课上学的知识点吧。

Long Short Term Memory (LSTM) 模型

  • LSTM uses a “conveyor belt” to get longer memory than SimpleRNN.

  • Each of the following blocks has a parameter matrix:

    • Forget gate
    • Input gate
    • New values
    • Output values
  • Number of parameters: 4 × \times × shape(h) × \times × [shape(h) + shape(x)]
    在这里插入图片描述

Making RNNs More Effective

  • SimpleRNN and LSTM are two kinds of RNNs; always use LSTM instead of SimpleRNN.
  • Use Bi-RNN instead of RNN whenever possible.
  • Stacked RNN may be better than a single RNN layer (if n is big).
  • Pretrain the embedding layer (if n is small).

Stacked RNN

在这里插入图片描述

Bidirectional RNN

在这里插入图片描述
在这里插入图片描述

Text Generation (自动文本生成)

Train a Neural Network

  1. Partition text to (segment, next_char) pairs.
  2. One-hot encode the characters.
    • Character → \rightarrow v v v × \times × 1 vector.
    • Segment → \rightarrow l l l × \times × v v v matrix
  3. Build and train a neural network
    • l l l × \times × v v v matrix ⇒ \Rightarrow LSTM ⇒ \Rightarrow Dense ⇒ \Rightarrow v v v × \times × 1 vector

Text Generation

  1. Propose a seed segment
  2. Repeat the followings:
    a) Feed the segment (with one-hot) to the neural network.
    b) The neural network outputs probabilities.
    c) next_char ← \leftarrow Sample from the probabilities.
    d) Append next_char to the segment.

机器翻译与 Seq2Seq 模型

在这里插入图片描述
在这里插入图片描述

如何提升 Seq2Seq

  1. Bi-LSTM instead of LSTM (Encoder only!!!)
    • Encoder’s final states ( h t h_t ht and c t c_t ct) have all the information of the English sentense.
    • If the sentence is long, the final states have forgotten early inputs.
    • Bi-LSTM (left-to-right and right-to-left) has longer memory.
    • Use Bi-LSTM in the encoder; use unidirectional LSTM in the decoder.
  2. Word-Level Tokenization
    • Word-Level tokenization instead of char-level.
      • The average length of English words is 4.5 letters.
      • The sequences will be 4.5x shorter.
      • Shorter sequence -> less likely to forget.
    • But you will need a large dataset,
      • of (frequently used) chars is ~ 1 0 2 10^2 102 → \rightarrow one-hot suffices.
      • of (frequently used) words is ~ 1 0 4 10^4 104 → \rightarrow must be embedding.
      • Embedding layer has many parameters → \rightarrow overfitting!
    1. Multi-Task Learning (这样一来 encoder 只有一个而训练数据多了一倍,所以可以训练的更好)
      在这里插入图片描述

Attention (注意力机制)

Seq2Seq 模型有个缺点是无法记住一个很长的句子的完整信息,所以有可能句子中有个别词被忘记而 decoder 无从得知完整的句子进入无法得到正确的翻译。
在这里插入图片描述
因此,一些研究者引入了 Attention 机制,Attention 的特点如下:

  • Attention tremendously improves Seq2Seq model.
  • With attention, Seq2Seq model does not forget source input.
  • With attention, the decoder knows where to focus.
  • Downside: much more computation
    在这里插入图片描述

Transformer and BERT

  1. Transformer is Seq2Seq modell; it has an encoder and decoder.
  2. Transformer model is not RNN.
  3. Transformer is based on attention and self-attention.
  4. BERT is for pre-training Transformer’s encoder.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值