Week6-6Language Modelling3

最新推荐文章于 2022-07-20 11:00:49 发布

zypandora

最新推荐文章于 2022-07-20 11:00:49 发布

阅读量718

点赞数

分类专栏： NLP(Michigan)

本文链接：https://blog.csdn.net/zypandora/article/details/50364577

版权

45 篇文章 0 订阅

订阅专栏

Evaluation of LM

Does the model fit the data?
- A good model will give high probability to a real sentence.
Perplexity

Per=1P(w1,w2,...,wN)‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾√N
- Average branching factor in predicting the next word
- Lower perplexity -> higher probability
Logarithmic version
$P e r = 2 - 1 N \sum l o g 2 P (w i)$ $Per = 2^{-\frac{1}{N} \sum log_2 P(w_i)}$

H (p, q) = - \sum x p (x) log q (x)

$H(p,q) = -\sum_x p(x)\log q(x)$

Out of vocabulary words(OOV)
- split the training set into 2 parts
- label all words in part 2 that were not in part 1 UNK
- The estimates for UNK will be used in the estimation for the unknown words in test data
Clustering
- e.g., dates, monetary amounts, organizations, years
Long distance dependencies
- This is where n-gram model fails by definition
- missing syntactic information
  - The students who participated in the game are tired.
- missing semantic information
  - The pizza that I had yesterday was tasty.
  - The class that I had yesterday was interesting.

Syntactic model
- condition words on other words that appear in a specific syntactic relation with them
Caching model
- take advantage of the fact that words appears in bursts

这里写图片描述