Standford NLP Course(3) - Language Modeling

最新推荐文章于 2024-04-10 09:58:24 发布

同一轮月亮

最新推荐文章于 2024-04-10 09:58:24 发布

阅读量325

点赞数

分类专栏： NLP 文章标签： nlp

NLP 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Goal: assign a probability to a sentence

Machine Translation - which word fits the meaning and structure better
Spell correction
Speech Recognition - two word has same pronounceation
etc.

The chain rule:
P(ABCD)=P(A)P(B|A)P(C|AB)P(D|ABC)
Cannot simply calculate the chain rule by counting, because there are too many possibles

Markov Assumption:
计数和计算整句话的概率太小，结果的代表性不强，因此使用相邻的几个词建模
k=0 Unigram model P(w1w2…wn)=TT P(wi)
k=1 Bigram model P(wi|w1w2…wi-1)=P(wi|wi-1)
……
In general, this is insufficient, because language has long-distance dependencies. But in most cases, this is enough.
Estimating bigram: MLE P(Wi|Wi-1)=count(wi-1,wi)/count(wi-1)

Knowledge got from the statistic:
World
Grammar
It is prossible the combination is legal, but not appeared in the training data.

We do everything in log space:

Avoid underflow - 乘积会变得非常非常小，难以计算
Adding is faster than multiplying
p1*p2*p3*p4=logp1+logp2+logp3+logp4

Evaluation: How good is our model?
If it assigns higher probability to a “real” or “frequently observed” sententce.
Extrinsic evaluation: to put each modal in a task, i.e. spelling corrector, speech recognizer, MT system, and compare the accuracy. Time-consuming
So sometimes use intrinsic evaluation: perplexity
Perplexity is good only when the test data looks just like the training data, but is helpful to think about.

Intuition of perplexity:
The Shannon Game: How well can we predict the next word?
lower perplexity = better model
Perplexity公式

同一轮月亮

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Standford NLP Course(3) - Language Modeling

Goal: assign a probability to a sentenceMachine Translation - which word fits the meaning and structure betterSpell correctionSpeech Recognition - two word has same pronounceation etc.The chain ru
复制链接

扫一扫

专栏目录