Standford NLP Course(3) - Language Modeling

Goal: assign a probability to a sentence

  • Machine Translation - which word fits the meaning and structure better
  • Spell correction
  • Speech Recognition - two word has same pronounceation
  • etc.

The chain rule:
P(ABCD)=P(A)P(B|A)P(C|AB)P(D|ABC)
Cannot simply calculate the chain rule by counting, because there are too many possibles

Markov Assumption:
计数和计算整句话的概率太小,结果的代表性不强,因此使用相邻的几个词建模
k=0 Unigram model P(w1w2…wn)=TT P(wi)
k=1 Bigram model P(wi|w1w2…wi-1)=P(wi|wi-1)
……
In general, this is insufficient, because language has long-distance dependencies. But in most cases, this is enough.
Estimating bigram: MLE P(Wi|Wi-1)=count(wi-1,wi)/count(wi-1)

Knowledge got from the statistic:
World
Grammar
It is prossible the combination is legal, but not appeared in the training data.

We do everything in log space:

  • Avoid underflow - 乘积会变得非常非常小,难以计算
  • Adding is faster than multiplying
    p1*p2*p3*p4=logp1+logp2+logp3+logp4

Evaluation: How good is our model?
If it assigns higher probability to a “real” or “frequently observed” sententce.
Extrinsic evaluation: to put each modal in a task, i.e. spelling corrector, speech recognizer, MT system, and compare the accuracy. Time-consuming
So sometimes use intrinsic evaluation: perplexity
Perplexity is good only when the test data looks just like the training data, but is helpful to think about.

Intuition of perplexity:
The Shannon Game: How well can we predict the next word?
lower perplexity = better model
Perplexity公式

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值