Week6-3,4Language Modelling1

最新推荐文章于 2023-04-07 00:28:43 发布

zypandora

最新推荐文章于 2023-04-07 00:28:43 发布

阅读量289

点赞数

分类专栏： NLP(Michigan)

本文链接：https://blog.csdn.net/zypandora/article/details/50340257

版权

45 篇文章 0 订阅

订阅专栏

Probabilistic language model

P (w n ∣ w 1, w 2, . . ., w n - 1)

$P(w_n \mid w_1, w_2, ..., w_{n-1})$

P (S) = P (w 1, w 2, . . ., w n) = P (w 1) P (w 2 ∣ w 1) . . . P (w n ∣ w 1, w 2, . . ., w n - 1)

$\begin{align} P(S) &= P(w_1, w_2, ..., w_n)\\ &= P(w_1)P(w_2 \mid w_1) ... P(w_n \mid w_1, w_2, ... , w_{n-1}) \end{align}$

We cannot compute the conditional probability directly due to the data sparseness, so we have to use Markov Assumption.

Using training data

The word pizza appears 700 times in a corpus of $1\times 10^7$ words
$P M L (p i z z a) = 700 1 \times 10 7 = 7 \times 10 - 5$ $P_{ML}(pizza) = \frac{700}{1 \times 10^7} = 7 \times 10^{-5}$

The word with appears 1000 times in the corpus
the phrase with spinach appears 6 times
$P M L (s p i n a c h ∣ w i t h) = c o u n t ( with spinach ) c o u n t ( with ) = 6 1000 = 0.006$ $P_{ML}(spinach \mid with) = \frac{count(\textrm{with spinach})}{count(\textrm{with})} = \frac{6}{1000} = 0.006$

The estimation is domain-based, and it may be not good for other gerenes

这里写图片描述

这里写图片描述

The MLE values are often on the order of 10−6 or less
- multiplying 20 such values gives a number on the order of $10^{-120}$
- this leads to underflow
Use (base 10) logarithms instead
- $10^{-6}$ becomes -6
- Use sums instead of products