language model

最新推荐文章于 2024-05-06 19:44:36 发布

Ensheng Shi

最新推荐文章于 2024-05-06 19:44:36 发布

阅读量534

点赞数

文章标签： language model

本文链接：https://blog.csdn.net/qq_36097393/article/details/88954644

版权

Language Model

Ⅰ Language Model: A Survey of the State-of-the-Art Technology

Ⅰ Language Model: A Survey of the State-of-the-Art Technology

The goal of language modelling is to estimate the probability distribution of various linguistic units, e.g., words, sentences etc.
参考Language Model: A Survey of the State-of-the-Art Technology
这个paper或者说blog从count based model 和 continuous space两方面阐述language model。

1.1 count based model

count based的model比如n-gram，基于n-order Markov assumption。Wn 只和preceding 的n个words有关。于是condition probability为：
在这里插入图片描述
但是会出现四个问题：
1.data sparsity: 没见过的sentence 的probability 是0 （可以通过soomth等方法解决）
2.the curse of dimension: parameters are too large $V|^{n}$
3.exact pattern:’’ he cat is walking in the bedroom’’ and '‘a dog was running in the room’'虽然syntactically and semantically similar但是在这个very different
4. dependency beyond the widow is ignored:

但是count-based modeling并没有model 出true conditional probability。

1.2 Continuous-space language models

NNML：word embedding的始祖，address 了 data sparsity 但是为什么呢？，而RNN打破limited context

1.2.1 feed-forward neural network based LM

Y Bengio A Neural Probabilistic Language Model

在这里插入图片描述

优点：解决了data-sparsity 和 the curse of dimension 但是why,为什么解决了data-sparsity的问题
缺点： training 和 testing 的时间过长就有了好多speed-up的 techniques。

1.2.2 Recurrent neural network RNN

在这里插入图片描述

当用FNN时，首先要决定context size，尽管fixed context size 很effective 但context size 这个parameter 很难determine
由于RNN是dynamic system，input 经过network的传递output呈现指数级的爆炸blow up.
在统计LM的应用中，RNN和FNN之间的比较通常有利于RNN。原因将在后来的advanced Models中变得清晰。