Language Model
Ⅰ Language Model: A Survey of the State-of-the-Art Technology
The goal of language modelling is to estimate the probability distribution of various linguistic units, e.g., words, sentences etc.
参考Language Model: A Survey of the State-of-the-Art Technology
这个paper或者说blog从count based model 和 continuous space两方面阐述language model。
1.1 count based model
count based的model比如n-gram,基于n-order Markov assumption。Wn 只和preceding 的n个words有关。于是condition probability为:
但是会出现四个问题:
1.data sparsity: 没见过的sentence 的probability 是0 (可以通过soomth等方法解决)
2.the curse of dimension: parameters are too large
∣
V
∣
n
|V|^{n}
∣V∣n
3.exact pattern:’’ he cat is walking in the bedroom’’ and '‘a dog was running in the room’'虽然syntactically and semantically similar但是在这个very different
4. dependency beyond the widow is ignored:
但是count-based modeling并没有model 出true conditional probability。
1.2 Continuous-space language models
NNML:word embedding的始祖,address 了 data sparsity 但是为什么呢? ,而RNN打破limited context
1.2.1 feed-forward neural network based LM
Y Bengio A Neural Probabilistic Language Model
优点: 解决了data-sparsity 和 the curse of dimension 但是why,为什么解决了data-sparsity的问题
缺点: training 和 testing 的时间过长 就有了好多speed-up的 techniques。
1.2.2 Recurrent neural network RNN
- 当用FNN时,首先要决定context size,尽管fixed context size 很effective 但context size 这个parameter 很难determine
- 由于RNN是dynamic system,input 经过network的传递output呈现指数级的爆炸blow up.
- 在统计LM的应用中,RNN和FNN之间的比较通常有利于RNN。原因将在后来的advanced Models中变得清晰。
1.2.3 Advanced Models
讲了其他基于character-level ,sentence-level等的model,没太get到。