Evaluation of LM
- Extrinsic
- Intrinsic
- Correlate the two for validation purposes
Intrinsic: Perplexity
- Does the model fit the data?
- A good model will give high probability to a real sentence.
- Perplexity
Per=1P(w1,w2,...,wN)‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾√N
- Average branching factor in predicting the next word
- Lower perplexity -> higher probability
- Logarithmic version
Per=2−1N∑log2P(wi)
Cross entropy
H(p,q)=−∑xp(x)logq(x)
Word error rate
- number of insertions, deletions and substitutions
- normalized by sentence length
- same as Levenstein Edit Distance, but in a word level
Issues
Out of vocabulary words(OOV)
- split the training set into 2 parts
- label all words in part 2 that were not in part 1 UNK
- The estimates for UNK will be used in the estimation for the unknown words in test data
Clustering
- e.g., dates, monetary amounts, organizations, years
Long distance dependencies
- This is where n-gram model fails by definition
- missing syntactic information
- The students who participated in the game are tired.
- missing semantic information
- The pizza that I had yesterday was tasty.
- The class that I had yesterday was interesting.
Other ideas in LM
- Syntactic model
- condition words on other words that appear in a specific syntactic relation with them
- Caching model
- take advantage of the fact that words appears in bursts