One step up from Context Free Parsing.
Limitations of PCFGs
- The probabilities do not depend on the specific words
- Not possible to disambiguate sentences based on semantic information
- Idea: lexicalized grammar
- Use the head of a phrase as an additional source of information
- VP[ate] -> V[ate]
Collins Parser
- Generative, lexicalized model
- MLE, smoothing
Issues with lexicalized grammar
- Sparseness of training data
- many probabilities are difficult to estimate from the Penn Treebank
- Combinatorial explosion
- need for parameterization
Discriminative reranking
- A parser may return many parses of a sentence, with small differences in probabilities
- The top returned parse may not necessarily be the best because the PCFG may be deficient
- Other considerations may need to be taken into account
- Parse tree depth(not too deep or too shallow)
- left attachment vs. right attachment(right attachment more likely in English)
- discourse structure
- consistency across sentences