Accuracy in test data
- Alg2 63%
- Alg2a outperforms Alg2(Rule 3), 70%
Summarization
Memorizing everything is not a good idea!!
What additional sources can we use to improve the algorithm?
- use a few more good features(e.g. more prepositions, more verbs and nouns?)
- use clever ways to deal with missing information(how to deal with the tuples that does appear in the training set but are similar with some tuples in the training set)
- use semantic information(e.g. synonyms)
- use additional context
Statistics of PP attachment
Collins and Brooks(Back off)
- Combine all the possible algorithms.