Attachment ambiguities
- A key parsing decision is how we “attach various constituents”.
Example
- Every number(a constituent) can freely attach to different places, and it is the source of the exponential number of parsers, even for short sentence.
And we could get Catalan numbers:
2 problems to solve
- Repeated work
if we use simple top-down methods to parse naively, it would lead to a lot of repeated constituents.(must be efficient) - Choose the right parse
- How do we work out the correct attachment?
- The problem is AI complete, BUT words are good predictions of attachment, and we can usually be pretty sure about the meaning from the words
All leads to the Statistical Parser!!