Parsing as search
- There are 2 types of constraints on the parses
- from the input sentence
- from the grammar
- Therefore 2 types of approaches to parsing
- Top-down
- Bottom-up
Shift-reduce parsing
- a bottom-up parser
- tries to match the RHS of a production until it can build an S
- shift operation
- each word in the input sentence is pushed onto a stack
- reduce operation
- if the top n words on the top of the stack match the RHS of a production, then they are popped and replaced by the LHS of the production
- stopping condition
- The process stops when the input sentence has been processed and S has been popped from the stack.
Dynamic programming(mainly CKY)
- Motivation
- A lot of work is repeated
- Caching intermediate results improves the complexity
- Dynamic programming
- building a parse for a substring [i,j] based on all parses [i,k] and [k,j] that are included in it.
- Complexity
- O(n3) for recognizing an input string of length n
- CKY
- bottom-up
- requires a normalized(binarized) grammar
- Early parser
- top-down
- more complicated
Complexity of CKY
- There are O(n2) cells in the table ( n(n+1)2 )
- Single parse
- Each cell requires a linear lookup
- Total tim complexity is O(n3)
- All parses
- Total time complexity is exponential
Chomsky Normal form
- All rules have to be in binary form:
- X -> Y Z or X -> w
- This introduces new non-terminals for
- hybrid rules(mixture of terminal and non-terminal rules)
- n-ary rules(more than 2 non terminals)
- unary rules
Issues with CKY
- Weak equivalence only
- Same language, different structure
- If the grammar has to be converted to CNF, then the final parse tree doesn’t match the original grammar
- However, it can be converted back using a specific procedure
- Syntactic ambiguity
- (Deterministic) CKY has no way to perform syntactic disambiguation