PP attachment
- High(verbal, attached to VP)
- Low(nominal, attached to NP)
with the net is attached to the word caught, and it has no associations with butterfly.
We could formulate the PP attachment as a binary classification problem.
- Input: a pp and possibly the surrounding context
- Output: a binary label: 0 or 1, low or high
- In practice, the context only consists of 4 words:
- the preposition
- the verb before the preposition
- the noun before the preposition
- the noun after the preposition
- Example: join board as director
Why only 4 words?
- Almost all the information need to classify a prepositional phrase’s attachment is contained in these 4 features
- Using the tuples of 4 features allow for a consistent and scalable approach
Sample tuples
Supervised learning: evaluation
- Manually label sets of sentences
- Split the labeled data into training and testing sets
- Use training data to find patterns
- Apply these patterns on the testing data
- For evaluation: use Accuracy (the percentage of correct labels that a given algorithm has assigned on the testing data)
- Compare with the simple baseline method
The simplest baseline method is to find the more common class (label) in the training data and assign it to al instances of the test data.