POS tagging methods
- rule-based
- stochastic
- HMM(generative)
- Maximum Entropy(discriminative)
- transfer-based
HMM tagging
T=argmaxP(T∣W)P(T∣W)=P(W∣T)P(T)P(W)
P(W) is ignored, and P(T) is called prior, and P(W∣T) is likelihood.
P(T)P(W∣T)=P(t1,t2,...,tn)P(w1,w2,...,wn∣t1,t2,...tn)=∏inp(ti∣t1,...,ti−1)∏inp(wi∣w1,...,wi−1,t1,t2,...,ti)
- Simplification 1
-
P(W∣T)=∏P(wi∣ti)
-Simplification 2 - P(T)=∏P(ti∣ti−1)
-
P(W∣T)=∏P(wi∣ti)
- Bigram approximation
- T=argmaxP(T∣W)=∏P(wi∣ti)P(ti∣ti−1)
Evaluating taggers
- Data set
- Training set
- Development set
- Test set
- Tagging accuracy
Transformation-based learning
Thoughts about POS taggers
- New domains
- Lower performance
- Distributional Clustering