Week7-2POS tagging

POS

  • Open class
    • nouns, non-modal verbs, adjectives, adverbs
  • Closed class
    • prepositions, modal verbs, conjunctions, particles, determiners, pronouns

Penn Treebank tag set: the label IN indicates all of the prepositions except for TO which has only to, even if to is the type of particle.

Some observations

  • ambiguity: the tag type of the words, and even the pronunciation of the words could be different

Useful for parsing, machine translation, word sense disambiguation, etc.

Main techniques

  • rule-based
  • machine learning(crf, maximum entropy, markov models)
  • transformation-based

这里写图片描述

Source of information

  • Knowledge about individual words (unigram)
    • lexical information
    • spelling(-or, -er)
    • capitalization(IBM)
  • Knowledge about neighboring words

Evaluation

  • Baseline(relatively high)
    • tag each word with its most likely tag
    • tag each OOV word as a noun
    • accuracy around 90%
  • current accuracy
    • around 97% for english
    • 98% for human performance

Rule-based tagging

  • use dictionary or finite-state transducers to find all possible POS
  • use disambiguation rules
    • e.g., ART + V( article + verb is never allowed )
  • hundreds of rules can be designed

Rule examples

这里写图片描述

Useful for unseen languages

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值