自然语言处理(四): Part of Speech Tagging

目录

1. What is Part of Speech (POS)? 词性是什么

2. Information Extraction 信息提取

2.1 POS Open Classes 

2.2 POS Closed Classes (English)

2.3 Ambiguity

2.4 POS Ambiguity in News Headlines 

3. Tagsets

3.1 Major Penn Treebank Tags

3.2 Derived Tags (Open Class)

3.3 Derived Tags (Closed Class)

3.4 Tagged Text Example

4. Automatic Tagging

4.1 Why Automatically POS tag?

4.2 Automatic Taggers

4.3 Rule-based tagging

4.4 Unigram tagger

4.5 Classifier-Based Tagging

4.6 Hidden Markov Models 

4.7 Unknown Words

4.8 A Final Word


1. What is Part of Speech (POS)? 词性是什么

AKA word classes, morphological classes, syntactic categories 又名词类,形态词,句法类

Nouns, verbs, adjective, etc 

POS tells us quite a bit about a word and its neighbours POS告诉我们关于一个词和它的邻居的相当多的信息:

  • nouns are often preceded by determiners 名词前面经常有定语从句
  • verbs preceded by nouns 动词前面是名词
  • content as a noun pronounced as CONtent content作为名词读作CONtent
  • content as a adjective pronounced as conTENT content作为形容词读作conTENT

2. Information Extraction 信息提取

Given this:

  • “Brasilia, the Brazilian capital, was founded in 1960.”

Obtain this:

  • capital(Brazil, Brasilia)
  • founded(Brasilia, 1960)

Many steps involved but first need to know nouns (Brasilia, capital), adjectives (Brazilian), verbs (founded) and numbers (1960).

2.1 POS Open Classes 

Open vs closed classes: how readily do POS categories take on new words? Just a few open classes: 开放类与封闭类:POS类别多容易接受新词?只有几个开放类。

Nouns

  • Proper (Australia) versus common (wombat) 
  • Mass (rice) versus count (bowls)

Verbs

  • Rich inflection (go/goes/going/gone/went)
  • Auxiliary verbs (be, have, and do in English)
  • Transitivity (wait versus hit versus give) — number of arguments

Adjectives

  • Gradable (happy) versus non-gradable (computational)

Adverbs

  • Manner (slowly)
  • Locative (here)
  • Degree (really)
  • Temporal (today)

2.2 POS Closed Classes (English)

Prepositions (in, on, with, for, of, over,…)

  • on the table

Particles

  • brushed himself off

Determiners

  • Articles (a, an, the)
  • Demonstratives (this, that, these, those)
  • Quantifiers (each, every, some, two,…)

Pronouns

  • Personal (I, me, she,…)
  • Possessive (my, our,…)
  • Interrogative or Wh (who, what, …)

Conjunctions

  • Coordinating (and, or, but)
  • Subordinating (if, although, that, …)

Modal verbs

    评论
    添加红包

    请填写红包祝福语或标题

    红包个数最小为10个

    红包金额最低5元

    当前余额3.43前往充值 >
    需支付:10.00
    成就一亿技术人!
    领取后你会自动成为博主和红包主的粉丝 规则
    hope_wisdom
    发出的红包
    实付
    使用余额支付
    点击重新获取
    扫码支付
    钱包余额 0

    抵扣说明:

    1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
    2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

    余额充值