用法:
pos_tags = nltk.pos_tag(words)
words是独立的单词列表
举个栗子
处理一段英文文本(text),进行分词,删除停用词,词性识别。
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from string import punctuation
text = 'Compatibility of systems of linear constraints over the set of natural numbers.'
# 分词
words_before = word_tokenize(text.lower())
# 删除停用词和标点符号
stopwords = set(stopwords.words('english') + list(punctuation))
words = []
for w in words_before:
if w not in stopwords:
words.append(w)
# 词性标注
pos_tags = nltk.pos_tag(words)
print(pos_tags)
输出
[('compatibility', 'NN'), ('systems', 'NNS'), ('linear', 'JJ'), ('constraints', 'NNS'), ('set', 'VBD'), ('natural', 'JJ'), ('numbers', 'NNS')]
标签说明:
标签 | 原词 | 含义 | 举例 |
---|---|---|---|
CC | coordinatingconjunction | 连词 | and, or,but, if, while,although |
CD | cardinaldigit | 数词 | twenty-four, fourth, 1991,14:24 |
DT | determiner | 限定词 | the, a, some, most,every, no |
EX | existentialthere | 存在量词 | there, there’s |
FW | foreignword | 外来词 | dolce, ersatz, esprit, quo,maitre |
IN | preposition/subordinating conjunction | 介词连词 | on, of,at, with,by,into, under |
JJ | adjective | 形容词 | new,good, high, special, big, local |
JJR | adjective comparative | 形容词比较级 | bleaker braver breezier briefer brighter brisker |
JJS | adjective, superlative | 形容词最高级 | calmest cheapest choicest classiest cleanest clearest |
LS | listmarker | 标记 | |
MD | modal | 情态动词 | can cannot could couldn’t |
NN | noun , singular | 名词 | year,home, costs, time, education |
NNS | nounplural | 名词复数 | undergraduates scotches |
NNP | propernoun, singular | 专有名词 | Alison,Africa,April,Washington |
NNPS | proper noun, plural | 专有名词复数 | Americans Americas Amharas Amityvilles |
PDT | predeterminer | 前限定词 | all both half many |
POS | possessiveending | 所有格标记 | ’ 's |
PRP | personalpronoun | 人称代词 | |
PRP$ | possessive pronoun | 所有格 | her his mine my our ours |
RB | adverb | 副词 | occasionally unabatingly maddeningly |
RBR | adverb,comparative | 副词比较级 | further gloomier grander |
RBS | adverb,superlative | 副词最高级 | best biggest bluntest earliest |
RP | particle | 虚词 | aboard about across along apart |
SYM | 符号 | % & ’ ‘’ ‘’. ) ) | |
TO | to | 词 | to to |
UH | interjection | 感叹词 | Goodbye Goody Gosh Wow |
VB | verb, baseform | 动词 | ask assemble assess |
VBD | verb, pasttense | 动词过去式 | dipped pleaded swiped |
VBG | verb,gerund/present participle | 动词现在分词 | telegraphing stirring focusing |
VBN | verb, pastparticiple | 动词过去分词 | multihulled dilapidated aerosolized |
VBP | verb,sing. present, non-3d | 动词现在式非第三人称时态 | predominate wrap resort sue |
VBZ | verb, 3rdperson sing. present | 动词现在式第三人称时态 | bases reconstructs marks |
WDT | wh-determiner | Wh限定词 | who,which,when,what,where,how |
WP | wh-pronoun | WH代词 | that what whatever |
WP$ | possessivewh-pronoun | WH代词所有格 | whose |
WRB | wh-abverb | WH副词 | - |