python实现自然语言处理之词干提取和词性还原

词干提取
import nltk.stem.porter as pt
import nltk.stem.lancaster as lc
import nltk.stem.snowball as sb

# 波特词干提取器  (偏宽松)
stemmer = pt.PorterStemmer()
# 朗卡斯特词干提取器   (偏严格)
stemmer = lc.LancasterStemmer()
# 思诺博词干提取器   (偏中庸)
stemmer = sb.SnowballStemmer('english')
r = stemmer.stem('playing') # 词干提取
词性还原

与词干提取作用类似, 次干提取出的词干信息不利于人工二次处理(人读不懂), 词性还原可以把名词复数等形式恢复为单数形式. 更有利于人工二次处理.

import nltk.stem as ns
# 词性还原器
lemmatizer = ns.WordNetLemmatizer()
n_lemm=lemmatizer.lemmatize(word, pos='n')
v_lemm=lemmatizer.lemmatize(word, pos='v')

案例:词干提取

"""
词干提取器 
"""
import nltk.stem.porter as pt
import nltk.stem.lancaster as lc
import nltk.stem.snowball as sb

words = ['table', 'probably', 'wolves', 
	'playing', 'is', 'the', 'beaches', 
	'grouded', 'dreamt', 'envision']

pt_stemmer = pt.PorterStemmer()
lc_stemmer = lc.LancasterStemmer()
sb_stemmer = sb.SnowballStemmer('english')

for word in words:
	pt_stem = pt_stemmer.stem(word)
	lc_stem = lc_stemmer.stem(word)
	sb_stem = sb_stemmer.stem(word)
	print('%8s %8s %8s %8s' % \
		  (word, pt_stem, lc_stem, sb_stem))

提取的结果:

   table     tabl     tabl     tabl
probably  probabl     prob  probabl
  wolves     wolv     wolv     wolv
 playing     play     play     play
      is       is       is       is
     the      the      the      the
 beaches    beach    beach    beach
 grouded    groud    groud    groud
  dreamt   dreamt   dreamt   dreamt
envision    envis    envid    envis

案例:词性还原

"""
词性还原
"""
import nltk.stem as ns
import nltk
nltk.download('wordnet')

words = ['table', 'probably', 'wolves', 
	'playing', 'is', 'the', 'beaches', 
	'grouded', 'dreamt', 'envision']

lemmatizer = ns.WordNetLemmatizer()
for word in words:
	n_lemm = lemmatizer.lemmatize(word,pos='n')
	v_lemm = lemmatizer.lemmatize(word,pos='v')
	print('%8s %8s %8s' % \
		  (word, n_lemm, v_lemm))

如下是词性还原的结果:

 table    table    table
probably probably probably
  wolves     wolf   wolves
 playing  playing     play
      is       is       be
     the      the      the
 beaches    beach    beach
 grouded  grouded  grouded
  dreamt   dreamt    dream
envision envision envision
  • 0
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值