词形归并

最新推荐文章于 2022-08-13 15:07:53 发布

微电子学与固体电子学-俞驰

最新推荐文章于 2022-08-13 15:07:53 发布

阅读量654

点赞数

分类专栏： Python自然语言处理

Python自然语言处理专栏收录该内容

60 篇文章 0 订阅

订阅专栏

Python 2.7.5 (default, Aug  4 2017, 00:39:18) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> raw="""DENNIS:Listern,strange women lying in ponds distributing swords... is no basis for a system of government. Supreme executive power derives from... a mandate from the masses,not from some farcical aquatic ceremony."""
>>> tokens=nltk.word_tokenize(raw)
>>> wnl=nltk.WordNetLemmatizer()
>>> [wnl.lemmatize(t) for t in tokens]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/nltk/stem/wordnet.py", line 40, in lemmatize
    lemmas = wordnet._morphy(word, pos)
  File "/usr/lib/python2.7/site-packages/nltk/corpus/util.py", line 116, in __getattr__
    self.__load()
  File "/usr/lib/python2.7/site-packages/nltk/corpus/util.py", line 81, in __load
    except LookupError: raise e
LookupError: 
**********************************************************************
  Resource wordnet not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('wordnet')
  
  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/nltk_data'
    - '/usr/lib/nltk_data'
**********************************************************************

>>> nltk.download('wordnet')
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.
True
>>> [wnl.lemmatize(t) for t in tokens]
['DENNIS', ':', 'Listern', ',', 'strange', u'woman', 'lying', 'in', u'pond', 'distributing', u'sword', '...', 'is', 'no', 'basis', 'for', 'a', 'system', 'of', 'government', '.', 'Supreme', 'executive', 'power', 'derives', 'from', '...', 'a', 'mandate', 'from', 'the', u'mass', ',', 'not', 'from', 'some', 'farcical', 'aquatic', 'ceremony', '.']

词形归并的意思是删除因为词缀而产生的词。

这里没有处理lying

微电子学与固体电子学-俞驰

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
词形归并

Python 2.7.5 (default, Aug 4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> import nltk>>> raw="""DENNI
复制链接

扫一扫