python 分词 词性_使用NLTK进行分词及词性标注

1、首先是安装

1.1、安装Python 3.4

注意要用32位版本

http://www.python.org/downloads/

1.2、安装Numpy

注意两点,一是不一定所有版本都有windows安装包,二是要找支持python3.4的安装包

http://sourceforge.net/projects/numpy/files/NumPy/

2、下载NLT Data

方法1:

在python中运行:

import nltk

nltk.download()

3、进行分词

3.1、设置环境变量

set PYTHON_HOME=C:\NeoLanguages\Python34_x86

set PATH=%PYTHON_HOME%;%PATH%

set NLTK_DATA=D:\NLP\NLTK\nltk_data

@python

3.2、py文件

#!usr/bin/python

import nltk

#测试句子

sentence = "Don’t ever let somebody tell you you can’t do something, not even me. \

You got a dream, you gotta protect it. People can’t do something themselves, \

they wanna tell you you can’t do it. If you want something, go get it. Period."

#分词

tokens = nltk.word_tokenize(sentence)

#词性标注

tagged = nltk.pos_tag(tokens)

#句法分析

entities = nltk.chunk.ne_chunk(tagged)

3.3、逐句运行

D:\MyProjects\NLP\NLTK>python

Python 3.4.4 (v3.4.4:737efcadf5a6, Dec 20 2015, 19:28:18) [MSC v.1600 32 bit (In

tel)] on win32

Type "help", "copyright", "credits" or "license" for more information.

>>> import nltk

>>> sentence = "Don’t ever let somebody tell you you can’t do something, not e

ven me. \

... You got a dream, you gotta protect it. People can’t do something themselves

, \

... they wanna tell you you can’t do it. If you want something, go get it. Peri

od."

>>> tokens = nltk.word_tokenize(sentence)

>>> tagged = nltk.pos_tag(tokens)

>>> entities = nltk.chunk.ne_chunk(tagged)

>>> tokens

['Don’t', 'ever', 'let', 'somebody', 'tell', 'you', 'you', 'can’t', 'do', 'som

ething', ',', 'not', 'even', 'me', '.', 'You', 'got', 'a', 'dream', ',', 'you',

'got', 'ta', 'protect', 'it', '.', 'People', 'can’t', 'do', 'something', 'thems

elves', ',', 'they', 'wan', 'na', 'tell', 'you', 'you', 'can’t', 'do', 'it', '.

', 'If', 'you', 'want', 'something', ',', 'go', 'get', 'it', '.', 'Period', '.']

>>> tagged

[('Don’t', 'NNP'), ('ever', 'RB'), ('let', 'VB'), ('somebody', 'NN'), ('tell',

'VB'), ('you', 'PRP'), ('you', 'PRP'), ('can’t', 'VBP'), ('do', 'VB'), ('someth

ing', 'NN'), (',', ','), ('not', 'RB'), ('even', 'RB'), ('me', 'PRP'), ('.', '.'

), ('You', 'PRP'), ('got', 'VBD'), ('a', 'DT'), ('dream', 'NN'), (',', ','), ('y

ou', 'PRP'), ('got', 'VBD'), ('ta', 'JJ'), ('protect', 'NN'), ('it', 'PRP'), ('.

', '.'), ('People', 'NNS'), ('can’t', 'VBP'), ('do', 'VBP'), ('something', 'NN'

), ('themselves', 'PRP'), (',', ','), ('they', 'PRP'), ('wan', 'VBP'), ('na', 'T

O'), ('tell', 'VB'), ('you', 'PRP'), ('you', 'PRP'), ('can’t', 'VBP'), ('do', '

VB'), ('it', 'PRP'), ('.', '.'), ('If', 'IN'), ('you', 'PRP'), ('want', 'VBP'),

('something', 'NN'), (',', ','), ('go', 'VBP'), ('get', 'VB'), ('it', 'PRP'), ('

.', '.'), ('Period', 'NNP'), ('.', '.')]

>>> entities

Tree('S', [('Don’t', 'NNP'), ('ever', 'RB'), ('let', 'VB'), ('somebody', 'NN'),

('tell', 'VB'), ('you', 'PRP'), ('you', 'PRP'), ('can’t', 'VBP'), ('do', 'VB')

, ('something', 'NN'), (',', ','), ('not', 'RB'), ('even', 'RB'), ('me', 'PRP'),

('.', '.'), ('You', 'PRP'), ('got', 'VBD'), ('a', 'DT'), ('dream', 'NN'), (',',

','), ('you', 'PRP'), ('got', 'VBD'), ('ta', 'JJ'), ('protect', 'NN'), ('it', '

PRP'), ('.', '.'), ('People', 'NNS'), ('can’t', 'VBP'), ('do', 'VBP'), ('someth

ing', 'NN'), ('themselves', 'PRP'), (',', ','), ('they', 'PRP'), ('wan', 'VBP'),

('na', 'TO'), ('tell', 'VB'), ('you', 'PRP'), ('you', 'PRP'), ('can’t', 'VBP')

, ('do', 'VB'), ('it', 'PRP'), ('.', '.'), ('If', 'IN'), ('you', 'PRP'), ('want'

, 'VBP'), ('something', 'NN'), (',', ','), ('go', 'VBP'), ('get', 'VB'), ('it',

'PRP'), ('.', '.'), Tree('PERSON', [('Period', 'NNP')]), ('.', '.')])

>>>

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值