#encoding=utf-8
importjieba
seg_list= jieba.cut("明天不上班啊",cut_all=True)print ("Full Mode:", "/".join(seg_list))
seg_list= jieba.cut("明天不上班啊",cut_all=False)print ("Default Mode:", "/".join(seg_list))
seg_list= jieba.cut("明天不上班啊")print (",".join(seg_list))
打印结果:
F:\python-study\fenci>python test.py
Building prefix dict from C:\Python33\lib\site-packages\jieba\dict.txt ...
Loading model from cache c:\users\zhaoji~1\appdata\local\temp\jieba.cache
Loading model cost 0.840 seconds.
Prefix dict has been built succesfully.
Full Mode: 明天/ 不/ 上班/ 啊
Default Mode: 明天/ 不/ 上班/ 啊
明天, 不, 上班, 啊
python分词工具:jieba
1、运行后错误:
F:\python-study\fenci>python test.py
File "test.py", line 3
SyntaxError: Non-UTF-8 code starting with '\xce' in file test.py on line 3, but
no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
查询资料,发现是编辑的编码问题,notepad打开,下边显示ansi,需要设置 转换为utf-8即可
2、python 3的print需要增加括号
print()
测试:
#coding=utf-8
importjiebaimportjieba.posseg as pseg
f=open("in.txt","r") #读取文本
string=f.read()
words= pseg.cut(string) #进行分词
result=""
for w inwords:
result+= str(w.word)+"/"+str(w.flag) #加词性标注
f=open("out.txt","w")
f.write(result)
f.close()