Python数据分析——文本挖掘

最新推荐文章于 2023-07-24 08:27:11 发布

General_单刀

最新推荐文章于 2023-07-24 08:27:11 发布

阅读量488

点赞数

分类专栏： Python数据分析文章标签： python 数据分析

本文链接：https://blog.csdn.net/qq_28284093/article/details/103222600

版权

Python数据分析专栏收录该内容

15 篇文章 2 订阅

订阅专栏

分词，用jiaba

# 分词
import jieba
doc = '我喜欢上海东方明珠'
# 全模式；精准模式；搜索引擎模式
w1 = jieba.cut(doc,cut_all=False) # 参数1：数据  参数2：模式 有三种模式，这里使用了精准模式
for item in w1:
    print(item)

运行结果：

Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\pc\AppData\Local\Temp\jieba.cache
我
喜欢
上海
东方明珠
Loading model cost 0.752 seconds.
Prefix dict has been built succesfully.

获取词语的词性

import jieba.posseg
doc = '我喜欢上海东方明珠'
w2 = jieba.posseg.cut(doc)
# flag词性
# word词语
for item in w2:
    print(item.flag)

运行结果：

Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\pc\AppData\Local\Temp\jieba.cache
Loading model cost 0.745 seconds.
Prefix dict has been built succesfully.
r
v
ns
nr

a：形容词

c：连词

d：副词

e：叹词

f：方位词

i：成语

m：数词

n：名词

nr：人名

ns：地名

nt：机构团体

nz：其他专有名词

p：介词

r：代词

t：时间

u：助词

v：动词

vn：动名词

w：标点符号

un：未知词语

词典的加载

jieba.load_userdict('文件名')

General_单刀

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
Python数据分析——文本挖掘

分词，用jiaba# 分词import jiebadoc = '我喜欢上海东方明珠'# 全模式；精准模式；搜索引擎模式w1 = jieba.cut(doc,cut_all=False) # 参数1：数据参数2：模式有三种模式，这里使用了精准模式for item in w1: print(item)运行结果：Building prefix dict from ...
复制链接

扫一扫