python中文模糊关键词提取_python文本特征提取词频矩阵、中文文本的分词、jieba分词库...

weixin_39633781

于 2021-01-04 05:44:44 发布

阅读量819

点赞数

文章标签： python中文模糊关键词提取

本文链接：https://blog.csdn.net/weixin_39633781/article/details/112394825

版权

单词、词语：作为特征值

方法1：sklearn.feature_extraction.text.CountVectorizer(stop_words=[])，

返回词语出现的次数，返回词频矩阵，stop_words=[]停用词列表

·CountVectorizer.fit_transform(X)X：文本或者包含文本字符串的可迭代对象返回值：返回sparse矩降

·CountVectorizer.inverse_transform(X)Xarray数组或者sparse矩阵返回值；转换之前数据格

·CountVectorizer.get_feature_names() 返回值；单词列表

import pandas as pdimport numpy as npfrom  sklearn.feature_extraction.text import CountVectorizerdata=["Maybe it was better to just really enjoy life. this is the life","享受生活，顺其自然。这就是生活"]transfer = CountVectorizer() #实例化一个转换器类data_new = transfer.fit_transform(data) #调用fit_transform()#print(data_new)print(transfer.get_feature_names())print(data_new.toarray()) #构建成一个二维表：data=pd.DataFrame(data_new

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_39633781

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python中文模糊关键词提取_python文本特征提取词频矩阵、中文文本的分词、jieba分词库...

单词、词语：作为特征值方法1：sklearn.feature_extraction.text.CountVectorizer(stop_words=[])，返回词语出现的次数，返回词频矩阵，stop_words=[]停用词列表·CountVectorizer.fit_transform(X)X：文本或者包含文本字符串的可迭代对象返回值：返回sparse矩降·CountVectorizer...
复制链接

扫一扫