TfidfVectorizer统计词频

最新推荐文章于 2023-04-24 13:50:38 发布

YPL_ZML

最新推荐文章于 2023-04-24 13:50:38 发布

阅读量1.6k

点赞数

分类专栏：机器学习数据分析

本文链接：https://blog.csdn.net/YPL_ZML/article/details/93906460

版权

数据分析同时被 2 个专栏收录

35 篇文章 1 订阅

订阅专栏

机器学习

8 篇文章 0 订阅

订阅专栏

from sklearn.feature_extraction.text import TfidfVectorizer
import jieba

# text = ['This is the first document.', 'This is the second second document.', 'And the third one.',
#         'Is this the first document?', ]
# 
# tf = TfidfVectorizer(min_df=1)
#
# X = tf.fit_transform(text)
# names = tf.get_feature_names()
# print(names)
# print(X.toarray())


text = '今天天气真好,我要去北京天安门玩，要去景山攻牙之后，玩完大明劫'
# 进行结巴分词，精确模式
text_list = jieba.cut(text, cut_all=False)
text_list = ",".join(text_list)
context = []
context.append(text_list)
print(context)

tf = TfidfVectorizer(min_df=1)

X = tf.fit_transform(context)
names = tf.get_feature_names()

print(names)
print(X.toarray())

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

YPL_ZML

关注关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
TfidfVectorizer统计词频

from sklearn.feature_extraction.text import TfidfVectorizerimport jieba# text = ['This is the first document.', 'This is the second second document.', 'And the third one.',# 'Is this the f...
复制链接

扫一扫