python关键词占比_Python文本处理2个小案例（文本嗅探与关键词占比统计）

最新推荐文章于 2022-11-07 17:05:46 发布

蹲不到的海东青

最新推荐文章于 2022-11-07 17:05:46 发布

阅读量426

点赞数

文章标签： python关键词占比

本文链接：https://blog.csdn.net/weixin_36308543/article/details/113505787

版权

本文展示如何使用Python进行文本嗅探，找出包含至少一个关键词的句子，并计算句子中关键词的占比。通过示例代码，演示了列表推导式、字符串操作以及生成器表达式的应用。

摘要由CSDN通过智能技术生成

问题描述：有一些句子和一些关键词，现在想找出包含至少一个关键词的那些句子(文本嗅探)，可以参考print('='*30)之前的代码。如果想进一步计算每个句子中的关键词占比(句子中所有关键词长度之和/句子长度)，可以参考后面的代码。关键词占比是比较常用的一个文本分类标准，如果想根据关键词占比对句子进行分类的话，可以自行补充代码。

本文主要演示列表推导式、字符串对象用法以及生成器表达式和内置函数的用法。

from random importchoice

from string import ascii_letters

def check(sentences, words):

'''返回包含至少一个关键词的句子列表'''

return [sentence \

for sentence in sentences\

if sum(sentence.count(word)\

for word in words)>0]

sentences = ['This is a test.',

'Beautiful is better than ugly.',

'Explicit is better than implicit.',

'Simple is better than complex.',

'Sparse is better than dense.',

'Readability counts.',

'Now is better than never.']

words = ['test', 'count', 'dense', 'is', 'simple']

result = check(sentences, words)

for item in result:

print(item)

print('='*30)

# 计算每个句子中所有关键字总长度的占比

d = {sentence:round(sum(sentence.count(word)*len(word)\

for word in words)/len(sentence),3)\

for sentence in result}

for item in d.items():

print(item)

运行结果：

This is a test.Beautiful is better than ugly.Explicit is better than implicit.Simple is better than complex.Sparse is better than dense.Readability counts.Now is better than never.==============================('This is a test.', 0.533)('Beautiful is better than ugly.', 0.067)('Explicit is better than implicit.', 0.061)('Simple is better than complex.', 0.067)('Sparse is better than dense.', 0.25)('Readability counts.', 0.263)('Now is better than never.', 0.08)

蹲不到的海东青

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python关键词占比_Python文本处理2个小案例（文本嗅探与关键词占比统计）

问题描述：有一些句子和一些关键词，现在想找出包含至少一个关键词的那些句子(文本嗅探)，可以参考print('='*30)之前的代码。如果想进一步计算每个句子中的关键词占比(句子中所有关键词长度之和/句子长度)，可以参考后面的代码。关键词占比是比较常用的一个文本分类标准，如果想根据关键词占比对句子进行分类的话，可以自行补充代码。本文主要演示列表推导式、字符串对象用法以及生成器表达式和内置函数的用法。...
复制链接

扫一扫