Python二级--政府报告分词-1

最新推荐文章于 2023-06-11 14:35:09 发布

China@V

最新推荐文章于 2023-06-11 14:35:09 发布

阅读量1.2k

点赞数 1

分类专栏： Python二级文章标签： python

本文链接：https://blog.csdn.net/qq_39451322/article/details/115311254

版权

Python二级专栏收录该内容

12 篇文章 7 订阅

订阅专栏

政府报告分词

题目一:
概述:


'''
问题1:数据统计。要求:修改PY301-1. py文件中代码，分别统计两个文件中出现次数最多的10词语，作为主题
词，要求词语不少于2个字符，打印输出在屏幕上，输出示例如下: (示例词语非答案 )
2019:改革:10,企业:9, .. (略),深化:2
2018:改革:11,效益:7, .. (略),深化:1
注意:输出格式采用英文冒号和英文逗号，标点符号前后无空格，各词语间用逗号分隔，最后一个词语后无逗
号。
'''

思路:

因为两个文本信息的处理方式都一样，所以定义一个函数来进行操作（偷偷懒）

def fun(txt):
	pass

读取所有文件

    fp = open(txt)
    res = fp.read()
    words = jieba.lcut(res)

词频统计并排序

  d = {}
  for word in words:
      if len(word) >= 2:
          d[word] = d.get(word, 0) + 1
lt = list(d.items())
lt.sort(key=lambda x: x[1], reverse=True)

获取参数的年份

 # 获取参数的年份
tmp = txt[txt.find('2'):txt.find('.')] + ':'

输出结果

for i in range(10):
    tmp += '{}:{},'.format(lt[i][0], lt[i][-1])
print(tmp[:-1])

释放资源

# 关闭文件释放资源
fp.close()

**总结**:
没什么说的，基本套路
代码如下:

import jieba


def fun(txt):
    fp = open(txt)
    res = fp.read()
    words = jieba.lcut(res)
    # print(words)
    d = {}
    for word in words:
        if len(word) >= 2:
            d[word] = d.get(word, 0) + 1
    lt = list(d.items())
    lt.sort(key=lambda x: x[1], reverse=True)
    # 获取参数的年份
    tmp = txt[txt.find('2'):txt.find('.')] + ':'

    # 输出结果
    for i in range(10):
        tmp += '{}:{},'.format(lt[i][0], lt[i][-1])
    print(tmp[:-1])

    fp.close()
if __name__ == '__main__':
    fun('data2019.txt')
    fun('data2018.txt')

相关代码和资源都会打包到下面的链接（另附一个份刷题笔记）：
刷题经验
代码（直接用Python导入即可），软件，题库：
链接：https://pan.baidu.com/s/1WClgPe1D79_GKclR26LJdA
提取码：pjmm

China@V

关注

1
点赞
踩
9

收藏

觉得还不错? 一键收藏
打赏
2
评论
Python二级--政府报告分词-1

政府报告分词题目一:概述:'''问题1:数据统计。要求:修改PY301-1. py文件中代码，分别统计两个文件中出现次数最多的10词语，作为主题词，要求词语不少于2个字符，打印输出在屏幕上，输出示例如下: (示例词语非答案 )2019:改革:10,企业:9, .. (略),深化:22018:改革:11,效益:7, .. (略),深化:1注意:输出格式采用英文冒号和英文逗号，标点符号前后无空格，各词语间用逗号分隔，最后一个词语后无逗号。'''思路:因为两个文本信息的处理方式都一样
复制链接

扫一扫