爬取《政府工作报告》词云展示并做词频统计
爬取《政府工作报告内容代码:
from requests import *
from bs4 import BeautifulSoup
r=get("http://www.hgnu.edu.cn/2020/0531/c1112a61314/page.htm",timeout=10)
r.encoding="utf-8"
s=BeautifulSoup(r.text,"html.parser")
f=open("报告.txt","w",encoding="utf-8")
L=s.find_all("p")
for c in L:
f.write("{}\n".format(c.text))
f.close()
词云展示:
引用图:
词频统计
import re
import collections # 词频统计库
import numpy