自动化之追踪热点

最新推荐文章于 2025-07-10 16:12:02 发布

python__reported

最新推荐文章于 2025-07-10 16:12:02 发布

阅读量2.6k

点赞数

CC 4.0 BY-SA版权

分类专栏： # 语音交互爬虫项目文章标签： python hotspot

本文链接：https://blog.csdn.net/python__reported/article/details/108229838

爬虫同时被 3 个专栏收录

17 篇文章

订阅专栏

项目

7 篇文章

订阅专栏

语音交互

3 篇文章

订阅专栏

自动化之追踪热点

一、成果
二、主要思路：调api

一、成果

每天定时，主要是早上8点和晚上10点自动检测桌面的待办任务然后使用图片展示，如：

1、桌面待做事项
在这里插入图片描述

2、微博实时热点
在这里插入图片描述

3、学术热点追踪
在这里插入图片描述

二、主要思路：调api

方法：
这里的微博热点爬虫、以及cnki爬虫都是已经写好的程序，在此只是做了一些改写来符合调用的方式
其中微博热点爬虫来自链接: Writeup001.
而中国知网主要采取selenium通过手机知网的网址进行爬取

用处：
就是图片化强制关注，之前一直都在爬取微博热点但是从来没有看过；学术热点也是如此，一直在数据库吃灰。因而作了这个整合的程序，强制热点关注并能够通过桌面代表事项不停的添加相关任务

这一想法，也在语音交互程序有所实现，只是语音识别准确的谷歌接口需要搭梯子，每次开机搭梯子特别麻烦，用的就比较少了

实现源码

import glob
import json
import os
import pprint
import re
import time
from datetime import datetime

from PIL import Image, ImageFont, ImageDraw  # 导入模块
from 项目.各种爬虫.爬虫实例.中国知网文献检索分析.文章深入检索 import main
import random
def api_weibo():
	text = []
	time_path = time.strftime('%Y{y}/%m{m}/%d{d}/',time.localtime()).format(y='年', m='月', d='日')
 	time_name = time.strftime('%Y{y}%m{m}%d{d}%H{h}',time.localtime()).format(y='年', m='月', d='日', h='点')
	path = r"J:/微博热度"+"/"+time_path
	for file in glob.glob(os.path.join(path,"{}.md".format(time_name))):
    	with open(file,"r",encoding="utf8",errors="ignore")as f:
        	#text = f.read()

        	text.append(f.read())
	return text


def api_cnki(text):
	print("开始学术热点.....")
	keyword = text.split("：")[1]
	main(keyword)
	today = datetime.today()
	today_ = today.strftime('%m-%d')
	hour = today.strftime("%H")
	path = r"J:\PyCharm项目\项目\各种爬虫\爬虫实例\中国知网文献检索分析" + "\\" + today_ + "\\{}时cnki.json".format(hour)
	with open(path,"r",encoding="utf-8",errors="ignore")as f:
    	#print( re.findall(r'[\u4e00-\u9fa5]+', f.read()))
    	text = json.loads(f.read())
    	text = [list(i.values()) for i in text]
    return text
            #with open(r"")


def Visualization(text):
"""生成图片"""
# 在内存生成图片

	image = Image.new('RGB', (1000, 9000), (245,245,220))
	image.save("./smile.jpg")


	im = Image.open("./smile.jpg")  # 打开文件
	print(im.format, im.size, im.mode)
	text = pprint.pformat(text).replace("[","").replace("]","").replace("'","").replace("\\\\n","")
	#print(text)
	draw = ImageDraw.Draw(im)  # 修改图片
	ttf = r"E:\360Downloads\字体\QingNiaoHuaGuangJianShuTong\QingNiaoHuaGuangJianShuTong-2.ttf"#此处需要自己的字体
	font = ImageFont.truetype(ttf, size = 30)
	draw.text((20, 100), text, fill=(34,139,34),font=font)  
	im.show()
	#im.save('mysmile.jpg')


def schedule_task():
	with open(r"C:\Users\lenovo\Desktop\待做事项.txt","r")as f:
    while True:
        text = f.readline()
       	if "学术热点" in text:
            today = datetime.today()
            today_ = today.strftime('%m-%d')
            hour = today.strftime("%H")
            path = r"J:\PyCharm项目\项目\各种爬虫\爬虫实例\中国知网文献检索分析"+"\\"+today_+"\\{}时cnki.json".format(hour)
            print(path)
            if os.path.exists(path):
                print("已经存在该文件....")
                with open(path, "r", encoding="utf-8",errors="ignore")as f:
                    # print( re.findall(r'[\u4e00-\u9fa5]+', f.read()))
                    text = json.loads(f.read())
                    text=[list(i.values()) for i in text]
                    #print(text)
                    Visualization(text)
            else:
                text = api_cnki(text)
                Visualization(text)
            break
        elif "实时新闻" in text:
            os.system("python J:\PyCharm项目\常用设置\小工具\爬虫\weibo_Hot_Search-master\weibo_Hot_Search.py")
            text = api_weibo()
            Visualization(re.findall("###(.*?)微博",str(text)))
            os.remove("./smile.jpg")

        else:
            break
schedule_task()