摸鱼工具—终端热搜榜，实在是上班摸鱼必备之工具，妙啊

最新推荐文章于 2024-09-12 11:59:29 发布

饕餮海

最新推荐文章于 2024-09-12 11:59:29 发布

阅读量1k

点赞数 31

分类专栏： python 爬虫 git 文章标签： python 热搜终端工作摸鱼 requests beautifulsoup 爬虫

本文链接：https://blog.csdn.net/Magic_Chen2012/article/details/137122906

版权

python 同时被 3 个专栏收录

9 篇文章 0 订阅

订阅专栏

爬虫

3 篇文章 0 订阅

订阅专栏

git

2 篇文章 0 订阅

订阅专栏

本文介绍了作者使用Python开发的一款终端工具，可实时抓取并展示来自百度、今日头条、微博、知乎和CSDN的热搜信息，用户可通过命令行进行交互，支持多个网站的切换和链接访问。

摘要由CSDN通过智能技术生成

本文介绍我用Python语言开发的热搜榜，聚合有百度、头条、微博、知乎和CSDN等网站热搜信息。该工具运行于终端中，比如cmder、powershell或者git bash等，实在是上班、摸鱼之必备工具。

—、工具执行效果

1.1 项目代码

项目代码地址存在gitee中，仓库地址：https://gitee.com/shawn_chen_rtz/hot_billboard.git，欢迎Star。

代码结构：

app.py文件是项目启动文件，执行python app.py，根据提示进行后续操作即可。

1.2 执行效果

执行效果如下，

输入对应数字访问不同网站热搜列表，输入字母q或者Q，工具退出运行。

比如，输入数字3，对应微博热搜列表，

热搜列表打印出后，输入对应数字获取访问链接，

CSDN热搜榜，

1.3 app.py启动文件程序

app.py程序，


# -*- coding:utf-8 -*-
from baidu_hot import get_baidu_hot
from toutiao_hot import get_toutiao_hot
from weibo_hot import get_weibo_hot
from zhihu_hot import get_zhihu_hot
from csdn_hot import get_csdn_hot
import time
print("欢迎回来！请输入对应数字浏览热搜")
on = True
while on:
    user_input = input("1-baidu；2-toutiao；3-weibo；4-zhihu；5-CSDN；q/Q-退出；请输入：")
    if user_input == '1':
        get_baidu_hot()
    elif user_input == '2':
        get_toutiao_hot()
    elif user_input == '3':
        get_weibo_hot()
    elif user_input == '4':
        get_zhihu_hot()
    elif user_input == '5':
        get_csdn_hot()
    elif user_input == 'q' or user_input == 'Q':
        on = False
    else:
        print("用户非法输入，3s后刷新，重新选择操作")
        time.sleep(3)
print("退出应用成功，期待再次光临")

一个while循环，循环体中根据用户输入内容进行条件判断，执行对应方法。

二、百度热搜实现

2.1 涉及模块

获取百度热搜方法实现需要导入模块requests、BeautifulSoup、re、time

2.2 对应接口

百度热搜接口：

https://top.baidu.com/board?tab=realtime

2.3 代码实现

代码实现，

import requests
from bs4 import BeautifulSoup
import re
import time
def get_baidu_hot():
    while True:
        baidu_top = "https://top.baidu.com/board?tab=realtime"
        resp = requests.get(baidu_top)
        resp.encoding = 'utf-8'
        html = resp.text
        soup = BeautifulSoup(html,'html.parser')
        news = soup.findAll(class_="content_1YWBm")
        news.reverse()
        i = 0
        news_ls = []
        for new in news:
            i = i + 1
            url = new.find('a').attrs['href']
            text = new.find(class_="c-single-text-ellipsis").text
            news_ls.append({"text":text.strip(),"url":url})
            print(('\033[1;37m'+str(i)+'\033[0m').center(50,"*"))
            print("\033[1;36m"+text.strip()+"\033[0m")
        # news_ls.reverse()
        user_input = input("输入新闻编号获取进一步访问的超链接,输入q/Q退出,输入r/R刷新热榜：")
        if user_input == 'q' or user_input == 'Q':
            break
        elif user_input == 'r' or user_input == 'R':
            continue
        elif user_input in [str(i) for i in range(1,len(news_ls)+1)]:
            news_index = eval(user_input) - 1
            print(news_ls[news_index].get('url'))
            print("\033[1;33m" + "按住Ctrl键，点击超链接进行访问" + "\033[0m")
            print('\033[5;31m'+'10s后自动刷新热榜'+'\033[0m')
            time.sleep(10)
            continue
        else:
            print("Invalid User Input.")
            print('\033[5;31m'+"3s后自动刷新热榜"+'\033[0m')
            time.sleep(3)
            continue
    print("Over,退出百度热搜!")

其中需要注意，根据接口返回页面数据具体情况使用BeautifulSoup模块。

三、头条热搜实现

3.1 涉及模块

获取头条热搜方法实现需要导入模块requests、time

3.2 对应接口

头条热搜的访问接口：

https://www.toutiao.com/hot-event/hot-board/?origin=toutiao_pc

3.3 代码实现

代码实现，


import requests
import time
def get_toutiao_hot():
    while True:
        url = "https://www.toutiao.com/hot-event/hot-board/?origin=toutiao_pc"
        resp = requests.get(url)
        resp.encoding = 'utf-8'
        resp = resp.json()
        news_ls = []
        i = 0
        news = resp.get('data')
        news.reverse()
        for new in news:
            i += 1
            print(('\033[1;37m'+str(i)+'\033[0m').center(50,'*'))
            news_ls.append({'title':new.get('Title'),'url':new.get('Url')})
            print('\033[1;36m'+new.get('Title')+'\033[0m')
        fixed_top_data = resp.get('fixed_top_data')
        fixed_top_data = fixed_top_data[0]
        news_ls.append({'title':fixed_top_data.get('Title'),'url':fixed_top_data.get('Url')})
        print(('\033[1;37m'+str(i+1)+'\033[0m').center(50,'*'))
        print('\033[1;36m'+news_ls[-1].get('title')+'\033[0m')
        user_input = input("输入新闻编号获取进一步访问的超链接,输入q/Q退出,输入r/R刷新热榜：")
        if user_input == 'q' or user_input == 'Q':
            break
        elif user_input == 'r' or user_input == 'R':
            continue
        elif user_input in [str(i) for i in range(1,len(news_ls)+1)]:
            news_index = eval(user_input) - 1
            print(news_ls[news_index].get('url'))
            print("\033[1;33m" + "按住Ctrl键，点击超链接进行访问" + "\033[0m")
            print('\033[5;31m'+'10s后自动刷新热榜'+'\033[0m')
            time.sleep(10)
            continue
        else:
            print("Invalid User Input.")
            print('\033[5;31m'+"3s后自动刷新热榜"+'\033[0m')
            time.sleep(3)
            continue
    print("Over,退出头条热搜!")

与百度热搜的区别是，该接口返回json数据，不是html源代码。所以不需要使用模块BeautifulSoup、re分析匹配页面元素。返回数据处理相对简单~

四、微博热搜实现

4.1 涉及模块

获取微博热搜方法实现需要导入模块requests、time、BeautifulSoup

4.2 对应接口

微博热搜的访问接口：

https://s.weibo.com/top/summary?cate=realtimehot

需要注意的是该接口的访问需要设置请求头，设置对应cookie信息，否则访问异常。

cookie信息，本章节的代码实现中是随机设置的，可以通过以下方法自行查找获取设置。浏览器页面访问https://s.weibo.com/top/summary?cate=realtimehot，F12找到该请求，如下图。

4.3 代码实现

代码实现，


import requests
import time
from bs4 import BeautifulSoup
def get_weibo_hot():
    while True:
        url = "https://s.weibo.com/top/summary?cate=realtimehot"
        headers = {"Cookie":"SUB=_2AxxxxxxxxxNxqwJxxx3dtWXlM5SjftExkMQK6NASTHqZWXWFEB;"}
        resp = requests.get(url=url,headers=headers)
        resp.encoding = 'utf-8'
        html = resp.text
        soup = BeautifulSoup(html,'html.parser')
        news = soup.findAll(class_='td-02')
        news.reverse()
        base_url = "https://s.weibo.com"
        news_ls = []
        i = 0
        for new in news:
            i = i + 1
            url = base_url + new.find('a').attrs['href']
            # print(url)
            title = new.find('a').text
            print(('\033[1;37m' + str(i) + '\033[0m').center(50,'*'))
            print('\033[1;36m' + title + '\033[0m')
            news_ls.append({"title":title,"url":url})

        news_length = len(news_ls)
        # news_ls.reverse()
        user_input = input("输入新闻编号获取进一步访问的超链接,输入q/Q退出,输入r/R刷新热榜：")
        if user_input == 'q' or user_input == 'Q':
            break
        elif user_input == 'r' or user_input == 'R':
            continue
        elif user_input in [str(i) for i in range(1,news_length+1)]:
            news_index = eval(user_input) - 1
            print(news_ls[news_index].get('url'))
            print("\033[1;33m" + "按住Ctrl键，点击超链接进行访问" + "\033[0m")
            print('\033[5;31m'+'10s后自动刷新热榜'+'\033[0m')
            time.sleep(10)
            continue
        else:
            print("Invalid User Input.")
            print('\033[5;31m'+"3s后自动刷新热榜"+'\033[0m')
            time.sleep(3)
            continue
    print("Over,退出微博热搜!")

同百度热搜返回结果处理类似，需要使用BS模块对返回数据进行处理，查找到对应热搜数据。BeautifulSoup模块在网页爬虫数据处理中起到很大的作用，可以重点关注下该模块。

五、知乎热搜实现

5.1 涉及模块

获取知乎热搜方法实现需要导入模块requests、time、BeautifulSoup、json

5.2 对应接口

知乎热搜的访问接口：

https://www.zhihu.com/billboard

5.3 代码实现

代码实现，


import requests
import time
from bs4 import BeautifulSoup
import json
def get_zhihu_hot():
    while True:
        url = "https://www.zhihu.com/billboard"
        resp = requests.get(url)
        resp.encoding = 'utf-8'
        html = resp.text
        soup = BeautifulSoup(html,'html.parser')
        news = soup.findAll(class_='HotList-itemTitle')
        # print(len(news))
        news_ls = []
        title_ls = []
        for new in news:
            title = new.text
            # print(title)
            title_ls.append(title)
        js_text_dict = json.loads(soup.find('script',{'id':'js-initialData'}).get_text())
        #print(js_text_dict['initialState']['topstory']['hotList'])
        js_text_dict = js_text_dict['initialState']['topstory']['hotList']
        url_ls = []
        for new in js_text_dict:
            url = new['target']['link']['url']
            url_ls.append(url)

        news_ls = [{'title':title_ls[i],'url':url_ls[i]} for i in range(len(title_ls))]
        news_ls.reverse()
        # print(news_ls)
        i = 0
        for new in news_ls:
            i += 1
            print(('\033[1;37m'+str(i)+'\033[0m').center(50,"*"))
            print('\033[1;36m'+new.get('title')+'\033[0m')


        news_length = len(news_ls)
        # news_ls.reverse()
        user_input = input("输入新闻编号获取进一步访问的超链接,输入q/Q退出,输入r/R刷新热榜：")
        if user_input == 'q' or user_input == 'Q':
            break
        elif user_input == 'r' or user_input == 'R':
            continue
        elif user_input in [str(i) for i in range(1,news_length+1)]:
            news_index = eval(user_input) - 1
            print(news_ls[news_index].get('url'))
            print("\033[1;33m" + "按住Ctrl键，点击超链接进行访问" + "\033[0m")
            print('\033[5;31m'+'10s后自动刷新热榜'+'\033[0m')
            time.sleep(10)
            continue
        else:
            print("Invalid User Input.")
            print('\033[5;31m'+"3s后自动刷新热榜"+'\033[0m')
            time.sleep(3)
            continue
    print("Over,退出知乎热搜!")

六、CSDN热搜实现

6.1 涉及模块

获取CSDN热搜方法实现需要导入模块requests、time

6.2 对应接口

CSDN热搜的访问接口：

https://blog.csdn.net/phoenix/web/blog/hot-rank?page=0&pageSize=50

https://blog.csdn.net/phoenix/web/blog/hot-rank?page=1&pageSize=50

注意！该接口返回数据较多，使用了分页参数page和pageSize，注意page参数替换成对应数字即可。比如0和1；该接口访问也需要设置请求头，否则返回不了正确数据。

6.3 代码实现

代码实现，


import requests
import time
def get_csdn_hot():
    while True:
        news_ls = []
        for i in range(2):
            url = "https://blog.csdn.net/phoenix/web/blog/hot-rank?page=" + str(i) + "&pageSize=50"
            #print(url)
            # csdn做了校验，必须设置请求头中的User-Agent才能成功返回内容
            headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"}
            resp = requests.get(url,headers=headers)
            resp = resp.json()
            news = resp['data']
            for new in news:
                news_ls.append({"title":new.get('articleTitle'),"url":new.get('articleDetailUrl')})

        i = 0
        news_ls.reverse()
        for new in news_ls:
            i += 1
            print(("\033[1;37m" + str(i) + "\033[0m").center(50,"*"))
            print("\033[1;36m" + new.get('title') + "\033[0m")

        news_length = len(news_ls)
        # news_ls.reverse()
        user_input = input("输入新闻编号获取进一步访问的超链接,输入q/Q退出,输入r/R刷新热榜：")
        if user_input == 'q' or user_input == 'Q':
            break
        elif user_input == 'r' or user_input == 'R':
            continue
        elif user_input in [str(i) for i in range(1,news_length+1)]:
            news_index = eval(user_input) - 1
            print(news_ls[news_index].get('url'))
            print("\033[1;33m" + "按住Ctrl键，点击超链接进行访问" + "\033[0m")
            print('\033[5;31m'+'10s后自动刷新热榜'+'\033[0m')
            time.sleep(10)
            continue
        else:
            print("Invalid User Input.")
            print('\033[5;31m'+"3s后自动刷新热榜"+'\033[0m')
            time.sleep(3)
            continue
    print("Over,退出CSDN热搜!")

可以关注作者微信公众号，追踪更多有价值的内容！