微博热门搜索榜爬取

最新推荐文章于 2024-05-13 13:55:33 发布

曾德天的博客

最新推荐文章于 2024-05-13 13:55:33 发布

阅读量2.4k

点赞数

本文链接：https://blog.csdn.net/tiantianhuanle/article/details/87166430

版权

新浪微博的热搜榜网址是http://s.weibo.com/top/summary，总共有50条，如图所示
在这里插入图片描述
使用BeautifulSoup包，直接上代码：

import requests
import json
from lxml import html
from bs4 import BeautifulSoup

etree = html.etree
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
data = {
    'cate': 'realtimehot'
}

try:
    r = requests.get('http://s.weibo.com/top/summary?', params=data, headers=headers)
    print(r.url)
    if r.status_code == 200:
        html = r.text
except:
    html = ""

f =open("weibohotnews.txt", "w", encoding='utf-8')
soup = BeautifulSoup(html,'lxml')
tr = soup.find(id='pl_top_realtimehot').find_all('tr', class_="")
for i, item in enumerate(tr):
    if i > 0:
        # print(item)
        title = item.find('a').get_text()
        print(title)
        num = item.find('span').get_text()
        print(num)
        id = item.find('td', class_="td-01 ranktop").get_text()
        print(id)
        f.write(id+'\t'+title+"\t"+num+'\n')

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

曾德天的博客

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
微博热门搜索榜爬取

新浪微博的热搜榜网址是http://s.weibo.com/top/summary，总共有50条，如图所示使用BeautifulSoup包，直接上代码：import requestsimport jsonfrom lxml import htmlfrom bs4 import BeautifulSoupetree = html.etreeheaders = {‘User-Agen...
复制链接

扫一扫