1、首先定位要爬取热榜的数据
拿到热榜数据的域名,返回数据未json数据
https://www.toutiao.com/hot-event/hot-board/?origin=toutiao_pc
2、python提取数据
import requests
import pandas as pd
import re
import os
head = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36',
}
url = 'https://www.toutiao.com/hot-event/hot-board/?origin=toutiao_pc'
response = requests.get(url, headers=head)
print(response.status_code)
json_data = response.json()
#print(json_data)
#空列表用户存储数据
title_list = []
url_list = []
hot_list = []
#url_list中url太长,其实不用那么长,域名后面加上lusterId就可访问
for data in json_data['data']:
title = data['Title']
id = data['ClusterId']
hot = data['HotValue']
title_list.append(title)
url_list.append(f"https://www.toutiao.com/trending/{id}")
hot_list.append(hot)
#print(f"标题:{title_list}\n地址:{id_list}\n热度值:{hot_list}")
# 把列表数据组装成Dataframe数据
ID = range(1, len(title_list) + 1)
df = pd.DataFrame(
{
'ID': ID,
'热榜标题': title_list,
'热度值': hot_list,
'热榜链接': url_list,
}
)
#指定文件存储路径
output_path = r'C:\Users\MAG\Desktop\python之路\python基础使用\toutiao.csv'
try:
df.to_csv(output_path, index=False)
print("CSV file saved successfully.")
except Exception as e:
print("An error occurred while saving the CSV file:")
print(e)
3、查看插入至表格内容
总结:
主要requests先获取数据;
创建列表将提取的数据存入列表中;
在使用pd将数据组装成Dataframe数据;
指定要存储的文件将数据保存。