requests案例——腾讯新闻数据的爬取

原创

已于 2024-10-05 18:08:27 修改 · 1.3k 阅读

11 ·

CC 4.0 BY-SA版权

文章标签：

#爬虫 #python #信息可视化 #接单

于 2024-10-05 14:33:42 首次发布

需求：

1.利用requests方法爬取该i.news.qq.com网站的数据（包括名字和对应链接）

2.实现翻页的爬取

3.将爬取下来的数据保存在excel文件中

4.利用jsonpath来解析获取的数据

5.使用openpyxl库处理 Excel 文件

注意：

1.如果报以下错误：

AttributeError: module 'numpy' has no attribute 'short'

且不需要用到numpy这个模块，则

1.更新openpyxl。

pip install --upgrade openpyxl

2.因为是实时更新的，所以虽然有161页，但是一般最后一页大多无数据，这会导致爬取数据错误，所以需要添加一个异常处理。

获取网页中

try:
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36'
    }
    r = requests.get(url, headers=headers, params=data)
    if r.status_code == 200:
        return r.json()
    else:
        print(f"请求失败，状态码：{r.status_code}")
        return None
except Exception as e:
    print(f"请求异常：{e}")
    return None

解析中