爬虫 | 网易新闻热点数据的获取与保存_爬取新闻网站的新闻并存储到本地-CSDN博客

url = 'https://c.m.163.com/news/hot/newsList'
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 '
                  'Safari/537.36 '
}

3. 发送HTTP请求获取网页内容

response = requests.get(url, headers=headers)

4. 解析网页内容

data = etree.HTML(response.text)

5. 提取标题和链接

title_list = data.xpath('//div[@class="title"]/a/text()')
href_list = data.xpath('//div[@class="title"]/a/@href')

6. 将提取的数据写入 CSV 文件

with open('网易.csv', 'a+', encoding='utf-8') as f:
    for title, href in zip(title_list, href_list):
        print("Title:", title)  # 标题
        print("Href:", href)  # 超链接
        f.write("{},{}\n".format(title, href))

五、结果展示

六、完整代码

#!/usr/bin/env python
# -*- encoding: utf-8 -*-
import requests
from lxml import etree

url = 'https://c.m.163.com/news/hot/newsList'
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 '
                  'Safari/537.36 '
}
response = requests.get(url, headers=headers)

data = etree.HTML(response.text)
title_list = data.xpath('//div[@class="title"]/a/text()')
href_list = data.xpath('//div[@class="title"]/a/@href')

# 保存数据，指定编码为UTF-8
with open('网易.csv', 'a+', encoding='utf-8') as f:
    for title, href in zip(title_list, href_list):
        print("Title:", title)  # 标题
        print("Href:", href)  # 超链接
        f.write("{},{}\n".format(title, href))