Python网络爬虫之动态网页爬取及使用selenium模块爬取

最新推荐文章于 2023-06-12 14:52:08 发布

原创

最新推荐文章于 2023-06-12 14:52:08 发布 · 2.8k 阅读

10 ·

CC 4.0 BY-SA版权

本文介绍了如何利用Python的requests模块爬取动态网页数据，并详细讲解了使用selenium爬取今日头条新闻评论的具体步骤，包括设置谷歌浏览器driver。此外，还提供了一个综合案例，展示如何用selenium抓取airbnb的房源信息。

Python网络爬虫之动态网页爬取及使用selenium模块爬取

使用requests模块爬取动态网页数据
使用selenium爬取今日头条新闻评论
综合案例

使用requests模块爬取动态网页数据

"""
使用requests模块爬取动态网页数据
今日头条：某条新闻的评论信息
"""
import requests

#路径
url = "https://www.toutiao.com/api/comment/list/?group_id=6749065854995939854&item_id=6749065854995939854&offset=0&count=15"

# 响应头
headers = {
   
   
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3314.0 Safari/537.36 SE 2.X MetaSr 1.0",
}

# 解决日志中的大量warning信息
requests.packages.urllib3.disable_warnings()

# 请求，获取响应
response = requests.get(url, headers=headers, verify=False)
if response.status_code == 200:
    # print(response.text)
    # print(response.json())
    with open("今日评论。txt", "w", encoding="UTF8")