python3.10 selenium 捕获网页ajax网络请求

toooooop8

已于 2024-08-29 16:18:01 修改

阅读量184

点赞数

文章标签： selenium ajax 测试工具

于 2024-08-29 16:17:04 首次发布

原文链接：https://blog.csdn.net/qq_38170796/article/details/135581910

版权

首先，我们要开启浏览器的日志记录，首先需要配置一个capabilities，它允许定义浏览器的一些特性。

import json
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

chrome_option = Options()

chrome_option.set_capability("goog:loggingPrefs", {"performance": "ALL"})


service = Service("./chromedriver-win64/chromedriver.exe")

driver = webdriver.Chrome(service=service, options=chrome_option)

在最新的selenium的版本中，上面是selenium4.2的版本的写法，是通过Options()去配置浏览器的一些属性，在这里通过配置'goog:loggingPrefs': {'performance': 'ALL'}来打开浏览器的性能日志记录。

获取日志
接下来我们尝试获取一下访问百度的日志

driver.get("https://www.baidu.com")

performance_log = driver.get_log("performance")

在访问百度后，通过get_log(“performance”)来获取性能日志，这是一个由字典组成的列表。这里我打印一个看一下格式

{
    'level': 'INFO', 
    'message': '{"message":{"method":"Network.requestWillBeSent","params":{"documentURL":"https://www.baidu.com/","frameId":"62715239374117F099DBA348C45736CD","hasUserGesture":false,"initiator":{"type":"other"},"loaderId":"C5F286A8A5744DEEB277D0718C4E34E8","redirectHasExtraInfo":false,"request":{"headers":{"Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36","sec-ch-ua":"\"Not/A)Brand\";v=\"99\", \"Google Chrome\";v=\"115\", \"Chromium\";v=\"115\"","sec-ch-ua-mobile":"?0","sec-ch-ua-platform":"\"Windows\""},"initialPriority":"VeryHigh","isSameSite":true,"method":"GET","mixedContentType":"none","referrerPolicy":"strict-origin-when-cross-origin","url":"https://www.baidu.com/"},"requestId":"C5F286A8A5744DEEB277D0718C4E34E8","timestamp":1508.943033,"type":"Document","wallTime":1692620664.219256}},"webview":"62715239374117F099DBA348C45736CD"}', 
    'timestamp': 1692620664216
}

可以看到，关键的信息都在message中，注意message中是一个json字段

{
  "message":{
    "method":"Network.requestWillBeSent",
    "params":{
      "documentURL":"https://www.baidu.com/",
      "frameId":"62715239374117F099DBA348C45736CD",
      "hasUserGesture":false,
      "initiator":{"type":"other"},
      "loaderId":"C5F286A8A5744DEEB277D0718C4E34E8",
      "redirectHasExtraInfo":false,
      "request":{
        "headers":{
          "Upgrade-Insecure-Requests":"1",
          "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36",
          "sec-ch-ua":"\"Not/A)Brand\";v=\"99\", \"Google Chrome\";v=\"115\", \"Chromium\";v=\"115\"","sec-ch-ua-mobile":"?0",
          "sec-ch-ua-platform":"\"Windows\""
        },
        "initialPriority":"VeryHigh",
        "isSameSite":true,
        "method":"GET",
        "mixedContentType":"none",
        "referrerPolicy":"strict-origin-when-cross-origin",
        "url":"https://www.baidu.com/"
      },
      "requestId":"C5F286A8A5744DEEB277D0718C4E34E8",
      "timestamp":1508.943033,
      "type":"Document",
      "wallTime":1692620664.219256
    }
  },
  "webview":"62715239374117F099DBA348C45736CD"
}

这是一个请求包的示例，大家看到method凡是以Network开头的都是网络请求。

获取网络返回包
我这个需求是想获取一个请求的返回包的数据，其实本身在performance日志中也有Network.response*相关的日志，但不是完整的请求包

所以，就需要requesetId这个字段了，通过CDP来获取

message = json.loads(packet.get("message")).get("message")
packet_method = message.get("method")
if "Network" in packet_method:
    request_id = message.get("params").get("requestId")
    resp = driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': request_id})
    body = resp.get("body")

这里就可以将返回包完整的获取到了

如果有定制化需求，比如想获取某个链接的返回包等等，都可以去通过分析数据通过条件判断来处理

参考代码：

# 配置 ChromeDriver
chrome_options = Options()
# chrome_options.add_argument("--headless")  # 不显示浏览器界面
chrome_options.set_capability("goog:loggingPrefs", {"performance": "ALL"})  # 下面log要用到
service = Service('E:/python/chromedriver/128.0.6613.84/chromedriver.exe')  # 替换为你的 ChromeDriver 路径

driver = webdriver.Chrome(service=service, options=chrome_options)
driver.get('https://live.photoplus.cn/live/pc/37044594/#/live')

performance_log = driver.get_log("performance")
for log in performance_log:
    message = json.loads(log.get("message")).get("message")
    packet_method = message.get("method")
    if "Network" in packet_method:
        request = message.get("params").get('request')
        if(request):
            url = request.get("url")
            if(url and "page" in url):
                print(url)
                break