selenium获取网络响应的正文

最新推荐文章于 2024-08-11 17:56:49 发布

Jason_WangYing

最新推荐文章于 2024-08-11 17:56:49 发布

阅读量3.3k

点赞数 2

分类专栏： python3 文章标签： selenium 测试工具

本文链接：https://blog.csdn.net/Jason_WangYing/article/details/127168102

版权

python3 专栏收录该内容

245 篇文章 14 订阅

订阅专栏

有时我们爬虫时，需要获取到页面的api接口响应正文

下面我们上具体代码：

from selenium import webdriver 
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import time
import json

caps = {
    'browserName': 'chrome',
    'loggingPrefs': {
        'browser': 'ALL',
        'driver': 'ALL',
        'performance': 'ALL',
    },
    'goog:chromeOptions': {
        'perfLoggingPrefs': {
            'enableNetwork': True,
        },
        'w3c': False,
    },
}
driver = webdriver.Chrome(desired_capabilities=caps)

driver.get('http://127.0.0.1:8000/apiGet?id=1&pid=334')
# 必须等待一定的时间，不然会报错提示获取不到日志信息，因为絮叨等所有请求结束才能获取日志信息
time.sleep(3)

request_log = driver.get_log('performance')

到这里，我们成功获取到了网页加载的所有资源，资源都在request_log里面，现在我们就需要筛选出来我们需要的资源requestId

for i in range(len(request_log)):
    message = json.loads(request_log[i]['message'])
    message = message['message']['params']
    # .get() 方式获取是了避免字段不存在时报错
    request = message.get('request')
    if(request is None):
        continue

    url = request.get('url')
    if(url == "http://127.0.0.1:8000/apiGet?id=1&pid=334"):
        # 得到requestId
        print(message['requestId'])
        # 通过requestId获取接口内容
        content = driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': message['requestId']})
        print(content)
        break

这里最主要是driver里面的 execute_cdp_cmd方法，我们可以根据此方法来获取网页的请求包，具体文档参考此。

查看的源码是

def execute_cdp_cmd(self, cmd, cmd_args):
    """
    Execute Chrome Devtools Protocol command and get returned result
    The command and command args should follow chrome devtools protocol domains/commands, refer to link
    https://chromedevtools.github.io/devtools-protocol/

    :Args:
     - cmd: A str, command name
     - cmd_args: A dict, command args. empty dict {} if there is no command args

    :Usage:
        driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': requestId})

    :Returns:
        A dict, empty dict {} if there is no result to return.
        For example to getResponseBody:

        {'base64Encoded': False, 'body': 'response body string'}

    """
    return self.execute("executeCdpCommand", {'cmd': cmd, 'params': cmd_args})['value']

结果就是这样

Jason_WangYing

关注

2
点赞
踩
16

收藏

觉得还不错? 一键收藏
0
评论
selenium获取网络响应的正文

到这里，我们成功获取到了网页加载的所有资源，资源都在request_log里面，现在我们就需要筛选出来我们需要的资源requestId。这里最主要是driver里面的 execute_cdp_cmd方法，我们可以根据此方法来获取网页的请求包，具体。有时我们爬虫时，需要获取到页面的api接口响应正文。
复制链接

扫一扫

专栏目录