Selenium处理异步加载请求获取XHR消息体的2种方法

目录

通过Log读取XHR

简单使用示例


异步加载情况下,不涉及浏览器全局的加载,因此selenium会直接往下执行,这就导致异步结果还没返回,脚本就继续执行了。


方法一、通过Log读取XHR

构造chrome driver:

chrome_options = Options()
# -------------------------------------------------------------------- #
chrome_options.add_argument("--allow-running-insecure-content")
chrome_options.add_argument("--ignore-certificate-errors")
chrome_options.add_argument("--disable-single-click-autofill")
chrome_options.add_argument("--disable-autofill-keyboard-accessory-view[8]")
chrome_options.add_argument("--disable-full-form-autofill-ios")
chrome_options.add_experimental_option('perfLoggingPrefs', {
    'enableNetwork': True,
    'enablePage': False,
})
caps = DesiredCapabilities.CHROME
caps['goog:loggingPrefs'] = {
    'browser': 'ALL',
    'performance': 'ALL',
}
caps['perfLoggingPrefs'] = {
    'enableNetwork': True,
    'enablePage': False,
    'enableTimeline': False
}
# -------------------------------------------------------------------- #

driver = webdriver.Chrome(options=chrome_options, desired_capabilities=caps)

通过log来获取xhr:

def get_xhr_logs(driver):
    log_xhr_array = []
    for typelog in driver.log_types:
    perfs = driver.get_log(typelog)
    for row in perfs:
        log_data = row
        message_ = log_data['message']
        try:
            log_json = json.loads(message_)
            log = log_json['message']
            if log['method'] == 'Network.responseReceived':
                # 去掉静态js、css等,仅保留xhr请求
                type_ = log['params']['type']
                id = log['params']['requestId']
                if type_.upper() == "XHR":
                    # log_xhr_array.append(log)
                    log_xhr_array.append(id)
        except:
            pass
    return log_xhr_array

其中,上述中“message”的消息如下:

{
	'method': 'Network.responseReceived',
	'params': {
		'frameId': '77E0FFEEDA6B3CE3ADACCD6133701429',
		'loaderId': 'DA184885509BC77DB2426FCDB768E5FA',
		'requestId': '5620.89',
		'response': {
			'connectionId': 512,
			'connectionReused': False,
			'encodedDataLength': 295,
			'fromDiskCache': False,
			'fromPrefetchCache': False,
			'fromServiceWorker': False,
			'headers': {
				'access-control-allow-origin': '*',
				'cache-control': 'no-cache',
				'content-length': '271',
				'content-type': 'application/json',
				'date': 'Thu, 14 Apr 2022 08:15:24 GMT',
				'via': '1.1 4fe583422d0b309b9b1d4505e54b137c.cloudfront.net (CloudFront)',
				'x-amz-cf-id': 'bhkU5eqTsWXmJRXa1AUu2mto5kMsWoWR-ePxEFpXHeS3uUIRd-7seA==',
				'x-amz-cf-pop': 'JFK51-C1',
				'x-branch-request-id': '95066afcbce046c482bdea654034402a-2022041408',
				'x-cache': 'Miss from cloudfront'
			},
			'mimeType': 'application/json',
			'protocol': 'h2',
			'remoteIPAddress': '192.154.249.210',
			'remotePort': 9000,
			'responseTime': 1649924123904.849,
			'securityDetails': {
				'certificateId': 0,
				'certificateTransparencyCompliance': 'unknown',
				'cipher': 'AES_128_GCM',
				'issuer': 'DigiCert TLS RSA SHA256 2020 CA1',
				'keyExchange': '',
				'keyExchangeGroup': 'X25519',
				'protocol': 'TLS 1.3',
				'sanList': ['*.branch.io', 'branch.io'],
				'signedCertificateTimestampList': [],
				'subjectName': '*.branch.io',
				'validFrom': 1635292800,
				'validTo': 1669593599
			},
			'securityState': 'secure',
			'status': 200,
			'statusText': '',
			'timing': {
				'connectEnd': 1470.717,
				'connectStart': 0.071,
				'dnsEnd': -1,
				'dnsStart': -1,
				'proxyEnd': -1,
				'proxyStart': -1,
				'pushEnd': 0,
				'pushStart': 0,
				'receiveHeadersEnd': 2177.895,
				'requestTime': 233026.325475,
				'sendEnd': 1471.578,
				'sendStart': 1471.22,
				'sslEnd': 1470.707,
				'sslStart': 961.743,
				'workerFetchStart': -1,
				'workerReady': -1,
				'workerRespondWithSettled': -1,
				'workerStart': -1
			},
			'url': 'https://api2.branch.io/v1/open'
		},
		'timestamp': 233028.504486,
		'type': 'XHR'
	}
}

通过requestId可以获得详细的消息体:

def get_xhr_body(driver, requestId):
    response_body = driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': requestId})
    return response_body

简单使用示例

driver.find_element(by=By.XPATH, value='//*[@id="main"]/div[1]/form/button').send_keys(Keys.ENTER)
response = None
login_type = LoginType.Fail
while True:
    ids = get_xhr_logs(driver)
    print('>> 等待异步加载中...')
    if ids:
        for id in ids:
            try:
                body = get_xhr_body(driver, id)
                response = eval(body['body'])
                print(response)
                if response.get('token'):
                    login_type = LoginType.Success
                break
            except Exception:
                pass
        break
    time.sleep(0.5)
return login_type, response

方法二、使用开源工具selenium-wire

Github:https://github.com/wkeeling/selenium-wire

与selenium无缝衔接,非常好用~~

示例代码后期再补,可先自行前往官网查看。

  • 0
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论
使用selenium拦截异步请求可以通过以下步骤实现: 1. 首先,导入selenium库和相关模块: ```python from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC ``` 2. 创建一个WebDriver实例,比如使用Chrome浏览器: ```python driver = webdriver.Chrome() ``` 3. 打开目标网页: ```python driver.get('http://www.example.com') ``` 4. 使用WebDriverWait等待异步请求完成: ```python wait = WebDriverWait(driver, 10) # 设置等待时间为10秒 element = wait.until(EC.presence_of_element_located((By.ID, 'element_id'))) ``` 5. 拦截异步请求: ```python requests = driver.requests # 获取所有的请求 for request in requests: if request.response: print(request.url, request.response.status_code) ``` 在上述代码中,我们使用WebDriverWait来等待页面加载完成,然后使用driver.requests获取所有的请求,遍历请求列表并判断是否有响应,如果有响应则打印请求的URL和响应的状态码。 请注意,以上代码仅为示例,具的拦截异步请求的实现可能因网页结构和异步请求的方式而有所不同。具的实现方法可能需要根据实际情况进行调整。\[2\]\[3\] #### 引用[.reference_title] - *1* *3* [Python爬虫第二课 Selenium介绍和反爬技术](https://blog.csdn.net/fegus/article/details/124447201)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* [MVC – 14.ajax异步请求](https://blog.csdn.net/weixin_30974667/article/details/119524939)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

小锋学长生活大爆炸

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值