python 访问网页生成流量_硒-python。如何捕获网络流量的响应

I am using python Django to create a web app.

i am using selenium to launch a headless browser(phantomjs) and making some clicks till i reach a particular page.

I wish to capture network traffic and get the response of a particular network call. This network call actually holds a html doc as it's response.

Any way to achieve this ?

解决方案

You can get access to browser or chromedriver logs, they are slightly different when it comes to network responses. The browser log is called performance and the driver log is called driver. They return a json-like object, which you can parse to extract events with Network methods inside them:

{'level': 'INFO',

'message': '{"message":{"method":"Page.frameStoppedLoading","params":{"frameId":"FB10764A3ABF7FFC83110C39C5F7BF77"}},"webview":"C2D13BD13CF743B6D0695B35E9CC935C"}',

'timestamp': 1538607113832},

{'level': 'INFO',

'message': '{"message":{"method":"Page.frameDetached","params":{"frameId":"FB10764A3ABF7FFC83110C39C5F7BF77"}},"webview":"C2D13BD13CF743B6D0695B35E9CC935C"}',

'timestamp': 1538607113838},

{'level': 'INFO',

'message': '{"message":{"method":"Network.requestWillBeSent","params":{"documentURL":"https://stackoverflow.com/questions/52633697/selenium-python-how-to-capture-network-traffics-response","frameId":"C2D13BD13CF743B6D0695B35E9CC935C","hasUserGesture":false,"initiator":{"type":"other"},"loaderId":"5331BFDC4F466FCED920CFC9F033D2EC","request":{"headers":{"Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"},"initialPriority":"VeryHigh","method":"GET","mixedContentType":"none","referrerPolicy":"no-referrer-when-downgrade","url":"https://stackoverflow.com/questions/52633697/selenium-python-how-to-capture-network-traffics-response"},"requestId":"5331BFDC4F466FCED920CFC9F033D2EC","timestamp":104499.729,"type":"Document","wallTime":1538607113.838206}},"webview":"C2D13BD13CF743B6D0695B35E9CC935C"}',

'timestamp': 1538607113839},...}

You need to enable logging in DesiredCapabilities and then parse it using JSON module:

import json

from selenium import webdriver

from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

caps = DesiredCapabilities.CHROME

caps['loggingPrefs'] = {'performance': 'ALL'}

driver = webdriver.Chrome(desired_capabilities=caps)

driver.get('https://stackoverflow.com/questions/52633697/selenium-python-how-to-capture-network-traffics-response')

def process_browser_log_entry(entry):

response = json.loads(entry['message'])['message']

return response

browser_log = driver.get_log('performance')

events = [process_browser_log_entry(entry) for entry in browser_log]

events = [event for event in events if 'Network.response' in event['method']]

I don't know if you can get access to response data itself using this, but you can get a url of the response.

UPDATE 2020-10-07 ⬇

As @Roey B and @Inactivist explain in the comments, you can access response body using Network.getResponseBody command:

driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': events[0]["params"]["requestId"]})

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值