pyppeteer报错 Protocol error (Runtime.callFunctionOn): Cannot find context with specified id

原问题网址:Exception: Execution context was destroyed, most likely because of a navigation. issue - PythonTechWorld

I got something working for a specific case of webpage redirect.

At time of writing my software and packages version is:

Python==3.7.3
requests-html==0.10.0
pyppeteer==0.0.25

# for ipython notebook asyncio issues
tornado==4.5.3

Here's a excerpt of the sample target page content with redirection using both javascript and meta-tag:

<script>url="http://example.com/somewhereelse";window.location.assign(url)</script>
<noscript><meta http-equiv="refresh" content="0; url=http://example.com/somewhereelse"></noscript>

The code I ran which errored was:

from requests_html import HTMLSession

session = HTMLSession()
session.get("http://mysite.com")
r.html.render()

The above code results in:

NetworkError: Execution context was destroyed, most likely because of a navigation.

if we look carefully at the documentation:

>>> help(r.html.render)

Help on method render in module requests_html:

render(retries: int = 8, script: str = None, wait: float = 0.2, scrolldown=False, sleep: int = 0, reload: bool = True, timeout: Union[float, int] = 8.0, keep_page: bool = False) method of requests_html.HTML instance
    Reloads the response in Chromium, and replaces HTML content
    with an updated version, with JavaScript executed.
    
    :param retries: The number of times to retry loading the page in Chromium.
    :param script: JavaScript to execute upon page load (optional).
    :param wait: The number of seconds to wait before loading the page, preventing timeouts (optional).
    :param scrolldown: Integer, if provided, of how many times to page down.
    :param sleep: Integer, if provided, of how many long to sleep after initial render.
    :param reload: If ``False``, content will not be loaded from the browser, but will be provided from memory.
    :param keep_page: If ``True`` will allow you to interact with the browser page through ``r.html.page``.
    
    If ``scrolldown`` is specified, the page will scrolldown the specified
    number of times, after sleeping the specified amount of time
    (e.g. ``scrolldown=10, sleep=1``).
    
    If just ``sleep`` is provided, the rendering will wait *n* seconds, before
    returning.

The key thing is the param, "sleep".

A few points to note:

  1. the above target page sample shows the meta refresh is content="0;... which means 0 seconds wait to redirect the page.
  2. Looking at the javacript code there's no wait/sleep/delay either.
  3. Under current hardware speeds, and internet access speed, I don't expect the chromium browser running headless to refresh/redirect and load target page slower than 1 seconds (unless it is a big page and multiple more redirects).

Therefore, 1 seconds wait is a reasonable time to set before returning render().

In addition we have to use keep_page for extraction of crucial information, to be shown later.

changing the input of the render() method to:

r.html.render(sleep=1, keep_page=True)

Allowed the code to run without issues. If it still errors (due to slow network speed, cpu busy, etc.), try again with higher sleep.

To find out the redirected page's URL:

>>> r.html.page.url

http://example.com/somewhereelse

This issue deals with page redirects erroring, and with this line of thought:
Although the above solution works, and it's clunky to implement a try-except loop to retry with increasing sleep time to make it work.

I'm still trying to find an equivalent of "window.onload" method to get the sleep to be automatic or dynamic wait for response from headless browser to "ping back" rather than the current method of python doing increment "polling" to check whether the redirect is completed and target URL destination has been reached.

I'm all ears to better methods if anyone comes up with any.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值