ChromeWebDriver 无头浏览器完整爬到页面的逻辑内容

天才梦浪

于 2021-07-08 14:12:14 发布

阅读量311

点赞数

分类专栏：后端爬虫文章标签： java

本文链接：https://blog.csdn.net/hlw521hxq/article/details/118572381

版权

后端同时被 2 个专栏收录

26 篇文章 0 订阅

订阅专栏

爬虫

2 篇文章 0 订阅

订阅专栏

ChromeWebDriver 完整爬到页面的逻辑内容

  		/*配置基础选项*/
        ChromeOptions options = new ChromeOptions();
        /*配置页面加载策略 */
        options.setPageLoadStrategy(PageLoadStrategy.EAGER);
        /*设置无头模式*/
        options.addArguments("-headless");
        /*创建驱动*/
        ChromeDriver webDriver = new ChromeDriver(options);
        /*设置页面超时时间配置*/
        WebDriver.Timeouts timeouts = webDriver.manage().timeouts();
        timeouts.pageLoadTimeout(20, TimeUnit.SECONDS);
        /*获取页面*/
        webDriver.get("https://www.cnblogs.com/dk1024/p/11590510.html");
        /*获取页面元素*/
        WebElement webElement = webDriver.findElement(By.xpath("/html"));
        /*获取页面内容*/
        String content = webElement.getAttribute("outerHTML");
        /*完整页面html*/
        Html html = new Html(content);