使用puppeteer之全网页截图-第一个版本

最新推荐文章于 2024-05-31 14:21:08 发布

向往的生活Life

最新推荐文章于 2024-05-31 14:21:08 发布

阅读量7.7k

点赞数 3

分类专栏： javascript puppeteer 文章标签： puppeteer

本文链接：https://blog.csdn.net/ASAS1314/article/details/81633423

版权

puppeteer 同时被 2 个专栏收录

4 篇文章 0 订阅

订阅专栏

javascript

3 篇文章 0 订阅

订阅专栏

免责声明：本人博客所有文章纯属学习之用，不涉及商业利益。不合适引用，自当删除！
先说一些废话：
因为是测试，没有给出项目的具体搭建流程。
Puppeteer是谷歌官方出品的一个通过DevTools协议控制headless Chrome的Node库。可以通过Puppeteer的提供的api直接控制Chrome模拟大部分用户操作来进行UI Test或者作为爬虫访问页面来收集数据。所以开发语言当然也就是js啦。
Github项目地址: puppeteer
API: puppeteer API，现在看的时候版本是1.7.0。
puppeteer的简单使用，大家可以参照官网上的demo，或者百度出来的文章，都会有相关的代码，但是似乎puppeteer用的人相对较少，所以真实资料也少很多。能找到的文章就是那么几篇。

至于puppeteer的安装就不累诉了，搜索一下，相信各位能找到，这里主要是针对使用puppeteer对访问网页后，对全网页进行截图（无对网页重出现的特殊场景进行处理，如：验证码、登录框等）。以https://www.jd.com为例。

注意点：需要翻页，否则页面加载不全，则截图时，展示不全。后续会有优化版。

上代码
无翻页版本，写到这里就简单的给出类似demo的例子：

// 导入包
const puppeteer = require('puppeteer');

(async () => {
    // 启动Chromium
    const browser = await puppeteer.launch({ignoreHTTPSErrors: true, headless:false, args: ['--no-sandbox']});
    // 打开新页面
    const page = await browser.newPage();
    // 设置页面分辨率
    await page.setViewport({width: 1920, height: 1080});

    let request_url = 'https://www.jd.com';

    // 访问
    await page.goto(request_url, {waitUntil: 'domcontentloaded'}).catch(err => console.log(err));
    await page.waitFor(1000);
    let title = await page.title();
    console.log(title);

    try {
        // 截图
        await page.screenshot({path: "jd.jpg", fullPage:true}).catch(err => {
            console.log('截图失败');
            console.log(err);
        });
        await page.waitFor(5000);

    } catch (e) {
        console.log('执行异常');
    } finally {
        await browser.close();
    }
})();

运行结果截图：
这里写图片描述
可以发现，图片缺失严重，不管你在代码中等待多久，都是没用的，需要对页面进行滚动，触发页面的滚动事件。
滚动版本，这个版本可以实现滚动，但是觉得代码写的不好，而且对个别一些网站不兼容，所以才成为第一个版本：

const puppeteer = require('puppeteer');

(async () => {
    // 启动Chromium
    const browser = await puppeteer.launch({ignoreHTTPSErrors: true, headless:false, args: ['--no-sandbox']});
    // 打开新页面
    const page = await browser.newPage();
    // 设置页面分辨率
    await page.setViewport({width: 1920, height: 1080});

    let request_url = 'https://www.jd.com';
    // 访问
    await page.goto(request_url, {waitUntil: 'domcontentloaded'}).catch(err => console.log(err));
    await page.waitFor(1000);
    let title = await page.title();
    console.log(title);

    // 网页加载最大高度
    const max_height_px = 20000;
    // 滚动高度
    let scrollStep = 1080;
    let height_limit = false;
    let mValues = {'scrollEnable': true, 'height_limit': height_limit};

    while (mValues.scrollEnable) {
        mValues = await page.evaluate((scrollStep, max_height_px, height_limit) => {

            // 防止网页没有body时，滚动报错
            if (document.scrollingElement) {
                let scrollTop = document.scrollingElement.scrollTop;
                document.scrollingElement.scrollTop = scrollTop + scrollStep;

                if (null != document.body && document.body.clientHeight > max_height_px) {
                    height_limit = true;
                } else if (document.scrollingElement.scrollTop + scrollStep > max_height_px) {
                    height_limit = true;
                }

                let scrollEnableFlag = false;
                if (null != document.body) {
                    scrollEnableFlag = document.body.clientHeight > scrollTop + 1081 && !height_limit;
                } else {
                    scrollEnableFlag = document.scrollingElement.scrollTop + scrollStep > scrollTop + 1081 && !height_limit;
                }

                return {
                    'scrollEnable': scrollEnableFlag,
                    'height_limit': height_limit,
                    'document_scrolling_Element_scrollTop': document.scrollingElement.scrollTop
                };
            }

        }, scrollStep, max_height_px, height_limit);

        await sleep(800);
    }

    try {
        await page.screenshot({path: "jd.jpg", fullPage:true}).catch(err => {
            console.log('截图失败');
            console.log(err);
        });
        await page.waitFor(5000);

    } catch (e) {
        console.log('执行异常');
    } finally {
        await browser.close();
    }

})();

//延时函数
function sleep(delay) {
    return new Promise((resolve, reject) => {
        setTimeout(() => {
            try {
                resolve(1)
            } catch (e) {
                reject(0)
            }
        }, delay)
    })
}

截图效果：
这里写图片描述
可以看到，效果还是可以的，至于等待时间，需要根据你的网络环境进行延时。当然，此版本只是为了大家学习。

向往的生活Life

关注

3
点赞
踩
8

收藏

觉得还不错? 一键收藏
打赏
4
评论
使用puppeteer之全网页截图-第一个版本

先说一些废话：因为是测试，没有给出项目的具体搭建流程。 Puppeteer是谷歌官方出品的一个通过DevTools协议控制headless Chrome的Node库。可以通过Puppeteer的提供的api直接控制Chrome模拟大部分用户操作来进行UI Test或者作为爬虫访问页面来收集数据。所以开发语言当然也就是js啦。 Github项目地址: puppeteer API: pupp...
复制链接

扫一扫