前段时间接到一个任务,是将网页生成PDF。
开始的解决方案是使用html2canvas
将页面生成图片,再使用jspdf
生成PDF
。
这样做有3个缺点:
html2canvas
生成的图片比较模糊html2canvas
和jspdf
非常吃CPU,会导致性能差一点的电脑卡死- 生成的
PDF
,因为内容是图片生成的,所以文字等内容不能选中、复制等操作
后面了解到Puppeteer
库,Puppeteer
是一个node库,他提供了一组用来操纵Chrome的API, 通俗来说就是一个 headless chrome
浏览器。既然是浏览器,那么我们手工可以在浏览器上做的事情Puppeteer
都能胜任:
- 生成网页截图或者
PDF
(VUE
这类的SPA也可以生成网页截图或者PDF
) - 高级爬虫,可以爬取大量异步渲染内容的网页
- 模拟键盘输入、表单自动提交、登录网页等,实现 UI 自动化测试
- 捕获站点的时间线,以便追踪你的网站,帮助分析网站性能问题
我们使用midwayjs
作为HTTP服务,通过调用Puppeteer
生成图片和PDF
返回二进制给浏览器。
搭建环境
-
创建项目,
node
版本需要18.0.0
以上npm init midway@latest -y cd G:\workspaces\export-service
-
安装
puppeteer-core
npm i puppeteer-core -S npm i carlo -S
安装
puppeteer-core
和carlo
需要代理,npm
设置代理的方法# 设置 HTTP 代理 npm config set proxy http://127.0.0.1:7890 # 设置 HTTPS 代理 npm config set https-proxy http://127.0.0.1:7890 # 移除代理设置 npm config delete proxy npm config delete https-proxy # 查看当前代理设置 npm config get proxy npm config get https-proxy
导出pdf和图片
-
创建
puppeteer.service.ts
import { Provide } from '@midwayjs/core'; import { IImageOptions } from '../interface'; import puppeteer from 'puppeteer-core'; import * as findChrome from 'carlo/lib/find_chrome'; @Provide() export class PuppeteerService { async getImage(data: IImageOptions) { // 创建一个 puppeteer 实例 const findChromePath = await findChrome({}); const executablePath = findChromePath.executablePath; const browser = await puppeteer.launch({ args: [ // Required for Docker version of Puppeteer '--no-sandbox', '--disable-setuid-sandbox', // This will write shared memory files into /tmp instead of /dev/shm, // because Docker’s default for /dev/shm is 64MB '--disable-dev-shm-usage', ], headless: true, executablePath, }); const page = await browser.newPage(); if (data.cookies) { await page.setCookie(...data.cookies); } await page.goto(data.url); const buffer = await page.screenshot({ fullPage: true, type: 'jpeg' }); await browser.close(); return buffer; } async getPdf(data: IImageOptions) { // 创建一个 puppeteer 实例 const findChromePath = await findChrome({}); const executablePath = findChromePath.executablePath; const browser = await puppeteer.launch({ args: [ // Required for Docker version of Puppeteer '--no-sandbox', '--disable-setuid-sandbox', // This will write shared memory files into /tmp instead of /dev/shm, // because Docker’s default for /dev/shm is 64MB '--disable-dev-shm-usage', ], headless: true, executablePath, }); const page = await browser.newPage(); if (data.cookies) { await page.setCookie(...data.cookies); } await page.goto(data.url); const buffer = await page.pdf({ printBackground: true, margin: { top: 20, bottom: 20, }, }); await browser.close(); return buffer; } }
-
修改
interface.ts
export interface IImageOptions { url: string; cookies?: { name: string; value: string; path?: string; domain?: string; }[]; }
-
创建
api.controller.ts
import { Inject, Controller, Get } from '@midwayjs/core'; import { Context } from '@midwayjs/koa'; import { PuppeteerService } from '../service/puppeteer.service'; @Controller('/') export class APIController { @Inject() ctx: Context; @Inject() puppeteerService: PuppeteerService; @Get('/img') async getImg() { const buffer = await this.puppeteerService.getImage({ url: 'https://www.baidu.com', }); this.ctx.type = 'image/jpeg'; this.ctx.set('Accept', 'image/webp,image/apng,image/png,image/*,*/*;q=0.8'); // 下载图片 // this.ctx.set('content-disposition', 'attachment; filename="baidu.png"'); this.ctx.body = buffer; } @Get('/pdf') async getPdf() { const buffer = await this.puppeteerService.getPdf({ url: 'https://www.baidu.com', }); this.ctx.type = '.pdf'; // this.ctx.set('Content-Type', 'application/octet-stream'); this.ctx.set('Accept', 'image/webp,image/apng,image/png,image/*,*/*;q=0.8'); // 下载PDF // this.ctx.set('content-disposition', 'attachment; filename="baidu.png"'); this.ctx.body = buffer; } }
-
文件架构
-
启动
npm run dev
-
测试
-
导出图片
-
导出
PDF
-
问题
使用jmeter
做压测发现内存暴了,导致服务重启。
原因:每一次请求都去产生一个puppeteer
实例。产生一个 puppeteer
实例就等于打开一个chrome,这是一个非常消耗性能的行为。
优化
使用连接池generic-pool
优化。
-
安装
generic-pool
npm i generic-pool -S
-
创建
puppeteer-pool.ts
import puppeteer, { Browser, BrowserContext } from 'puppeteer-core'; import { createPool, Pool } from 'generic-pool'; import * as findChrome from 'carlo/lib/find_chrome'; interface IPuppeteerPool { max?: number; min?: number; maxUses?: number; testOnBorrow?: boolean; autostart?: boolean; idleTimeoutMillis?: number; evictionRunIntervalMillis?: number; puppeteerArgs?: number; validator?: () => Promise<boolean>; } export class PuppeteerPool { private static _instance: PuppeteerPool; private _options: IPuppeteerPool; private _useCount = 0; private _browser: Browser; private _pool: Pool<BrowserContext>; public static async getInstance(options: IPuppeteerPool = {}) { if (!this._instance) { this._instance = new PuppeteerPool(options); await this._instance.init(); } return this._instance; } /** * 初始化一个 Puppeteer 池 * @param {Object} [options={}] 创建池的配置配置 * @param {Number} [options.max=10] 最多产生多少个 puppeteer 实例 。如果你设置它,请确保 在引用关闭时调用清理池。 pool.drain().then(()=>pool.clear()) * @param {Number} [options.min=1] 保证池中最少有多少个实例存活 * @param {Number} [options.maxUses=2048] 每一个 实例 最大可重用次数,超过后将重启实例。0表示不检验 * @param {Number} [options.testOnBorrow=2048] 在将 实例 提供给用户之前,池应该验证这些实例。 * @param {Boolean} [options.autostart=false] 是不是需要在 池 初始化时 初始化 实例 * @param {Number} [options.idleTimeoutMillis=3600000] 如果一个实例 60分钟 都没访问就关掉他 * @param {Number} [options.evictionRunIntervalMillis=180000] 每 3分钟 检查一次 实例的访问状态 * @param {Object} [options.puppeteerArgs={}] puppeteer.launch 启动的参数 * @param {Function} [options.validator=(instance)=>Promise.resolve(true))] 用户自定义校验 参数是 取到的一个实例 * @param {Object} [options.otherConfig={}] 剩余的其他参数 // For all opts, see opts at https://github.com/coopernurse/node-pool#createpool */ constructor(options: IPuppeteerPool = {}) { this._options = options; } public async init() { await this._initBrowser(); this._initPool(); } private async _initBrowser() { // 创建一个 puppeteer 实例 const findChromePath = await findChrome({}); const executablePath = findChromePath.executablePath; this._browser = await puppeteer.launch({ args: [ // Required for Docker version of Puppeteer '--no-sandbox', '--disable-setuid-sandbox', // This will write shared memory files into /tmp instead of /dev/shm, // because Docker’s default for /dev/shm is 64MB '--disable-dev-shm-usage', ], headless: true, executablePath, }); } private _initPool() { const { max = 10, min = 2, maxUses = 2028, testOnBorrow = true, autostart = false, idleTimeoutMillis = 3600000, evictionRunIntervalMillis = 180000, validator = (instance: BrowserContext) => Promise.resolve(true), ...otherConfig } = this._options; const factory = { create: async () => { // 创建一个匿名的浏览器上下文 const instance = this._browser; // 创建一个 puppeteer 实例 ,并且初始化使用次数为 0 this._useCount = 0; return await instance.createIncognitoBrowserContext(); }, destroy: async (instance: BrowserContext) => { await instance.close(); }, validate: async (instance: BrowserContext) => { // 执行一次自定义校验,并且校验校验 实例已使用次数。 当 返回 reject 时 表示实例不可用 const valid = await validator(instance); return valid && (maxUses <= 0 || this._useCount < maxUses); }, }; const config = { max, min, testOnBorrow, autostart, idleTimeoutMillis, evictionRunIntervalMillis, ...otherConfig, }; this._pool = createPool(factory, config); const genericAcquire = this._pool.acquire.bind(this._pool); // 重写了原有池的消费实例的方法。添加一个实例使用次数的增加 this._pool.acquire = () => genericAcquire().then((instance: BrowserContext) => { this._useCount += 1; return instance; }); } public async use(fn: (instance: BrowserContext) => Promise<BrowserContext>) { let resource: BrowserContext; return this._pool .acquire() .then(async r => { resource = r; return resource; }) .then(fn) .then( result => { // 不管业务方使用实例成功与后都表示一下实例消费完成 this._pool.release(resource); return result; }, err => { this._pool.release(resource); throw err; } ); } get pool(): Pool<BrowserContext> { return this._pool; } }
-
修改
puppeteer.service.ts
import { Provide } from '@midwayjs/core'; import { IImageOptions } from '../interface'; import { PuppeteerPool } from '../util/puppeteer-pool'; @Provide() export class PuppeteerService { async getImage(data: IImageOptions) { const pool = await PuppeteerPool.getInstance(); return pool.use(async instance => { const page = await instance.newPage(); if (data.cookies) { await page.setCookie(...data.cookies); } await page.goto(data.url); const buffer = await page.screenshot({ fullPage: true, type: 'jpeg' }); await page.close(); return buffer; }); } async getPdf(data: IImageOptions) { const pool = await PuppeteerPool.getInstance(); return pool.use(async instance => { const page = await instance.newPage(); if (data.cookies) { await page.setCookie(...data.cookies); } await page.goto(data.url); const buffer = await page.pdf({ printBackground: true, margin: { top: 20, bottom: 20, }, }); await page.close(); return buffer; }); } }
部署
这里使用docker
进行部署,部署前需要下载chrome
浏览器下载。
下载后将文件放在项目的build/google-chrome-stable_current_x86_64.rpm
-
在项目的根目录下创建
docker
配置文件Dockerfile
FROM node:18 AS build RUN npm config set https-proxy http://192.168.56.1:7890 WORKDIR /app COPY ./package.json /app COPY ./package-lock.json /app COPY ./tsconfig.json /app COPY ./.editorconfig /app COPY ./.eslintrc.json /app COPY ./.prettierrc.js /app COPY ./bootstrap.js /app COPY ./src /app/src RUN npm install RUN npm run build FROM node:18 AS chrome-stable WORKDIR /app COPY ./build/google-chrome-stable_current_amd64.deb /app/google-chrome-stable_current_amd64.deb RUN apt-get update && apt-get install -y fonts-liberation libasound2 fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-khmeros fonts-kacst fonts-freefont-ttf \ libatk-bridge2.0-0 libgtk-3-0 libnspr4 libnss3 libx11-xcb1 libxss1 libxtst6 lsb-release xdg-utils libu2f-udev libvulkan1 RUN dpkg -i /app/google-chrome-stable_current_amd64.deb RUN rm -rf /var/lib/apt/lists/* FROM chrome-stable RUN npm config set https-proxy http://192.168.56.1:7890 WORKDIR /app COPY --from=build /app/dist ./dist # 把源代码复制过去, 以便报错能报对行 COPY --from=build /app/src ./src COPY --from=build /app/bootstrap.js ./ COPY --from=build /app/package.json ./ COPY --from=build /app/package-lock.json ./ ENV TZ="Asia/Shanghai" RUN npm install --production # 如果端口更换,这边可以更新一下 EXPOSE 7001 CMD ["npm", "run", "start"]
-
创建
docker-compose
配置文件docker-compose/docker-compose.yml
version: '3' services: export-service: build: context: .. dockerfile: Dockerfile image: export-service:latest container_name: export-service restart: always ports: - 7001:7001
-
启动命令
cd docker-compose sudo docker-compose up -d
其他命令
# 不使用缓存,打包镜像 sudo docker-compose build --no-cache
github:
如果觉得对您有帮助,还烦请点击下面的链接,帮忙github点个star~谢谢~