puppeteer pdf_如何使用Node和Google Puppeteer生成HTML表和PDF

puppeteer pdf

Understanding NodeJS internally can be a little bit daunting (I know it was for me once). Node is a very powerful language and it can do a lot of things.

内部了解NodeJS可能会有些艰巨(我知道这对我来说只是一次)。 Node是一种非常强大的语言,它可以做很多事情。

Today I wanted to uncover the power of Node’s built-in utility tool called fs (file system)

今天,我想揭示Node内置的称为fs ( 文件系统 )的实用工具的功能

As per the fs docs:

根据fs docs:

The fs module provides an API for interacting with the file system in a manner closely modeled around standard POSIX functions.

fs模块提供了一种API,用于以围绕标准POSIX函数紧密建模的方式与文件系统进行交互。

Which is just a fancy way of saying that file system is a way in Node to interact with files for both read and write operations.

这只是一种奇特的说法,即文件系统是Node中与文件进行读写操作的一种方式。

Now file system is a humongous utility in NodeJS that has a lot of fancy features. In this article, however I will only discuss 3:

现在, 文件系统是NodeJS中一个庞大的实用程序,具有许多精美功能。 但是,在本文中,我将仅讨论3:

  • Getting file information: fs.statSync

    获取文件信息: fs.statSync

  • Deleting a file: fs.unlinkSync

    删除文件: fs.unlinkSync

  • Writing data to a file: fs.writeFileSync

    将数据写入文件: fs.writeFileSync

Another thing we will cover in this article is Google Puppeteer which is this really cool, slick tool created by some awesome folks at Google.

我们将在本文中介绍的另一件事是Google Puppeteer ,它是由Google的一些很棒的人创建的非常酷,精巧的工具。

So what is puppeteer? Well as per the docs, they say:

那么什么是木偶? 根据文档,他们说:

Puppeteer is a Node library which provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol. It can also be configured to use full (non-headless) Chrome or Chromium.

Puppeteer是一个Node库,它提供了高级API来通过DevTools协议控制无头 Chrome或Chromium。 也可以将其配置为使用完整(无头)的Chrome或Chromium。

So it’s basically a tool that lets you do all the cool browser related things on server. Like getting a website’s screenshots, crawling websites, and generating pre-render content for single page applications. You can even do form submissions via your NodeJS server.

因此,从根本上讲,它是一个工具,可让您在服务器上完成所有与浏览器相关的工作。 就像获取网站的屏幕截图,抓取网站并为单页应用程序生成预渲染内容一样。 您甚至可以通过NodeJS服务器进行表单提交。

Again puppeteer is a huge tool, so we will cover just a small but a very cool feature of puppeteer. We’ll look at how to generate a nice PDF file based on our generated HTML table file. In the process we’ll learn about puppeteer.launch() and understand a bit about page() & pdf().

再者,puppeteer是一个巨大的工具,因此我们将只介绍puppeteer的一个很小但很酷的功能。 我们将研究如何基于我们生成HTML表文件来生成一个漂亮的PDF文件。 在此过程中,我们将学习puppeteer.launch()并了解有关page()和pdf()的知识。

因此,再次简要概述一下,我们将介绍的内容: (So to again give a brief overview, things we will cover:)
  • Generating stub data (for invoices) using an online tool.

    使用在线工具生成存根数据(用于发票)。
  • Creating an HTML table with a little bit of styling with generated data in it, using an automated node script.

    使用自动节点脚本创建带有样式HTML表,并在其中生成数据。
  • Learning about checking if a file exists or not using fs.statSync

    使用fs.statSync了解有关检查文件是否存在的信息
  • Learning about deleting a file by using fs.unlinkSync

    通过使用fs.unlinkSync了解有关删除文件的信息
  • Learning about writing a file using fs.writeFileSync

    了解有关使用fs.writeFileSync写入文件的信息
  • Creating a PDF file of that HTML file generated using Google puppeteer

    创建使用Google puppeteer生成HTML文件的PDF文件
  • Making them into npm scripts, to be used later ? ?

    使它们成为npm脚本,以备后用? ?

Also before we begin here is the entire source code of the tutorial, for everyone to follow along. You don’t have to write anything, but you should write code along with this tutorial. That will prove more useful & you’ll understand more. SOURCE CODE OF TUTORIAL

同样,在我们开始之前,还有本教程的全部源代码 ,每个人都可以跟随。 您无需编写任何内容,但应随本教程一起编写代码。 事实证明这将更加有用,并且您将了解更多。 源代码

Before we begin, please ensure that you have at least the following installed on your machine

在开始之前,请确保您的计算机上至少安装了以下产品

  • Node version 8.11.2

    节点版本8.11.2
  • Node Package Manager (NPM) version 6.9.0

    节点软件包管理器(NPM)版本6.9.0

You don’t need to, but you can also watch an introductory video (my first ever made) that talks about the basics in reading, writing, and deleting a file in NodeJS. This will help you understand this tutorial. (Please do give me feedback). ?

您不需要,但是您也可以观看介绍性视频(我的第一个视频),该视频讨论了在NodeJS中读取,写入和删除文件的基本知识。 这将帮助您理解本教程。 (请给我反馈)。 ?

让我们开始吧 (Let’s get started)

第1步: (Step 1:)

In your terminal type in the following:

在您的终端中输入以下内容:

npm init -y

This will initialize an empty project for you.

这将为您初始化一个空项目。

第2步: (Step 2:)

Second, in the same folder, create a new file called data.json and have some mocked data in it. You can use the following JSON sample.

其次,在同一文件夹中,创建一个名为data.json的新文件,并在其中包含一些模拟数据。 您可以使用以下JSON示例。

You can get the mocked JSON stub data from here. For generating this data I have used an awesome tool called https://mockaroo.com/ It is an online data generator tool.

您可以从此处获取模拟的JSON存根数据 为了生成此数据,我使用了一个很棒的工具https://mockaroo.com/。它是一个在线数据生成器工具。

The JSON data I am going with has a structure like this:

我要处理的JSON数据具有以下结构:

[
  {},
  {},
  {
   "invoiceId": 1,
   "createdDate": "3/27/2018",
   "dueDate": "5/24/2019",
   "address": "28058 Hazelcrest Center",
   "companyName": "Eayo",
   "invoiceName": "Carbonated Water - Peach",
   "price": 376
  },
  {
   "invoiceId": 2,
   "createdDate": "6/14/2018",
   "dueDate": "11/14/2018",
   "address": "6205 Shopko Court",
   "companyName": "Ozu",
   "invoiceName": "Pasta - Fusili Tri - Coloured",
   "price": 285
  },
  {},
  {}
]

You can download the complete JSON array for this tutorial from here.

您可以从此处下载本教程的完整JSON数组

第三步: (Step 3:)

Next create a new file called buildPaths.js

接下来创建一个名为buildPaths.js的新文件

const path = require('path');
const buildPaths = {
   buildPathHtml: path.resolve('./build.html'),
   buildPathPdf: path.resolve('./build.pdf')
};
module.exports = buildPaths;

So path.resolve will take in a relative path and return us the absolute path of that particular directory.

因此path.resolve将采用相对路径,并向我们返回该特定目录的绝对路径。

So path.resolve('./build.html'); will for example return something like this:

所以path.resolve('./build.html'); 例如将返回如下内容:

$ C:\\Users\\Adeel\\Desktop\\articles\\tutorial\\build.html
第4步: (Step 4:)

In the same folder create a file called createTable.js and add the following code:

在同一文件夹中,创建一个名为createTable.js的文件,并添加以下代码:

I know that is a lot of code, but let’s divide it into chunks and start understanding it piece by piece.

我知道这是很多代码,但是让我们将其分为多个块,然后开始逐段地理解它。

Go to line 106 (github gist)

转到第106行 ( github gist )

In our try/catch block we first check if the build file for HTML exists in the system or not. This is the path of the file where our NodeJS script will generate our HTML.

try/catch块中,我们首先检查系统中是否存在HTML的构建文件。 这是NodeJS脚本将生成HTML的文件的路径。

if (doesFileExist(buildPathHtml){} calls doesFileExist() method which simply returns true/false. For this we use

if (doesFileExist(buildPathHtml){}调用dosFileExist()方法,该方法仅返回true / false。为此,我们使用

fs.statSync(filePath);

This method actually returns information about the file like the size of the file, when the file was created, and so on. However if we provide it an invalid file path, this method returns as a null error. Which we use here to our benefit and wrap the fs.statSync() method in a try/catch. If Node is successfully able to read the file in our try block, we return true — otherwise it throws an error which we get in our catch block and returns false.

此方法实际上返回有关文件的信息,例如文件的大小,创建文件的时间等等。 但是,如果我们为它提供了无效的文件路径,则此方法将返回null错误。 我们在这里受益匪浅,并将fs.statSync()方法包装在try/catch 。 如果Node能够成功读取try块中的文件,则返回true否则,它将引发错误,并在catch块中获取该错误并返回false

If the file exists in the system we end up deleting the file using

如果文件存在于系统中,我们最终将使用以下命令删除文件

fs.unlinkSync(filePath); // takes in a file path & deletes it

After deleting the file, we need to generate rows to put in the table.

删除文件后,我们需要生成要放入表中的行。

步骤5: (Step 5:)

So first we import data.json which we do at line 3 & then on line 115 we iterate each item using map(). You can read more about Array.prototype.map() here.

因此,首先我们导入data.json第3行执行),然后在第115行,使用map()迭代每个项目。 您可以在此处阅读有关Array.prototype.map()的更多信息

The map method takes a method createRow which takes in an object through each iteration and returns a string which has content like this:

map方法采用createRow方法,该方法在每次迭代中都包含一个对象,并返回一个具有以下内容的字符串:

"<tr>
  <td>invoice id</td>
  <td>invoice name</td>
  <td>invoice price</td>
  <td>invoice created date</td>
  <td>invoice due date</td>
  <td>invoice address</td>
  <td>invoice sender company name</td>
</tr>"

const row = data.map(createdRow).join('');

const row = data.map(createdRow).join('');

The join('') part is important here, because I want to concatenate all of my array into a string.

join('')部分在这里很重要,因为我想将所有数组连接成一个字符串。

An almost similar principle is used for generating a table on line 117 & then the html table on line 119.

几乎类似的原理用于在第117行生成一个表,然后在第119 生成html表

步骤6: (Step 6:)

The important part is where we write to our file on line 121:

重要的部分是我们在第121行写入文件的位置

fs.writeFileSync(buildPathHtml, html);

It takes in 2 parameters: one is the build path (string) and the html content (string) and generates a file (if not created; and if it is created, it overwrites the already existing file).

它有两个参数:一个是构建路径(字符串)和html内容(字符串),并生成一个文件(如果未创建;如果创建,它将覆盖现有文件)。

One thing to note here we might not need Step 4, where we check if the file exists & if it does then delete it. This is because writeFileSync does that for us. I just added that in the code for learning purposes.
这里要注意的一件事,我们可能不需要步骤4,在该步骤中,我们检查文件是否存在以及是否存在,然后将其删除。 这是因为writeFileSync为我们做到了。 我只是在代码中添加了该代码以供学习。
步骤7: (Step 7:)

In your terminal, go in the folder path where you have the createTable.js and type

在终端中,进入具有createTable.js的文件夹路径,然后键入

$ npm run ./createTable.js

As soon as you run this script, it will create a new file in the same folder called build.html You can open that file in your browser and it will look something like this.

一旦运行此脚本,它将在相同的名为build.html文件夹中创建一个新文件。您可以在浏览器中打开该文件,看起来像这样。

Cool right? So far so good. ?

酷吧? 到目前为止,一切都很好。

Also you can add an npm script in your package.json like this:

您也可以在package.json中添加一个npm script ,如下所示:

"scripts": {
  "build:table": "node ./createTable.js"
},

This way instead of writing npm run ./createTable.js, you can just type in npm run build:table.

这样,您npm run ./createTable.js输入npm run ./createTable.js ,而只需输入npm run build:table

Next up: generating a PDF from the generated HTML file.

下一步:从生成的HTML文件生成PDF。

步骤8: (Step 8:)

First things first we need to install a fancy tool, so go in your terminal in your application folder and type in

首先,我们需要安装一个精美的工具,因此请在终端中的应用程序文件夹中输入

npm install puppeteer
步骤9: (Step 9:)

In the same folder where you have files createTable.js , buildPaths.js & data.json, create a new file called createPdf.js and add content to it like below:

在你的文件相同的文件夹createTable.jsbuildPaths.jsdata.json ,创建一个名为新文件createPdf.js并添加内容到它象下面这样:

As we did with createTable.js script, let’s break this down into chunks and start understanding this script step by step.

正如我们对createTable.js脚本所做的那样,让我们​​将其分解为大块,并逐步了解该脚本。

Let’s start with line 40: here we call a method init() which calls the method on line 30. One thing to focus on is that our init() method is an async method. Read more on this async function.

让我们从第40行开始在这里,我们调用方法init() ,该方法在第30行调用该方法 之一 需要重点注意的是,我们的init()方法是一个异步方法。 阅读更多关于此异步功能的信息

First in the init() method we call printPdf() method which is again an async method, so we have to wait for its response. The printPdf() method returns us a PDF instance which we then write to a file on line 33.

首先,在init()方法中,我们调用printPdf()方法,这又是一个异步方法,因此我们必须等待其响应。 printPdf()方法返回一个PDF实例,然后将其写入第33行的文件中

So what does the printPdf() method do? Let’s dig deep in it.

那么printPdf()方法的作用是什么? 让我们深入研究。

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(buildPathHtml, { waitUntil: 'networkidle0' });
const pdf = await page.pdf({
  format: 'A4',
  margin: {
   top: '20px', right: '20px', bottom: '20px', left: '20px'}
});
await browser.close();
return pdf;

We first launch a headless browser instance using puppeteer by doing the following:

我们首先通过执行以下操作使用puppeteer启动无头浏览器实例:

await puppeteer.launch(); // this returns us headless browser

which we then use to open a web page:

然后我们用来打开网页:

await browser.newPage(); // open a blank page in headless browser

Once we have a blank page open we can navigate to a page. Since our web page is locally in our system, we simply

打开空白页面后,我们可以导航到页面。 由于我们的网页位于系统的本地,因此我们只需

page.goto(buildPathHtml, { waitUntil: 'networkidle0' });

Here waitUntil: 'networkidle0; is important, because it tells puppeteer to wait for 500/ms until there are no more network connections.

这里waitUntil: 'networkidle0; 之所以重要,是因为它告诉伪造者等待500 / ms,直到不再有网络连接为止。

Note: This is why we used path.resolve() to get absolute paths, because in order to open the web page with puppeteer, we need an absolute path.

注意:这就是为什么我们使用path.resolve()来获取绝对路径的原因,因为为了使用puppeteer打开网页,我们需要一个绝对路径。

After we have a web page opened in the headless browser on the server, we save that page as a pdf:

在服务器的无头浏览器中打开网页后,我们将该网页另存为pdf:

await page.pdf({ });

As soon as we have a pdf version of the web page, we need to close the browser instance opened by puppeteer to save resources by doing this:

有了pdf版本的网页后,我们需要关闭puppeteer打开的浏览器实例,以通过执行以下操作节省资源:

await browser.close();

& then we return the pdf saved, which we then write to the file.

&然后返回保存的pdf ,然后将其写入文件。

步骤10: (Step 10:)

In your terminal type

在您的终端中

$ npm ./createPdf.js

Note: Before running the above script, ensure that you the build.html file generated by createTable.js script. This ensures we always have the build.html prior to running the createPdf.js script. In your package,json do the following.

注意:在运行上述脚本之前,请确保您由createTable.js脚本生成了build.html文件。 这将确保我们总是有build.html之前运行createPdf.js脚本。 在您的package,json执行以下操作。

"scripts": {
  "build:table": "node ./createTable.js",
  "prebuild:pdf": "npm run build:table",
  "build:pdf": "node ./createPdf.js"
},

Now if you run $ npm run build:pdf it will execute the createTable.js script first and then createPdf.js script. You can read more on NPM scripts on their official docs.

现在,如果您运行$ npm run build:pdf ,它将首先执行createTable.js脚本,然后执行createPdf.js脚本。 您可以在其官方文档中阅读有关NPM脚本的更多信息。

When you run

当你跑步

$ npm run build:pdf

It will run and create a build.pdf which will look like this:

它将运行并创建一个build.pdf ,看起来像这样:

And that is it, we are done.

就是这样,我们完成了。

You have learned the following:

您已了解以下内容:

  • How to check if a file exists / tet file information (in Node)

    如何检查文件是否存在/ tet文件信息(在Node中)
  • How to delete a file in Node

    如何在Node中删除文件
  • How to write to a file

    如何写文件
  • How to use Google Puppeteer to generate a PDF file

    如何使用Google Puppeteer生成PDF文件

Happy learning, I would love to hear your thoughts on this article. You can reach me on twitter as well.

祝您学习愉快,我很想听听您对本文的看法。 您可以在Twitter上与我联系 也一样

翻译自: https://www.freecodecamp.org/news/how-to-generate-an-html-table-and-a-pdf-with-node-google-puppeteer-32f94d9e39f6/

puppeteer pdf

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值