简介
- Puppeteer是 GoogleChrome 团队官方的无界面(Headless)Chrome工具
- 它是一个NodeJs库
- 提供了高级的 API 来控制基于Chrome Debug Protocols(简称 CDP)协议上的Headless Chrome
- 可以用于配置控制正常的Chrome
- 本文主要添加一些使用示例
安装
安装chrome
不再详细描述,下载安装即可。启动headless,参考链接
chrome \
--headless \ # Runs Chrome in headless mode.
--disable-gpu \ # Temporarily needed if running on Windows.
--remote-debugging-port=9222 \
https://www.chromestatus.com # URL to open. Defaults to about:blank.
安装Puppeteer
npm安装,详情参考链接
$ npm i --save puppeteer
示例代码
网页源码
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.toutiao.com', { waitUntil: 'networkidle2' });
const pageContent = await page.content()
console.log(pageContent)
await page.close()
await page.deleteCookie()
await browser.close();
})();
网页截图
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
page.setViewport({ width: 1920, height: 2160})
await page.goto('https://www.toutiao.com', { waitUntil: 'networkidle2' });
await page.screenshot({
path:'toutiao.png'
})
await page.close()
await page.deleteCookie()
await browser.close();
})();
代理及认证
- 参考文章
const puppeteer = require('puppeteer');
(async () => {
const proxyServer = 'http://proxy_ip:proxy_port'
const browser = await puppeteer.launch({
headless:false,
args: [`--proxy-server=${proxyServer}`]
});
const page = await browser.newPage();
await page.authenticate({ username: 'username', password: 'password' })
await page.goto('https://www.toutiao.com/', { waitUntil: 'networkidle2' });
const pageContent = await page.content()
console.log(pageContent)
await page.close()
await page.deleteCookie()
await browser.close();
})();
设置代理
正常代理
const puppeteer = require('puppeteer');
(async () => {
const proxyServer = 'http://proxy_ip:proxy_port'
const browser = await puppeteer.launch({
headless: false,
args: [`--proxy-server=${proxyServer}`]
});
})();
含有账密的代理
参考[示例代码 - 代理及认证]
PAC代理
const puppeteer = require('puppeteer');
(async () => {
const pacProxyFile = 'file location'
const browser = await puppeteer.launch({
headless: false,
args: [`google-chrome --proxy-pac-url=${pacProxyFile}`]
});
const page = await browser.newPage();
await page.authenticate({ username: 'username', password: 'password' })
})();
相关链接
Chrome Debug Protocol
Headless 入门
Puppeteer
Puppeteer中文
相关教程