mac下使用puppeteer-core demo

Shellphon

已于 2023-06-07 19:32:47 修改

阅读量1.2k

点赞数

分类专栏：烂笔头文章标签： macos chrome 前端 puppeteer

于 2023-06-03 23:27:28 首次发布

本文链接：https://blog.csdn.net/dont27/article/details/130963223

版权

烂笔头专栏收录该内容

12 篇文章 0 订阅

订阅专栏

许久没用过puppeteer了，最近看了篇文章，貌似能解决puppeteer访问一些需要登录的系统时每次都要手动登录的麻烦。

于是回顾了一下。

1. 指定浏览器：

因为本来电脑就有chrome，不想puppeteer装多一个chrome，于是只安装了puppeteer-core，那就需要在launch的时候，传入executablePath来指定chrome浏览器。一般装的位置：/Applications/Google Chrome.app/Contents/MacOS/Google Chrome

2. 指定userDataDir

指定userDataDir目录，用于存储用户数据，这步很关键。当你设定了userDataDir, puppeteer会读取该指定目录下的数据。因此，只要跑一次，手动登录一下，下回再跑脚本，就不需要手动登录了。当然，如果目标系统有做一些时效处理的话，可能下回再跑脚本，缓存过期还是得手动登录了。

基本使用代码如下：

import puppeteer from 'puppeteer-core';
import os from 'node:os';
import path from 'node:path';

const sleep = (milliseconds) => new Promise(r => setTimeout(r, milliseconds));

const browser = await puppeteer.launch({
  headless: false,
  executablePath: '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
  defaultViewport: {
      width: 0,
      height: 0
  },
  userDataDir: path.join(os.homedir(), '.puppeteer-data')
});

3. headless: true访问不了问题

在写一个自动签到的脚本，本来headless:false完全自动，都好好的，一改成headless:true就不行，脚本调试发现拿到的html都是空的head和body。

用node写个简单的输出请求头脚本测试了一下：

import http from 'node:http';

const server = http.createServer((req, res) => {
  console.log(`Received ${req.method} request for ${req.url}`);
  console.log(`Headers: ${JSON.stringify(req.headers)}`);

  res.writeHead(200, { 'Content-Type': 'text/plain' });
  res.write('Hello, World!');
  res.end();
});

server.listen(4000, () => {
  console.log('Server listening on port 4000');
});

分别用headless: false和true访问，发现：原来是请求头的问题。

Received GET request for /
Headers: {"host":"127.0.0.1:4000","connection":"keep-alive","sec-ch-ua":"\"HeadlessChrome\";v=\"113\", \"Chromium\";v=\"113\", \"Not-A.Brand\";v=\"24\"","sec-ch-ua-mobile":"?0","sec-ch-ua-platform":"\"macOS\"","upgrade-insecure-requests":"1","user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/113.0.5672.126 Safari/537.36","accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7","sec-fetch-site":"none","sec-fetch-mode":"navigate","sec-fetch-user":"?1","sec-fetch-dest":"document","accept-encoding":"gzip, deflate, br"}

尝试访问同个网站的首页，反倒能取到数据，猜测目标页面专门做了处理，所以，只要脚本设置好header即可。

await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36');

Shellphon

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
mac下使用puppeteer-core demo

因为本来电脑就有chrome，不想puppeteer装多一个chrome，于是只安装了puppeteer-core，那就需要在launch的时候，传入executablePath来指定chrome浏览器。因此，只要跑一次，手动登录一下，下回再跑脚本，就不需要手动登录了。当然，如果目标系统有做一些时效处理的话，可能下回再跑脚本，缓存过期还是得手动登录了。许久没用过puppeteer了，最近看了篇文章，貌似能解决puppeteer访问一些需要登录的系统时每次都要手动登录的麻烦。
复制链接

扫一扫