phantomjs

最新推荐文章于 2021-11-04 16:18:32 发布

starwmx520

最新推荐文章于 2021-11-04 16:18:32 发布

阅读量508

点赞数

分类专栏： NodeJs

本文链接：https://blog.csdn.net/starwmx520/article/details/52292017

版权

NodeJs 专栏收录该内容

25 篇文章 0 订阅

订阅专栏

在网上一搜一大堆，我也没有多深入研究

我也只是会创建page

var page = require('webpage').create();

设置属性

page.paperSize = {
width: '1000px',
height: '700px',
margin: {
top: '0px',
left: '0px'
}
};

page.open(url, function(status) {
  //console.log("Status: " + status);
  if(status === "success") {
    //console.log(page.content); //输出页面内容
    //在页面里执行函数，并返回结果。
    var txt=page.evaluate(function(){
    	return document.querySelector('body').innerHTML
    })
    console.log(txt);
  }else{
    //失败的输出
    console.log('fail:',url)
  }
  phantom.exit();
});

这些简单的

但它是有一些有用的功能

1、system

类似于node 执行命令时输入的参数

node里是用process.argv 来接收所有参数。

node XX.js XX 这就是argv的顺序。

system用法 :

var system=require('system');
//args 为 运行的文件和其后的配置
//它不会有phantom 配置参数
var args=system.args;
//console.log(args);
var url=args[1];

phantomjs 执行时可能会有--Xx这些配置，args不会有这些。

xx.js 为下标0。这个功能可以用来做什么？

可以用process_child 创建子进程，执行phantoms代码，传递参数用。

2、对ssl 也就是https的加载

对于s的是抓取不到的需要在运行时加上参数：

phantomjs --ignore-ssl-errors=true --ssl-protocol=any phantom_giglio.js

执行时设置url

phantomjs --ignore-ssl-errors=true --ssl-protocol=any phantom_child.js 'url'

最好用双引号哦。

3、process_child 调用,传递加载的url

var spawn = require('child_process').spawn;
	var result=url+'\n';	
	var ls = spawn('phantomjs', ['--ignore-ssl-errors=true','--ssl-protocol=any','phantom_child.js', url]);
	ls.stdout.setEncoding('utf8');
	//当数据返回量大时会分批传递过来
	var body='';
	ls.stdout.on('data', function(data) {
	    body+=data;  

	});
	ls.stderr.on('data', function (data) {
	    //console.error("error ",url, data);	    
	}); 
	// 注册子进程关闭事件 
	ls.on('exit', function (code, signal) { 
		console.log('phantom_child exit code=' + code); 
		//处理数据
	});

基础就是进程调用phantomjs 指定--参数和调用js 还有js需要的参数url

js 里回传用console.log就可以了。

在关闭事件内判断body接收的内容，用cheerio创建

用它来处理ajax的详情页，对于列表url的抓取用request jddom 都可以。

完整的是把phantom.js写在模块，列表里传递url，回调函数内返回过滤的数据。

保存到文件内。