html转bvl转换器,pull content from an external website generated by javascript

问题

I know how to pull html content from an external website with php and parse it, but the problem is that the content I want to extract is generated by a javascript function.

The code looks like this:

getCotizaciones("cotizaciones_busca.dat");

I would like to extract all the content generated by that function.

This is the webpage from where I'm trying to pull the content: http://www.bvl.com.pe/neg_rv_alfa.html#

I tried this, but it's not working:

$html = new DOMDocument();

$html->loadHtmlFile('http://www.bvl.com.pe/neg_rv_alfa.html#');

$xpath = new DOMXPath($html);

$nodelist = $xpath->query('//*[@id="div"]/div[4]');

echo $output = $nodelist->item(0)->nodeValue;

// and this is the output I get: getCotizaciones("cotizaciones_busca.dat");

回答1:

Unfortunately you cannot execute JavaScript code using DOM or any other PHP function that loads external sources e.g. get_file, curl, ect. You need JavaScript compiler, or a programming language needs a plugin to compile JavScript (e.g. WebKit on C++) PHP doesn't have that support.

However, what you can do is to see how the data is generated in a browser and how it displays that data. I did that for you and find out that grid is generated by making a request to different URL. So, instead of calling 'http://www.bvl.com.pe/neg_rv_alfa.html#' which calls the JavaScript function getCotizaciones("cotizaciones_busca.dat"); which in turn calls this URL using ajax.

http://www.bvl.com.pe/includes/cotizaciones_busca.dat

this url is the data you need and you an load it via DOM or whatever>

Protip: Use firebug or whatever dev tool console for browser of your choice. whenever you see ajax request, see what it does, where does it make a request, and what are parameters. Check the source of the js file where function is stored. See what it does. In your instance http://www.bvl.com.pe/js/cabecera_pie.js and you'll see its calling an ajax request depending on what user has clicked. replicate that in phpb before domload , etc

回答2:

I don't think it's possible to do using only PHP.

But you can run browser in a separate process, have it load the page and execute javascript, then you can grab the results.

That's pretty easy to do using PhantomJS - http://phantomjs.org/.

You will have to prepare javascript file that will load the page, simulate user input if necessary, peek info the DOM and save the results somewhere using PhantomJS file api, then load results in PHP. You can start from examples - take a look at https://github.com/ariya/phantomjs/blob/master/examples/pizza.js

来源:https://stackoverflow.com/questions/13656181/pull-content-from-an-external-website-generated-by-javascript

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值