问题
I know how to pull html content from an external website with php and parse it, but the problem is that the content I want to extract is generated by a javascript function.
The code looks like this:
getCotizaciones("cotizaciones_busca.dat");
I would like to extract all the content generated by that function.
This is the webpage from where I'm trying to pull the content: http://www.bvl.com.pe/neg_rv_alfa.html#
I tried this, but it's not working:
$html = new DOMDocument();
$html->loadHtmlFile('http://www.bvl.com.pe/neg_rv_alfa.html#');
$xpath = new DOMXPath($html);
$nodelist = $xpath->query('//*[@id="div"]/div[4]');
echo $output = $nodelist->item(0)->nodeValue;
// and this is the output I get: getCotizaciones("cotizaciones_busca.dat");
回答1:
Unfortunately you cannot execute JavaScript code using DOM or any other PHP function that loads external sources e.g. get_file, curl, ect. You need JavaScript compiler, or a programming language needs a plugin to compile JavScript (e.g. WebKit on C++) PHP doesn't have that support.
However, what you can do is to see how the data is generated in a browser and how it displays that data. I did that for you and find out that grid is generated by making a request to different URL. So, instead of calling 'http://www.bvl.com.pe/neg_rv_alfa.html#' which calls the JavaScript function getCotizaciones("cotizaciones_busca.dat"); which in turn calls this URL using ajax.
http://www.bvl.com.pe/includes/cotizaciones_busca.dat
this url is the data you need and you an load it via DOM or whatever>
Protip: Use firebug or whatever dev tool console for browser of your choice. whenever you see ajax request, see what it does, where does it make a request, and what are parameters. Check the source of the js file where function is stored. See what it does. In your instance http://www.bvl.com.pe/js/cabecera_pie.js and you'll see its calling an ajax request depending on what user has clicked. replicate that in phpb before domload , etc
回答2:
I don't think it's possible to do using only PHP.
But you can run browser in a separate process, have it load the page and execute javascript, then you can grab the results.
That's pretty easy to do using PhantomJS - http://phantomjs.org/.
You will have to prepare javascript file that will load the page, simulate user input if necessary, peek info the DOM and save the results somewhere using PhantomJS file api, then load results in PHP. You can start from examples - take a look at https://github.com/ariya/phantomjs/blob/master/examples/pizza.js
来源:https://stackoverflow.com/questions/13656181/pull-content-from-an-external-website-generated-by-javascript