webpage ajax,web scraping - How do you scrape AJAX pages? - Stack Overflow

Overview:

All screen scraping first requires manual review of the page you want to extract resources from. When dealing with AJAX you usually just need to analyze a bit more than just simply the HTML.

When dealing with AJAX this just means that the value you want is not in the initial HTML document that you requested, but that javascript will be exectued which asks the server for the extra information you want.

You can therefore usually simply analyze the javascript and see which request the javascript makes and just call this URL instead from the start.

Example:

Take this as an example, assume the page you want to scrape from has the following script:

function ajaxFunction()

{

var xmlHttp;

try

{

// Firefox, Opera 8.0+, Safari

xmlHttp=new XMLHttpRequest();

}

catch (e)

{

// Internet Explorer

try

{

xmlHttp=new ActiveXObject("Msxml2.XMLHTTP");

}

catch (e)

{

try

{

xmlHttp=new ActiveXObject("Microsoft.XMLHTTP");

}

catch (e)

{

alert("Your browser does not support AJAX!");

return false;

}

}

}

xmlHttp.onreadystatechange=function()

{

if(xmlHttp.readyState==4)

{

document.myForm.time.value=xmlHttp.responseText;

}

}

xmlHttp.open("GET","time.asp",true);

xmlHttp.send(null);

}

Then all you need to do is instead do an HTTP request to time.asp of the same server instead. Example from w3schools.

Advanced scraping with C++:

For complex usage, and if you're using C++ you could also consider using the firefox javascript engine SpiderMonkey to execute the javascript on a page.

Advanced scraping with Java:

For complex usage, and if you're using Java you could also consider using the firefox javascript engine for Java Rhino

Advanced scraping with .NET:

For complex usage, and if you're using .Net you could also consider using the Microsoft.vsa assembly. Recently replaced with ICodeCompiler/CodeDOM.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值