<HTML> <HEAD> <TITLE> *** Place Title Here *** </TITLE> <script language=vbscript>... Function bytes2BSTR(arrBytes) strReturn ="" arrBytes = CStr(arrBytes) For i =1 To LenB(arrBytes) ThisCharCode = AscB(MidB(arrBytes, i, 1)) If ThisCharCode <&H80 Then strReturn = strReturn & Chr(ThisCharCode) Else NextCharCode = AscB(MidB(arrBytes, i+1, 1)) strReturn = strReturn & Chr(CLng(ThisCharCode) *&H100 + CInt(NextCharCode)) i = i +1 End If Next bytes2BSTR = strReturn End Function 'Dim objXMLHTTP, xml 'Set xml = CreateObject("Microsoft.XMLHTTP") 'xml.Open "GET", "http://www.chinadaily.com.cn/language_tips/index.html", False 'xml.Send 'document.Write bytes2BSTR(xml.responseBody) 'Set xml = Nothing </script> <SCRIPT LANGUAGE = JavaScript>... function getHtml() ...{ var objXMLHTTP =new ActiveXObject("Microsoft.XMLHTTP"); //建立XMLHTTP对象 objXMLHTTP.open("GET","http://www.chinadaily.com.cn/language_tips/index.html",false); objXMLHTTP.Send(null); if(objXMLHTTP.readyState ==4&& objXMLHTTP.Status ==200) ...{ var HTML=objXMLHTTP.responseBody;//得到网页内容 var txt=bytes2BSTR(HTML);//解决中文乱码问题 var regx=/<span class="ywzi">([^<a].+?)</span>/g; var t=regx.exec(txt);//取出想要的内容 var t1=t[0].split('.') document.write(t1[0]+".<br>"+t1[1]); } } </SCRIPT> </HEAD> <BODY BGCOLOR="white"> <SCRIPT LANGUAGE = JavaScript> getHtml(); </SCRIPT> </BODY> </HTML>
所谓网页小偷程序,其实就是网页部分内容的抓取器,通过了XMLHTTP组件调用其它网站上的网页,通过过虑网页内容,来得到自己所需信息,例如获取新闻内容,获取网站用户信息等等. 下面将通过一段javascript脚本,获取Chinadaily英语点津页面上的每日一句的内容,将下面的内容保存为html文件,在浏览器上打开.HTML>HEAD>TITLE> *** P