README:一个能向指定邮箱推送所关心新闻内容的插件,测试环境:LAMP+Chrome/Firefox,分下面几个步骤实现:
1.获取目标网站源代码:
实现方法:PHP的curl类
ubuntu下的安装方法:
#sudo apt-get install curl libcurl3 libcurl3-dev php5-curl
然户重启Apache服务:
#sudo /etc/init.d/apache2 restart
function GetHtmlCode($url){
$ch = curl_init();//初始化一个cur对象
curl_setopt ($ch, CURLOPT_URL, $url);//设置需要抓取的网页
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);//设置crul参数,要求结果保存到字符串中还是输出到屏幕上
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT,1000);//设置链接延迟
$HtmlCode = curl_exec($ch);//运行curl,请求网页
return $HtmlCode;
}
2.用正则表达式截取出源代码中的所有链接:
//参数说明:$string=GetHtmlCode($url)
function GetAllLink($string) {
$string = str_replace("\r","",$string);
$string = str_replace("\n","",$string);
$regex[url] = "((http|https|ftp|telnet|news):\/\/)?([a-z0-9_\-\/\.]+\.[][a-z0-9:;&#@=_~%\?\/\.\,\+\-]+)";
$regex[email] = "([a-z0-9_\-]+)@([a-z0-9_\-]+\.[a-z0-9\-\._\-]+