php 模仿蜘蛛抓取内容并分析

最新推荐文章于 2023-09-13 14:21:31 发布

路口华丽的转身

最新推荐文章于 2023-09-13 14:21:31 发布

阅读量893

点赞数

分类专栏： PHP技术

本文链接：https://blog.csdn.net/yangbbenyang/article/details/38417089

版权

PHP技术专栏收录该内容

33 篇文章 0 订阅

订阅专栏

这是一款模仿baidu,google抓取你网页时的样子哦，下面就是代码看看吧。

header("Content-Type:text/html;charset=gbk");
$message=$_POST['message'];
$contents = @file_get_contents("$message");
if($contents=="Forbidden"){
$ch = curl_init();
$timeout = 5;
curl_setopt ($ch, CURLOPT_URL, "$message");
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)");
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$contents = curl_exec($ch);
curl_close($ch);
}
preg_match_all("/<title(.*?)</title>/is",$contents,$title);
preg_match_all("/<meta(.*?)>/is",$contents,$meta);
preg_match_all("/<body(.*?)body>/is",$contents,$body);
echo 'title：'.strip_tags($title[0][0]).' ';
for($i=0;$i<count($meta[0]);$i++){
if(preg_match("/keywords/i",$meta[0][$i])){
preg_match_all("/content="(.*?)"/is",$meta[0][$i],$keywords);

}
if(preg_match("/description/i",$meta[0][$i])){
preg_match_all("/content="(.*?)"/is",$meta[0][$i],$description);

}
}
echo 'keywords：'.strip_tags($keywords[1][0]).' ';
echo 'description：'.strip_tags($description[1][0]).' ';
echo 'body：'.strip_tags($body[0][0]);
?>

更多详细内容请查看：http://www.111cn.net/phper/18/67a3af30619696432294fd5c2731f13f.htm

路口华丽的转身

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
php 模仿蜘蛛抓取内容并分析

这是一款模仿baidu,google抓取你网页时的样子哦，下面就是代码看看吧。header("Content-Type:text/html;charset=gbk");$message=$_POST['message'];$contents = @file_get_contents("$message");if($contents=="Forbidden"){ $ch = curl_in
复制链接

扫一扫