php抓取网站链接,抓取网站,获取链接,使用PHP和XPATH抓取链接

通过深度递归找到网站链接

$depth = 1;

print_r(getList($depth));

function getList($depth)

{

$lists = getDepth($depth);

return $lists;

}

function getUrl($request_url)

{

$countValid = 0;

$brokenCount =0;

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $request_url);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // We want to get the respone

$result = curl_exec($ch);

$regex = '|

preg_match_all($regex, $result, $parts);

$links = $parts[1];

$lists = array();

foreach ($links as $link)

{

$url = htmlentities($link);

$result =getFlag($url);

if($result == true)

{

$UrlLists["clean"][$countValid] =$url;

$countValid++;

}

else

{

$UrlLists["broken"][$brokenCount]= "broken->".$url;

$brokenCount++;

}

}

curl_close($ch);

return $UrlLists;

}

function ZeroDepth($list)

{

$request_url = $list;

$listss["0"]["0"] = getUrl($request_url);

$lists["0"]["0"]["clean"] = array_unique($listss["0"]["0"]["clean"]);

$lists["0"]["0"]["broken"] = array_unique($listss["0"]["0"]["broken"]);

return $lists;

}

function getDepth($depth)

{

// $list =OW_URL_HOME;

$list = "https://example.com";//enter the url of website

$lists =ZeroDepth($list);

for($i=1;$i<=$depth;$i++)

{

$l= $i;

$l= $l-1;

$depthArray=1;

foreach($lists[$l][$l]["clean"] as $depthUrl)

{

$request_url = $depthUrl;

$lists[$i][$depthArray]["requst_url"]=$request_url;

$lists[$i][$depthArray] = getUrl($request_url);

}

}

return $lists;

}

function getFlag($url)

{

$url_response = array();

$curl = curl_init();

$curl_options = array();

$curl_options[CURLOPT_RETURNTRANSFER] = true;

$curl_options[CURLOPT_URL] = $url;

$curl_options[CURLOPT_NOBODY] = true;

$curl_options[CURLOPT_TIMEOUT] = 60;

curl_setopt_array($curl, $curl_options);

curl_exec($curl);

$status = curl_getinfo($curl, CURLINFO_HTTP_CODE);

if ($status == 200)

{

return true;

}

else

{

return false;

}

curl_close($curl);

}

?>`

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值