PHP爬取百度所有省市信息,PHP批量抓取百度搜索结果 | 甄选网

富川福利

于 2021-03-09 19:45:43 发布

阅读量261

点赞数

文章标签： PHP爬取百度所有省市信息

使用php命令脚本批量抓取百度搜索url

用法 php.exe 1.php “关键词” “抓取页数”

结果将保存在同目录下baidu.txt 如没有这个文件请手动创建

PHP

error_reporting(0);

@$keyword = $argv[1];

@$zpage = $argv[2];

if((!$keyword) or (!$zpage)){

die(‘Require keyword and page’);

}

$keyword = urlencode($keyword);

for($p=0;$p

$url = ‘http://www.baidu.com/s?wd=’.$keyword.’&pn=’.$p.’0&oq=1&tn=baiduhome_pg&ie=utf-8&rsv_idx=2&rsv_pq=8292c42600001067&rsv_t=5e14MUzgVAGXxjHEqvWPfyBfPeJioaXg83h6Bm5Nlfi4ScTL4Qg1IKNLNtIEbbmFKHyl&f=8&rsv_bp=1&rsv_spt=1′;

$txt = file_get_contents($url);

preg_match_all(‘/(data)(-)(tools)(=)(\’)(\{)(“title”)(:)(“)((.+))(“)(,)(“url”)(:)(“)((.+))(“)(\})(\’)/i’,$txt,$matches);

for($i=0;$i

$json = str_replace(‘data-tools=’,”,$matches[0][$i]);

$json = str_replace(‘\”,”,$json);

$data = json_decode($json,true);

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $data[‘url’]);

curl_setopt($ch, CURLOPT_HEADER, true);

curl_setopt($ch, CURLOPT_NOBODY,true);

curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);

curl_setopt($ch, CURLOPT_AUTOREFERER,true);

curl_setopt($ch, CURLOPT_TIMEOUT,5);

curl_setopt($ch, CURLOPT_HTTPHEADER, array(

‘Accept: */*’,

‘User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)’,

‘Connection: Keep-Alive’));

$header = curl_exec($ch);

curl_close($ch);

$header = explode(“\n”,$header);

$num = array_find($header,’Location:’);

$header = explode(‘: ‘,$header[$num]);

$link = trim($header[1]);

//if(stristr($link,’baidu.com’)){

//continue;

//}

file_put_contents(‘baidu.txt’,$data[‘title’].”\r\n”.$link.”\r\n”,FILE_APPEND);

$a = $i+1;

echo $a.’ Complete!’.”\n”;

}

$b = $p+1;

echo ‘Page’.$b.’Complete!’.”\n”;

}

echo ‘Complete!’.”\n”;

function array_find($array,$word){

foreach($array AS $num => $key){

if(strpos($key,$word) !== false){

return $num;

break;

}

}

}

?>

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。