PHP爬取百度所有省市信息,PHP批量抓取百度搜索结果 | 甄选网

使用php命令脚本批量抓取百度搜索url

用法 php.exe 1.php “关键词” “抓取页数”

结果将保存在同目录下baidu.txt 如没有这个文件请手动创建

PHP

error_reporting(0);

@$keyword = $argv[1];

@$zpage = $argv[2];

if((!$keyword) or (!$zpage)){

die(‘Require keyword and page’);

}

$keyword = urlencode($keyword);

for($p=0;$p

$url = ‘http://www.baidu.com/s?wd=’.$keyword.’&pn=’.$p.’0&oq=1&tn=baiduhome_pg&ie=utf-8&rsv_idx=2&rsv_pq=8292c42600001067&rsv_t=5e14MUzgVAGXxjHEqvWPfyBfPeJioaXg83h6Bm5Nlfi4ScTL4Qg1IKNLNtIEbbmFKHyl&f=8&rsv_bp=1&rsv_spt=1′;

$txt = file_get_contents($url);

preg_match_all(‘/(data)(-)(tools)(=)(\’)(\{)(“title”)(:)(“)((.+))(“)(,)(“url”)(:)(“)((.+))(“)(\})(\’)/i’,$txt,$matches);

for($i=0;$i

$json = str_replace(‘data-tools=’,”,$matches[0][$i]);

$json = str_replace(‘\”,”,$json);

$data = json_decode($json,true);

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $data[‘url’]);

curl_setopt($ch, CURLOPT_HEADER, true);

curl_setopt($ch, CURLOPT_NOBODY,true);

curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);

curl_setopt($ch, CURLOPT_AUTOREFERER,true);

curl_setopt($ch, CURLOPT_TIMEOUT,5);

curl_setopt($ch, CURLOPT_HTTPHEADER, array(

‘Accept: */*’,

‘User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)’,

‘Connection: Keep-Alive’));

$header = curl_exec($ch);

curl_close($ch);

$header = explode(“\n”,$header);

$num = array_find($header,’Location:’);

$header = explode(‘: ‘,$header[$num]);

$link = trim($header[1]);

//if(stristr($link,’baidu.com’)){

//continue;

//}

file_put_contents(‘baidu.txt’,$data[‘title’].”\r\n”.$link.”\r\n”,FILE_APPEND);

$a = $i+1;

echo $a.’ Complete!’.”\n”;

}

$b = $p+1;

echo ‘Page’.$b.’Complete!’.”\n”;

}

echo ‘Complete!’.”\n”;

function array_find($array,$word){

foreach($array AS $num => $key){

if(strpos($key,$word) !== false){

return $num;

break;

}

}

}

?>

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值