从新浪网站抓取页面,周末连续抓了两天,结果被发现.IP被封.
只好研究了一下代理的方法.成功.
方法如下:
private $proxy_host='119.70.40.101';
private $proxy_port='8080';
//对获得内容进行处理,去掉头部信息
private function trimHeader($content){
$array=split("\n\r",$content,"2");
return trim($array["1"]);
}
//使用代理抓取
private function http_fopen($url){
$conn_host = $this->proxy_host;
$conn_port = $this->proxy_port;
$abs_url = $url;
$query = "GET $abs_url HTTP/1.0\r\n".
"User-agent:PHP/class http 0.1\r\n".
"\r\n";
$fp = fsockopen($conn_host,$conn_port);
if(!$fp){
return false;
}else{
fputs($fp,$
只好研究了一下代理的方法.成功.
方法如下:
private $proxy_host='119.70.40.101';
private $proxy_port='8080';
//对获得内容进行处理,去掉头部信息
private function trimHeader($content){
$array=split("\n\r",$content,"2");
return trim($array["1"]);
}
//使用代理抓取
private function http_fopen($url){
$conn_host = $this->proxy_host;
$conn_port = $this->proxy_port;
$abs_url = $url;
$query = "GET $abs_url HTTP/1.0\r\n".
"User-agent:PHP/class http 0.1\r\n".
"\r\n";
$fp = fsockopen($conn_host,$conn_port);
if(!$fp){
return false;
}else{
fputs($fp,$