对于初学者肯定会什么网站都爬,然后就会发现有些网站比如CSDN就会返回403错误。原因是浏览器和java程序的请求是不太一样的,我们伪装成浏览器的行为就可以了。
import java.io.IOException;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.methods.PostMethod;
import org.apache.commons.httpclient.params.HttpMethodParams;
public class PostMethodTest {
public static void main(String[] args) {
Test();
}
public static void Test() {
HttpClient httpClient = new HttpClient();
PostMethod postMethod = new PostMethod("http://blog.csdn.net/hjgzj");
try {
httpClient.getParams().setParameter(HttpMethodParams.USER_AGENT, "Mozilla/5.0 (X11; U; Linux i686; zh-CN; rv:1.9.1.2) Gecko/20090803");
int statusCode = httpClient.executeMethod(postMethod);
System.out.println(statusCode);
System.out.println(postMethod.getResponseBodyAsString());
} catch (HttpException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
嗯,我们就是比之前的代码多加了一行httpClient.getParams().setParameter(HttpMethodParams.USER_AGENT, "Mozilla/5.0 (X11; U; Linux i686; zh-CN; rv:1.9.1.2) Gecko/20090803"); 可以解决了。
如果还有什么网站报403,欢迎留言,我会研究一下。