爬取的网站链接:https://you.ctrip.com/members/htlask/qa
如下图:
实现步骤:
1,将HtmlUnit依赖包引入pom文件
<dependency>
<groupId>net.sourceforge.htmlunit</groupId>
<artifactId>htmlunit</artifactId>
<version>2.32</version>
</dependency>
2,定义客户端并配置参数
WebClient webClient = new WebClient();
webClient.getOptions().setUseInsecureSSL(true);
//配置javaScript功能
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
//配置ajax加载功能
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.waitForB