HtmlUnit 获取登录后的页面信息失败

最新推荐文章于 2024-08-01 09:41:00 发布

lms008001

最新推荐文章于 2024-08-01 09:41:00 发布

阅读量1.4k

点赞数

文章标签：网络爬虫数据

本文链接：https://blog.csdn.net/lms008001/article/details/78055121

版权

最近在学习网络爬虫抓取数据，运用HtmlUnit, 可以获取到动态加载后的数据。但是有些网站需要先登录，后获取登录后的数据就出现问题。

public static void TianyaTestByHtmlUnit() {

try {
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);

// The ScriptException is raised because you have a syntactical
// error in your javascript.
// Most browsers manage to interpret the JS even with some kind of
// errors
// but HtmlUnit is a bit inflexible in that sense.
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(false);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setRedirectEnabled(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.setJavaScriptTimeout(20000);
webClient.waitForBackgroundJavaScript(10000);
webClient.getOptions().setRedirectEnabled(true);
webClient.getCookieManager().setCookiesEnabled(true);

// get the url
HtmlPage page = webClient.getPage("http://passport.tianya.cn/login.jsp");
System.out.println("Orgin page data =" + page.asXml());

HtmlTextInput username = (HtmlTextInput) page.getElementById("userName");
username.type("lms_test_****");
HtmlPasswordInput password = (HtmlPasswordInput) page.getElementById("password");
password.click();
password.type("liu*****");
//HtmlAnchor submit = page.getAnchorByName("loginBtn");
HtmlButton submit = (HtmlButton) page.getElementById("loginBtn");
webClient.waitForBackgroundJavaScript(4000);
HtmlPage nextPage =(HtmlPage) submit.click();


// Wait js load the data
webClient.waitForBackgroundJavaScript(10000);

Thread.sleep(20000);
System.out.println("After click login button =" + nextPage.asXml());

Set<Cookie> cookies = webClient.getCookieManager().getCookies();;
Map<String, String> responseCookies = new HashMap<String, String>();
for (Cookie c : cookies) {
responseCookies.put(c.getName(), c.getValue());
System.out.println("cookie name --" + c.getName()+" value:"+c.getValue());
}

webClient.close();
} catch (Exception e) {

e.printStackTrace();
}
}