HtmlUnit 是Java实现的headless/GUI-less browser,可以用于自动测试和爬虫。
支持Javascript自动解析和css,用起来很方便。
下面是一个自动登录的例子,其中使用了fiddler的代理,用于debug:
String loginUrl = "http://mmcloud.com/login/sso/login";
try ( WebClient webClient = new WebClient(BrowserVersion.CHROME, "127.0.0.1", 8888)) {
webClient.waitForBackgroundJavaScript(60000);
HtmlPage page = webClient.getPage(loginUrl);
//获取右上角的小图标,注意所有的class都要列出来
HtmlElement div = page.getFirstByXPath("//div[@class='changeMethod changeMethod1']");
if( div == null || ! (div instanceof HtmlDivision)) {
System.out.println("div[@class='changeMethod changeMethod1'] not found");
return;
}
//点击小图标,去用户名/密码登录界面
HtmlPage page1 = div.click();
//找到登录form
HtmlForm form = page1.getHtmlElementById("fm1");
HtmlInput name = form.getInputByName("view_username");
HtmlInput password = form.getInputByName("password");
HtmlInput submit = form.getInputByName("submit_btn");
//输入用户名和密码
name.type("usernamexxxx");
password.type("passwordyyyyy");
//点击"登录"按钮
HtmlPage page2 = submit.click();
page2.getTitleText();
page2.getWebResponse().getStatusCode();
}
pom文件需要引入依赖:
<dependency>
<groupId>net.sourceforge.htmlunit</groupId>
<artifactId>htmlunit</artifactId>
<version>2.55.0</version>
</dependency>
控制台会报告一些css错误,不需要特别处理,比如:
Dec 01, 2021 2:27:12 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://mmcloud.com/statics/js/layer1.9.1/skin/layer.css' [7:1134] Error in declaration. '*' is not allowed as first char of a property.
传说中的HtmlUnit兼容性的问题,我没有碰到;如果碰到了再尝试Chrome headless browser.