因为需要,所以有了你,感恩带你来的大神,阿里嘎多~
1、项目地址
HtmlUnit – Welcome to HtmlUnit Or HtmlUnit点击打开链接 :http://sourceforge.net/projects/htmlunit/
2、简介
3、实例刨析
final
WebClient webClient=
new
WebClient();
final
HtmlPage page=webClient.getPage(
"http://www.baidu.com/"
);
System.out.println(page.asText());
webClient.closeAllWindows();
上面的4行代码,运行,就可以得到百度 首页的全部内容,上面代码在运行的过程中会出现很多警告,出现这些警告的主要原因是对JS、CSS支持不是很好,可以通过下面的代码配置JS、CSS是否启用:
webClient.getOptions().setCssEnabled(
false
);
webClient.getOptions().setJavaScriptEnabled(
false
);
4、小小feature入门快
1
2
|
//模拟chorme浏览器,其他浏览器请修改BrowserVersion.后面
WebClient webClient=
new
WebClient(BrowserVersion.CHROME);
|
方法一,通过get方法获取
1
2
3
|
HtmlPage page=webClient.getPage(
"http://www.baidu.com/");
//从百度上获取div标签id=myId内容
HtmlDivision div=(HtmlDivision)page.getElementById(
"myId"
);
|
方法二,通过XPath获取,XPath通常用于无法通过Id搜索,或者需要更为复杂的搜索时使用
1
|
List<?> tableList = page.getByXPath("//table[@id='tracing_by_booking_f:hl55']");
|
3)代理服务器的配置,代理的配置很简单,只需要配置好地址,端口,用户名与密码即可
1
2
3
|
final
DefaultCredentialsProvider credentialsProvider = (DefaultCredentialsProvider) webClient.getCredentialsProvider();
credentialsProvider.addCredentials(
"username"
,
"password"
);
|
1
2
3
4
5
6
7
8
9
10
|
//获取表单
final
HtmlForm form = page.getFormByName(
"form"
);
//获取提交按扭
final
HtmlSubmitInput button = form.getInputByName(
"submit"
);
//一会得输入的
final
HtmlTextInput textField = form.getInputByName(
"userid"
);
textField.setValueAttribute(
"test"
);
//点击提交表单
final
HtmlPage page = button.click();
|
1
2
3
4
|
java.util.List<HtmlAnchor> achList=page.getAnchors();
for
(HtmlAnchor ach:achList){
System.out.println(ach.getHrefAttribute());
}
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
page.getBody().getTextContent()
page.getTitleText(); page.asXml(); page.asText();
//获取input的默认值
htmlInput.getDefaultValue();
//设置文本框的值 htmlInput.setValueAttribute("values"); //模拟点击效果 htmlInput.click();
//模拟输入文本框
htmlInput.
type("用户名");
//获取内嵌iframe
HtmlPage framePage=(HtmlPage)page.getFrameByName("frmTree1").getEnclosedPage();
//获得选择框
HtmlSelect seriesSelect = (HtmlSelect) page.getElementById("countrylst"); //获得所有的选择框内容 List optionList = seriesSelect.getOptions(); //将指定的选项选中 optionList.get(1).setSelected(true); |
1
2
3
4
5
6
7
8
9
10
11
12
13
|
String sinaSetUrl = hostSinaUrl + "basic/setting_account";
request = new WebRequest(new URL(sinaSetUrl),HttpMethod.POST);
request.setCharset("utf-8");
request.setRequestParameters(Arrays.asList( new NameValuePair("nickname", nickname), new NameValuePair("pop3", "on"), new NameValuePair("imap", "on")));
client.getPage(request); |
getResponseHeaders方法的返回值
0:Date=Tue, 07 Jul 2015 13:05:30 GMT
1:X-Powered-By=Servlet/3.0
2:Pragma=no-cache
3:Cache-Control=no-cache, no-store, must-revalidate
4:Expires=Thu, 01 Jan 1970 00:00:00 GMT
5:Vary=Accept-Encoding
6:Keep-Alive=timeout=60, max=185
7:Connection=Keep-Alive
8:Content-Type=text/html;charset=utf-8
9:Content-Language=en
10:Set-Cookie=TS01a3c52a=013b7b09e098d85f84bac23c1d294c3e11048d7ed19229b7ed7e16e91de8749726c9a6bf; Path=/
11:Transfer-Encoding=chunked
9)HtmlPage执行JS
1
2
|
ScriptResult sr = htmlPage.executeJavaScript("javascript:document.getElementById('tracing_by_booking_f').submit();");
HtmlPage resultPage = (HtmlPage) sr.getNewPage();
|