支持动态爬取,如登录、Javascript内容。
一、准备:
1、下载selenium(如:selenium-java-2.37.0.zip,约24MB),chromedriver.exe(chromedriver_win32.zip,约2.4MB)
https://sites.google.com/a/chromium.org/chromedriver/downloads
2、设置driver,参考:
http://download.csdn.net/detail/qianaier/7966945
二、开发:
连接例子
System.setProperty("webdriver.chrome.driver",
"C:/Program Files (x86)/Google/Chrome/Application/chromedriver.exe");
WebDriver driver = new ChromeDriver();
登录例子
driver.get("https://xx.xxx.xx/admin/login.jsp");
driver.findElement(By.id("input_username")).clear();
driver.findElement(By.id("input_username")).sendKeys("username");
driver.findElement(By.id("input_password")).clear();
driver.findElement(By.id("input_password")).sendKeys("password");
// log on
driver.findElement(By.className("btn")).click();
System.out.println(driver.getTitle());
子元素访问例子(By.id/className/tagName)
WebElement sub = driver.findElement(By.id("accordion-element"));
List<WebElement> subList = sub.findElement(By.className("accordion-inner")).
findElement(By.className("nav-list")).findElements(By.tagName("li"));
for (WebElement li:subList) {
WebElement href = li.findElement(By.tagName("a"));
属性、文本内容
if (div1.getAttribute("style").contains("margin-bottom: 10px;")) {
// detail
System.out.println(div1.findElement(By.tagName("div")).getText());
内部页面跳转
WebDriver frameDriver = driver.switchTo().frame("iframe_1");
隐藏元素
-- 不能获取的例子
if (closeButton.getText().equals("关闭")) {
closeButton.click();
}
-- 可以获取的例子
if (closeButton.getAttribute("innerText").equals("关闭")) {
((JavascriptExecutor) frameDriver).executeScript(""
+ "document.getElementsByClassName('modal')[0]."
+ "getElementsByClassName('btn')[0].click()");
}
机器操作太快引起元素找不到
// 加上sleep
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
closeButton.click();
退出
frameDriver.quit();
driver.quit();
Reference
Selenium定位不到元素的解决方法—iframe挡住了去路 http://www.51testing.com/html/02/n-855802.html
selenium webdriver 学习总结-浏览器启动方式 http://blog.csdn.net/pugongying1988/article/details/14525013
使用Selenium来抓取动态加载的页面 http://my.oschina.net/flashsword/blog/147334
Selenium Test 自动化测试 入门级学习笔记 http://www.renren.com
隐藏元素 http://my.oschina.net/longtutengfei/blog/166773,http://www.open-open.com/lib/view/open1402750704931.html