对于常见的爬取来说,当你非Python(这个专业),对于Java来说selenium就是大杀器了。有机会学习下。
再idea 工程里面pom.xml引用:
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>3.141.59</version>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-chrome-driver</artifactId>
<version>3.141.59</version>
</dependency>
先准备下环境
官网介绍:
brew tap homebrew/cask && brew cask install chromedriver
这个看个人网络,我的就不行
curl: (7) Failed to connect to chromedriver.storage.googleapis.com port 443: Operation timed out
Error: Download failed on Cask 'chromedriver' with message: Download failed: https://chromedriver.storage.googleapis.com/79.0.3945.36/chromedriver_mac64.zip
换个方式:
http://chromedriver.storage.googleapis.com/index.html
找到你对应Chrome版本,我用了这个
http://chromedriver.storage.googleapis.com/index.html?path=84.0.4147.30/
下载后,解压到/usr/local/bin
或者在下载路径解压缩后,执行命令:sudo cp -r chromedriver /usr/local/bin/
手动考过去。
因为会报错:
Exception in thread "main" java.lang.IllegalStateException: The path to the driver executable must be set by the webdriver.chrome.driver system property; for more information, seehttp://code.google.com/p/selenium/wiki/ChromeDriver. The latest version can be downloaded fromhttp://code.google.com/p/chromedriver/downloads/list
**************
demo:
public static void main(String[] args) {
WebDriver driver = new ChromeDriver();
driver.get("http://www.baidu.com");
String title = driver.getTitle();
System.out.printf(title);
driver.close();
}
运行期间会弹出一个浏览器框
先跑起来,再看看那些API用法。