1、运行环境
selenium 首先要下载chromedriver,chromedriver与google浏览器对应的版本如下表所示:
驱动版本 | 浏览器对应的版本 |
---|---|
ChromeDriver v2.43 (2018-10-16) | Supports Chrome v69-71 |
ChromeDriver v2.42 (2018-09-13) | Supports Chrome v68-70 |
应该没有人用更老的版本了,chromedriver的下载地址如下:
http://npm.taobao.org/mirrors/chromedriver/
2、程序功能
通过twitch的搜索框,搜索给定的关键字,返回视频直播地址的url。
这个使用idea进行编写的程序,打算打包成war运行,所以创建项目的时候配置如下图所示,主要是将jar打包成war:
程序运行的pom文件如下:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.1.1.RELEASE</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>learn</groupId>
<artifactId>twlive</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>war</packaging>
<name>twlive</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-tomcat</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>3.141.59</version>
</dependency>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.10.3</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
对应的application.properties配置文件如下,配置文件里面写的是chromedriver在系统中的绝对路径:
spring.jmx.default-domain=twitch-live
#chorme webdriver的路径
web.driver = C:\\bin\\chromedriver.exe
两个默认的启动类代码如下类1:
package learn.twlive;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.scheduling.annotation.EnableScheduling;
@SpringBootApplication
@EnableScheduling
public class TwliveApplication {
public static void main(String[] args) {
SpringApplication.run(TwliveApplication.class, args);
}
}
类2
package learn.twlive;
import org.springframework.boot.builder.SpringApplicationBuilder;
import org.springframework.boot.web.servlet.support.SpringBootServletInitializer;
public class ServletInitializer extends SpringBootServletInitializer {
@Override
protected SpringApplicationBuilder configure(SpringApplicationBuilder application) {
return application.sources(TwliveApplication.class);
}
}
定时器运行类如下,配置成50s运行一次,cron在线配置网站链接:http://cron.qqe2.com/ 原来好像没有广告,现在加上了。
package learn.twlive.web;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.web.bind.annotation.RestController;
@RestController
public class Web {
@Value("${web.driver}")
private String webDriverPath;
@Scheduled(cron = "0/50 0 * * * ? ")
public void getTwurl(){
TwitchSearch twitchSearch = new TwitchSearch();
String searchKey = "dota2 psg.lgd";
String url = twitchSearch.downLoadSeliunm(searchKey,webDriverPath);
if(url!=null)
System.out.println(url);
}
}
具体的搜索类和url提取类如下,需要使用jsoup ,也可以使用selenium自带的选择器:
package learn.twlive.web;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import org.openqa.selenium.By;
import org.openqa.selenium.Dimension;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
public class TwitchSearch {
Logger logger = LoggerFactory.getLogger(this.getClass());
public String downLoadSeliunm(String keyword,String webDriverPath){
String url = "https://www.twitch.tv/";
System.setProperty("webdriver.chrome.driver",webDriverPath);
//chromedriver服务地址
ArrayList<String> command = new ArrayList<String>();
//command.add("--headless");
ChromeOptions options = new ChromeOptions();
options.addArguments(command);
WebDriver driver = new ChromeDriver(options);
try{
driver.manage().window().setSize(new Dimension(1920,1080));
driver.get(url);
driver.findElement(By.id("nav-search-input")).sendKeys(keyword);
String docString = driver.getPageSource();
Document doc = Jsoup.parse(docString);
String liveUrl= parserTwitch(doc);
return liveUrl;
}catch (Exception ex){
logger.info(ex.toString());
}finally {
driver.quit();
}
return "";
}
public String parserTwitch(Document doc){
Elements searchResultSectionBlock = doc.getElementsByClass("search-result-section__block");
if(searchResultSectionBlock.size()>0){
Elements urlA = searchResultSectionBlock.get(0).getElementsByClass("tw-interactive tw-block tw-full-width tw-interactable tw-interactable--inverted");
String href = urlA.get(0).attr("href");
return "https://www.twitch.tv"+href;
}
return "";
}
}