缘由:调用yarn的api得到任务日志,但是返回值是个html,需要将其中的日志信息解析出来。
引入jar包:
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>5.7.0</version>
<scope>test</scope>
</dependency>
方法一:使用在线地址,发送请求后解析
package demo.com.test;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.junit.Test;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;
public class JsoupHMTest {
@Test
public void t() throws IOException {
String url = "http://hdp02.bonc.com:23999/node/containerlogs/container_e08_1684747720686_0127_01_000001/hadoop/stdout/?start=0&start=0";
Document document = Jsoup.connect(url).get();
String contennt = document.getElementsByTag("pre").first().html();
List<String> logList = Arrays.asList(contennt.split("\n"));
System.out.println(logList);
for(int i =0;i<logList.size();i++ ){
System.out.print(i + "--------");
System.out.println(logList.get(i));
}
String logoSrc = document.select("#logo").html();
System.out.println("id为logo的元素内容为" + logoSrc);
}
}
方法二:解析本地文件
package demo.com.test;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.junit.Test;
import java.io.File;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;
public class JsoupHM2Test {
@Test
public void t() throws IOException {
Document root = Jsoup.parse(new File("C:\\Users\\wysghmbb\\Desktop\\stdout2.htm"), "utf-8");
String contennt = root.getElementsByTag("pre").first().html();
List<String> logList = Arrays.asList(contennt.split("\n"));
for(int i =0;i<logList.size();i++ ){
System.out.print(i + "--------");
System.out.println(logList.get(i));
}
}}
效果举例
其他链接
获取document后,还可以根据id或者name、标签等进行解析,参考https://blog.csdn.net/qq_26786441/article/details/106207828