java 解析html的js,Java解析JS生成的html元素

I'm very new to html parsing with Java, I used JSoup previously to parse simple html without it dynamically changing, however I now need to parse a web page that has dynamic elements. This is the code I attempted to parse the web page with prior however it was impossible to find the elements since they where added after the page had loaded. The situation is question is a page that uses google maps with markers on it, I'm attempting to scrape the images of these markers.

public static void main(String[] args) {

try {

doc = Jsoup.connect("https://pokevision.com")

.userAgent(

"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36")

.get();

} catch (IOException e) {

e.printStackTrace();

}

Elements images = doc.select("img[src~=(?i)\\.(png|jpe?g|gif)]");

for (Element image : images) {

System.out.println("src : " + image.attr("src"));

}

}

So since apparently this operation is impossible with JSoup, what other libraries can I use to find the image sources. A1AXH.png

解决方案

The problem you are facing is Jsoup retrieves the static source code, as it would be delivered to a browser. What you want is the DOM after the javaScript has been invoked. For this, you can use HTML Unit to get the rendered page and then pass its content to Jsoup for parsing.

// capture rendered page

WebClient webClient = new WebClient();

HtmlPage myPage = webClient.getPage("https://pokevision.com");

// convert to jsoup dom

Document doc = Jsoup.parse(myPage.asXml());

// extract data using jsoup selectors

Elements images = doc.select("img[src~=(?i)\\.(png|jpe?g|gif)]");

for (Element image : images) {

System.out.println("src : " + image.attr("src"));

}

// clean up resources

webClient.close();

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值