使用Selenium渲染页面并提取数据

最新推荐文章于 2024-09-18 10:14:48 发布

Mr_Tank_

最新推荐文章于 2024-09-18 10:14:48 发布

阅读量1.2w

点赞数

分类专栏： java 文章标签： selenium 爬虫

本文链接：https://blog.csdn.net/Mr_Tank_/article/details/17042547

版权

java 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

最近使用java爬虫收集数据，遇到js动态数据的时候使用如jsoup是获取不到数据的，所以要先进行页面的渲染。

下面是以京东商城数据为例，使用Selenium需要先下好相应的驱动，我使用的是Chrome；还需要加入common-exec包

import org.openqa.selenium.By;

import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;


/**
 * Created with IntelliJ IDEA.
 * User: Mr_Tank_
 * Date: 13-11-29
 * Time: 下午9:52
 * To change this template use File | Settings | File Templates.
 */
public class seleniumTest {


    public static void main(String args[]){
        System.getProperties().setProperty("webdriver.chrome.driver", "E:\\driver\\chromedriver.exe");
        WebDriver webDriver =new ChromeDriver();
        webDriver.get("http://list.jd.com/9987-653-655-0-0-0-0-0-0-0-1-1-1-1-1-72-4137-33.html");

        WebElement webElement = webDriver.findElement(By.xpath("//div[@id='plist']"));

        System.out.println(webElement.getAttribute("outerHTML"));

        WebElement li=webElement.findElement(By.xpath("//li[@index='1']"));

        String name=li.findElement(By.xpath("//li[@index='1']//div[@class='p-name']/a")).getText();
        System.out.println("商品名:"+name);

        String price=li.findElement(By.xpath("//li[@index='1']//div[@class='p-price']/strong")).getText();
        System.out.println("价格:"+price);

        String eva=li.findElement(By.xpath("//li[@index='1']//span[@class='evaluate']/a[@target='_blank']")).getText();
        System.out.println("评价:"+eva);

        webDriver.close();
    }
}

结果：