java 用xpath解析html,用java中的xpath和selenium解析HTML表格数据

I want to take the data and organize it without the tags. It looks something like this

Optical Zoom:15x
Digital Zoom:6x
Battery Type:Alkaline
Resolution Megapixels:14 MP

and I want to be able to extract all the strings of information so that I can store in a plaintext file with just this:

Optical Zoom: 15x Digital Zoom: 6x Battery Type: Alkaline Resolution

Megapixels: 14 MP

public static void main(String[] args) {

FirefoxProfile profile = new FirefoxProfile();

profile.setPreference("general.useragent.override", "some UA string");

WebDriver driver = new FirefoxDriver(profile);

String Url = "http://www.walmart.com/ip/Generic-14-MP-X400-BK/19863348";

driver.get(Url);

List resultsDiv = driver.findElements(By.xpath("//table[contains (@class,'SpecTable')//td"));

System.out.println(resultsDiv.size());

for (int i=0; i

System.out.println(i+1 + ". " + resultsDiv.get(i).getText());

}

I am programming in Java with Selenium and I cannot figure out the correct XPath expression for it.

Can someone figure out why I err on this and maybe give me some pointers on how I can parse this data correctly? Im very new to Selenium and XPaths but I need this for work.

Also if anyone has any good sources for me to learn Selenium and XPath fast, those would also be greatly appreciated!

解决方案

Probably this will suite your needs:

string text = driver.findElement(By.cssSelector("table.SpecTable")).getText();

String text will contain all text nodes from the table with class SpecTable.

I prefer using css, because it's supported by IE and faster than xpath. But as for xpath tutorials try this and this.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
JsoupXpath 是一款纯Java开发的使用xpath解析html解析器,xpath语法分析与执行完全独立,html的DOM树生成借助Jsoup,故命名为JsoupXpath.为了在java里也享受xpath的强大与方便但又苦于找不到一款足够强大的xpath解析器,故开发了JsoupXpath。JsoupXpath的实现逻辑清晰,扩展方便,支持几乎全部常用的xpath语法.http://www.cnblogs.com/ 为例 "//a/@href"; "//div[@id='paging_block']/div/a[text()='Next >']/@href"; "//div[@id='paging_block']/div/a[text()*='Next']/@href"; "//h1/text()"; "//h1/allText()"; "//h1//text()"; "//div/a"; "//div[@id='post_list']/div[position()1000]/div/h3/allText()"; //轴支持 "//div[@id='post_list']/div[self::div/div/div/span[@class='article_view']/a/num()>1000]/div/h3/allText()"; "//div[@id='post_list']/div[2]/div/p/preceding-sibling::h3/allText()"; "//div[@id='post_list']/div[2]/div/p/preceding-sibling::h3/allText()|//div[@id='post_list']/div[1]/div/h3/allText()"; 在这里暂不列出框架间的对比了,但我相信,你们用了会发现JsoupXpath就是目前市面上最强大的的Xpath解析器。 快速开始 如果不方便使用maven,可以直接使用lib下的依赖包跑起来试试,如方便可直接使用如下dependency(已经上传至央maven库,最新版本0.1.1):    cn.wanghaomiao    JsoupXpath    0.1.1 依赖配置好后,就可以使用如下例子进行体验了!String xpath="//div[@id='post_list']/div[./div/div/span[@class='article_view']/a/num()>1000]/div/h3/allText()";String doc = "..."; JXDocument jxDocument = new JXDocument(doc); List<Object> rs = jxDocument.sel(xpath); for (Object o:rs){     if (o instanceof Element){             int index = ((Element) o).siblingIndex();             System.out.println(index);     }     System.out.println(o.toString()); } 其他可以参考 cn.wanghaomiao.example包下的例子 语法 支持标准xpath语法(支持谓语嵌套),支持全部常用函数,支持全部常用轴,去掉了一些标准里面华而不实的函数和轴,下面会具体介绍。语法可以参考http://www.w3school.com.cn/xpath/index.asp 关于使用Xpath的一些注意事项 非常不建议直接粘贴Firefox或chrome里生成的Xpa

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值