爬取url返回个json时使用这种方式解析json:
JsonPathSelector json = new JsonPathSelector(page.getRawText());
List name = json.selectList("$.data.itemList[*].brand.name");
List uri = json.selectList("$.data.itemList[*].brand.uri");
结果在使用debug的F5时看到了这个类的源码:这什么情况?
看了us.codecraft.webmagic.selector.JsonPathSelectorTest,才知道原来参数写错了。
JsonPathSelector(String jsonPathStr)这个构造函数的参数是jsonPathStr,也就是提取规则的str。
String select(String text)方法和List selectList(String text)方法,参数都是text,也就是json的字符串。
package us.codecraft.webmagic.selector;
import org.junit.Test;
import java.util.List;
import static org.assertj.core.api.Assertions.assertThat;
/**
* @author code4crafter@gmai.com
*/
public class JsonPathSelectorTest {
private String text = "{ \"store\": {\n" +
" \"book\": [ \n" +
" { \"category\": \"reference\",\n" +
" \"author\": \"Nigel Rees\",\n" +
" \"title\": \"Sayings of the Century\",\n" +
" \"price\": 8.95\n" +
" },\n" +
" { \"category\": \"fiction\",\n" +
" \"author\": \"Evelyn Waugh\",\n" +
" \"title\": \"Sword of Honour\",\n" +
" \"price\": 12.99,\n" +
" \"isbn\": \"0-553-21311-3\"\n" +
" }\n" +
" ],\n" +
" \"bicycle\": {\n" +
" \"color\": \"red\",\n" +
" \"price\": 19.95\n" +
" }\n" +
" }\n" +
"}";
@Test
public void testJsonPath() {
System.out.println(text);
JsonPathSelector jsonPathSelector = new JsonPathSelector("$.store.book[*].author");
String select = jsonPathSelector.select(text);
List list = jsonPathSelector.selectList(text);
assertThat(select).isEqualTo("Nigel Rees");
assertThat(list).contains("Nigel Rees","Evelyn Waugh");
jsonPathSelector = new JsonPathSelector("$.store.book[?(@.category == 'reference')]");
list = jsonPathSelector.selectList(text);
select = jsonPathSelector.select(text);
System.out.println("select:\t"+select);
System.out.println("list:\t"+list);
assertThat(select).isEqualTo("{\"author\":\"Nigel Rees\",\"price\":8.95,\"category\":\"reference\",\"title\":\"Sayings of the Century\"}");
assertThat(list).contains("{\"author\":\"Nigel Rees\",\"price\":8.95,\"category\":\"reference\",\"title\":\"Sayings of the Century\"}");
}
}
我觉得这个实现不太好。在一个page中,jsonStr是一样的,而提取规则不同。如果每次都new 一个新的JsonPathSelector作为提取规则,那要创建多少对象啊。而且和下面这种实现比较来说,提取规则开发方式不同:
String brand_price = html.xpath("//span[@id=\"item-sellprice\"]/text()").toString();
String brand_img = html.xpath("//img[@id=\"brand-img\"]/@src").toString();
String brand_describe = html.xpath("//p[@id=\"brand-describe\"]/text()").toString();
String location_text = html.xpath("//span[@id=\"location-text\"]/text()").toString();
估计不是我自己出现这种问题吧。嘿嘿。