如题,因为在爬虫中经常需要进行数据转化,比如说爬下来的JSON,转化为String,或者转为实体
比如说我们现在又这么一个URL
https://www.hko.gov.hk/cis/aws/aws.xml
然后我们需要获取他的值,比方我用下面的一个方法去获取
//传入URL和一些head值(部分网址有一些放爬虫操作),具体如何那些head可以通过谷歌的控制台和postman去一个个找
public static String doGetOnther(String urls, Map<String, String> head) throws Exception {
SSLContext sslContext = new SSLContextBuilder()
.loadTrustMaterial(null, (certificate, authType) -> true).build();
CloseableHttpClient client = HttpClients.custom()
.setSslcontext(sslContext)
.setSSLHostnameVerifier(new NoopHostnameVerifier())
.build();
HttpGet httpGet = new HttpGet(urls);
RequestConfig requestConfig = RequestConfig.custom()
.setConnectionRequestTimeout(20000)
.setSocketTimeout(20000)
.setConnectTimeout(50000).build();
httpGet.setConfig(requestConfig);
if (head != null) {
for (Object map : head.entrySet()) {
httpGet.setHeader((String) ((Map.Entry) map).getKey(), (String) ((Map.Entry) map).getValue());
}
}
httpGet.setHeader("accept", "*/*");
httpGet.setHeader("connection", "Keep-Alive");
httpGet.setHeader("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");
HttpResponse response = client.execute(httpGet);
String resultString = EntityUtils.toString(response.getEntity(), "UTF-8");
return resultString;
}
然后通过这个方法我们就获取到一系列数据,当然是String(为了观看简洁,我删除了一部分内容)
{"aws":[
{"code":"HKA","eng":"Hong Kong International Airport^","chi":"香港國際機場^"},
{"code":"R18","eng":"Sai Kung Sam Yuk Middle School","chi":"西貢三育中學"},
{"code":"R27","eng":"Yuen Long","chi":"元朗"},
{"code":"R28","eng":"Au Tau","chi":"凹頭"},
{"code":"R29","eng":"Lok Ma Chau","chi":"落馬洲"},
]}
然后通过解析我们可以看清楚他的结构
所以判断是Object → Array → Object
然后一般Object我们可以解析成为键值对模式,再转为 array ,最后Object
JSONObject resultObject = JSON.parseObject(jsonString);
JSONArray resultArray = resultObject.getJSONArray("aws");
for (int i = 0; i < resultArray.size(); i++) {
JSONObject partDaily = resultArray.getJSONObject(i);
String codeString=partDaily.getString("code");
String startYearString=partDaily.getString("startYear");
String startMonthString=partDaily.getString("startMonth");
String endYearthString=partDaily.getString("endYear");
String endMonthString=partDaily.getString("endMonth");
System.out.println(codeString);
System.out.println(startYearString);
System.out.println(startMonthString);
System.out.println(endYearthString);
System.out.println(endYearthString);
System.out.println(endMonthString);
}
然后我们可以在debug上看到对应的值
然后控制台输出
这就可以获取到了,另外如果实体类字段比较多,但是比较规律,可以先把数据存入到Map里面然后将Map转化为实体
Map eneity = new HashedMap();
entiy.put("numnber"+i,"value"); //我想你懂我的意思(抛媚眼~)
............
RunTide runTide = JSON.parseObject(JSON.toJSONString(eneity), RunTide.class);