java用dom4j解析带有cdata的xml报文
背景
前两天在工作中,调用外部的webservice接口,发现对方的返回报文格式与常见的不同,在解析中也一直有问题,遂记录下来。
报文格式
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<ns1:result xmlns:ns1="http://baidu.com/">
<return><![CDATA[<?xml version="1.0" encoding="UTF-8"?><Response>
<responsebody>
<respInfo>
<resultCode>1</resultCode>
<resultMsg>success</resultMsg>
<users>
<user>
<name>张三</name>
<sex>男</sex>
<age>18</age>
<stuends>
<stuend>
<score>80</score>
<height>20</height>
</stuend>
</stuends>
</user>
</users>
</respInfo>
</responsebody>
</Response>]]></return>
</ns1:result>
</soap:Body>
</soap:Envelope>
思路
可以发现在整个xmj报文的中间,有一部分数据用<![CDATA[被包裹的数据]]>包裹起来,导致我们不能用以前的方法来解析,需要先去掉xml报文头,定位到return节点,再处理cdata中的数据。
解决方法
按照节点格式生成对应的实体类
先生成实体类,需要lombok
Response.java
import lombok.*;
/**
* @Author: xs
* @Description:java用dom4j解析带有cdata的xml报文
* @Date:Create:in 2020/6/28 15:48
* @Modified By:
*/
@Builder
@Data
@AllArgsConstructor(access = AccessLevel.PRIVATE)
@NoArgsConstructor
public class Response {
private ResponseBody responsebody;
}
ResponseBody.java
import lombok.*;
/**
* @Author: xs
* @Description:
* @Date:Create:in 2020/6/28 15:49
* @Modified By:
*/
@Builder
@Data
@AllArgsConstructor(access = AccessLevel.PRIVATE)
@NoArgsConstructor
public class ResponseBody {
private RespInfo respInfo;
}
RespInfo.java
import lombok.*;
import java.util.List;
/**
* @Author: xs
* @Description:
* @Date:Create:in 2020/6/28 15:52
* @Modified By:
*/
@Builder
@Data
@AllArgsConstructor(access = AccessLevel.PRIVATE)
@NoArgsConstructor
public class RespInfo {
private String resultCode;
private String resultMsg;
private List<Users> users;
}
Users.java
import lombok.*;
import java.util.List;
/**
* @Author: xs
* @Description:
* @Date:Create:in 2020/6/28 15:53
* @Modified By:
*/
@Builder
@Data
@AllArgsConstructor(access = AccessLevel.PRIVATE)
@NoArgsConstructor
public class Users {
private String name;
private String sex;
private String age;
private List<Stuends> stuends;
}
Stuends.java
import lombok.*;
/**
* @Author: xs
* @Description:
* @Date:Create:in 2020/6/28 15:55
* @Modified By:
*/
@Builder
@Data
@AllArgsConstructor(access = AccessLevel.PRIVATE)
@NoArgsConstructor
public class Stuends {
private String score;
private String height;
}
解析类
testXml.java
import com.thoughtworks.xstream.XStream;
import com.thoughtworks.xstream.io.xml.DomDriver;
import org.dom4j.Document;
import org.dom4j.DocumentHelper;
/**
* @Author: xs
* @Description:
* @Date:Create:in 2020/6/28 15:56
* @Modified By:
*/
public class testXml {
public static void main(String[] args) {
String str = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
"<soap:Envelope xmlns:soap=\"http://schemas.xmlsoap.org/soap/envelope/\">\n" +
" <soap:Body>\n" +
" <ns1:result xmlns:ns1=\"http://baidu.com/\">\n" +
" <return><![CDATA[<?xml version=\"1.0\" encoding=\"UTF-8\"?><Response>\n" +
" <responsebody>\n" +
" <respInfo>\n" +
" <resultCode>1</resultCode>\n" +
" <resultMsg>success</resultMsg>\n" +
" <users>\n" +
" <user>\n" +
" <name>张三</name>\n" +
" <sex>男</sex>\n" +
" <age>18</age>\n" +
" <stuends>\n" +
" <stuend>\n" +
" <score>80</score>\n" +
" <height>20</height>\n" +
" </stuend>\n" +
" </stuends>\n" +
" </user>\n" +
" </users>\n" +
" </respInfo>\n" +
" </responsebody>\n" +
"</Response>]]></return>\n" +
" </ns1:result>\n" +
" </soap:Body>\n" +
"</soap:Envelope>";
Response response = test(str);
String name = response.getResponsebody().getRespInfo().getUsers().get(0).getName();
System.out.println("name: "+name);
}
public static Response test(String xml){
Response response = new Response();
try{
Document document = DocumentHelper.parseText(xml);
String returnStr = document.getRootElement().element("Body").element("result").element("return").getText();
// 此处初始化XStream加了new DomDriver(),因为缺少xpp3_min的jar包,如果有这个jar不加new DomDriver()也可以
XStream xStream = new XStream(new DomDriver());
xStream.alias("Response",Response.class);
xStream.alias("responsebody",ResponseBody.class);
xStream.alias("respInfo",RespInfo.class);
xStream.alias("user",Users.class);
xStream.alias("stuend",Stuends.class);
Document document1 = DocumentHelper.parseText(returnStr);
String returnXml = document1.getRootElement().asXML();
response = (Response) xStream.fromXML(returnXml);
}catch (Exception e){
e.printStackTrace();
}
return response;
}
}
输出结果
name: 张三
Process finished with exit code 0
注意
使用xstream需要引入3个jar包
<dependency>
<groupId>com.thoughtworks.xstream</groupId>
<artifactId>xstream</artifactId>
<version>1.4</version>
</dependency>
<dependency>
<groupId>xmlpull</groupId>
<artifactId>xmlpull</artifactId>
<version>1.1.3.1</version>
</dependency>
<dependency>
<groupId>xpp3</groupId>
<artifactId>xpp3_min</artifactId>
<version>1.1.4c</version>
</dependency>
如果缺少xpp3_min会报以下问题,可以在初始化时加上new DomDriver()
###在解析到return这一层时,如果用的是asXML方法。
String returnStr = document.getRootElement().element("Body").element("result").element("return").asXML();
会报以下问题
这是因为getText()获取当前节点的文本内容。是当前节点.如果当前节点是一个element元素,那返回值就是null.例如上述报文,getText获取到的就是如下内容
<?xml version="1.0" encoding="UTF-8"?><Response>
<responsebody>
<respInfo>
<resultCode>1</resultCode>
<resultMsg>success</resultMsg>
<users>
<user>
<name>张三</name>
<sex>男</sex>
<age>18</age>
<stuends>
<stuend>
<score>80</score>
<height>20</height>
</stuend>
</stuends>
</user>
</users>
</respInfo>
</responsebody>
</Response>
但是asXml()获取到的是如下内容,指的是这个节点(元素)的开始到结束包含的内容组成String
<return><![CDATA[<?xml version="1.0" encoding="UTF-8"?><Response>
<responsebody>
<respInfo>
<resultCode>1</resultCode>
<resultMsg>success</resultMsg>
<users>
<user>
<name>张三</name>
<sex>男</sex>
<age>18</age>
<stuends>
<stuend>
<score>80</score>
<height>20</height>
</stuend>
</stuends>
</user>
</users>
</respInfo>
</responsebody>
</Response>]]></return>
所以不能用asXml。
####在解析到Response这一层的时候,如果用的是getText方法
String returnXml = document1.getRootElement().getText();
会报以下问题
是因为getText在这里获取的是空,
但是asXml获取到的是
<Response>
<responsebody>
<respInfo>
<resultCode>1</resultCode>
<resultMsg>success</resultMsg>
<users>
<user>
<name>张三</name>
<sex>男</sex>
<age>18</age>
<stuends>
<stuend>
<score>80</score>
<height>20</height>
</stuend>
</stuends>
</user>
</users>
</respInfo>
</responsebody>
</Response>
总结
从入参xml到returnStr,我们做的是去掉了整个xml文件头,解析到了cdata这一层;从returnStr到returnXml,我们做的是去掉了cdata中的文件头。