参考:http://kodejava.org/how-do-i-get-mixed-content-of-an-xml-element/
下面的代码demo了如何用JDOM读取xml文件中的CDATA数据。
要解析的xml文件内容如下:
<root><data><!-- This element contains application data -->User Information<![CDATA[<table><tr><td>-data-</td></tr></table>]]><field name=\"username\">alice</field></data></root>
代码如下:
package org.kodejava.example.jdom;
import org.jdom.CDATA;
import org.jdom.Comment;
import org.jdom.DocType;
import org.jdom.Document;
import org.jdom.Element;
import org.jdom.EntityRef;
import org.jdom.JDOMException;
import org.jdom.ProcessingInstruction;
import org.jdom.Text;
import org.jdom.input.SAXBuilder;
import java.io.IOException;
import java.io.StringReader;
import java.util.List;
public class JDOMMixedContent {
public static void main(String[] args) {
String xml =
"<root>" +
"<data>" +
"<!-- This element contains application data -->" +
"User Information" +
"<![CDATA[<table><tr><td>-data-</td></tr></table>]]>" +
"<field name=\"username\">alice</field>" +
"</data>" +
"</root>";
SAXBuilder builder = new SAXBuilder();
try {
Document document = builder.build(new StringReader(xml));
Element root = document.getRootElement();
Element data = root.getChild("data");
//
// Reading the mixed content of an xml element and iterate
// the result list. This list object can contains any of the
// following objects: Comment, Element, CDATA, DocType,
// ProcessingInstruction, EntityRef and Text.
//
List content = data.getContent();
for (Object o : content) {
if (o instanceof Comment) {
Comment comment = (Comment) o;
System.out.println("Comment = " + comment);
} else if (o instanceof Element) {
Element element = (Element) o;
System.out.println("Element = " + element);
} else if (o instanceof CDATA) {
CDATA cdata = (CDATA) o;
System.out.println("CDATA = " + cdata);
} else if (o instanceof DocType) {
DocType docType = (DocType) o;
System.out.println("DocType = " + docType);
} else if (o instanceof ProcessingInstruction) {
ProcessingInstruction pi = (ProcessingInstruction) o;
System.out.println("PI = " + pi);
} else if (o instanceof EntityRef) {
EntityRef entityRef = (EntityRef) o;
System.out.println("EntityRef = " + entityRef);
} else if (o instanceof Text) {
Text text = (Text) o;
System.out.println("Text = " + text);
}
}
//
// Remove the second mixed content which is the CDATA content.
//
content.remove(2);
} catch (JDOMException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
输入如下:
Comment = [Comment: <!-- This element contains application data -->]
Text = [Text: User Information]
CDATA = [CDATA: <table><tr><td>-data-</td></tr></table>]
Element = [Element: <field/>]
解读:
代码42~67行解析data节点中的内容。其实,data节点内容由4部分组成,见上面的xml文件内容,分别用不同颜色标记。代码中通过判断data节点内容的类型(Comment, Element, CDATA 等),分别转换成不同的数据类型,然后进行实际内容的读取,这样就能做到,任凭data节点下的内容形式多么乱,都能罩得住。
注意:
对于下面这样一段xml内容,data节点下面貌似只有一段CDATA内容,没有其他内容。
<data>
<![CDATA[@Ho>40D9\/%\")\/Q}16pG:5;:!fUqM+TnH6'@6R(/Vp;^:{KOj3?[.AvFGP\\"yFk>C@]]>
</data>
其实不然。这里的data节点下面有3段内容,依次是:
- "\n\t\t\t\r",Text类型
- <![CDATA[@Ho>40D9\/%\")\/Q}16pG:5;:!fUqM+TnH6'@6R(/Vp;^:{KOj3?[.AvFGP\\"yFk>C@]]>,CDATA类型
- "\n\t\t\t",Text类型
另外注意,上面实例代码的第47,50,53,56,59,62,65行的o的转换后的对象,最好都做一下 .getValue().trim()操作。