解析xml常用技术有两种,一种是dom,一种是sax,两者区别如下:
我们常用Dom4j框架去解析xml,Dom4j解析xml底层实现有两种方式:一种是Dom4j_dom方式,另一种为Dom4j_sax方式,
性能实验,两种方式同时去解析一个4M的xml文件,
XML文件如下:
<?xml version="1.0" encoding="gb2312"?>
<students>
<student id="01">
<name>张三</name>
<age>18</age>
</student>
<student id="02">
<name>李四</name>
<age>28</age>
</student>
<student id="01">
<name>张三</name>
<age>18</age>
</student>
<student id="02">
<name>李四</name>
<age>28</age>
</student>
...n个student节点
</students>
代码如下:
package Dom4jSample;
import java.io.File;
import java.io.UnsupportedEncodingException;
import java.util.List;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Element;
import org.dom4j.ElementHandler;
import org.dom4j.ElementPath;
import org.dom4j.io.SAXReader;
public class Dom4jTest {
public static void main(String[] args) throws DocumentException, UnsupportedEncodingException {
long startTime = System.currentTimeMillis();
Dom4j_sax();
System.out.println("Dom4j_sax cost time = " + (System.currentTimeMillis() - startTime));
startTime = System.currentTimeMillis();
Dom4j_dom();
System.out.println("Dom4j_dom cost time = " + (System.currentTimeMillis() - startTime));
}
/**
* dom4j解析xml底层实现有两种方式:dom, sax方式
* dom4j的sax方式步骤:
* 1. 引入jar包:Dom4j-1.6.1.jar和jaxen-1.1.6.jar
* 2. SAXReader saxReader = new SAXReader()
* 3. saxReader.addHandler(xpath表达式,实现ElementHandler接口的类,本例使用匿名类)
* 4. saxReader.read()
*/
public static void Dom4j_sax() throws UnsupportedEncodingException, DocumentException{
SAXReader saxReader = new SAXReader();
saxReader.addHandler("/students/student", new ElementHandler(){ //第一个参数为xpath表达式,不懂请百度。第二个参数为匿名类
public void onEnd(ElementPath elem) {
Element studentNode = elem.getCurrent(); //获取student节点
String id = studentNode.attributeValue("id"); //获取student节点sn数次属性值
// System.out.println("----" + studentNode.attributeValue("id"));
studentNode.detach();//必须在这里释放内存,不然处理大文件xml时会内存溢出
}
public void onStart(ElementPath arg0) {
}});
saxReader.read(new File("D:\\qinshipeng\\dom4j\\student.xml"));
/*String text = "<?xml version=\"1.0\" encoding=\"gb2312\"?><students><student sn=\"22\"><name>张三</name>" +
"<age>18</age></student></students>";
//ByteArrayInputStream把String转为输入流
saxReader.read(new ByteArrayInputStream(text.getBytes("UTF-8")));
// DocumentHelper.parseText(text);
*/
}
public static void Dom4j_dom() throws UnsupportedEncodingException, DocumentException{
SAXReader saxReader = new SAXReader();
Document doc = saxReader.read(new File("D:\\qinshipeng\\dom4j\\student.xml"));
List<Element> studentNodes = doc.selectNodes("/students/student"); //返回student节点集合,很耗时
for(Element e : studentNodes){
String id = e.attributeValue("id");
}
}
}
实验结果:
结论: 解析大文件性能:原生的SAX > Dom4j_sax > Dom4j_dom