Dom4j里的sax方式和dom方式处理大xml文件性能对比

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u014385013/article/details/79190774

解析xml常用技术有两种,一种是dom,一种是sax,两者区别如下:


我们常用Dom4j框架去解析xml,Dom4j解析xml底层实现有两种方式:一种是Dom4j_dom方式,另一种为Dom4j_sax方式,

性能实验,两种方式同时去解析一个4M的xml文件,

XML文件如下:

<?xml version="1.0" encoding="gb2312"?>

<students>  
	  <student id="01">       
        <name>张三</name>     
        <age>18</age>   
    </student>    
    <student id="02">       
        <name>李四</name>     
        <age>28</age>   
    </student>
    <student id="01">       
        <name>张三</name>     
        <age>18</age>   
    </student>    
    <student id="02">       
        <name>李四</name>     
        <age>28</age>   
    </student>
    ...n个student节点
</students>


代码如下:

package Dom4jSample;

import java.io.File;
import java.io.UnsupportedEncodingException;
import java.util.List;

import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Element;
import org.dom4j.ElementHandler;
import org.dom4j.ElementPath;
import org.dom4j.io.SAXReader;

public class Dom4jTest {

	public static void main(String[] args) throws DocumentException, UnsupportedEncodingException {
		long startTime = System.currentTimeMillis();
		Dom4j_sax();
		System.out.println("Dom4j_sax cost time = " + (System.currentTimeMillis() - startTime));
		
		startTime = System.currentTimeMillis();
		Dom4j_dom();
		System.out.println("Dom4j_dom cost time = " + (System.currentTimeMillis() - startTime));
	}

	/**
	 * dom4j解析xml底层实现有两种方式:dom, sax方式
	 * dom4j的sax方式步骤:
	 * 1. 引入jar包:Dom4j-1.6.1.jar和jaxen-1.1.6.jar
	 * 2. SAXReader saxReader = new SAXReader()
	 * 3. saxReader.addHandler(xpath表达式,实现ElementHandler接口的类,本例使用匿名类)
	 * 4. saxReader.read()
	 */
	public static void Dom4j_sax() throws UnsupportedEncodingException, DocumentException{
		SAXReader saxReader = new SAXReader();
		saxReader.addHandler("/students/student", new ElementHandler(){  //第一个参数为xpath表达式,不懂请百度。第二个参数为匿名类
			public void onEnd(ElementPath elem) {
				Element studentNode = elem.getCurrent();  //获取student节点
				String id = studentNode.attributeValue("id");  //获取student节点sn数次属性值
//				System.out.println("----" + studentNode.attributeValue("id"));
				studentNode.detach();//必须在这里释放内存,不然处理大文件xml时会内存溢出
			}

			public void onStart(ElementPath arg0) {

			}});
		saxReader.read(new File("D:\\qinshipeng\\dom4j\\student.xml"));

		/*String text = "<?xml version=\"1.0\" encoding=\"gb2312\"?><students><student sn=\"22\"><name>张三</name>" +
		"<age>18</age></student></students>";
		//ByteArrayInputStream把String转为输入流
		saxReader.read(new ByteArrayInputStream(text.getBytes("UTF-8")));
//		DocumentHelper.parseText(text);
		 */
	}

	public static void Dom4j_dom() throws UnsupportedEncodingException, DocumentException{
		SAXReader saxReader = new SAXReader();
		Document doc = saxReader.read(new File("D:\\qinshipeng\\dom4j\\student.xml"));
		List<Element> studentNodes = doc.selectNodes("/students/student"); //返回student节点集合,很耗时
		for(Element e : studentNodes){
			String id = e.attributeValue("id");
		}
	}
}
实验结果:


结论: 解析大文件性能:原生的SAX > Dom4j_sax > Dom4j_dom

展开阅读全文

没有更多推荐了,返回首页