1.Document
接口对象是官方出的,W3C标准,作为HTML、XML实体类加载到内存中,形成文档对象,然后使用循环进行数据解析。
2.SAXParser
SAXParser是一个用于处理XML的事件驱动的“推”模型。它不是W3C标准,但它是一个得到了广泛认可的API,大多数SAXParser解析器在实现的时候都遵循标准。
SAXParser解析器不象DOM那样建立一个整个文档的树型表示,而是使用数据流的方式读取,然后根据读取文档的元素类型进行事件反馈。这些事件将会推给事件处理器,而事件处理器则提供对文档内容的访问数据包装等。
事件处理器有三种基本类型:
用于访问XML DTD内容的DTDHandler;
用于低级访问解析错误的ErrorHandler;
用于访问文档内容的最普遍类型ContentHandler。
3.XMLStreamReader(StAX)
XMLStreamReader也属于数据留解析的一种,读入文件,按线性的方式从文件头一直读到文件尾;和SAXParser一样,使用事件驱动的模型来反馈事件。不同的是,XMLStreamReader不使用SAXParser的推模型,而是使用 “拉”模型进行事件处理。而且XMLStreamReader解析器不使用回调机制,而是根据应用程序的要求返回事件。XMLStreamReader还提供了用户友好的API用于读入和写出。
尽管SAXParser向ContentHandler返回不同类型的事件,但XMLStreamReader却将它的事件返回给应用程序,甚至可以以对象的形式提供事件。
当应用程序要求一个事件时,XMLStreamReader解析器根据需要从XML文档读取并将该事件返回给该应用程序。 XMLStreamReader提供了用于创建XMLStreamReader读写器的工具,所以应用程序可以使用StAX接口而无需参考特定实现的细节。
与Document和SAXParser不同,XMLStreamReader指定了两个解析模型:指针模型,如SAXParser,它简单地返回事件;迭代程序模型,它以对象形式返回事件(这里需要吐槽一下,我个人是比较喜欢SAXParser的handler事件处理的模式,代码方面比较值观),其实XMLStreamReader也可以跟SAXParser一样,但是需要额外的对象创建开销。
以下来看看示例代码:
1.Document解析XML的基础代码:
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(path);
Element element = document.getDocumentElement();
只需要三行代码就可以把Element对象读出来,这时候只需要遍历Element对象,就可以把数据组装出来。
2.SAXParser解析XML的基础代码
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser parser = factory.newSAXParser();
parser.parse(path, handler);
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
也是三行代码,其中比较重要的是handler的事件回调,这里使用的是DefaultHandler。
3.XMLStreamReader(StAX)
InputStream in = new FileInputStream(path);
XMLInputFactory factory = XMLInputFactory.newFactory();
XMLStreamReader reader = factory.createXMLStreamReader(in);
while (reader.hasNext()) {
int event = reader.next();
if (event == XMLStreamConstants.START_ELEMENT) {
} else if (event == XMLStreamConstants.END_ELEMENT) {
} else if (event == XMLStreamConstants.END_DOCUMENT) {
out("Use StAXParser object,and use time is " + (System.currentTimeMillis() - t) + "ms");
}
}
这里使用InputStream读入文件流,然后把流数据传递给XMLStreamReader对象,接着就循环遍历,在循环中必须使用.next()返回事件类型。
以下是我测试读取全国地区(含县级)数据的测试时间:
Document使用了103ms,其中SAXParser解析最快,基本上都是10~16ms之间,这取决于个人电脑,我的是比较烂的垃圾笔记本。
以下贴出读取全国XML地区数据的JAVA代码,三种方式:
一、Document
import model.AreaModel;
import model.AreaNode;
import model.CityModel;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
/**
* Document解析
* Created by alan on 2018/12/16.
*/
public class XmlParserByDocument extends OutPut {
private String path;
List areaModels = new ArrayList<>();
public XmlParserByDocument() {
}
public XmlParserByDocument(String path) {
this.path = path;
}
public List getAreaModels() {
return areaModels;
}
public void parser() {
long t = System.currentTimeMillis();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(path);
Element element = document.getDocumentElement();
out("document v" + document.getXmlVersion() + " encode " + document.getInputEncoding());
if ("root".equals(element.getTagName())) {
NodeList nodeList = element.getChildNodes();
AreaModel area = null;
CityModel city;
for (int i = 0; i < nodeList.getLength(); i++) {
String nodeName = nodeList.item(i).getNodeName();
if ("province".equals(nodeName)) {
area = new AreaModel(parserNode(nodeList.item(i)), parserNodeList(nodeList.item(i).getChildNodes()));
areaModels.add(area);
}
}
out("Use Document object and use time is " + (System.currentTimeMillis() - t) + "ms.");
} else {
throw new Exception("invalid xml file.");
}
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
} finally {
}
}
public void test(){
String str = "";
for (AreaModel a : areaModels) {
str += a.getProvince() + "\n";
for (AreaNode n : a.getCitys()) {
str += "\t" + n + "\n";
for (AreaNode j : n.getChild()) {
str += "\t\t" + j + "\n";
}
}
}
out(str);
}
private List parserNodeList(NodeList list) {
List nodes = new ArrayList<>();
int l = list.getLength();
for (int i = 0; i < list.getLength(); i++) {
if (list.item(i).hasChildNodes()) {
AreaNode node = parserNode(list.item(i));
node.setChild(parserNodeList(list.item(i).getChildNodes()));
nodes.add(node);
} else {
AreaNode node = parserNode(list.item(i));
if (node != null) {
nodes.add(node);
}
}
}
return nodes;
}
private AreaNode parserNode(Node node) {
AreaNode areaNode = null;
NamedNodeMap attrs = node.getAttributes();
if (attrs != null) {
areaNode = new AreaNode(attrs.getNamedItem("name").getTextContent(), Integer.valueOf(attrs.getNamedItem("postcode").getTextContent()));
}
return areaNode;
}
}
二、SAXParser
import model.AreaModel;
import model.AreaNode;
import org.xml.sax.*;
import org.xml.sax.helpers.AttributesImpl;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
/**
* Stream解析bySAX
* Created by alan on 2018/12/16.
*/
public class XmlParserBySAX extends OutPut {
private String path = "d:/test/area.xml";
private List areaModels;
public XmlParserBySAX() {
}
public XmlParserBySAX(String path) {
this.path = path;
}
public List getAreaModels() {
return areaModels;
}
public void parser() {
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser parser = factory.newSAXParser();
parser.parse(path, handler);
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
public void test(){
String str = "";
for (AreaModel a : areaModels) {
str += a.getProvince() + "\n";
for (AreaNode n : a.getCitys()) {
str += "\t" + n + "\n";
for (AreaNode j : n.getChild()) {
str += "\t\t" + j + "\n";
}
}
}
out(str);
}
private long t = 0;
private DefaultHandler handler = new DefaultHandler() {
private AreaModel province;
private List citys;
private List areas;
private AreaNode city;
@Override
public void startDocument() throws SAXException {
areaModels = new ArrayList<>();
t = System.currentTimeMillis();
// out("start....");
}
@Override
public void endDocument() throws SAXException {
out("Use SAXParser object,and use time is " + (System.currentTimeMillis() - t) + "ms");
}
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
switch (qName) {
case "province":
province = new AreaModel();
province.setProvince(new AreaNode(attributes.getValue("name"), Integer.valueOf(attributes.getValue("postcode"))));
citys = new ArrayList<>();
break;
case "city":
city = new AreaNode(attributes.getValue("name"), Integer.valueOf(attributes.getValue("postcode")));
areas = new ArrayList<>();
break;
case "area":
areas.add(new AreaNode(attributes.getValue("name"), Integer.valueOf(attributes.getValue("postcode"))));
break;
}
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
switch (qName) {
case "province":
province.setCitys(citys);
areaModels.add(province);
break;
case "city":
city.setChild(areas);
citys.add(city);
break;
case "area":
break;
}
}
};
}
三、XMLStreamReader(StAX)
import com.sun.org.apache.bcel.internal.generic.BREAKPOINT;
import model.AreaModel;
import model.AreaNode;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;
/**
* 拉解析器解析
* Created by alan on 2018/12/16.
*/
public class XmlParserByStAX extends OutPut {
private String path;
private List areaModels = new ArrayList<>();
public XmlParserByStAX() {
}
public XmlParserByStAX(String path) {
this.path = path;
}
public List getAreaModels() {
return areaModels;
}
public void parser() {
try {
InputStream in = new FileInputStream(path);
XMLInputFactory factory = XMLInputFactory.newFactory();
XMLStreamReader reader = factory.createXMLStreamReader(in);
AreaModel province = null;
List citys = null;
List areas = null;
AreaNode city = null;
long t = System.currentTimeMillis();
areaModels = new ArrayList<>();
while (reader.hasNext()) {
int event = reader.next();
if (event == XMLStreamConstants.START_ELEMENT) {
switch (reader.getName().toString()) {
case "province":
province = new AreaModel();
province.setProvince(new AreaNode(reader.getAttributeValue(null,"name"),
Integer.valueOf(reader.getAttributeValue(null,"postcode"))));
citys = new ArrayList<>();
break;
case "city":
city = new AreaNode(reader.getAttributeValue(null,"name"),
Integer.valueOf(reader.getAttributeValue(null,"postcode")));
areas = new ArrayList<>();
break;
case "area":
areas.add(new AreaNode(reader.getAttributeValue(null,"name"),
Integer.valueOf(reader.getAttributeValue(null,"postcode"))));
break;
}
} else if (event == XMLStreamConstants.END_ELEMENT) {
switch (reader.getName().toString()) {
case "province":
province.setCitys(citys);
areaModels.add(province);
break;
case "city":
city.setChild(areas);
citys.add(city);
break;
case "area":
break;
}
} else if (event == XMLStreamConstants.END_DOCUMENT) {
out("Use StAXParser object,and use time is " + (System.currentTimeMillis() - t) + "ms");
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (XMLStreamException e) {
e.printStackTrace();
}
}
public void test() {
String str = "";
for (AreaModel a : areaModels) {
str += a.getProvince() + "\n";
for (AreaNode n : a.getCitys()) {
str += "\t" + n + "\n";
for (AreaNode j : n.getChild()) {
str += "\t\t" + j + "\n";
}
}
}
out(str);
}
}
四、AreaModel模型类源码
package model;
import java.util.List;
/**
* Created by alan on 2018/12/15.
*/
public class AreaModel {
private AreaNode province;
private List citys;
public AreaModel(){}
public AreaModel(AreaNode province, List citys) {
this.province = province;
this.citys = citys;
}
public AreaNode getProvince() {
return province;
}
public void setProvince(AreaNode province) {
this.province = province;
}
public List getCitys() {
return citys;
}
public void setCitys(List citys) {
this.citys = citys;
}
}
五、AreaNode模型类源码
package model;
import java.util.List;
/**
* Created by alan on 2018/12/15.
*/
public class AreaNode {
private String name;
private Integer postCode;
private List child;
public AreaNode() {
}
public AreaNode(String name, Integer postCode) {
this.name = name;
this.postCode = postCode;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public Integer getPostCode() {
return postCode;
}
public void setPostCode(Integer postCode) {
this.postCode = postCode;
}
public List getChild() {
return child;
}
public void setChild(List child) {
this.child = child;
}
@Override
public String toString() {
String r = "{name:\"%s\",postCode:\"%s\"}";
String str = String.format(r, this.getName(), this.getPostCode());
return str;
}
}
所有的代码都贴出来了,现在需要一个main()方法测试:
private static String path = "d:/test/area.xml";
public static void main(String[] args) {
EventQueue.invokeLater(() -> {
out("...");
XmlParserByDocument document = new XmlParserByDocument(path);
document.parser();
//the 2.
XmlParserBySAX sax = new XmlParserBySAX(path);
sax.parser();
//the 3.
XmlParserByStAX stAX = new XmlParserByStAX(path);
stAX.parser();
out(document.getAreaModels().size());
out(sax.getAreaModels().size());
out(stAX.getAreaModels().size());
// document.test();
// stAX.test();
// sax.test();
});
}
对了,把area.xml文件也分享出来:本地下载