java 解析document_Java开发XML解析器Document、SAXParser、XMLStreamReader详解

1.Document

接口对象是官方出的,W3C标准,作为HTML、XML实体类加载到内存中,形成文档对象,然后使用循环进行数据解析。

2.SAXParser

SAXParser是一个用于处理XML的事件驱动的“推”模型。它不是W3C标准,但它是一个得到了广泛认可的API,大多数SAXParser解析器在实现的时候都遵循标准。

SAXParser解析器不象DOM那样建立一个整个文档的树型表示,而是使用数据流的方式读取,然后根据读取文档的元素类型进行事件反馈。这些事件将会推给事件处理器,而事件处理器则提供对文档内容的访问数据包装等。

事件处理器有三种基本类型:

用于访问XML DTD内容的DTDHandler;

用于低级访问解析错误的ErrorHandler;

用于访问文档内容的最普遍类型ContentHandler。

3.XMLStreamReader(StAX)

XMLStreamReader也属于数据留解析的一种,读入文件,按线性的方式从文件头一直读到文件尾;和SAXParser一样,使用事件驱动的模型来反馈事件。不同的是,XMLStreamReader不使用SAXParser的推模型,而是使用 “拉”模型进行事件处理。而且XMLStreamReader解析器不使用回调机制,而是根据应用程序的要求返回事件。XMLStreamReader还提供了用户友好的API用于读入和写出。

尽管SAXParser向ContentHandler返回不同类型的事件,但XMLStreamReader却将它的事件返回给应用程序,甚至可以以对象的形式提供事件。

当应用程序要求一个事件时,XMLStreamReader解析器根据需要从XML文档读取并将该事件返回给该应用程序。 XMLStreamReader提供了用于创建XMLStreamReader读写器的工具,所以应用程序可以使用StAX接口而无需参考特定实现的细节。

与Document和SAXParser不同,XMLStreamReader指定了两个解析模型:指针模型,如SAXParser,它简单地返回事件;迭代程序模型,它以对象形式返回事件(这里需要吐槽一下,我个人是比较喜欢SAXParser的handler事件处理的模式,代码方面比较值观),其实XMLStreamReader也可以跟SAXParser一样,但是需要额外的对象创建开销。

以下来看看示例代码:

1.Document解析XML的基础代码:

DocumentBuilder builder = factory.newDocumentBuilder();

Document document = builder.parse(path);

Element element = document.getDocumentElement();

只需要三行代码就可以把Element对象读出来,这时候只需要遍历Element对象,就可以把数据组装出来。

2.SAXParser解析XML的基础代码

SAXParserFactory factory = SAXParserFactory.newInstance();

try {

SAXParser parser = factory.newSAXParser();

parser.parse(path, handler);

} catch (ParserConfigurationException e) {

e.printStackTrace();

} catch (SAXException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

也是三行代码,其中比较重要的是handler的事件回调,这里使用的是DefaultHandler。

3.XMLStreamReader(StAX)

InputStream in = new FileInputStream(path);

XMLInputFactory factory = XMLInputFactory.newFactory();

XMLStreamReader reader = factory.createXMLStreamReader(in);

while (reader.hasNext()) {

int event = reader.next();

if (event == XMLStreamConstants.START_ELEMENT) {

} else if (event == XMLStreamConstants.END_ELEMENT) {

} else if (event == XMLStreamConstants.END_DOCUMENT) {

out("Use StAXParser object,and use time is " + (System.currentTimeMillis() - t) + "ms");

}

}

这里使用InputStream读入文件流,然后把流数据传递给XMLStreamReader对象,接着就循环遍历,在循环中必须使用.next()返回事件类型。

以下是我测试读取全国地区(含县级)数据的测试时间:

ef4b39ba76282acbe35d8e06b00f6dd4.png

Document使用了103ms,其中SAXParser解析最快,基本上都是10~16ms之间,这取决于个人电脑,我的是比较烂的垃圾笔记本。

以下贴出读取全国XML地区数据的JAVA代码,三种方式:

一、Document

import model.AreaModel;

import model.AreaNode;

import model.CityModel;

import org.w3c.dom.*;

import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;

import javax.xml.parsers.DocumentBuilderFactory;

import javax.xml.parsers.ParserConfigurationException;

import java.io.IOException;

import java.util.ArrayList;

import java.util.List;

/**

* Document解析

* Created by alan on 2018/12/16.

*/

public class XmlParserByDocument extends OutPut {

private String path;

List areaModels = new ArrayList<>();

public XmlParserByDocument() {

}

public XmlParserByDocument(String path) {

this.path = path;

}

public List getAreaModels() {

return areaModels;

}

public void parser() {

long t = System.currentTimeMillis();

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

try {

DocumentBuilder builder = factory.newDocumentBuilder();

Document document = builder.parse(path);

Element element = document.getDocumentElement();

out("document v" + document.getXmlVersion() + " encode " + document.getInputEncoding());

if ("root".equals(element.getTagName())) {

NodeList nodeList = element.getChildNodes();

AreaModel area = null;

CityModel city;

for (int i = 0; i < nodeList.getLength(); i++) {

String nodeName = nodeList.item(i).getNodeName();

if ("province".equals(nodeName)) {

area = new AreaModel(parserNode(nodeList.item(i)), parserNodeList(nodeList.item(i).getChildNodes()));

areaModels.add(area);

}

}

out("Use Document object and use time is " + (System.currentTimeMillis() - t) + "ms.");

} else {

throw new Exception("invalid xml file.");

}

} catch (ParserConfigurationException e) {

e.printStackTrace();

} catch (SAXException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

} catch (Exception e) {

e.printStackTrace();

} finally {

}

}

public void test(){

String str = "";

for (AreaModel a : areaModels) {

str += a.getProvince() + "\n";

for (AreaNode n : a.getCitys()) {

str += "\t" + n + "\n";

for (AreaNode j : n.getChild()) {

str += "\t\t" + j + "\n";

}

}

}

out(str);

}

private List parserNodeList(NodeList list) {

List nodes = new ArrayList<>();

int l = list.getLength();

for (int i = 0; i < list.getLength(); i++) {

if (list.item(i).hasChildNodes()) {

AreaNode node = parserNode(list.item(i));

node.setChild(parserNodeList(list.item(i).getChildNodes()));

nodes.add(node);

} else {

AreaNode node = parserNode(list.item(i));

if (node != null) {

nodes.add(node);

}

}

}

return nodes;

}

private AreaNode parserNode(Node node) {

AreaNode areaNode = null;

NamedNodeMap attrs = node.getAttributes();

if (attrs != null) {

areaNode = new AreaNode(attrs.getNamedItem("name").getTextContent(), Integer.valueOf(attrs.getNamedItem("postcode").getTextContent()));

}

return areaNode;

}

}

二、SAXParser

import model.AreaModel;

import model.AreaNode;

import org.xml.sax.*;

import org.xml.sax.helpers.AttributesImpl;

import org.xml.sax.helpers.DefaultHandler;

import javax.xml.parsers.ParserConfigurationException;

import javax.xml.parsers.SAXParser;

import javax.xml.parsers.SAXParserFactory;

import java.io.IOException;

import java.util.ArrayList;

import java.util.List;

/**

* Stream解析bySAX

* Created by alan on 2018/12/16.

*/

public class XmlParserBySAX extends OutPut {

private String path = "d:/test/area.xml";

private List areaModels;

public XmlParserBySAX() {

}

public XmlParserBySAX(String path) {

this.path = path;

}

public List getAreaModels() {

return areaModels;

}

public void parser() {

SAXParserFactory factory = SAXParserFactory.newInstance();

try {

SAXParser parser = factory.newSAXParser();

parser.parse(path, handler);

} catch (ParserConfigurationException e) {

e.printStackTrace();

} catch (SAXException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

}

public void test(){

String str = "";

for (AreaModel a : areaModels) {

str += a.getProvince() + "\n";

for (AreaNode n : a.getCitys()) {

str += "\t" + n + "\n";

for (AreaNode j : n.getChild()) {

str += "\t\t" + j + "\n";

}

}

}

out(str);

}

private long t = 0;

private DefaultHandler handler = new DefaultHandler() {

private AreaModel province;

private List citys;

private List areas;

private AreaNode city;

@Override

public void startDocument() throws SAXException {

areaModels = new ArrayList<>();

t = System.currentTimeMillis();

// out("start....");

}

@Override

public void endDocument() throws SAXException {

out("Use SAXParser object,and use time is " + (System.currentTimeMillis() - t) + "ms");

}

@Override

public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {

switch (qName) {

case "province":

province = new AreaModel();

province.setProvince(new AreaNode(attributes.getValue("name"), Integer.valueOf(attributes.getValue("postcode"))));

citys = new ArrayList<>();

break;

case "city":

city = new AreaNode(attributes.getValue("name"), Integer.valueOf(attributes.getValue("postcode")));

areas = new ArrayList<>();

break;

case "area":

areas.add(new AreaNode(attributes.getValue("name"), Integer.valueOf(attributes.getValue("postcode"))));

break;

}

}

@Override

public void endElement(String uri, String localName, String qName) throws SAXException {

switch (qName) {

case "province":

province.setCitys(citys);

areaModels.add(province);

break;

case "city":

city.setChild(areas);

citys.add(city);

break;

case "area":

break;

}

}

};

}

三、XMLStreamReader(StAX)

import com.sun.org.apache.bcel.internal.generic.BREAKPOINT;

import model.AreaModel;

import model.AreaNode;

import javax.xml.stream.XMLInputFactory;

import javax.xml.stream.XMLStreamConstants;

import javax.xml.stream.XMLStreamException;

import javax.xml.stream.XMLStreamReader;

import java.io.FileInputStream;

import java.io.FileNotFoundException;

import java.io.InputStream;

import java.util.ArrayList;

import java.util.List;

/**

* 拉解析器解析

* Created by alan on 2018/12/16.

*/

public class XmlParserByStAX extends OutPut {

private String path;

private List areaModels = new ArrayList<>();

public XmlParserByStAX() {

}

public XmlParserByStAX(String path) {

this.path = path;

}

public List getAreaModels() {

return areaModels;

}

public void parser() {

try {

InputStream in = new FileInputStream(path);

XMLInputFactory factory = XMLInputFactory.newFactory();

XMLStreamReader reader = factory.createXMLStreamReader(in);

AreaModel province = null;

List citys = null;

List areas = null;

AreaNode city = null;

long t = System.currentTimeMillis();

areaModels = new ArrayList<>();

while (reader.hasNext()) {

int event = reader.next();

if (event == XMLStreamConstants.START_ELEMENT) {

switch (reader.getName().toString()) {

case "province":

province = new AreaModel();

province.setProvince(new AreaNode(reader.getAttributeValue(null,"name"),

Integer.valueOf(reader.getAttributeValue(null,"postcode"))));

citys = new ArrayList<>();

break;

case "city":

city = new AreaNode(reader.getAttributeValue(null,"name"),

Integer.valueOf(reader.getAttributeValue(null,"postcode")));

areas = new ArrayList<>();

break;

case "area":

areas.add(new AreaNode(reader.getAttributeValue(null,"name"),

Integer.valueOf(reader.getAttributeValue(null,"postcode"))));

break;

}

} else if (event == XMLStreamConstants.END_ELEMENT) {

switch (reader.getName().toString()) {

case "province":

province.setCitys(citys);

areaModels.add(province);

break;

case "city":

city.setChild(areas);

citys.add(city);

break;

case "area":

break;

}

} else if (event == XMLStreamConstants.END_DOCUMENT) {

out("Use StAXParser object,and use time is " + (System.currentTimeMillis() - t) + "ms");

}

}

} catch (FileNotFoundException e) {

e.printStackTrace();

} catch (XMLStreamException e) {

e.printStackTrace();

}

}

public void test() {

String str = "";

for (AreaModel a : areaModels) {

str += a.getProvince() + "\n";

for (AreaNode n : a.getCitys()) {

str += "\t" + n + "\n";

for (AreaNode j : n.getChild()) {

str += "\t\t" + j + "\n";

}

}

}

out(str);

}

}

四、AreaModel模型类源码

package model;

import java.util.List;

/**

* Created by alan on 2018/12/15.

*/

public class AreaModel {

private AreaNode province;

private List citys;

public AreaModel(){}

public AreaModel(AreaNode province, List citys) {

this.province = province;

this.citys = citys;

}

public AreaNode getProvince() {

return province;

}

public void setProvince(AreaNode province) {

this.province = province;

}

public List getCitys() {

return citys;

}

public void setCitys(List citys) {

this.citys = citys;

}

}

五、AreaNode模型类源码

package model;

import java.util.List;

/**

* Created by alan on 2018/12/15.

*/

public class AreaNode {

private String name;

private Integer postCode;

private List child;

public AreaNode() {

}

public AreaNode(String name, Integer postCode) {

this.name = name;

this.postCode = postCode;

}

public String getName() {

return name;

}

public void setName(String name) {

this.name = name;

}

public Integer getPostCode() {

return postCode;

}

public void setPostCode(Integer postCode) {

this.postCode = postCode;

}

public List getChild() {

return child;

}

public void setChild(List child) {

this.child = child;

}

@Override

public String toString() {

String r = "{name:\"%s\",postCode:\"%s\"}";

String str = String.format(r, this.getName(), this.getPostCode());

return str;

}

}

所有的代码都贴出来了,现在需要一个main()方法测试:

private static String path = "d:/test/area.xml";

public static void main(String[] args) {

EventQueue.invokeLater(() -> {

out("...");

XmlParserByDocument document = new XmlParserByDocument(path);

document.parser();

//the 2.

XmlParserBySAX sax = new XmlParserBySAX(path);

sax.parser();

//the 3.

XmlParserByStAX stAX = new XmlParserByStAX(path);

stAX.parser();

out(document.getAreaModels().size());

out(sax.getAreaModels().size());

out(stAX.getAreaModels().size());

// document.test();

// stAX.test();

// sax.test();

});

}

对了,把area.xml文件也分享出来:本地下载

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值