对XML读取一般有3种方式(DOM, SAX, StAX),JAXB作为特殊的情况后面介绍。以下内容将分别以代码的形式介绍
1. 使用org.w3c.dom.Document接口读取
使用W3C.DOM方式在调用parse的时候会生成整个的树结构,主要适用于文档相对而言小的情况,代码如下:
public void testDomXml() throws Exception {
/**
* DTD参看教程
* http://www.xmlfiles.com/dtd/
*
* 最简例子:
* DTD来源:http://cs.au.dk/~amoeller/XML/schemas/dtd-example.html
* XML来源:http://cs.au.dk/~amoeller/XML/xml/example.html
*/
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
// factory.setIgnoringElementContentWhitespace(true);
DocumentBuilder builder = factory.newDocumentBuilder();
builder.setErrorHandler(new ErrorHandler() {
@Override
public void warning(SAXParseException exception) throws SAXException {
exception.printStackTrace();
}
@Override
public void fatalError(SAXParseException exception) throws SAXException {
exception.printStackTrace();
}
@Override
public void error(SAXParseException exception) throws SAXException {
exception.printStackTrace();
}
});
Document doc = builder.parse(getClass().getResourceAsStream("1.xml"));
Element root = doc.getDocumentElement();
System.out.println(root.getTagName());
System.out.println(root.getAttribute("id"));
NodeList children = root.getChildNodes();
for (int i = 0; i < children.getLength(); i ++) {
Node node = children.item(i);
if (node instanceof Element) {
System.out.println(node.getNodeName());
Text text = (Text)node.getFirstChild();
if (text != null) {
System.out.println(text.getData());
}
NamedNodeMap attrs = node.getAttributes();
for (int j = 0; j < attrs.getLength(); j++) {
Attr attr = (Attr)attrs.item(j);
System.out.println(attr.getName());
System.out.println(attr.getValue());
}
}
}
}
- 补充点:在使用W3C.DOM的时候查找指定路径的DOM节点的时候是可以采用XPATH的,先上代码:
public void textXPath() throws Exception{
/**
* Xpath使用最简例子
* 深入点:QName的使用
*
* [XPath JAVA用法总结及代码样例] (https://my.oschina.net/cloudcoder/blog/223359)
*
*
*/
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
dbf.setIgnoringElementContentWhitespace(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(getClass().getResourceAsStream("xpath.xml"));
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
long calories = ((Double)xpath.evaluate("//collection/recipe/nutrition/@calories", doc, XPathConstants.NUMBER)).longValue();
System.out.println(calories);
}
2. 使用SAX方式读取
流式读取XML文件并不是将所有的内容都会生成树形结构,需要自己实现DefaultHandler来个性化处理(更多说明请百度)
public void testSax() throws Exception {
/**
* SAX很好的例子:http://blog.csdn.net/linminqin/article/details/6456476
*/
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
List<Book> books = new ArrayList<Book>();
sp.parse(new InputSource(new InputStreamReader(getClass().getResourceAsStream("2.xml"), "UTF-8")), new DefaultHandler(){
private String currentQname;
private Book book;
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException {
if (qName.equals("书")) {
book = new Book();
}
currentQname = qName;
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
if ("书名".equals(currentQname)) {
book.name = new String(ch, start, length);
}
if ("作者".equals(currentQname)) {
book.author = new String(ch, start, length);
}
if ("售价".equals(currentQname)) {
book.price = new String(ch, start, length);
}
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.equals("书")) {
books.add(book);
book = null;
}
currentQname = null;
}
});
for (Book b : books) {
System.out.println(b);
}
}
- 补充点1. SAX默认只会使用UTF-8读取,如果要使用其他编码方式的话就要使用InputSouce来解决乱码问题,代码如下:
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
List<Book> books = new ArrayList<Book>();
sp.parse(new InputSource(new InputStreamReader(getClass().getResourceAsStream("2.xml"), "UTF-8")), new DefaultHandler(){
private String currentQname;
private Book book;
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException {
if (qName.equals("书")) {
book = new Book();
}
currentQname = qName;
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
if ("书名".equals(currentQname)) {
book.name = new String(ch, start, length);
}
if ("作者".equals(currentQname)) {
book.author = new String(ch, start, length);
}
if ("售价".equals(currentQname)) {
book.price = new String(ch, start, length);
}
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.equals("书")) {
books.add(book);
book = null;
}
currentQname = null;
}
});
for (Book b : books) {
System.out.println(b);
}
- 补充点2. SAX读取XML,实现非常简单为读的反过程
public void testWriteXml() throws Exception {
XMLOutputFactory xof = XMLOutputFactory.newFactory();
ByteArrayOutputStream bos = new ByteArrayOutputStream();
XMLStreamWriter xsr = xof.createXMLStreamWriter(bos);
xsr.writeStartDocument("UTF-8", "1.0");
xsr.writeStartElement("A");
xsr.writeCharacters("http://localhost:8080/");
xsr.writeEndElement();
xsr.writeEndDocument();
bos.close();
System.out.println(new String(bos.toByteArray()));
}
<?xml version="1.0" encoding="UTF-8"?><A>http://localhost:8080/</A>
3. StAX方式解析
作者在书上写着这种方式比SAX好用,但是我并没有感觉出来,请看实现SAX同样功能的代码(明显感觉丑了很多)
public void testStAx() throws Exception {
XMLInputFactory xif = XMLInputFactory.newFactory();
XMLStreamReader xsr = xif.createXMLStreamReader(getClass().getResourceAsStream("2.xml"));
List<Book> books = new ArrayList<Book>();
String cLocalName = null;
for (Book book = null; xsr.hasNext(); ) {
int status = xsr.next();
if (status == XMLStreamConstants.START_ELEMENT) {
String lName = xsr.getName().getLocalPart();
if ("书".equals(lName)) {
book = new Book();
}
cLocalName = lName;
} else if (status == XMLStreamConstants.CHARACTERS) {
if ("书名".equals(cLocalName)) {
book.name = xsr.getText();
}
if ("作者".equals(cLocalName)) {
book.author = xsr.getText();
}
if ("售价".equals(cLocalName)) {
book.price = xsr.getText();
}
} else if (status == XMLStreamConstants.END_ELEMENT) {
if ("书".equals(xsr.getName().getLocalPart())) {
books.add(book);
book = null;
}
cLocalName = null;
}
}
for (Book b : books) {
System.out.println(b);
}
}
4. 使用JAXB读取和写XML
如果你的场景能使用JAXB的话,我只能说太幸运了,代码实在是太简单了,一目了然
public class JaxbTest {
@XmlRootElement
private static class Customer {
String name;
int age;
int id;
@XmlElement
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
@XmlElement
public int getAge() {
return age;
}
public void setAge(int age) {
this.age = age;
}
@XmlAttribute
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
@Override
public String toString() {
return "Customer[id=" + id + ",name=" + name + ",age=" + age + "]";
}
}
@Test
public void marshal() throws Exception {
Customer c = new Customer();
c.name = "kk";
c.id = 100;
c.age = 10;
Customer c2 = new Customer();
c2.name = "kk";
c2.id = 100;
c2.age = 10;
File file = new File("temp\\jaxb.xml");
JAXBContext context = JAXBContext.newInstance(Customer.class);
Marshaller marshaller = context.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshaller.marshal(c, file);
marshaller.marshal(c, System.out);
}
@Test
public void unmarshal() throws Exception {
JAXBContext jaxbContext = JAXBContext.newInstance(Customer.class);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
Customer cc = (Customer)unmarshaller.unmarshal(new FileReader(new File("temp\\jaxb.xml")));
System.out.println(cc);
}
}
总结
- W3CDOM和SAX方式感觉还是蛮不错的,各有侧重,使用起来也很方便,StAX并没有体现出其对于SAX的优势(至少我暂时还看出来);
- XPATH感觉在读取DOM节点的时候优势是很明显的,更多可以搜索下jsoupxpath
- 对于既读又写XML这种场景的话(特别是XML是通过对象生成这种情况下)建议采用JAXB,使用注解就可以轻松搞定.