Java：XML的四种解析方式 —— DOM解析

最新推荐文章于 2024-08-26 16:17:28 发布

Re__CODE

最新推荐文章于 2024-08-26 16:17:28 发布

阅读量466

点赞数

分类专栏： Java Web 文章标签： java

本文链接：https://blog.csdn.net/qq_44668555/article/details/107458328

版权

Java Web 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

XML概念

XML通常用于数据存储，配置的数据，XML文件的解析将这些数据读取出来。
XML在不同的语言环境中解析方式都是一样的，只不过实现的语法不同。
HTML 用于页面的展示

<?XML version="1.0" encoding="utf-8"?>
version — 版本号 encoding — 编码字符集

我们下面查询时使用的 department.xml 文件：

<?xml version="1.0" encoding="UTF-8" standalone="no"?><department>
    <student id="110">
        <name>lili</name>
        <age>100</age>
    </student>
    <student id="220">
        <name>Mary</name>
        <age>20</age>
    </student>
    <student id="330">
        <name>Terry</name>
        <age>20</age>
    </student>
</department>

从该文档中信息可得：

rootElement --------- 根节点 --------- department student ---------
标签节点 --------- Element id = “110” --------- 属性节点 --------- Attribute 简写为Attr（类似于Map的结构）
lili --------- 文本节点 --------- Text
department.xml --------- 文档 --------- Document
【以上节点在JavaAPI中都具备相同的父类Node】

XML的解析方式有以下四种：

DOM解析
SAX解析
JDOM解析
DOM4J解析
Xpath（应用在半成品的框架中和安卓的领域使用的比较多）

一、DOM解析原理

全称为：Document Object Model 文档对象模型

1.1 DOM 随机访问 — 探究内部构造

	@Test
    public void test2(){
        Document doc = DomUtils.getDocumentInstance(url);
        Element root = doc.getDocumentElement();

        //思路：通过循环去遍历整个文档
        //如何进行随机访问

        NodeList students = root.getElementsByTagName("student");
        System.out.println(students.getLength());

        for (int i = 0; i < students.getLength(); i++) {
            Node student =students.item(i);

            // //查看是student的id值
            Element stu1 = (Element) student;
            String result1 = stu1.getAttribute("id");

            //查看名字
            System.out.println(student.getNodeName()+"     "+student.getNodeValue());
            //查看其结点的值---为null（可知name与lili并不是附属值的关系）
            Node name = student.getFirstChild().getNextSibling();
            System.out.println(name.getNodeName()+"       "+name.getNodeValue());
            //方法一：调用其孩子结点 --- 输出lili
            Node value = name.getFirstChild();
            System.out.println(value.getNodeName()+"       "+value.getNodeValue());
            //方法二：不输出孩子结点，而是输出context值
            System.out.println(value.getNodeName()+"       "+value.getTextContent());

            //查看年龄
            Node age = name.getNextSibling().getNextSibling();
            System.out.println(age.getNodeName()+"       "+age.getTextContent());

        }
    }

若student的属性有两个 ------ id值，和sex

    @Test
    public void test3() throws ParserConfigurationException, IOException, SAXException {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

        DocumentBuilder builder = factory.newDocumentBuilder();
        //文档对象已经在内存中
        Document doc = builder.parse(url);
        Element root = doc.getDocumentElement();

        Node stuu = root.getFirstChild().getNextSibling();

        //表示属性节点
        NamedNodeMap nnm = stuu.getAttributes();
        System.out.println(nnm.getLength());
        for (int i = 0; i < nnm.getLength(); i++) {
            Node node = nnm.item(i);
            System.out.println(node.getNodeName()+"      "+node.getNodeValue());
        }
    }

文档中的结点可变为一个倒挂的树形结构，如图：
在这里插入图片描述

File ------ 把文件读取在内存中
org.w3c.dom ----- 包用于描述 ----- DOM节点的类型
Document 接口表示整个HTML或XML文档，从概念上讲，是文档树的根，并提供对文档数据的基本访问。

1.2 DOM连接

连接步骤：
【拿到Document对象】

创建DocumentBuilderFactory 对象，文档解析工厂
依据DocumentBuilderFactory对象，创建DocumentBuilder文档解析器
Document 通过文档解析器获取parse方法来解析文档

    private static DocumentBuilder builder = null;
    private static DocumentBuilderFactory factory = null;
    private static Document doc = null;
    private static Element element = null;
    
    public static Document getDocumentInstance(String filename){
        Document doc = null;
        try {
        	//第一步：创建DocumentBuilderFactory对象，文档解析工厂
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            //通过创建的DocumentBuilderFactory对象，获取DocumentBuilderFactory 文档解析器
            DocumentBuilder builder = factory.newDocumentBuilder();
            //文档对象已经在内存中
            doc = builder.parse(filename);
        } catch (ParserConfigurationException e) {
            e.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return doc;
    }

1.3 遍历访问结点

遍历访问结点步骤：

优先输出该节点以及其内容
判断节点类型
分为attribute与Element 和 Text两种遍历方式
并把 #text 进行过滤

    public static  void visit(Node start){
        System.out.println(start.getNodeName()+"..."+start.getNodeValue());
        if (start.getNodeType() == Node.ELEMENT_NODE){
            //attribute
            NamedNodeMap nnm = start.getAttributes();
            for (int i = 0; i < nnm.getLength(); i++) {
                Node attr = nnm.item(i);

                System.out.println(attr.getNodeName()+"..."+attr.getNodeValue());
            }
            //Element 和 Text
            for (Node sub = start.getFirstChild(); sub!=null;sub = sub.getNextSibling()){
                if(sub.getNodeName() != "#text"){
                    visit(sub);
                }
            }
        }
    }