DOM解析轻松入门（三）--.DOM Level 2 Tranversal 和Range

最新推荐文章于 2022-09-15 15:13:37 发布

mynameisshine

最新推荐文章于 2022-09-15 15:13:37 发布

阅读量2.4k

点赞数

分类专栏： XML 文章标签： module import validation string traversal iterator

本文链接：https://blog.csdn.net/mynameisshine/article/details/2109004

版权

XML 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

/*java and xml 3nd学习笔记
DOM解析轻松入门（三）--DOM Level2 Modules
author:shine
*/
呵呵，既然题目是DOM Module，那么就先来大概看看DOM Level和Module(有个印象就行，后面会举例讲解几个常用的Moduel)
1.DOM有三个级别(Level)：简单来说就是DOM规范不断完善的层次。(括号中的是Module的名字)
1）Level 1：“details the functionality and navigation of content within a document”意思就是说：提供了在document中一组基础的核心的节点，和

文档内容的导航的细节。
2）Level 2：在Level1的基础上，增加了几个针对特殊内容的DOM模型(Module)，如：
a.DOM Level 2 Core (XML)：扩展Level1的规范，处理了基本的DOM结构，如：ELement，Attr，Document等
b.DOM Level 2 View (View)：基本上浏览器不支持View Module（略）
c.DOM Level 2 Event (Events)：定义了一组标准化的HTML页面交互浏览器事件和XML文档节点树的事件。
d.DOM Level 2 CSS (CSS)：为CSS提供了一个基于DOM Core 和DOM View规范的模型。
e.DOM Level 2 Tranversal and Range (Tranversal/Range)：定义了一组用于遍历节点和处理XML或HTML文档范围的接口。
f.DOM Level 2 HTML (HTML)：扩展DOM提供了把HTML文档结构作为XML处理的接口。
3）Level 3：在Level2的基础上，增加了两个新的Module
a.DOM Level 3 Core (XML)：提供了bootstrapping和InfoSet 机制（其中bookstrapping 后面有提，infoSet就是指在Document接口中加入了

getXmlEncoding();getXmlStandalone();getXmlVersion();等控制XML Information的方法。）
b.DOM Level 3 Load & Save (LS)：（顾名思义）
c.DOM Level 3 Validation (Validation)：定义了根据DTD或Schema验证的接口。

2.现在的浏览器中,对DOM Level 的支持都不完全，一般支持Level1 Moduel,部分Level2 Moduel,部分Level3 Moduel，遗憾的是IE是这些浏览器中对

DOM Level支持最差的一个，它只是支持Level1 Moduel,和部分Level2 Moduel,，同样我们在解析XML使用的XML Parser也没有全部实现上面所有的

Level，所以在使用XML Parser时，是不是该检验（Verify）一下它到底支持哪几个Module呢？今天第一个例子就是干这个的。
package test;

import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.DOMImplementation;
public class DOMModuleChecker {
private static Map module2Map; //DOM Level2 Level
private static Map module3Map; //DOM Level3 Level

//把DOM Module的名字放入相应Level的Map中
private static void loadModule() {
module2Map = new HashMap();
module3Map = new HashMap();

     // DOM Level 2
     module2Map.put("XML", "DOM Level 2 Core");
     module2Map.put("Views", "DOM Level 2 Views");
     module2Map.put("Events", "DOM Level 2 Events");
     module2Map.put("CSS", "DOM Level 2 CSS");
     module2Map.put("Traversal", "DOM Level 2 Traversal");
     module2Map.put("Range", "DOM Level 2 Range");
     module2Map.put("HTML", "DOM Level 2 HTML");

     // DOM Level 3
     module3Map.put("XML", "DOM Level 3 Core");
     module3Map.put("LS", "DOM Level 3 Load & Save");
     module3Map.put("Validation", "DOM Level 3 Validation");
}

//检查实现DOMImplementation接口的类实现了哪几个Module,第二个参数是此类的全名
public void checkModule(DOMImplementation domImpl,String vendorImplementationClass) {
  System.out.println("For the DOM implementation class " +vendorImplementationClass + "：");
     //对Level2支持Module的检验
     Iterator iteratorLevel2 = module2Map.keySet().iterator();
     while(iteratorLevel2.hasNext()) {
     String name = (String)iteratorLevel2.next();
     String description = (String)module2Map.get(name);
         System.out.print("The " + description + " module is ");
         if(domImpl.hasFeature(name, "2.0")) {
         System.out.println("supported");
         }
         else {
         System.out.println("not supported");
         }
     }

     //对Level3支持Module的检验
     Iterator iteratorLevel3 = module3Map.keySet().iterator();
     while(iteratorLevel3.hasNext()) {
     String name = (String)iteratorLevel3.next();
     String description = (String)module3Map.get(name);
         System.out.print("The " + description + " module is ");
         if(domImpl.hasFeature(name, "3.0")) {
         System.out.println("supported");
         }
         else {
         System.out.println("not supported");
         }
     }
}

public static void main(String[] args) {

  String vendorImplementationClass1 = "org.apache.xerces.dom.DOMImplementationImpl";//实现DOMImpolementation的类

全名
  String vendorImplementationClass2 = "javax.xml.parsers.DocumentBuilderFactory";//实现DOMImpolementation的类全名

  loadModule();
  try {
   //得到第一个实现DOMImplementation接口的类org.apache.xerces.dom.DOMImplementationImpl
   DOMImplementation domImpl1 = (DOMImplementation )Class.forName

(vendorImplementationClass1).newInstance();

   //得到第二个实现DOMImplementation接口的类javax.xml.parsers.DocumentBuilderFactory
   DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
   DocumentBuilder db = dbf.newDocumentBuilder();
   DOMImplementation domImpl2 = db.getDOMImplementation();

   //开始检验
   DOMModuleChecker checker = new DOMModuleChecker();
   checker.checkModule(domImpl1, vendorImplementationClass1);
   System.out.println("-------------------------------------------------------");
   checker.checkModule(domImpl2, vendorImplementationClass2);
  } catch (InstantiationException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } catch (IllegalAccessException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } catch (ClassNotFoundException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } catch (ParserConfigurationException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }
}
}

这个例子中通过DOMImplementation接口的hasFeature方法检查实现该类的Parser们到底都支持哪些Moduel。
运行一下，结果：
For the DOM implementation class org.apache.xerces.dom.DOMImplementationImpl：
The DOM Level 2 Traversal module is supported
The DOM Level 2 Range module is supported
The DOM Level 2 HTML module is not supported
The DOM Level 2 Views module is not supported
The DOM Level 2 CSS module is not supported
The DOM Level 2 Events module is supported
The DOM Level 2 Core module is supported
The DOM Level 3 Core module is supported
The DOM Level 3 Validation module is not supported
The DOM Level 3 Load & Save module is supported
-------------------------------------------------------
For the DOM implementation class import javax.xml.parsers.DocumentBuilderFactory：
The DOM Level 2 Traversal module is supported
The DOM Level 2 Range module is supported
The DOM Level 2 HTML module is not supported
The DOM Level 2 Views module is not supported
The DOM Level 2 CSS module is not supported
The DOM Level 2 Events module is supported
The DOM Level 2 Core module is supported
The DOM Level 3 Core module is supported
The DOM Level 3 Validation module is not supported
The DOM Level 3 Load & Save module is supported

结果显而易见，两个类对DOM Module的支持都不错。

3,第一次想到这里，我就又有一个疑问，我们能否指定“支持”呢？答案是肯定的。这就是在DOM Level3中大名鼎鼎的bootstrapping,"自举"机制。
但是有一个前提就是你本身的Parser要支持，也就是说只能从上面的结果中选啦。

.....
DOMImplementationRegistry register = DOMImplementationRegistry.newInstance();
DOMImplementation domImpl = register.getDOMImplementation("XML 3.0"); //指定支持XML3.0
domImpl.createDocument(null, "books", null);
......

4.常用DOM Module：
1）DOM Level 2 Tranversal：看到这个名字就该想到这个Module是关于DOM遍历的，这个Module是由org.w3c.dom.traversa包来实现的，其中有四

个接口：NodeIterator，DocumentTraversal，NodeFilter，TreeWalker。再看例子之前，大概了解一下他们：

a. DocumentTraversal是这个包中的核心接口（感觉上地位有点像Document）,一般实现了Document接口的类就已经实现了DocumentTraversal，所

以可以：DocumentTraversal docTra = (DocumentTraversal)doc; 还有此接口两个核心方法createNodeIterator，createTreeWalker
b.NodeIterator和TreeWalker,负责遍历DOM，只不过是以不同的形式。（例子中解释）
c.NodeFilter：过滤器

先准备一个html：
<html>
<head>
</head>
<body>
Hi ! yaoyao <h1>how are you </h1>
<p>that's fine. do you?</p>
<h2>I'm fine too.</h2>
</body>
</html>

例一：NodeIterator（遍历上面的html）
package test;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

import org.apache.xerces.parsers.DOMParser;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.traversal.NodeIterator;
import org.w3c.dom.traversal.DocumentTraversal;
import org.w3c.dom.traversal.NodeFilter;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

public class ItaratorTest {
public void iteratorNode(String xmlPath) throws SAXException, IOException {
  //解析指定路径的xml文档，并得到doc
  BufferedReader br = new BufferedReader(new FileReader(xmlPath));
  InputSource inputSource = new InputSource(br);
  DOMParser parser = new DOMParser();
  parser.parse(inputSource);
  Document doc = parser.getDocument();

  //找到body元素
  Element root = doc.getDocumentElement();
  Node body = root.getElementsByTagName("body").item(0);

  //得到NodeIterator
  NodeIterator iterator = ((DocumentTraversal)doc).createNodeIterator(body, NodeFilter.SHOW_ALL, null, true);
  Node node;
  while((node = iterator.nextNode()) != null) {
   if(node.getNodeType() == Node.ELEMENT_NODE) {
    System.out.println("ElementName："+node.getNodeName());
   }
   else if(node.getNodeType() == Node.TEXT_NODE) {
    System.out.println("TextContent："+node.getNodeValue());
   }
  }
}

public static void main(String[] args) {
  String xmlURI = "D://workplace//a.html"; //自己的html文件路径
  ItaratorTest test = new ItaratorTest();
  try {
   test.iteratorNode(xmlURI);
  } catch (SAXException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } catch (IOException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }
}
}

output：
ElementName：body
TextContent：
Hi ! yaoyao
ElementName：h1
TextContent：how are you
TextContent：

ElementName：p
TextContent：that's fine. do you?
TextContent：

ElementName：h2
TextContent：I'm fine too.
TextContent：

例子注解：
NodeIterator iterator = ((DocumentTraversal)doc).createNodeIterator(body, NodeFilter.SHOW_ALL, null, true);
中createNodeIterator有四个参数：
a. 需要遍历的节点范围。（不包括body本身）
b. 需要遍历的节点种类。在这里是show_all，还可以是show_element，show_text;
c. 使用的过滤类（这里没有，所以为null）
d. 是否扩展实体引用（一般为true）

例二：NodeFilter
先写一个过滤器：
package test;
import org.w3c.dom.Node;
import org.w3c.dom.traversal.NodeFilter;
public class IteratorFilter implements NodeFilter{

public short acceptNode(Node node) {
  Node parent;
  if(node.getNodeType() == Node.TEXT_NODE) { //对文本节点进行过滤
   parent = node.getParentNode();
   if((parent.getNodeName().equalsIgnoreCase("p")) ||
   (parent.getNodeName().equalsIgnoreCase("h2"))) {
    return NodeFilter.FILTER_ACCEPT; //接受，即：当遇到p,h2时，把他们的文本进行遍历
   }
  }
  return NodeFilter.FILTER_SKIP; //跳过，即：当遇到非p,h2节点时，不对他们的文本遍历。
}
}

然后改一下例一：
NodeIterator iterator = ((DocumentTraversal)doc).createNodeIterator(body, NodeFilter.SHOW_ALL, new IteratorFilter(), true);

改完后run一下：
output:
TextContent：that's fine. do you?
TextContent：I'm fine too.
（并没有<h1>how are you </h1>）

注意：过滤器的过滤范围，NodeFilter是在createNodeIterator方法的第三个参数范围内进行过滤的（此处是NodeFilter.SHOW_ALL）,如果改成

NodeFIlter.SHOW_ELEMENT，那么run后，不会output任何内容。

2）DOM Level 2 Range：顾名思义，肯定更DOM解析的范围有关。它是由org.w3c.dom.ranges来实现，其中有三个接口：
a. DocumentRange(地位和DocumentTranversal差不多)
b. Range
c. RangeException
比较容易看看例子就可以明白：

例三：Range (还是以上面的html文件为基础，看看如何定制Range)
package test;

import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;

import org.apache.xerces.parsers.DOMParser;
import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.ranges.DocumentRange;
import org.w3c.dom.ranges.Range;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

public class RangeTest {
public static void main(String[] args) {
  String xmlURI = "D://workplace//a.html";
  DOMParser parser = new DOMParser();
  InputSource inputSource = new InputSource(xmlURI);
  try {
   parser.parse(inputSource);
   //得到Range
   Document doc = parser.getDocument();
   Range range = ((DocumentRange)doc).createRange();
   //设置范围
   Node body = doc.getElementsByTagName("body").item(0);
   range.setStartBefore(body.getFirstChild());
   range.setEndAfter(body.getLastChild());
   //删除range中的内容：
   range.deleteContents();
   //释放range
   range.detach();
   //把内存中的改变存入文件
   RangeTest test = new RangeTest();
   test.save(doc, xmlURI);
  } catch (SAXException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } catch (IOException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }
}
//把内存中的改变写入文件，在这里不是重点，不太明白没多大影响
public void save(Document doc,String path) {
  OutputFormat format = new OutputFormat(doc);
  try {
   OutputStream os = new FileOutputStream(path);
   XMLSerializer serializer = new XMLSerializer(os,format);
   serializer.asDOMSerializer();
   serializer.serialize(doc.getDocumentElement());
  } catch (FileNotFoundException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } catch (IOException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }

}
}

//例四：（以下面类似“剪切”的例子，看看如何取得range中的内容）
......
Node body = doc.getElementsByTagName("body").item(0);
range.setStartBefore(body.getFirstChild());
range.setEndAfter(body.getLastChild());
DocumentFragment list = range.extractContents(); //获得了body中的“片段”，操作DocumentFragment和操作NodeList差不多
range.deleteContents();
......

/*
DOM解析轻松入门（四）--DOM Level 3 Load and Save 2008-2-21
*/