DOM解析XML文件

最新推荐文章于 2023-04-26 16:45:15 发布

aaabc1112345

最新推荐文章于 2023-04-26 16:45:15 发布

阅读量165

点赞数

文章标签： java

原文链接：http://www.cnblogs.com/dengyuanqi/p/7641200.html

版权

DOM解析XML文件

　　dom(Document Object Model) 文档对象模型。

　　DOM中主要包括五个对象：

　　　　Document、Node、NodeList、Element、Attr下面对这五个元素一一分析：

　　1，Document对象代表了整个xml文档，xml所有的node都按一定的顺序在document对象中排列成树结构，通过遍历该对象得到我们想要的内容。通过DocumentBuilder对象的parse(File file)方法可以将xml文件解析成document对象，通过DocumentBuilderFactory的newDocumentBuilder()来获得DocumentBuilder对象，通过DocumentBuilderFactory的newInstance()方法来获得DocumentBuilderFactory对象

                        DocumentBuilderFactory dbf = DocuemntBuilderFactory.newInstance();
			DocumentBuilder db = dbf.newDocumentBuilder();
			Document doc = db.parse(new File(""));

　　除此之外，Document还可以创建其它节点的方法比如createAttribut()用来创建一个Attr对象。

                        createAttribute(String)  //用给定属性名创建一个Attr对象，并可使用setAttributeNode()方法将其放置在Element上。
			cretaeElement(String)   //用给定标签名创建一个Element对象
			createTextNode(String)   //用给定字符串创建一个Text对象
			getElementByTagName(String)   //返回NodeList对象，它包含了所有给定标签名字的标签
			getDocumentElement()    //返回这个DOM树的根节点的Element对象

　　2，Node对象是DOM结构中最为基本的对象，代表了文档树中的一个抽象的节点。主要有以下的方法：

                        appendChild(org.w3c.dom.Node)
			getFirstChild()   
			getLastChild()
			getNextSibling()  //返回在DOM树这个节点的下一个兄弟节点
			getNodeName()   //返回节点名称
			getNodeType()    //返回节点类型
			getNodeValue()  
			hasChildNodes() 
			hasAttributes()
			getOwnerDocument()  
			removeChild(Node)
			replaceChild(Node)

　　3，NodeList，Node的List,主要两个方法

                getLength()
		Item(int)

　　4，Element对象代表的是XML文档中的标签元素，继承于Node，亦是Node的最主要的子对象。在标签中可以包含有属性，因而Element对象中有存取其属性的方法，而任何Node中定义的方法，也可以用在Element对象上面。

                getChildNodes()  //返回NodeList对象
		getElementByTagNames(String)   //返回NodeList对象
		getTagName()
		getAttribute(String)
		getAttributeNode(String)

　　5，Attr Attr对象代表了某个标签中的属性

实例：

<?xml version="1.0" standalone="yes"?>
<links>
	<link>
		<text>JSP Insider</text>
		<url newWindow="no">http://www.jspinsider.com</url>
		<author>JSP Insider</author>
		<date>
			<day>2</day>
			<month>1</month>
			<year>2001</year>
		</date>
		<description>A JSP information site.</description>
	</link>
	<link>
		<text>The makers of Java</text>
		<url newWindow="no">http://java.sun.com</url>
		<author>Sun Microsystems</author>
		<date>
			<day>3</day>
			<month>1</month>
			<year>2001</year>
		</date>
		<description>Sun Microsystem's website.</description>
	</link>
	<link>
		<text>The standard JSP container</text>
		<url newWindow="no">http://jakarta.apache.org</url>
		<author>Apache Group</author>
		<date>
			<day>4</day>
			<month>1</month>
			<year>2001</year>
		</date>
		<description>Some great software.</description>
	</link>
</links>

package com.zhy.spider.bean;

import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class xmldisplay {
	public static void main(String[] args) throws SAXException, IOException, ParserConfigurationException {
		//1，构建doc对象
		//创建DocumentBuilderFactory对象
		DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
		//创建DocumentBuilder对象
		DocumentBuilder builder=factory.newDocumentBuilder();
		//将xml解析成doc对象
		Document doc=builder.parse("NewFile.xml");
		/*
		 * 调用normalize()，可以去掉XML文档中作为格式化内容的空白而映射在DOM树中的不必要的Text Node对象。
		 * 否则你得到的DOM树可能并不如你所想象的那样。特别是在输出的时候，这个normalize()更为有用。
		 */
		doc.normalize();
		NodeList links =doc.getElementsByTagName("link"); 
		for (int i=0;i<links.getLength();i++){
			Element link=(Element) links.item(i);
			System.out.print("Content: ");
			System.out.println(link.getElementsByTagName("text").item(0).getFirstChild().getNodeValue());
			System.out.print("URL: ");
			System.out.println(link.getElementsByTagName("url").item(0).getFirstChild().getNodeValue());
			System.out.print("Author: ");
			System.out.println(link.getElementsByTagName("author").item(0).getFirstChild().getNodeValue());
			System.out.print("Date: ");
			Element linkdate=(Element) link.getElementsByTagName("date").item(0);
			String day=linkdate.getElementsByTagName("day").item(0).getFirstChild().getNodeValue();
			String month=linkdate.getElementsByTagName("month").item(0).getFirstChild().getNodeValue();
			String year=linkdate.getElementsByTagName("year").item(0).getFirstChild().getNodeValue();
			System.out.println(day+"-"+month+"-"+year);
			System.out.print("Description: ");
			System.out.println(link.getElementsByTagName("description").item(0).getFirstChild().getNodeValue());
			System.out.println();
			} 
	}
}

转载于:https://www.cnblogs.com/dengyuanqi/p/7641200.html

aaabc1112345

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
DOM解析XML文件

DOM解析XML文件　　dom(Document Object Model) 文档对象模型。　　DOM中主要包括五个对象：　　　　Document、Node、NodeList、Element、Attr下面对这五个元素一一分析：　　1，Document对象代表了整个xml文档，xml所有的node都按一定的顺序在document对象中排列成树结构，通过遍历该对象得到我们想要...
复制链接

扫一扫