XML入门知识(二)——XML文档解析

2 篇文章 0 订阅

  在上一节XML入门知识(一)——XML语法简介当中,我们学习了XML的基本语法,这一节我们将学习XML文档的解析方法。XML解析方式分为两种:DOM方式和SAX方式。本文只介绍DOM方式。
  DOM:Document Object Model,文档对象模型。在应用程序中,基于DOM的XML分析器将一个XML文档转换成一个对象模型的集合(通常称DOM树),应用程序正是通过对这个对象模型的操作,来实现对XML文档数据的操作。通过DOM接口,应用程序可以在任何时候访问XML文档中的任何一部分数据,因此,这种利用DOM接口的机制也被称作随机访问机制。
  微软根据DOM模型接口规范给我们提供了一个XML语法解析器,即一个叫做MSXML.DLL的动态链接库,实际上它是一个COM(Component Object Model)对象库,里面封装了进行XML解析时所需要的所有对象。因为COM是一种以二进制格式出现的和语言无关的可重用对象,所以你可以用任何语言(比如VB,VC,DELPHI,C++ Builder甚至是剧本语言等等)对它进行调用,在你的应用中实现对XML文档的解析。
  由于MSXML读写XML文件需要使用XPath的知识,因此本文首先介绍XPath,然后分别讲解利用COM原生接口和智能指针包装类读写XML文档的知识。

1 XPath

1.1 概念

  XPath可以用来浏览XML文档中的元素和属性。XPath使用路径表达式去选择XML文档中节点或者节点集合,这些路径表达式和传统的电脑文件路径十分相似。XPath表达式可用于JavaScript,Java,XML Schema,PHP,Python,C和C++以及许多其他语言。
  以下面XML文档为例:

<?xml version="1.0" encoding="UTF-8"?>

<bookstore>

<book category="cooking">
  <title lang="en">Everyday Italian</title>
  <author>Giada De Laurentiis</author>
  <year>2005</year>
  <price>30.00</price>
</book>

<book category="children">
  <title lang="en">Harry Potter</title>
  <author>J K. Rowling</author>
  <year>2005</year>
  <price>29.99</price>
</book>

<book category="web">
  <title lang="en">XQuery Kick Start</title>
  <author>James McGovern</author>
  <author>Per Bothner</author>
  <author>Kurt Cagle</author>
  <author>James Linn</author>
  <author>Vaidyanathan Nagarajan</author>
  <year>2003</year>
  <price>49.99</price>
</book>

<book category="web">
  <title lang="en">Learning XML</title>
  <author>Erik T. Ray</author>
  <year>2003</year>
  <price>39.95</price>
</book>

</bookstore>

  以下表格是一些XPath表达式的含义

XPath ExpressionResult
/bookstore/book[1]Selects the first book element that is the child of the bookstore element
/bookstore/book[last()]Selects the last book element that is the child of the bookstore element
/bookstore/book[last()-1]Selects the last but one book element that is the child of the bookstore element
/bookstore/book[position() < 3]Selects the first two book elements that are children of the bookstore element
//title[@lang]Selects all the title elements that have an attribute named lang
//title[@lang=‘en’]Selects all the title elements that have a “lang” attribute with a value of “en”
/bookstore/book[price>35.00]Selects all the book elements of the bookstore element that have a price element with a value greater than 35.00
/bookstore/book[price>35.00]/titleSelects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00

1.2 XPath 节点类型和关系

  在XPath中,有七种类型的节点:元素、属性、文本、命名空间、指令、注释和根节点。XML文档被当作是节点树。最顶层的树元素被称为根节点。请看看以下XML文档:

<?xml version="1.0" encoding="UTF-8"?>

<bookstore>
  <book>
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
</bookstore>

  在上述XML文档中的节点类型包括:

<bookstore> (根节点)
J K. Rowling (元素节点)
lang=“en” (属性节点)

  节点之间的关系如下所述。

  1. 父节点:每一个元素和属性有一个父节点。在下面的例子中,book元素是title、author、year和price的父节点
  2. 子节点:元素节点可以拥有任意多个孩子节点;title、author、year和price元素是book元素的子节点。
  3. 兄弟节点:具有相同父节点的称为兄弟节点。title、author、year和price元素都属于兄弟节点。
  4. 祖先节点:一个节点的父节点,以及父节点的父节点,等等。例如title的祖先节点是book元素和bookstroe元素。
  5. 子孙节点:一个节点的孩子节点,孩子的孩子节点,等等。例如:bookstore的子孙节点是book、title、author、year和price元素。
<book>
  <title>Harry Potter</title>
  <author>J K. Rowling</author>
  <year>2005</year>
  <price>29.99</price>
</book>

1.3 XPath路径表达式语法

  XPath 使用路径表达式来选择 XML 文档中的节点或节点集。节点是按照一个或多个路径选择的。以下面XML文档为例。

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book>
  <title lang="en">Harry Potter</title>
  <price>29.99</price>
</book>
<book>
  <title lang="en">Learning XML</title>
  <price>39.95</price>
</book>
</bookstore>

一 选择节点函数

  最有用的XPath表达式如下

ExpressionDescription
nodenameSelects all nodes with the name “nodename”
/Selects from the root node
//Selects nodes in the document from the current node that match the selection no matter where they are
.Selects the current node
..Selects the parent of the current node
@Selects attributes

  下表列出了一些路径表达式的实例

Path ExpressionResult
bookstoreSelects all nodes with the name “bookstore”
/bookstoreSelects the root element bookstore(Note: If the path starts with a slash ( / ) it always represents an absolute path to an element!)
bookstore/bookSelects all book elements that are children of bookstore
//bookSelects all book elements no matter where they are in the document
bookstore//bookSelects all book elements that are descendant of the bookstore element, no matter where they are under the bookstore element
//@langSelects all attributes that are named lang

二 谓词路径

  谓词路径用来找到一个指定节点或者包含指定值的节点。谓词包含在一对方括号当中。下表中罗列了包含谓词的路径表达式以及表达式的含义。

Path ExpressionResult
/bookstore/book[1]Selects the first book element that is the child of the bookstore element.Note: In IE 5,6,7,8,9 first node is[0], but according to W3C, it is [1]. To solve this problem in IE, set the SelectionLanguage to XPath:In JavaScript: xml.setProperty(“SelectionLanguage”,“XPath”);
/bookstore/book[last()]Selects the last book element that is the child of the bookstore element
/bookstore/book[last()-1]Selects the last but one book element that is the child of the bookstore element
/bookstore/book[position() < 3]Selects the first two book elements that are children of the bookstore element
//title[@lang]Selects all the title elements that have an attribute named lang
//title[@lang=‘en’]Selects all the title elements that have a “lang” attribute with a value of “en”
/bookstore/book[price>35.00]Selects all the book elements of the bookstore element that have a price element with a value greater than 35.00
/bookstore/book[price>35.00]/titleSelects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00

三 选择未知节点

  XPath通配符可以用来选择未知节点

WildcardDescription
*Matches any element node
@*Matches any attribute node
node()Matches any node of any kind

  以下为实例

Path ExpressionResult
/bookstore/*Selects all the child element nodes of the bookstore element
//*Selects all elements in the document
//title[@*]Selects all title elements which have at least one attribute of any kind

四 选择多个路径

  通过在XPath中使用位或 | 运算符,可以同时选择多个路径。

Path ExpressionResult
//book/title | //book/priceSelects all the title AND price elements of all book elements
//title | //priceSelects all the title AND price elements in the document
/bookstore/book/title | //priceSelects all the title elements of the book element of the bookstore element AND all the price elements in the document

  XPath的介绍就到这里,更详细的请参阅:https://www.w3schools.com/XML/xpath_syntax.asp

2 COM原生接口方式读写XML

  以下代码示范了如何利用COM原生接口打开、保存、查询XML文档。请注意代码中获取的COM接口需要在使用结束时及时释放。

#include <stdio.h>
#include <windows.h>
#include <objbase.h>
#include <msxml6.h>

#include <iostream>

// Macro that calls a COM method returning HRESULT value.
#define CHK_HR(stmt)        do { hr=(stmt); if (FAILED(hr)) goto CleanUp; } while(0)

// Macro to verify memory allcation.
#define CHK_ALLOC(p)        do { if (!(p)) { hr = E_OUTOFMEMORY; goto CleanUp; } } while(0)

// Macro that releases a COM object if not NULL.
#define SAFE_RELEASE(p)     do { if ((p)) { (p)->Release(); (p) = NULL; } } while(0)\

// Helper function to create a VT_BSTR variant from a null terminated string. 
HRESULT VariantFromString(PCWSTR wszValue, VARIANT &Variant)
{
	HRESULT hr = S_OK;
	BSTR bstr = SysAllocString(wszValue);
	CHK_ALLOC(bstr);

	V_VT(&Variant) = VT_BSTR;
	V_BSTR(&Variant) = bstr;

CleanUp:
	return hr;
}

// Helper function to create a DOM instance. 
HRESULT CreateAndInitDOM(IXMLDOMDocument **ppDoc)
{
	HRESULT hr = CoCreateInstance(__uuidof(DOMDocument60), NULL, CLSCTX_INPROC_SERVER, IID_PPV_ARGS(ppDoc));
	if (SUCCEEDED(hr))
	{
		// these methods should not fail so don't inspect result
		(*ppDoc)->put_async(VARIANT_FALSE);
		(*ppDoc)->put_validateOnParse(VARIANT_FALSE);
		(*ppDoc)->put_resolveExternals(VARIANT_FALSE);
	}
	return hr;
}

//利用DOM加载XML文档
void loadDOMRaw()
{
	HRESULT hr = S_OK;
	IXMLDOMDocument *pXMLDom = NULL;
	IXMLDOMParseError *pXMLErr = NULL;

	BSTR bstrXML = NULL;
	BSTR bstrErr = NULL;
	VARIANT_BOOL varStatus;
	VARIANT varFileName;
	VariantInit(&varFileName);

	CHK_HR(CreateAndInitDOM(&pXMLDom));

	// XML file name to load
	CHK_HR(VariantFromString(L"stocks.xml", varFileName));
	CHK_HR(pXMLDom->load(varFileName, &varStatus));
	if (varStatus == VARIANT_TRUE)
	{
		CHK_HR(pXMLDom->get_xml(&bstrXML));
		printf("XML DOM loaded from stocks.xml:\n%S\n", bstrXML);
	}
	else
	{
		// Failed to load xml, get last parsing error
		CHK_HR(pXMLDom->get_parseError(&pXMLErr));
		CHK_HR(pXMLErr->get_reason(&bstrErr));
		printf("Failed to load DOM from stocks.xml. %S\n", bstrErr);
	}

CleanUp:
	SAFE_RELEASE(pXMLDom);
	SAFE_RELEASE(pXMLErr);
	SysFreeString(bstrXML);
	SysFreeString(bstrErr);
	VariantClear(&varFileName);
}

//利用DOM保存XML文档
void saveDOM()
{
	HRESULT hr = S_OK;
	IXMLDOMDocument *pXMLDom = NULL;
	IXMLDOMParseError *pXMLErr = NULL;
	BSTR bstrXML = NULL;
	BSTR bstrErr = NULL;
	VARIANT_BOOL varStatus;
	VARIANT varFileName;

	VariantInit(&varFileName);

	CHK_HR(CreateAndInitDOM(&pXMLDom));

	bstrXML = SysAllocString(L"<r>\n<t>top</t>\n<b>bottom</b>\n</r>");
	CHK_ALLOC(bstrXML);
	CHK_HR(pXMLDom->loadXML(bstrXML, &varStatus));

	if (varStatus == VARIANT_TRUE)
	{
		CHK_HR(pXMLDom->get_xml(&bstrXML));
		printf("XML DOM loaded from app:\n%S\n", bstrXML);

		VariantFromString(L"myData.xml", varFileName);
		CHK_HR(pXMLDom->save(varFileName));
		printf("XML DOM saved to myData.xml\n");
	}
	else
	{
		// Failed to load xml, get last parsing error
		CHK_HR(pXMLDom->get_parseError(&pXMLErr));
		CHK_HR(pXMLErr->get_reason(&bstrErr));
		printf("Failed to load DOM from xml string. %S\n", bstrErr);
	}

CleanUp:
	SAFE_RELEASE(pXMLDom);
	SAFE_RELEASE(pXMLErr);
	SysFreeString(bstrXML);
	SysFreeString(bstrErr);
	VariantClear(&varFileName);
}

// Helper function to display parse error.
// It returns error code of the parse error.
HRESULT ReportParseError(IXMLDOMDocument *pDoc, char *szDesc)
{
	HRESULT hr = S_OK;
	HRESULT hrRet = E_FAIL; // Default error code if failed to get from parse error.
	IXMLDOMParseError *pXMLErr = NULL;
	BSTR bstrReason = NULL;

	CHK_HR(pDoc->get_parseError(&pXMLErr));
	CHK_HR(pXMLErr->get_errorCode(&hrRet));
	CHK_HR(pXMLErr->get_reason(&bstrReason));
	printf("%s\n%S\n", szDesc, bstrReason);

CleanUp:
	SAFE_RELEASE(pXMLErr);
	SysFreeString(bstrReason);
	return hrRet;
}

//查询XML文档节点
void queryNodes()
{
	HRESULT hr = S_OK;
	IXMLDOMDocument *pXMLDom = NULL;
	IXMLDOMNodeList *pNodes = NULL;
	IXMLDOMNode *pNode = NULL;

	BSTR bstrQuery1 = NULL;
	BSTR bstrQuery2 = NULL;
	BSTR bstrNodeName = NULL;
	BSTR bstrNodeValue = NULL;
	VARIANT_BOOL varStatus;
	VARIANT varFileName;
	VariantInit(&varFileName);

	CHK_HR(CreateAndInitDOM(&pXMLDom));

	CHK_HR(VariantFromString(L"stocks.xml", varFileName));
	CHK_HR(pXMLDom->load(varFileName, &varStatus));
	if (varStatus != VARIANT_TRUE)
	{
		CHK_HR(ReportParseError(pXMLDom,(char*) "Failed to load DOM from stocks.xml."));
	}

	// Query a single node.
	//The selectSingleNode method is similar to the selectNodes method, but returns only the first matching node rather than the list of all matching nodes.
	bstrQuery1 = SysAllocString(L"//stock[1]/*");
	CHK_ALLOC(bstrQuery1);
	CHK_HR(pXMLDom->selectSingleNode(bstrQuery1, &pNode));
	if (pNode)
	{
		printf("Result from selectSingleNode:\n");
		CHK_HR(pNode->get_nodeName(&bstrNodeName));
		printf("Node, <%S>:\n", bstrNodeName);
		SysFreeString(bstrNodeName);

		CHK_HR(pNode->get_xml(&bstrNodeValue));
		printf("\t%S\n\n", bstrNodeValue);
		SysFreeString(bstrNodeValue);
		SAFE_RELEASE(pNode);
	}
	else
	{
		CHK_HR(ReportParseError(pXMLDom, (char*)"Error while calling selectSingleNode."));
	}

	// Query a node-set.
	bstrQuery2 = SysAllocString(L"//stock[1]/*");
	CHK_ALLOC(bstrQuery2);
	CHK_HR(pXMLDom->selectNodes(bstrQuery2, &pNodes));
	if (pNodes)
	{
		printf("Results from selectNodes:\n");
		//get the length of node-set
		long length;
		CHK_HR(pNodes->get_length(&length));
		for (long i = 0; i < length; i++)
		{
			CHK_HR(pNodes->get_item(i, &pNode));
			CHK_HR(pNode->get_nodeName(&bstrNodeName));
			printf("Node (%d), <%S>:\n", i, bstrNodeName);
			SysFreeString(bstrNodeName);

			CHK_HR(pNode->get_xml(&bstrNodeValue));
			printf("\t%S\n", bstrNodeValue);
			SysFreeString(bstrNodeValue);
			SAFE_RELEASE(pNode);
		}
	}
	else
	{
		CHK_HR(ReportParseError(pXMLDom,(char *) "Error while calling selectNodes."));
	}

CleanUp:
	SAFE_RELEASE(pXMLDom);
	SAFE_RELEASE(pNodes);
	SAFE_RELEASE(pNode);
	SysFreeString(bstrQuery1);
	SysFreeString(bstrQuery2);
	SysFreeString(bstrNodeName);
	SysFreeString(bstrNodeValue);
	VariantClear(&varFileName);
}

int main()
{
	HRESULT hr = CoInitialize(NULL);
	if (SUCCEEDED(hr))
	{
		loadDOMRaw();
		saveDOM();
		queryNodes();
		CoUninitialize();
	}

	return 0;
}

3 利用COM智能指针包装类读写XML

3.1 导入头文件和库

  安装MSXML后,为了在编译程序时正确解析MSXML的API调用,需要设置应用程序的工程项目配置。在Microsoft Visual C++中,必须将MSXML头文件和库导入到项目中。
  可以利用下面指令来实现:

#import <msxml6.dll>

  这个指令使Visual C++在编译时生成了msxml6.dll中包含的类型库信息,也就是在工程文件夹下创建了两个头文件,分别为msxml6.tlh和msxml6.tli。这些文件包含所需要的类型库信息,接口也被封装为智能指针的形式。
使用智能指针类有很多优点。除了自动化一些对象管理任务外,例如在接口指针上调用AddRef方法或Release方法,它们还使C/C++中的API调用约定与script或Visual Basic中的更加一致。这对经常使用这些语言的程序员很有帮助。

3.2 原生接口和智能指针包装类的区别

一 当函数包含[out ,retval]形参时

  当DOM函数有一个[out, retval] 参数时,使用智能指针和原生API函数有一定区别。下面以load函数为例进行说明。
  原生接口语法原型:

HRESULT load(
[in] VARIANT xmlSource,
[out, retval] VARIANT_BOOL *isSuccessful
);

  调用方式为:

hr = pXMLDom->load(“myData.xml”, &vbStatus);

  其中hr, pXMLDom,和vbStatus的类型分别是HRESULT, IXMLDOMDocument* 和VARIANT_BOOL。

  智能指针包装类的语法

VARIANT_BOOL load( [in] VARIANT xmlSource );

  调用方式如下:

vbStatus = pXMLDom->load(“myData.xml”);

  使用智能指针包装器时,用返回值替代了DOM函数的[out , retval]参数,其语法和script或者VB很像。

二 访问DOM属性时

  原生接口和智能指针类访问DOM属性的方式是不同的。下面以async 属性作为例子。
  在原生接口中,async的原型为

HRESULT get_async( [out, retval] VAIRANT_BOOL *isAsync );
HRESULT put_async( [in] VAIRANT_BOOL isAsync );

  调用的例子:

hr = pXMLDom->get_async(&vbAsync);
if (vbAsync == VARIANT_TRUE)
{
hr = pXMLDom->put_async(VARIANT_FALSE);
}

  其中hr, pXMLDom和vbAsync是HRESULT, IXMLDOMDocument*和VARIANT_BOOL类型。
  而在智能指针包装类中,属性作为类的成员变量

VARIANT_BOOL async

  调用的例子为:

If (pXMLDom->async == VARIANT_TRUE)
{
pXMLDom->async = VARIANT_FALSE;
}

3.3 代码实例

  下面代码包含了如何加载XML文档、保存XML文档 、查询XML中某个节点、增加XML节点等例程。和使用原生DOM接口相比,智能指针代替我们执行了COM接口的释放(release)操作,因此代码比较简洁。

// XMLSmartPointerDemo.cpp : 此文件包含 "main" 函数。程序执行将在此处开始并结束。
//

#include <stdio.h>
#include <tchar.h>
#import <msxml6.dll> //这一句引入了所需要的头文件

//该函数实现实现了以下两个功能
//创建了一个XML DOM对象(pXMLDom),并设置为同步模式(Creates an XML DOM object (pXMLDom) and sets it to synchronous mode.)
//调用pXMLDom的load方法,指定XML文档的路径(Calls the load method on pXMLDom, specifying the path to stocks.xml.)
void loadDOMsmart()
{
	MSXML2::IXMLDOMDocumentPtr pXMLDom;
	HRESULT hr = pXMLDom.CreateInstance(__uuidof(MSXML2::DOMDocument60), NULL, CLSCTX_INPROC_SERVER);
	if (FAILED(hr))
	{
		printf("Failed to instantiate an XML DOM.\n");
		return;
	}

	try
	{
		pXMLDom->async = VARIANT_FALSE;
		pXMLDom->validateOnParse = VARIANT_FALSE;
		pXMLDom->resolveExternals = VARIANT_FALSE;

		if (pXMLDom->load("stocks.xml") == VARIANT_TRUE)
		{
			printf("XML DOM loaded from stocks.xml:\n%s\n", (LPCSTR)pXMLDom->xml);
		}
		else
		{
			// Failed to load xml
			printf("Failed to load DOM from stocks.xml. %s\n",
				(LPCSTR)pXMLDom->parseError->Getreason());
		}
	}
	catch (_com_error errorObject)
	{
		printf("Exception thrown, HRESULT: 0x%08x", errorObject.Error());
	}
}


// Macro that calls a COM method returning HRESULT value.
#define CHK_HR(stmt)        do { hr=(stmt); if (FAILED(hr)) goto CleanUp; } while(0)

//如何保存一个xml文档
void saveDOMsmart()
{
	MSXML2::IXMLDOMDocumentPtr pXMLDom = NULL;
	HRESULT hr = pXMLDom.CreateInstance(__uuidof(MSXML2::DOMDocument60), NULL, CLSCTX_INPROC_SERVER);

	if (FAILED(hr))
	{
		printf("Failed to instantiate an XML DOM.\n");
		return;
	}

	try
	{
		pXMLDom->async = VARIANT_FALSE;
		pXMLDom->validateOnParse = VARIANT_FALSE;
		pXMLDom->resolveExternals = VARIANT_FALSE;

		if (pXMLDom->loadXML(L"<r>\n<t>top</t>\n<b>bottom</b>\n</r>") == VARIANT_TRUE)
		{
			printf("XML DOM loaded from app:\n%s\n", (LPCSTR)pXMLDom->xml);

			CHK_HR(pXMLDom->save(L"myData.xml"));
			printf("XML DOM saved to myData.xml.\n");
		}
		else
		{
			printf("Failed to load DOM from xml string. %s\n", (LPCSTR)pXMLDom->parseError->Getreason());
		}
	}
	catch (_com_error errorObject)
	{
		printf("Exception thrown, HRESULT: 0x%08x", errorObject.Error());
	}

CleanUp:
	return;
}

//查询一个节点
void queryNodesSmart()
{
	MSXML2::IXMLDOMDocumentPtr pXMLDom;
	HRESULT hr = pXMLDom.CreateInstance(__uuidof(MSXML2::DOMDocument60), NULL, CLSCTX_INPROC_SERVER);
	if (FAILED(hr))
	{
		printf("Failed to instantiate an XML DOM.\n");
		return;
	}

	try
	{
		pXMLDom->async = VARIANT_FALSE;
		pXMLDom->validateOnParse = VARIANT_FALSE;
		pXMLDom->resolveExternals = VARIANT_FALSE;

		if (pXMLDom->load(L"stocks.xml") != VARIANT_TRUE)
		{
			CHK_HR(pXMLDom->parseError->errorCode);
			printf("Failed to load DOM from stocks.xml.\n%s\n",
				(LPCSTR)pXMLDom->parseError->Getreason());
		}

		//This expression specifies all the child elements of the first <stock> element in the XML document.
		//In MSXML, the selectSingleNode method returns the first element of the resultant node-set, 
		//and the selectNodes method returns all the elements in the node-set.
		MSXML2::IXMLDOMNodePtr pNode = pXMLDom->selectSingleNode(L"//stock[1]/*");
		if (pNode)
		{
			printf("Result from selectSingleNode:\nNode, <%s>:\n\t%s\n\n",
				(LPCSTR)pNode->nodeName, (LPCSTR)pNode->xml);
		}
		else
		{
			printf("No node is fetched.\n");
		}

		// Query a node-set.
		MSXML2::IXMLDOMNodeListPtr pnl = pXMLDom->selectNodes(L"//stock[1]/*");
		if (pnl)
		{
			printf("Results from selectNodes:\n");
			for (long i = 0; i < pnl->length; i++)
			{
				pNode = pnl->item[i];
				printf("Node (%d), <%s>:\n\t%s\n",
					i, (LPCSTR)pNode->nodeName, (LPCSTR)pnl->item[i]->xml);
			}
		}
		else
		{
			printf("No node set is fetched.\n");
		}
	}
	catch (_com_error errorObject)
	{
		printf("Exception thrown, HRESULT: 0x%08x", errorObject.Error());
	}

CleanUp:
	return;
}

//增加一个节点
void addNodeSmart()
{
	MSXML2::IXMLDOMDocumentPtr pXMLDom;
	HRESULT hr = pXMLDom.CreateInstance(__uuidof(MSXML2::DOMDocument60), NULL, CLSCTX_INPROC_SERVER);
	if (FAILED(hr))
	{
		printf("Failed to instantiate an XML DOM.\n");
		return;
	}

	try
	{
		pXMLDom->async = VARIANT_FALSE;
		pXMLDom->validateOnParse = VARIANT_FALSE;
		pXMLDom->resolveExternals = VARIANT_FALSE;

		if (pXMLDom->load(L"stocks.xml") != VARIANT_TRUE)
		{
			CHK_HR(pXMLDom->parseError->errorCode);
			printf("Failed to load DOM from stocks.xml.\n%s\n",
				(LPCSTR)pXMLDom->parseError->Getreason());
		}

		//获取根元素
		MSXML2::IXMLDOMElementPtr pNode = pXMLDom->GetdocumentElement();

		//下面一句获取根元素下第一个stock元素,不删除备用
		//MSXML2::IXMLDOMNodePtr pNode = pXMLDom->selectSingleNode(L"//stock[1]/*");	
		if (pNode)
		{
			MSXML2::IXMLDOMElementPtr pNewElement = NULL;
			pNewElement = pXMLDom->createElement(L"NewNode");
			pNewElement->put_text(L"通过程序添加的第一个子节点");
			pNode->appendChild(pNewElement);
			pXMLDom->save(L"stocksNew.xml");	//保存才会生效
			printf("XML DOM loaded from stocksNew.xml:\n%s\n", (LPCSTR)pXMLDom->xml);
		}
		else
		{
			printf("No node is fetched.\n");
		}
	}
	catch (_com_error errorObject)
	{
		printf("Exception thrown, HRESULT: 0x%08x", errorObject.Error());
	}

CleanUp:
	return;
}

int _tmain(int argc, _TCHAR* argv[])
{
	HRESULT hr = CoInitialize(NULL);
	if (SUCCEEDED(hr))
	{
		loadDOMsmart();
		saveDOMsmart();
		queryNodesSmart();
		addNodeSmart();
		CoUninitialize();
	}

	return 0;
}

参考文献:
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms756005(v=vs.85)
https://blog.csdn.net/thanklife/article/details/119726472

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
XML格式的Word文档解析器是一种能够读取并解析XML格式的Word文档的软件工具或程序。XML(可扩展标记语言)是一种用于结构化数据存储和交换的标记语言,而Word文档是一种常见的办公文档格式,通常在Microsoft Office套件中使用。解析器是一个可以解析并提取文档中的数据和元数据的程序。 XML格式的Word文档解析器的主要功能包括以下几个方面: 1. 解析XML结构:解析器可以根据XML格式的规范解析Word文档中的标记和节点,以获得文档的结构信息。例如,解析器可以读取和解析包含段落、标题、表格、图片等元素的XML节点。 2. 提取文本和样式:解析器可以提取Word文档中的文本内容,并保留其原始的格式和样式信息。这包括字体、字号、颜色、对齐方式等文本样式的提取。 3. 处理表格和图片:解析器能够处理Word文档中的表格和图片元素。它可以提取表格的行列数据,以及图片的位置、大小和格式等信息。 4. 读取元数据:解析器可以读取Word文档中的元数据,如标题、作者、创建日期等。这些元数据可以用于文档的分类、检索和管理。 5. 导出数据:解析器可以将解析得到的文本、样式、表格数据、图片等信息导出为其他格式,如HTML、纯文本或PDF等,以方便进行处理或共享。 通过XML格式的Word文档解析器,我们可以方便地提取和处理Word文档中的内容和元数据。这在许多场景下都是非常有用的,例如数据挖掘、文档处理和自动化办公等领域。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

_Santiago

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值