XML是由World WideWeb联盟(W3C)定义的元语言。它已经成为一种通用的数据交换格式,它的平台无关性,语言无关性,系统无关性,给数据集成与交互带来了极大的方便。XML在不同的语言里解析方式都是一样的,只不过实现的语法不同而已。
XML本身只是以纯文本对数据进行编码的一种格式,要想利用XML,或者说利用XML文件中所编码的数据,必须先将数据从纯文本中解析出来,因此,必须有一个能够识别XML文档中信息的解析器,用来解释XML文档并提取其中的数据。然而,根据数据提取的不同需求,又存在着多种解析方式,不同的解析方式有着各自的优缺点和适用环境。选择合适的XML解析技术能够有效提升应用系统的整体性能。
所有的XML处理都从解析开始,无论是使用XSLT或Java语言,第一步都是要读入XML文件,解码结构和检索信息等等,这就是解析,即把代表XML文档的一个无结构的字符序列转换为满足XML语法的结构化组件的过程。
XML基本的的解析方式主要有两种:SAX(Simple API for XML)和DOM(Document ObjectModel)。
SAX是基于事件流的解析。SAX处理的优点非常类似于流媒体的优点。分析能够立即开始,而不是等待所有的数据被处理。而且,由于应用程序只是在读取数据时检查数据,因此不需要将数据存储在内存中。这对于大型文档来说是个巨大的优点。事实上,应用程序甚至不必解析整个文档,它可以在某个条件得到满足时停止解析。一般来说,SAX还比它的替代者DOM快很多。SAX解析器采用了基于事件的模型,它在解析XML文档的时候可以触发一系列的事件,当发现给定的tag的时候,它可以激活一个回调方法,告诉该方法制定的标签已经找到。SAX对内存的要求通常会比较低,因为它让开发人员来决定所要处理的tag。特别是当开发人员只需要处理文档中所包含的部分数据时,SAX这种扩展能力得到了更好的体现。但用SAX解析器的时候编码工作会比较困难,而且很难同时访问同一个文档中的多处不同数据。优点:(1)、不需要等待所有数据都被处理,分析就能立即开始;(2)、只在读取数据时检查数据,不需要保存在内存中;(3)、可以在某个条件得到满足时停止解析,不必解析整个文档;(4)、效率和性能较高,能解析大于系统内存的文档。缺点:(1)、需要应用程序自己负责TAG的处理逻辑(例如维护父/子关系等),文档越复杂程序就越复杂;(2)、单向导航,无法定位文档层次,很难同时访问同一文档的不同部分数据,不支持XPath。
DOM是用与平台和语言无关的方式表示XML文档的官方W3C标准。DOM是以层次结构组织的节点或信息片段的集合。这个层次结构允许开发人员在树中寻找特定信息。分析该结构通常需要加载整个文档和构造层次结构,然后才能做任何工作。由于它是基于信息层次的,因而DOM被认为是基于树或基于对象的。优点:(1)、允许应用程序对数据和结构做出更改;(2)、访问是双向的,可以在任何时候在树中上下导航,获取和操作任意部分的数据。缺点:通常需要加载整个XML文档来构造层次结构,消耗资源大。
基于C/C++语言的XML解析库包括:
(1)、Expat:http://www.libexpat.org/ ;
(2)、die-xml:https://code.google.com/p/die-xml/;
(3)、Xerces-C++:http://xerces.apache.org/xerces-c/index.html;
(4)、TinyXml:http://www.grinninglizard.com/tinyxml/;
Xerces-C++的编译和使用:
1、 从http://xerces.apache.org/xerces-c/download.cgi#verify下载 xerces-c-3.1.1.zip 源代码,并解压缩;
2、 用vs2010打开xerces-c-3.1.1\projects\Win32\VC10\xerces-all目录下的xerces-all.sln;
3、 分别选择SolutionConfigurations、Solution Platforms中相关项,然后选中Solution ‘xerces-all’,-->单击右键,选择执行Rebuild Solution,会在/Build/Win32/VC10目录下生成相应的动态库和静态库,这里选择Static Debug/xerces-c_static_3D.lib和Static Release/xerces-c_static_3.lib进行测试;
4、在’xerces-all’工作空间的基础上新建一个TestXerces工程,选中此工程,分别在Debug和Release下,工程属性(1)、Configuration Properties -->Character Set:Use Unicode Character Set; (2)、C/C++-->General-->Additional Include Directories: ../../../../../src ,C/C++ -->Prerocessor中加入:
- _CRT_SECURE_NO_DEPRECATE
- _WINDOWS
- XERCES_STATIC_LIBRARY
- XERCES_BUILDING_LIBRARY
- XERCES_USE_TRANSCODER_WINDOWS
- XERCES_USE_MSGLOADER_INMEMORY
- XERCES_USE_NETACCESSOR_WINSOCK
- XERCES_USE_FILEMGR_WINDOWS
- XERCES_USE_MUTEXMGR_WINDOWS
- XERCES_PATH_DELIMITER_BACKSLASH
- HAVE_STRICMP
- HAVE_STRNICMP
- HAVE_LIMITS_H
- HAVE_SYS_TIMEB_H
- HAVE_FTIME
- HAVE_WCSUPR
- HAVE_WCSLWR
- HAVE_WCSICMP
- HAVE_WCSNICMP
stdafx.h:
- #pragma once
- #include "targetver.h"
- #include <stdio.h>
- #include "xercesc/util/PlatformUtils.hpp"
- #include "xercesc/util/XMLString.hpp"
- #include "xercesc/dom/DOM.hpp"
- #include "xercesc/util/OutOfMemoryException.hpp"
- #include "xercesc/util/TransService.hpp"
- #include "xercesc/parsers/SAXParser.hpp"
- #include "xercesc/sax/HandlerBase.hpp"
- #include "xercesc/framework/XMLFormatter.hpp"
stdafx.cpp:
- #include "stdafx.h"
- // TODO: reference any additional headers you need in STDAFX.H
- // and not in this file
- #ifdef _DEBUG
- #pragma comment(lib, "../../../../../Build/Win32/VC10/Static Debug/xerces-c_static_3D.lib")
- #else
- #pragma comment(lib, "../../../../../Build/Win32/VC10/Static Release/xerces-c_static_3.lib")
- #endif
TestXerces.cpp:
- #include "stdafx.h"
- #include <iostream>
- using namespace std;
- XERCES_CPP_NAMESPACE_USE
- class XStr
- {
- public :
- // -----------------------------------------------------------------------
- // Constructors and Destructor
- // -----------------------------------------------------------------------
- XStr(const char* const toTranscode)
- {
- // Call the private transcoding method
- fUnicodeForm = XMLString::transcode(toTranscode);
- }
- ~XStr()
- {
- XMLString::release(&fUnicodeForm);
- }
- // -----------------------------------------------------------------------
- // Getter methods
- // -----------------------------------------------------------------------
- const XMLCh* unicodeForm() const
- {
- return fUnicodeForm;
- }
- private :
- // -----------------------------------------------------------------------
- // Private data members
- //
- // fUnicodeForm
- // This is the Unicode XMLCh format of the string.
- // -----------------------------------------------------------------------
- XMLCh* fUnicodeForm;
- };
- #define X(str) XStr(str).unicodeForm()
- /*
- * This sample illustrates how you can create a DOM tree in memory.
- * It then prints the count of elements in the tree.
- */
- int CreateDOMDocument()
- {
- // Initialize the XML4C2 system.
- try {
- XMLPlatformUtils::Initialize();
- } catch(const XMLException& toCatch) {
- char *pMsg = XMLString::transcode(toCatch.getMessage());
- XERCES_STD_QUALIFIER cerr << "Error during Xerces-c Initialization.\n"
- << " Exception message:"
- << pMsg;
- XMLString::release(&pMsg);
- return 1;
- }
- // Watch for special case help request
- int errorCode = 0;
- /*{
- XERCES_STD_QUALIFIER cout << "\nUsage:\n"
- " CreateDOMDocument\n\n"
- "This program creates a new DOM document from scratch in memory.\n"
- "It then prints the count of elements in the tree.\n"
- << XERCES_STD_QUALIFIER endl;
- errorCode = 1;
- }*/
- if(errorCode) {
- XMLPlatformUtils::Terminate();
- return errorCode;
- }
- {
- // Nest entire test in an inner block.
- // The tree we create below is the same that the XercesDOMParser would
- // have created, except that no whitespace text nodes would be created.
- // <company>
- // <product>Xerces-C</product>
- // <category idea='great'>XML Parsing Tools</category>
- // <developedBy>Apache Software Foundation</developedBy>
- // </company>
- DOMImplementation* impl = DOMImplementationRegistry::getDOMImplementation(X("Core"));
- if (impl != NULL) {
- try {
- DOMDocument* doc = impl->createDocument(
- 0, // root element namespace URI.
- X("company"), // root element name
- 0); // document type object (DTD).
- DOMElement* rootElem = doc->getDocumentElement();
- DOMElement* prodElem = doc->createElement(X("product"));
- rootElem->appendChild(prodElem);
- DOMText* prodDataVal = doc->createTextNode(X("Xerces-C"));
- prodElem->appendChild(prodDataVal);
- DOMElement* catElem = doc->createElement(X("category"));
- rootElem->appendChild(catElem);
- catElem->setAttribute(X("idea"), X("great"));
- DOMText* catDataVal = doc->createTextNode(X("XML Parsing Tools"));
- catElem->appendChild(catDataVal);
- DOMElement* devByElem = doc->createElement(X("developedBy"));
- rootElem->appendChild(devByElem);
- DOMText* devByDataVal = doc->createTextNode(X("Apache Software Foundation"));
- devByElem->appendChild(devByDataVal);
- //
- // Now count the number of elements in the above DOM tree.
- //
- const XMLSize_t elementCount = doc->getElementsByTagName(X("*"))->getLength();
- XERCES_STD_QUALIFIER cout << "The tree just created contains: " << elementCount
- << " elements." << XERCES_STD_QUALIFIER endl;
- doc->release();
- } catch (const OutOfMemoryException&) {
- XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << XERCES_STD_QUALIFIER endl;
- errorCode = 5;
- } catch (const DOMException& e) {
- XERCES_STD_QUALIFIER cerr << "DOMException code is: " << e.code << XERCES_STD_QUALIFIER endl;
- errorCode = 2;
- } catch (...) {
- XERCES_STD_QUALIFIER cerr << "An error occurred creating the document" << XERCES_STD_QUALIFIER endl;
- errorCode = 3;
- }
- } else{// (inpl != NULL)
- XERCES_STD_QUALIFIER cerr << "Requested implementation is not supported" << XERCES_STD_QUALIFIER endl;
- errorCode = 4;
- }
- }
- XMLPlatformUtils::Terminate();
- return errorCode;
- }
- // ---------------------------------------------------------------------------
- // This is a simple class that lets us do easy (though not terribly efficient)
- // transcoding of XMLCh data to local code page for display.
- // ---------------------------------------------------------------------------
- class StrX
- {
- public :
- // -----------------------------------------------------------------------
- // Constructors and Destructor
- // -----------------------------------------------------------------------
- StrX(const XMLCh* const toTranscode)
- {
- // Call the private transcoding method
- fLocalForm = XMLString::transcode(toTranscode);
- }
- ~StrX()
- {
- XMLString::release(&fLocalForm);
- }
- // -----------------------------------------------------------------------
- // Getter methods
- // -----------------------------------------------------------------------
- const char* localForm() const
- {
- return fLocalForm;
- }
- private :
- // -----------------------------------------------------------------------
- // Private data members
- //
- // fLocalForm
- // This is the local code page form of the string.
- // -----------------------------------------------------------------------
- char* fLocalForm;
- };
- inline XERCES_STD_QUALIFIER ostream& operator<<(XERCES_STD_QUALIFIER ostream& target, const StrX& toDump)
- {
- target << toDump.localForm();
- return target;
- }
- int SAXPrint()
- {
- // ---------------------------------------------------------------------------
- // Local data
- //
- // doNamespaces
- // Indicates whether namespace processing should be enabled or not.
- // Defaults to disabled.
- //
- // doSchema
- // Indicates whether schema processing should be enabled or not.
- // Defaults to disabled.
- //
- // schemaFullChecking
- // Indicates whether full schema constraint checking should be enabled or not.
- // Defaults to disabled.
- //
- // encodingName
- // The encoding we are to output in. If not set on the command line,
- // then it is defaulted to LATIN1.
- //
- // xmlFile
- // The path to the file to parser. Set via command line.
- //
- // valScheme
- // Indicates what validation scheme to use. It defaults to 'auto', but
- // can be set via the -v= command.
- // ---------------------------------------------------------------------------
- static bool doNamespaces = false;
- static bool doSchema = false;
- static bool schemaFullChecking = false;
- static const char* encodingName = "LATIN1";
- static XMLFormatter::UnRepFlags unRepFlags = XMLFormatter::UnRep_CharRef;
- static char* xmlFile = 0;
- static SAXParser::ValSchemes valScheme = SAXParser::Val_Auto;
- // Initialize the XML4C2 system
- try {
- XMLPlatformUtils::Initialize();
- } catch (const XMLException& toCatch) {
- XERCES_STD_QUALIFIER cerr << "Error during initialization! :\n"
- << StrX(toCatch.getMessage()) << XERCES_STD_QUALIFIER endl;
- return 1;
- }
- xmlFile = "../../../../../samples/data/personal-schema.xml";
- int errorCount = 0;
- //
- // Create a SAX parser object. Then, according to what we were told on
- // the command line, set it to validate or not.
- //
- SAXParser* parser = new SAXParser;
- parser->setValidationScheme(valScheme);
- parser->setDoNamespaces(doNamespaces);
- parser->setDoSchema(doSchema);
- parser->setHandleMultipleImports (true);
- parser->setValidationSchemaFullChecking(schemaFullChecking);
- //
- // Create the handler object and install it as the document and error
- // handler for the parser-> Then parse the file and catch any exceptions
- // that propogate out
- //
- int errorCode = 0;
- try {
- //SAXPrintHandlers handler(encodingName, unRepFlags);
- //parser->setDocumentHandler(&handler);
- //parser->setErrorHandler(&handler);
- parser->parse(xmlFile);
- errorCount = parser->getErrorCount();
- } catch (const OutOfMemoryException&) {
- XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << XERCES_STD_QUALIFIER endl;
- errorCode = 5;
- } catch (const XMLException& toCatch) {
- XERCES_STD_QUALIFIER cerr << "\nAn error occurred\n Error: "
- << StrX(toCatch.getMessage())
- << "\n" << XERCES_STD_QUALIFIER endl;
- errorCode = 4;
- }
- if(errorCode) {
- XMLPlatformUtils::Terminate();
- return errorCode;
- }
- //
- // Delete the parser itself. Must be done prior to calling Terminate, below.
- //
- delete parser;
- // And call the termination method
- XMLPlatformUtils::Terminate();
- if (errorCount > 0)
- return 4;
- else
- return 0;
- return 0;
- }
- int main(int argc, char* argv[])
- {
- CreateDOMDocument();
- SAXPrint();
- cout<<"ok!"<<endl;
- return 0;
- }