充分利用 Xerces-C++,第 1 部分

这篇分为两个部分的文章对 Xerces-C++ XML 库作了介绍。第1部分解释如何将这个库链接到在 Linux 和 Windows 中编写的应用程序。大量的代码展示了用 SAX API 进行解析的情况,还有一个示例应用程序显示了如何以 ASCII 艺术的形式创建一个条形图。在第 2 部分,我将展示如何装载、操作或者合成一个 DOM 文档,您还会看到如何用可伸缩矢量图(Scalable Vector Graphics,SVG)创建同样的条形图。C++ 程序员阅读这些文章之后应该可以容易地在他们的应用程序中添加 XML 解析和处理能力。

Xerces-C++ 是一个非常健壮的 XML 解析器,它提供了验证,以及 SAX 和 DOM API。XML 验证在文档类型定义(Document Type Definition,DTD)方面有很好的支持,并且在 2001年12月增加了支持 W3C XML Schema 的基本完整的开放标准。

Xerces-C++: 简史

Xerces-C++ 的前身是 IBM 的 XML4C 项目。XML4C 和 XML4J 是两个并列的项目,而 XML4J 是 Xerces-J——Java 实现——的前身。IBM 将这两个项目的源代码让与 Apache 软件基金会(Apache Software Foundation),他们将其分别改名为 Xerces-C++ 和 Xerces-J。这两个项目是 Apache XML 组的核心项目(如果看到的是“Xerces-C”而不是“Xerces-C++”,也是同一个东西,因为这个项目一开始就是用 C(译者注:原文为C++)语言编写的)。

IBM 仍然在 Xerces-C++ 的基础上继续 XML4C 项目。从我所研究的版本来看,XML4C 与 Xerces-C++ 相比突出的好处是,它的默认安装对大量国际字符编码提供了更好的支持(见 参考资料)。

验证

指定 XML 文档资料结构的两种基本方法是 DTD 和 W3C XML Schema,其中 DTD 的历史要长得多。XML Schema 基本上就是表示为 XML 的 DTD。Xerces-C++ 提供了很好的默认安装的验证能力以保证一个 XML 文档符合一个 DTD。

许可证

Xerces-C++ 的使用需要遵守 Apache Software License (见 参考资料),它正巧是最具有可读性的开放源代码许可证之一。它可以与 BSD 许可证作一个很好的对比。实质上,不付特许使用费就可以在您(或您公司)的软件中使用 Xerces-C++,只要向客户和用户说明软件中包括 Apache 代码,并加上适当的版权说明即可。关于许可证的具体内容请参见 Web 页面。





回页首


SAX: 事件 API 模型

正如您可能知道的,SAX 是一个用于解析 XML 文档的面向事件的编程 API。一个解析引擎消耗 XML 序列数据,并在发现进来的 XML 数据的结构时回调应用程序。这些回调称为事件句柄。SAX 实际上是两个 API:SAX 1.0 是最初的,而 SAX 2.0 是当前修订过的规范。它们两个很类似,但是也有区别,因此大多数基于 SAX 1.0 的应用程序在移植到新规范后会失败。

,SAX API 规范被作为一个单独的项目移植到了 SourceForge(见 参考资料)。本文后面给出的 SAX 例子使用的是 SAX 2.0。





回页首


DOM: 文档对象模型

与 SAX 不同,DOM API 允许对 XML 文档进行编辑并保存为一个文件或者流。它还允许以编程方式从头开始构建一个新的 XML 文档。 其原因是 DOM 为文档提供了一个内存中的模型。您可以遍历文档树、删除节点或者嫁接新节点。

tech wrecks

DOM 是 W3C 技术推荐中的一员,被亲切地称之为 tech wrecks。DOM 有 3 级,第 1 级和第 2 级为完全技术推荐状态,而第 3 级为工作草案状态。

第 1 级 DOM 的核心定义了基本 XML 功能所需要的大部分内容:构建 XML 文档表示的能力。 DOMString 类型被显式指定包括宽 UTF-16 字符。第 1 级还定义了与 DOM 树不同部分以编程方式互动的接口。在第 1 级中有意去掉了 XML 的序列化。在第 1 级核心之外是第 1 级 DOM 的 HTML 定义。这部分内容试图用早期的 Dynamic HTML 对象模型解析第 1 级 DOM 的核心(不确切地称为第 0 级)。

第 2 级 DOM 增加了命名空间、事件和迭代器,以及视图和样式表的支持。一些应用程序需要第 2 级 DOM:例如,为一个命名空间指定一个 XML Schema 对于像 RDF 这样的应用程序是很重要的,在这样的应用程序中,XML 标记来自不同的架构,很有可能出现命名冲突。第 2 级对 DOMImplementation 接口增加了两个 createDocument 方法。有一个例子显示为什么它是重要的。当您认为不会再在 SAX 中发现回调和事件句柄时,它们又出现在 Event 接口中。与用于解析的 SAX 事件不同,DOM 事件可以反映出用户与文档的互动以及对使用文档的改变。反映文档结构改变的 DOM 事件称为 mutation eventsTreeWalkersNodeIterators 增强了 DOM 树的遍历。程序可以通过 StyleSheet 接口检查样式信息。最后,视图支持使 XML 应用程序可以检查原始的和经过样式表处理过的这两种形式的文档。在此之前和之后的视图分别称为 document视图和 abstract视图。

第 3 级 DOM 核心对 DOMImplementation 接口增加了 getInterface 方法。在第 3 级文档中,可以指定文档的字符编码或者设置一些基本的 XML 声明,如 versionstandalone 。第 2 级不允许将 DOM 节点从一个文档移动到另一个文档。第 3 级取消了这种限制。第 3 级增加了 user data——可以选择性地附加到任何节点上的额外的应用程序数据。第 3 级还有一些其他高级特性,但是 W3C 委员会仍然在完善第 3 级草案。参见 参考资料中的链接以了解委员会的进展。





回页首


下载和安装

可以下载压缩后的 tar 形式的 Xerces-C++,也可以下载预编译的二进制文件(见 参考资料)。通过 Perl、Python、VBScript 或者 JavaScript 访问库的脚本用户可以下载适用于他们平台的二进制文件以进行安装。C++ 程序员很可能愿意从源压缩文件编译自己的二进制文件。在 Apache XML 组 Web 站点上有很好的编译指导,在本文稍后的地方我会讨论我所发现的几个微妙的问题—— pthreads 链接问题和修复 Windows 平台上潜在的 内存泄漏 问题。第 2 部分将包括在 SVG 例子中指定 DOCTYPE 的提示。如果希望在阅读时编译库,那么要先看看 Apache 站点上的 Xerces 编译文档(见 参考资料),然后回到这里了解如何将 Xerces 链接到自己的应用程序。

可以下载 tar 形式的文件并脱机操作(比如用笔记本电脑)。tar 文件中包括了全部 HTML 文档,所以不需要回到 Web 站点去看指示。

Win32 版本上的编译

在 Visual Studio dot-NET 或者 Win64 上安装软件的步骤与在 Win32 上的编译步骤一样。

  1. 解压缩 Xerces 源 tar 文件到一个工作目录。Xerces-C++ 有自己的目录结构,所以应保证在这一步中保持相对路径名。
  2. 用 Windows 资源管理器或者习惯使用的文件管理器进入到 //xerces-c-src_2_3_0//Projects//Win32//VC6//xerces-all// 文件夹并单击 xerces-all.dsw workspace 文件以启动 Microsoft Developer Studio。
    注:这些指导假定您是在 Visual Studio 6 中编译 Win32 应用程序。对于 Visual Studio dot-NET 或者 Win64 应用程序,在 Win64 或者 VC7 各自的目录中重复步骤 1和2。
  3. 在 Developer Studio 中,让 XercesLib 成为当前活跃的项目,并按 F7 以编译 DLL。对于去年的硬件,这需要一到两分钟。
  4. 在您的项目中增加到 Xerces 头文件的路径(要链接到 Xerces-C++ 的应用程序需要在它们的工作空间中包括 XercesLib DSP 项目文件,或者在它们的项目文件中增加 LIB 文件以允许链接)。选择 Project>Settings 以调出项目设置对话框。从 Settings组合框中选择 All Configurations,单击 C++ 标签,选择 Preprocessor类别,并在 Additional include directories 文本框中添加 Xerces 包含路径(类似于 //xerces-c-sr2_2_0//src )。
  5. 如果在工作空间中添加了 XercesLib DSP,记得要将自己的项目标记为依赖于 XercesLib 项目,否则,就会得到链接错误。
  6. 编译一个 stub C++ 源文件,该文件不做任何事情,只包含一行内容用于读取 #include <xercesc/sax/HandlerBase.hpp> 。如果能够编译这个只有一行的 C++ 文件,那么您的包含路径就可能是正确的。之后保存工作空间。为了运行和调试这个应用程序,在工作目录中放入 Xerces DLL 的一个副本。

Linux 下的 编译

按照 doc/html 文件夹中的详细指导编译 Xerces-C++ 共享库。下面的命令展示了如何用压缩的源文件编译 Xerces-C++ 库。这里假定在像 /home/user 这样的目录中有 xerces-c-src_2_3_0.tar.gz 文件。不管选的是什么目录,它都应该与 XERCESCROOT 变量匹配,因为 configure 脚本有这个要求。

# cd /home/user
# gunzip xerces-c-src_2_3_0.tar.gz
# tar -xvf xerces-c-src_2_3_0.tar
# export XERCESCROOT=/home/user/xerces-c-src_2_3_0
# cd $(XERCESCROOT)/src/xercesc
# ./configure
# make all

对于本例后面的部分,我假设源树是在 /home/user/xerces-c-src_2_3_0 目录中。如果一切顺利,共享库应该出现在 lib 文件夹中。如果有问题,那么请参考 /doc/html 文件夹中的编译指导。这时,您可以将这个库(和 symlinks)拷贝到 /usr/lib ,或者定义相应的环境变量以使装载器可以找到新编译的库。

测试新库的方便方法是编译并运行一个例子:

# export XERCESCROOT=/home/user/xerces-c-src_2_3_0
# cd $(XERCESCROOT)/samples
# ./configure
# make all

我在一个全新安装的 Slackware Linux 9.0 上编译其中一个例子中遇到了一个小问题。链接器抱怨缺少与 pthread 相关的输出。我编辑了 Makefile.in 文件以包括对 -lpthread 的引用并再次运行 configure 。第二次时键入 make all 就可以了。

证明库可以工作后,就可以开始自己的 Xerces-C++ 项目了。使用 -I 编译器选项以帮助编译器找到 Xerces 头文件。用链接器选项 -L-l 以帮助链接器找到 Xerces-C++ 库。清单 1 给出了一个可以使用的最简单的 makefile 以供开始。


清单 1. 最简单的 makefile
APP = example
XERCES = /home/user/xerces-c-src_2_3_0
INCS = ${XERCES}/src
${APP} :: ${APP}.cpp
${CC} -lxerces-c-src_2_3_0 -I${INCS} ${APP}.cpp -o ${APP}

使用清单 1 的命令是 make 或者 gmake 。可以将 APP 变量改变为任何您所使用的源文件。本文中的例子使用类似的 makefiles。

Xerces C++ 从版本 2.2.0 开始增加了对 C++ 命名空间的支持(不要与 XML 命名空间搞混)。如果有工作于 2.1.0 的代码,并且希望利用新版本的好处,那么在代码中包括 Xerces C++ 头文件的后面添加下面三行。


清单 2. Xerces C++ 命名空间支持
#ifdef XERCES_CPP_NAMESPACE_USE
XERCES_CPP_NAMESPACE_USE
#endif

当然,可以仅仅对所有 Xerces-C++ 对象加上 XERCES_CPP_NAMESPACE:: namespace 前缀。





回页首


示例应用程序

为了在解释有关使用 Xerces-C++ 的基本内容时能够有趣一些,我将用XML作为数据格式创建一个简单条形图。为了避开跨平台项目中的平台 GUI 特定障碍,我用 ASCII 艺术制作这个条形图。不管怎么说,这是一篇有关 XML 而不是 GTK、OpenGL 或者 Direct-X 的文章。如果对于使用图形数据的 XML 表示有兴趣,可参见 SVG 和 SMIL (见 参考资料)。我在第 2 部分描述的 DOM 例子输出 SVG。我将以简单的文本应用程序开始。

清单 3 是数据的 DTD。下面我将构建一个程序以装载这些数据,确定使用什么比例、然后实际在屏幕上描绘这个数据。


清单 3. 示例应用程序数据的 DTD
APP = example
<?xml version="1.0" ?>
<!ELEMENT figures (PCDATA) >
<!ATTLIST figures type (sales | inventory | labor) >
<!ATTLIST figures value CDATA >
<!ELEMENT department (figures*) >
<!ATTLIST department name CDATA>
<!ELEMENT corporate (department*) >
<!ATTLIST corporate name CDATA >

清单 4 显示了数据可能的样子。


清单 4. 示例输入 XML 数据
APP = example
<?xml version="1.0" ?>
<corporate name="Big Biz">
<department name="North">
<figures type="sales" value="125000.00"/>
<figures type="inventory" value="90000.00"/>
<figures type="labor" value="110000.00">estimated</figures>
</department>
<department name="South">
<figures type="sales" value="980000.00"/>
<figures type="inventory" value="110000.00"/>
<figures type="labor" value="115000.00">estimated</figures>
</department>
<department name="East">
<figures type="sales" value="210000.00"/>
<figures type="inventory" value="80000.00"/>
<figures type="labor" value="95000.00">estimated</figures>
</department>
<department name="West">
<figures type="sales" value="160000.00"/>
<figures type="inventory" value="75000.00"/>
<figures type="labor" value="130000.00">estimated</figures>
</department>
<department name="Central">
<figures type="sales" value="723000.00"/>
<figures type="inventory" value="11000.00"/>
<figures type="labor" value="221000.00">estimated</figures>
</department>
</corporate>





回页首


SAX2 实现

清单 5是基准 SAX 实现。这不是一个完整的程序,因为它缺少句柄实现,但是它的确显示了使框架就序所需要的东西。对 XMLPlatformUtils:Initialize()XMLPlatformUtils::Terminate() 的调用非常重要。在应用程序不能正确初始化库时,库会抛出一个异常来提供保护。

为了使清单 5 中的程序成为完整的应用程序,需要添加 清单 6中的事件句柄类。SAX2 带有名为 DefaultHandler 的默认事件句柄类,它在同名的 C++ 文件中定义。默认句柄什么也不做——它只是一个 stub 实现——但是它是完整的,所以我在这里用它作为图形事件句柄类的基类。

清单 7中的这个文件是 清单 6中的事件句柄类的实际实现。虽然程序其他部分只是让 SAX2 解析器运行的样板代码,但是清单 7 中的部分定义了应用程序的个性。

Xerces-C++ 使用 XMLCh 作为 typedef'd 字符表示。在一些平台上, XMLCh 与 C类型 wchar_t 兼容,后者通常是两个——但是有时是四个——字节宽。因为这种宽度不固定的可能性,所以文档不鼓励 wchar_tXMLCh 互换的做法。在一些平台上这没问题,但是在另一些平台上就会出错。Xerces-C++ 使用更大的字符表示 UTF-16 而不是 UTF-8 或者 ISO-8859 交换文本。为了调试这个程序,我使用了 XMLString::transcode 函数来转换将在控制台上显示的宽字符字符串,如图 1 所示。


图 1. SAX 解析器输出的截屏
Screen shot of SAX parser output

我发现在 Microsoft Windows 中使用 Xerces 内部字符串类有一个问题。在 XMLString.hpp 中的注释要求调用 replicate 和其他类似的函数以释放返回的内存。问题出现在将应用程序与作为 DLL的 Xerces-C++ 库进行链接上。字符串在 DLL 的本地堆上分配。如果应用程序和 XercesLib DLL 都使用同一个 C 运行时(CRT)库 DLL,那么一切正常。但是,如果应用程序使用单线程的 CRT,而 XercesLib 使用多线程的 CRT,那么 DLL 问题就会出现。当应用程序试图释放字符串内存时,C 运行时注意到内存不是来自于应用程序的本地堆。在调试编译时它会抛出一个异常,但是在发布编译时它可能无警告地泄漏内存。在以前版本的 Xerces(如 1_5_1)的示例程序是通过不释放内存来避免这个问题。

我解决这个问题的方法是在 XMLString 类中增加两个静态放弃函数。因为字符串内存是由在 DLL 内部执行的代码释放的,所以使用的是正确的本地堆,没有调试断言结果。我高兴地看到 Xerces 开发者 Tinny Ng 将它添加到了 XMLString 类中,并更进一步设字符串指针为 null(见 参考资料)。这样做的另一个好处是程序员不需要担心 XMLString 是如何分配内存的。不需要猜测它们应该使用 delete[] 还是 free ,它们只要调用 XMLString::release 就行了。当然,也可以只是确保您的应用程序要使用的 CRT 与 XercesLib DLL 使用的 CRT 一样。

This two-part article offers an introduction to the Xerces-C++ XML library. Part 1 explains how to link the library into applications written in Linux and Windows. Ample code demonstrates parsing with the SAX API, and a sample application shows you how to create a bar graph in ASCII art. In Part 2, I'll demonstrate how to load, manipulate, or synthesize a DOM document, and you'll see how to create the same bar graph using Scalable Vector Graphics (SVG). C++ programmers who read these articles should be able to easily add XML parsing and processing capabilities to their applications.
<script language="JavaScript" type="text/javascript"> </script>

Xerces-C++ is a very robust XML parser that offers validation, plus SAX and DOM APIs. XML validation is well supported for a Document Type Definition (DTD), and essentially complete open-standards support for W3C XML Schema was added in December 2001.

Xerces-C++: a capsule bio

Xerces-C++ originated as the XML4C project at IBM. XML4C was a companion project to XML4J, which likewise was the origins of Xerces-J -- the Java implementation. IBM released the source for both projects to the Apache Software Foundation, where they were renamed Xerces-C++ and Xerces-J, respectively. These two are core projects of the Apache XML group. (If you see "Xerces-C" instead of "Xerces-C++", it's the same thing; the project was written in C++ from the start.)

The XML4C project continues at IBM, based on Xerces-C++. XML4C's distinguishing merit relative to Xerces-C++ is better out-of-the-box support for a huge number of international character encodings in the version that I explored (see Resources).

Validation

The two principal means of specifying the structure of an XML document are the DTD and W3C XML Schema, with DTD being the much older of the two. XML Schema is basically a DTD expressed as XML. Xerces-C++ offers great out-of-the-box validation capabilities for ensuring that an XML document conforms to a DTD.

Licensing

Xerces-C++ is made available under the terms of the Apache Software License (see Resources), which happens to be one of the more readable open-source licenses around. It compares very well to the BSD license. Essentially, you can use Xerces-C++ in your (or your company's) software royalty free at the mere expense of disclosing to your customers and users that your software includes Apache code, and including the proper copyright notice. Check the Web page for the exact text of the license.



Back to top


SAX: the event API model

SAX, as you may know, is an event-oriented programming API for parsing XML documents. A parsing engine consumes XML sequential data and makes callbacks into the application as it discovers the structure of the incoming XML data. These callbacks are referred to as event handlers. SAX is actually two APIs: SAX 1.0 is the original, and SAX 2.0 is the current revised specification. The two are similar, but different enough that most applications based on SAX 1.0 break when they are moved to the newer specification.

The SAX API specification was moved to SourceForge as a project of its own (see Resources). The SAX examples I give later in this article make use of SAX 2.0.



Back to top


DOM: the Document Object Model

Unlike SAX, the DOM API permits editing and saving an XML document back to a file or stream. It also permits programmatically constructing a new XML document from scratch. The reason for this is that DOM provides an in-memory model for the document. You can traverse the document tree, prune nodes, or graft on new ones.

The tech wrecks

DOM is a family of W3C technical recommendations affectionately called tech wrecks. DOM has three levels, with Levels 1 and 2 at full technical recommendation status and Level 3 at working draft status.

The DOM Level 1 Core defines most of what is needed for basic XML functionality: the ability to construct a representation of an XML document. The DOMString type is explicitly specified to consist of wide UTF-16 characters. Level 1 goes on to define the interfaces for programmatically interacting with the various pieces of a DOM tree. Serialization of XML is intentionally omitted from Level 1. Just beyond the Level 1 core is the DOM Level 1 HTML definition. This area attempts to resolve DOM Level 1 core with the earlier Dynamic HTML object model (loosely referred to as Level 0).

The DOM Level 2 adds namespaces, events, and iterators, plus view and stylesheet support. You need DOM Level 2 for some applications: For instance, assigning an XML Schema to a namespace is essential for applications like RDF, where XML tags come from different schemas and the chance for a name collision is high. Level 2 adds a pair of createDocument methods to the DOMImplementation interface. One of the examples will show why this is important. Just when you thought you were safe from the callbacks and event handlers found in SAX, here they are again in the Event interface. Unlike the SAX events, which are for parsing, DOM events can reflect user interactions with a document as well as changes to a live document. DOM events that reflect the change in the structure of a document are called mutation events. TreeWalkers and NodeIterators enhance DOM tree traversal. Programs can inspect style information through the StyleSheet interface. Finally, view support allows an XML application to examine a document in both original and stylesheet rendered forms. These before and after views are called the document and abstract views.

DOM Level 3 Core adds the getInterface method to the DOMImplementation interface. In a Level 3 document, you can specify the document's character encoding or set some of its basic XML declarations like version and standalone. Level 2 doesn't permit moving DOM nodes from one document to another. Level 3 drops this limitation. Level 3 adds user data -- extra application data that can be optionally attached to any node. Level 3 has a number of other advanced features, but the W3C committee is still working on the Level 3 drafts. Check Resources for a link to read up on the committee's progress.



Back to top


Download and install

You can download Xerces-C++ as a zipped tarball or a precompiled binary (see Resources). Script users accessing the library through Perl, Python, VBScript, or JavaScript can download the binary for their platform to get a jumpstart on installation. C++ programmers will most likely prefer to go with building their own binaries from the source tarball. The building instructions on the Apache XML group Web site are well written; a little farther on in this article I discuss a couple of subtle issues that I have discovered -- a pthreads linking problem and a fix for potential memory leaks on Windows platforms. Part 2 will include a tip for specifying a DOCTYPE in the SVG example. If you want to build the library as you read this, look at the Xerces build documentation found on the Apache site (see Resources) first and then come back here to read about linking Xerces to your own applications.

You can download the tarball and work offline (with a laptop, for example). The full HTML documentation is included in the tarball, so you don't need to keep referring back to the Web site for the instructions.

Building for Win32

The steps for installing the software on Visual Studio dot-NET or Win64 are nearly identical to these steps for building on Win32.

  1. Unzip and untar the Xerces source tarball to a working directory. Xerces-C++ has its own directory structure, so you should make sure you preserve relative path names during this step.
  2. Using Windows Explorer or your favorite file manager, drill down to the /xerces-c-src_2_3_0/Projects/Win32/VC6/xerces-all/ folder and click the xerces-all.dsw workspace file to launch Microsoft Developer Studio.
    Note: These instructions assume that you're building Win32 applications in Visual Studio 6. For Visual Studio dot-NET or Win64 applications, repeat steps 1 and 2 in the Win64 or VC7 variants of the directory.
  3. From Developer Studio, make XercesLib the current active project and press F7 to build the DLL. On last year's hardware this takes a minute or two.
  4. Add a path to the Xerces header files into your project. (Applications wanting to link against Xerces-C++ need to either include the XercesLib DSP project file in their workspace or add the LIB file in their project file to permit linking.) Select Project>Settings to bring up the project settings dialog box. Select All Configurations from the Settings combo box, click the C++ tab, select the Preprocessor category, and add the Xerces include path (something like /xerces-c-sr2_2_0/src) to the Additional include directories text box.
  5. If you have added the XercesLib DSP to your workspace, remember to mark your own project as dependent upon the XercesLib project; otherwise, you will be greeted with link errors.
  6. Create a stub C++ source file that does nothing but contain a line that reads #include <xercesc/sax/HandlerBase.hpp>. If you can compile this one-line C++ file, your include paths are probably right. Save your workspace after doing that. To run and debug your application, place a copy of the Xerces DLL in the working directory.

Building for Linux

Build the Xerces-C++ shared library by following the thorough instructions in the doc/html folder. The commands below illustrate how to build the Xerces-C++ library from the zipped source. This assumes that the xerces-c-src_2_3_0.tar.gz file is present in a directory like /home/user. Whatever directory you choose should match the XERCESCROOT variable; the configure script requires it.

# cd /home/user
# gunzip xerces-c-src_2_3_0.tar.gz
# tar -xvf xerces-c-src_2_3_0.tar
# export XERCESCROOT=/home/user/xerces-c-src_2_3_0
# cd $(XERCESCROOT)/src/xercesc
# ./configure
# make all

For the rest of this example, I'll assume the source tree is under the /home/user/xerces-c-src_2_3_0 directory. If all goes well, the shared library should appear in the lib folder. If you have problems, review the build instructions in the /doc/html folder. At this point, you can either copy the library (and symlinks) to /usr/lib or define the appropriate environment variable so that the loader can locate your newly-compiled library.

The easy way to test out your new library is to build and run one of the samples:

# export XERCESCROOT=/home/user/xerces-c-src_2_3_0
# cd $(XERCESCROOT)/samples
# ./configure
# make all

I tripped over a small problem building one of the samples on a fresh installation of Slackware Linux 9.0. The linker complained of some missing pthreads-related exports. I edited the Makefile.in file to include a reference to -lpthread and ran configure again. The second time around, typing make all worked.

Once you know the library works, you can start your own Xerces-C++ project. Use the -I compiler option to help the compiler locate the Xerces header files. Use the -L and -l linker options to help the linker locate the Xerces-C++ library. Listing 1 gives you a working minimal makefile to get started.


Listing 1. A minimal makefile
APP = example
XERCES = /home/user/xerces-c-src_2_3_0
INCS = ${XERCES}/src

${APP} :: ${APP}.cpp
	${CC} -lxerces-c-src_2_3_0 -I${INCS} ${APP}.cpp -o ${APP}

The command to kick off Listing 1 is make or gmake. You can change the APP variable to whatever source file suits you. The examples in this article use similar makefiles.

Xerces C++ added C++ namespace support (not to be confused with XML namespaces) as of Version 2.2.0. If you have code that works on 2.1.0 and you'd like to take advantage of the newer version, add the following three lines to your code, just after including the Xerces C++ headers.


Listing 2. Xerces C++ namespace support
#ifdef XERCES_CPP_NAMESPACE_USE
XERCES_CPP_NAMESPACE_USE
#endif

You could, of course, just prefix all of your Xerces-C++ objects with the XERCES_CPP_NAMESPACE:: namespace.



Back to top


The sample application

To keep things interesting as I explain the basics of using Xerces-C++, I'm going to create a simple bar graph using XML as the data format. To dodge the cross-platform bullet of platform GUI specifics, I'm doing the bar graph using ASCII art. This is, after all, an article on XML and not GTK, OpenGL, or Direct-X. If you are interested in using an XML representation of graphical data, look at SVG and SMIL (see Resources). The DOM example that I describe in Part 2 outputs SVG. I'll start with the simple text-only app.

Listing 3 is the DTD for the data. Next I'll construct a program to load the data, determine what scale to use, and then actually plot the data to the screen.


Listing 3. DTD for sample application data
APP = example
<?xml version="1.0" ?>
<!ELEMENT figures (PCDATA) >
<!ATTLIST figures type (sales | inventory | labor) >
<!ATTLIST figures value CDATA >
<!ELEMENT department (figures*) >
<!ATTLIST department name CDATA> 
<!ELEMENT corporate (department*) >
<!ATTLIST corporate name CDATA >

Listing 4 shows a sampling of what the data might look like.


Listing 4. Sample input XML data
APP = example
<?xml version="1.0" ?>
<corporate name="Big Biz">
<department name="North">
<figures type="sales" value="125000.00"/>
<figures type="inventory" value="90000.00"/>
<figures type="labor" value="110000.00">estimated</figures>
</department>
<department name="South">
<figures type="sales" value="980000.00"/>
<figures type="inventory" value="110000.00"/>
<figures type="labor" value="115000.00">estimated</figures>
</department>
<department name="East">
<figures type="sales" value="210000.00"/>
<figures type="inventory" value="80000.00"/>
<figures type="labor" value="95000.00">estimated</figures>
</department>
<department name="West">
<figures type="sales" value="160000.00"/>
<figures type="inventory" value="75000.00"/>
<figures type="labor" value="130000.00">estimated</figures>
</department>
<department name="Central">
<figures type="sales" value="723000.00"/>
<figures type="inventory" value="11000.00"/>
<figures type="labor" value="221000.00">estimated</figures>
</department>
</corporate>



Back to top


SAX2 implementation

Listing 5 is a baseline SAX implementation. This isn't a complete program because it is missing the handler implementation, but it does show what exactly is needed to put the framework into place. The calls to XMLPlatformUtils:Initialize() and XMLPlatformUtils::Terminate() are very important. The library guards against applications that fail to initialize the library properly by throwing an exception.

To make the program in Listing 5 a complete application, you need to add the event-handler class in Listing 6. SAX2 comes with a default event-handler class called DefaultHandler, defined in the C++ header file of the same name. The default handler does nothing -- it is just a stub implementation -- but it is complete, and so I'm using it here as a base class for the graphing event-handler class.

This file in Listing 7 is the actual implementation of the event-handler class in Listing 6. While the rest of the program is pretty much just boilerplate code to get the SAX2 parser running, the part in Listing 7 defines the application's personality.

Xerces-C++ uses XMLCh as a typedef'd character representation. On some platforms it is compatible with the C type wchar_t, which is usually two -- but sometimes four -- bytes wide. Because of that possibility, the docs discourage the practice of interchanging wchar_t and XMLCh. You can get away with it on some platforms, but it will break on others. Xerces-C++ uses this larger character representation to exchange text as UTF-16 as opposed to UTF-8 or ISO-8859. To debug this program, I'm using the XMLString::transcode function to convert the wide character strings for display on a console, as shown in Figure 1.


Figure 1. Screen shot of SAX parser output
Screen shot of SAX parser output

I discovered a problem using the Xerces internal string class on Microsoft Windows. The comments in XMLString.hpp require the caller of replicate and other similar functions to release the memory returned. The problem comes from linking your application against the Xerces-C++ library as a DLL. The strings are allocated from the DLL's local heap. If both your application and the XercesLib DLL use the exact same C runtime (CRT) library DLL, then all is well. If, however, your program uses the single-threaded CRT and XercesLib uses the multithreaded CRT, DLL problems happen. When your program attempts to release the string memory, the C runtime notices that the memory did not come from your application's local heap. For debug builds it throws an exception, but for release builds it may silently leak memory. The sample programs found in earlier versions of Xerces (like 1_5_1) avoided this by simply not releasing the memory.

My fix for this was to add a pair of static discard functions to the XMLString class. Because the string memory is released by code executing inside the DLL, the correct local heap is used, and no debug assertion results. I was pleased to see that Xerces developer Tinny Ng added this to the XMLString class and went a step further to null the string pointer (see Resources). The other nice feature of this is that programmers don't need to worry about how the implementation of XMLString allocates memory. Instead of guessing whether they should be using delete[] or free, they can just call XMLString::release. You can, of course, just make sure the CRT that your application expects is the same as the CRT used by the XercesLib DLL.






回页首

 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值