Inside MSXML Performance(MSXML性能分析) (3)

MSXML Features

MSXML特点

Next, let's examine some important scenarios associated with the Document Object Model (DOM)—including loading, saving, walking a DOM tree, and creating a new DOM tree in memory.

接下去,让我们讨论一些在文档对象模型(DOM)中很重要的场景,包括载入,保存,遍历DOM树和在内存中创建一个新的DOM树。

DOM

The MSXML Document Object Model ("Microsoft.XMLDOM," CLSID_DOMDocument, IID_IXMLDOMDocument) is the starting point for all XML processing within the MSXML parser. The fastest way to load an XML document is to use the default "rental" threading model (which means the DOM document can be used by only one thread at a time; it doesn't matter which thread) with validateOnParse, resolveExternals, and preserveWhiteSpace all disabled:

MSXML文档对象模型("Microsoft.XMLDOM," CLSID_DOMDocument, IID_IXMLDOMDocument)MSXML解析器中所有处理XML过程的起始点。载入一个XML文档的最快的方法是使用默认的“租用”线程模式(这意味着该DOM文档同时只有一个线程能使用;但它并不介意是哪一个线程使用),必须将validateOnParse resolveExternals preserveWhiteSpace的属性设为False

    var doc = new ActiveXObject("Microsoft.XMLDOM");

    doc.validateOnParse = false;

    doc.resolveExternals = false;

    doc.preserveWhiteSpace = false;

    doc.load("test.xml");

Working Set

工作集

When using the DOM, the first metric to consider is the working set. Memory is used to load Msxml.dll and the other .dll files on which it depends. Some of these other .dll files are "delay loaded," which means the working set won't be affected until that .dll is used. MSXML is a COM DLL, so you typically use the standard COM APIs (CoInitialize and CoCreateInstance) to create a new XML document object. The minimum working set for a simple Visual C++ 6.0 command line application that uses COM is about one megabyte. (This includes the following .dll files: Ntdll.dll, Kernel32.dll, Ole32.dll, Rpcrt4.dll, Advapi32.dll, Gdi.dll, User32.dll, and Oleaut32.dll.) The first call to CoCreateInstance of an IXMLDOMDocument object loads Msxml.dll and Shlwapi.dll, which adds another 745 KB on top of this. Once all the .dll files are loaded, a new IXMLDOMDocument object is only about 8 KB.

当使用DOM时,首先要考虑的度量指标是工作集。内存中载入了Msxml.dll和其他必须的dll文件。这些dll文件中有的是延时载入的,就是说它们在没有使用之前并不影响工作集。MSXML是一个COM DLL,所以你通常使用标准COM APICoInitialize CoCreateInstance)来创建一个新的XML文档对象。对于一个简单的使用COMVisual C++6.0命令行应用程序最少的工作集是1兆字节左右。(这包含了以下dll文件:Ntdll.dllKernel32.dllOle32.dllRpcrt4.dllAdvapi32.dllGdi.dllUser32.dllOleaut32.dll。)首次调用CoCreateInstance创建IXMLDOMDocument对象时载入Msxml.dllShlwaip.dll,在前面的基础上又增加了745KB。一旦所有的dll文件载入后,新建的IXMLDocument对象只需要8KB空间。

The memory used by the XML data loaded into an XML document is anywhere from one to four times the size of the XML file on disk, depending on the "tagginess" of the data being loaded and whether the file was already in a Unicode format on disk. The following is a very rough formula for estimating the memory required for a given XML document:

内存中XML数据的大小可能是XML文件在磁盘上大小的一至四倍,这取决于载入数据的“标签比重”和它在磁盘上是否已经是Unicode编码格式的。以下是一个粗略的公式,用来估计给定的XML文档需要的内存空间大小:

ws = 32(n+t) + 12t + 50u + 2w;

The following table describes the parts of the formula:

下表介绍了公式中的各个部分:

Part
项目

Description
描述

ws

The working set in bytes.
工作集的大小(单位为字节)

n

The number of element and attribute nodes in the tree. Each element, attribute, attribute value, and text content has one node (for example, <element attribute = "value">text</element> = four nodes).
树中元素和属性节点的数量。每一个元素,属性,属性的值和文本内容都有一个节点(例如,<element attribute = "value">text</element> 共四个节点)

t

The number of text nodes.
文本节点的数量

u

The number of unique element and attribute names.
元素和属性的唯一名数量。

w

The number of Unicode characters in text content (including attribute values). Note that loading single-byte ANSI text into memory results in twice the number, because all text is stored as Unicode characters, which are two bytes each.
文本内容中Unicode字符的数量(包括属性值)。注意,将单字节的ANSI文本载入内存后会占用两倍的空间大小,因为它们会以Unicode字符存储,每个字符占用两个字节。

This assumes you do not set the preserveWhiteSpace flag; when you do, more nodes are created to preserve the white space between elements, using more memory.

以上公式是基于没有设置preserveWhiteSpace标志的情况;当你设置该标志时,会创建更多的节点来保留元素之间的空格,这样就会占用更多的内存空间。

For the sample data above, we see the following working set numbers (not including the initial startup working set):

对于前述的样品文件,以下表格显示了所需的工作空间大小(不包括工作空间初始化时的工作空间):

Sample
样品

Working set
工作空间

Ratio to file size
与磁盘文件大小的比例

Ado.xml

4,689,920

2.16

Hamlet.xml

704,512

1.25

Ot.xml

10,720,000

1.39

Northwind.xml

249,856

0.51

An element-heavy XML document containing a lot of white space between elements and stored in Unicode can actually be smaller in memory than on disk. Files that have a more balanced ratio of elements to text content, such as Hamlet.xml and Ot.xml, end up at about 1.25 to 1.5 the UCS-2 file size when in memory. Files that are very data-dense, such as Ado.xml, end up more than twice the disk-file size when loaded into memory.

一个元素比重很大,在各元素之间有很多空格并且以Unicode格式存储的XML文档可能在内存空间所需的空间比在磁盘上要少。而元素和文本内容比较平衡的文档,如Hamlet.xmlOt.xml,可能在内存中所占空间与在磁盘上以UCS-2格式占用的空间大小比为1.251.5。而那些数据密集型的文档,就像Ado.xml那样,占用的内存空间可能会是在磁盘上大小的两倍或者更多。

Megabytes Per Second

百兆字节每秒

For the megabytes-per-second metric, I loaded each sample file 10 times in a loop on a Pentium II 450-MHz dual-processor computer running Windows 2000, measured the load times, and averaged the results.

对于百兆字节每秒这个度量指标,我通过以下试验来衡量载入时间:在Pentium II 450-MHz双处理器,运行Windows 2000的计算机上,将每个样品文件循环载入10次,得到载入时间,并进行平均,结果如下表所示:

Sample
样品

Load time (milliseconds)
载入时间(单位:毫秒)

MB/second
MB/

Nodes/second
节点/

Ado.xml

677

3.2

184,909

Hamlet.xml

104

5.3

116,432

Ot.xml

1063

7.2

111,682

Northwind.xml

62

7.8

103,887

Also shown in this table is a measure of nodes per second. Notice how this correlates with megabytes per second. The more nodes processed per buffer of input data, the slower the absolute throughput. Conversely, the more compact the nodes are (as in Ado.xml), the higher the nodes per second.

在上面的表格中还显示了节点/秒的测试结果。请注意它与百兆字节每秒之间的关系。每个输入数据的缓冲区中节点数量越多,输出的绝对量就越少。相反,节点越紧凑(就像Ado.xml那样),每秒处理的节点数就越多。

Attributes vs. Elements
属性与元素

You could conclude from this that attribute-heavy formats (such as that of Ado.xml) deliver more data per second than element-heavy formats. But this should not be the reason for you to switch everything to attributes. There are many other factors to consider in the decision to use attributes versus elements.

你可以从上面得到结论:属性比重大的格式(就像Ado.xml那样)比元素比重大的格式每秒传递的数据量更大。但是这并不是要你将所有的东西都用属性来表达。在考虑使用元素还是属性时,还有很多其他的因素要斟酌。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值