Inside MSXML Performance(MSXML性能分析)

Inside MSXML Performance

MSXML性能分析

Chris Lovett
Microsoft Corporation

February 21, 2000

Download the source code for this article (1.17MB)

下载本文中示例的源代码

Contents

Metrics
MSXML Features
Working Set
Megabytes Per Second
Attributes vs. Elements
First DOM Walk Working Set Delta
createNode Overhead
Walk vs. selectSingleNode
Save
Namespaces
Free-Threaded Documents
Delayed Memory Cleanup
Virtual Memory
IDispatch
Scripting
The Dreaded "//" Operator
Prune the Search Tree
Cross-Threading Models
Conclusion

目录

度量指标
MSXML
特点
工作空间
百兆字节每秒
属性与元素
第一次DOM树遍历引起的工作空间增量
提前createNode
遍历与selectSingleNode
保存
名字空间
自由线程文档
延时的内存释放
虚拟内存
IDispatch
脚本
令人担心的“//”运算符
修剪查询树
交叉线程模式
小结

I definitely got the message from your online comments that we need more "novice-level" material and some real XML applications. However, this article was already in the pipeline-and is intended for the advanced XML developer. (After all, this column is called "Extreme XML"!) That said, this article assumes you are familiar with XML and the Microsoft XML Parser (MSXML) in particular. See the MSDN XML Developer's Center for more information.

我从网上很多评论中得知,大家需要更多的是入门级的资料和一些XML的实际应用举例。但是,本文已经基本成稿并且针对的是高级XML开发人员(毕竟,本专栏的名称叫“极限XML”!)。这就是说,本文的读者应该是比较熟悉XMLMicrosoft XML解析器的。要得到更多相关信息,请查阅MSDN XML Developer's Center

So, you're designing your XML-based Web application and you need to know what kind of performance to expect from your XML server. Obviously, this depends a lot on what processing you plan to do. It is hard to generalize, because there are so many variables—such as the size of the XML documents, the amount of script code required to process the documents, the amount of output generated, and so on.

因此,你可能正在设计基于XMLWeb应用程序,而且你需要知道XML服务器的工作性能到底怎样。显然,这是由同你的处理过程密切相关。这很难概括来说,因为有太多的因素可以影响它的性能——如XML文档的大小,处理文档所使用的脚本代码的多少,产生输出的多少等等。

For example, major variables that can affect the performance of MSXML include:

例如,主要影响MSXML性能的因素有:

·                   The kind of XML data

·                   The ratio of tags to text

·                   The ratio of attributes to elements

·                   The amount of discarded white space

·                   XML数据的种类

·                   标签对文字的比例

·                   属性对元素的比例

·                   可忽略的空格的数量

To illustrate some of these variables, I'll use four sample data files. Shown below is a snippet from each file to show you what each looks like:

为了说明各个因素,在此使用4个样本数据文件。一下就是这些文件中抽取的片段示例:

Ado.xml

This sample file is a persistently saved ADO Recordset object—and is extremely attribute heavy. Each attribute value is short, with little wasted white space, making it a data-dense document.

这个样本文件被永久保存的ADO Recordset对象,它充满了属性。每一个属性的值很短,没有什么空格,是一个数据密集的文档。

<rsSchema:row au_id='267-41-2394' au_lname='O'Leary' au_fname='Michael'

    phone='408 286-2428' address='22 Cleveland Av. #14' city='San Jose' state='CA'

    zip='95128' contract='True' name='systypes' id='4' uid='1' type='S ' userstat='0'

    sysstat='113' indexdel='0' schema_ver='1' refdate='1900-01-01T00:00:00'

    crdate='1996-04-03T03:38:57.387000000' version='0' deltrig='0' instrig='0'

    updtrig='0' seltrig='0' category='0' cache='0'/>

Hamlet.xml

This sample file consists of Shakespeare's play "Hamlet." The file is a well -balanced combination of text and element markup, with no attributes.

这个文件包含了莎士比亚的剧本“哈姆雷特”。它由文字和元素标签组成,没有任何属性。

<SCENE><TITLE>SCENE I.  Elsinore. A platform before the castle.</TITLE>

<STAGEDIR>FRANCISCO at his post. Enter to him BERNARDO</STAGEDIR>

<SPEECH>

<SPEAKER>BERNARDO</SPEAKER>

<LINE>Who's there?</LINE>

</SPEECH>

Ot.xml

This sample file consists of the entire Old Testament. Each tag is only one or two characters, which reduces the tag-to-text ratio.

这个文件包含了整本旧约全书。每个标签只有一到两个字符,降低了标签对文字的比例

<book>

<bktlong>The First Book of Moses, Called GENESIS.</bktlong>

<bktshort>Genesis</bktshort>

<chapter><chtitle>Chapter 1</chtitle>

<v><vn>1</vn><p>In the beginning God created the heaven and the earth.</p></v>

...

Northwind.xml

This sample file contains a portion of the Northwind database that ships with Microsoft Access. It uses elements instead of attributes, and has a high tag-to-text ratio, and has a lot of extra white space.

本样品包含了Microsoft Access附带的Northwind数据库的一部分。它使用元素而不是属性,有很高的标签对文字比例,还有很多多余的空格。

<OrderIDs>

    <Item>

        <OrderID> 10326</OrderID>

        <OrderDate> 11/10/94</OrderDate>

        <ShipAddress> C/ Araquil, 67</ShipAddress>

    </Item>

...

Another major factor is whether the original file is stored as UCS-2. For most XML documents in English, UTF-8 is half the size of UCS-2 because the Latin characters compress down to a single byte in UTF-8. But this is not true for all languages. For some Asian languages, UTF-8 is actually larger than UCS-2, because it can expand to three bytes per character in the worst case. To be fair, the best format to use for measuring performance is UCS-2 on disk so that the numbers are more globally meaningful.

另一个主要因素是文件是否以UCS-2格式编码。由于大多数XML文档是英文的,UTF-8的大小是UCS-2的一半,因为拉丁字符在UTF-8中压缩到了一个字节。但是在对于其他语言来说并不一样。比如,对于一些亚洲语言,UTF-8UCS-2更大,因为在最坏情况下它将每个字符扩展到三个字节。为了公正起见,度量性能的最好格式应该是UCS-2,这样更适应全球化的情况。

The following table shows the UCS-2 file sizes, number of unique names, number of elements and attributes, number of text nodes, and amount of text content (in Unicode characters) for each of our sample files. It also shows a "tagginess factor," which is the ratio of element and attribute name characters to the rest of the file.

下表显示了四个样品文件的UCS-2文件大小,唯一名的数量,元素和属性的数量,文本节点的数量和文字内容的数量(Unicode字符)。它还显示了标签比重,表示元素和属性名字符对文件中其他字符的比例。

Sample
样品

File size
文件大小

Unique names
唯一名

Elements and attributes元素和属性

Text nodes
文字节点

Text content (characters)
文本内容(字符数)

Tagginess (percentage)
标签比重(百分比)

Ado.xml

2,171,812

53

63,722

61,462

3890

18.7

Hamlet.xml

559,260

17

6637

5472

170,545

5.9

Ot.xml

7,663,624

12

71,417

47,302

3,236,900

1.4

Northwind.xml

488,140

12

3680

2761

31,155

6.0

The number of unique names is interesting because MSXML "atomizes" element and attribute names, meaning it creates only one string object for each unique name and points to that object from each element or attribute that shares the same name. This is important because the names of elements and attributes are typically highly repetitive. For example, the Ado.xml sample actually contains 63,722 element and attribute names, which consume a total of 407,148 bytes of the overall file size. This is a tag-to-file size ratio of over 18 percent! But out of all these names remain only 53 unique names. So instead of using 407 KB of memory to store them, they can be stored in just a few kilobytes.

唯一名数量很有趣,因为 MSXML “原子化”了元素和属性的名字,这意味着它对于每个唯一名只创建一个字符串对象,指向有相同名字的元素和属性。这很重要,因为元素和属性名通常重复性很高。例如,在 Ado.xml 样本文件中,实际有 63,722 个元素和属性名,在整个文件中占了 407,148 字节。这里的标签对文件的比例超过了 18% !但是这些名字中只有 53 个唯一名。所以不必用 407KB 的内存来存储了,只需要很少的内存就够了。
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值