XML和命名表

最新推荐文章于 2024-09-15 19:46:31 发布

cunfuxiao7305

最新推荐文章于 2024-09-15 19:46:31 发布

阅读量81

点赞数

原文链接：https://www.hanselman.com/blog/xml-and-the-nametable

版权

I got a number (~dozen) of emails about by use of the Nametable in my XmlReader post recently. Charles Cook tried it out and noticed about a 10% speedup. I also received a number of poo-poo emails that said "use XPath" or "don't bother" and "the performance is good enough."

最近，我在XmlReader帖子中通过使用Nametable收到了大约数十封电子邮件。查尔斯·库克(Charles Cook)尝试了一下，并注意到速度提高了10％。我还收到了很多一封便便邮件，它们表示“使用XPath”或“不要打扰”并且“性能足够好”。

Sure, if that works for you, that's great. Of course, always measure before you make broad statements. That said, here's a broad statement. Using an XmlReader will always be faster than the DOM and/or XmlSerializer. Always.

当然，如果这对您有用，那就太好了。当然，在做出广泛陈述之前，请务必先进行衡量。也就是说，这是一个广泛的声明。使用XmlReader总是比DOM和/或XmlSerializer快。 总是。

Why? Because what do you think is underneath the DOM and inside of XmlSerialization? An XmlReader of course.

为什么？因为您认为DOM之下和XmlSerialization内部是什么？当然是XmlReader。

For documents larger than about 50k, you're looking at least one order of magnitude faster when plucking a single value out. When grabbing dozens, it increases.

对于大于约50k的文档，当抽取单个值时，查找速度至少快一个数量级。当抓取数十个时，它会增加。

Moshe is correct in his pointing out that a nice middle-place perf-wise is the XPathReader (for a certain subset of XPath). There's a number of nice XmlReader implementations that fill the space between XmlTextReader and XPathDocument by providing more-than-XmlReader functionality:

Moshe指出正确的中间位置是XPathReader (对于XPath的某个子集)，这是正确的。有许多不错的XmlReader实现，通过提供比XmlReader更多的功能来填充XmlTextReader和XPathDocument之间的空间：

BTW, I would also point out that an XmlReader is what I call a "cursor-based pull implementation." While it's similar to the SAX parsers in that it exposes the infoset rather than the angle brackets, it's not SAX.

顺便说一句，我还要指出XmlReader是我所谓的“基于光标的请求实现”。尽管它与SAX解析器相似，但它公开了信息集而不是尖括号，但它不是SAX。

Now, all that said, what was the deal with my Nametable usage? Charles explains it well, but I will expand. You can do this if you like:

现在，所有这些都说明了我的Nametable使用情况如何处理？查尔斯讲得很好，但是我会继续扩展。如果愿意，可以执行以下操作：

XmlTextReader tr =

new XmlTextReader("http://feeds.feedburner.com/ScottHanselman");

新的XmlTextReader(“ http://feeds.feedburner.com/ScottHanselman”);

while (tr.Read())

而(tr.Read())

{

if (tr.NodeType == XmlNodeType.Element && tr.LocalName == "enclosure")

如果(tr.NodeType == XmlNodeType.Element && tr.LocalName ==“附件”)

{

while (tr.MoveToNextAttribute())

同时(tr.MoveToNextAttribute())

{

Console.WriteLine(String.Format("{0}:{1}",

Console.WriteLine(String.Format(“ {0}：{1}”，

tr.LocalName, tr.Value));

tr.LocalName，tr.Value));

}

The line in red does a string compare as you look at each element. Not a big deal, but it adds up over hundreds or thousands of executions when spinning through a large document.

当您查看每个元素时，红色的线会比较字符串。没什么大不了的，但是当浏览一个大文档时，它会增加数百或数千个执行。

The NameTable is used by XmlDocument, XmlReader(s), XPathNavigator, and XmlSchemaCollection. It's a table that maps a string to an object reference. This is called "atomization" - meaning we want to think about atom (think small). If they see "enclosure" more than once, they use the object reference rather than have n number of "enclosure" strings internally.

XmlDocument，XmlReader，XPathNavigator和XmlSchemaCollection使用NameTable。这是一个将字符串映射到对象引用的表。这称为“原子化”-意味着我们要考虑原子(认为很小)。如果他们多次看到“附件”，则使用对象引用，而不是内部使用n个“附件”字符串。

It's not exactly like a Hashtable, as the NameTable will return the object reference if the string has already been atomized.

它与哈希表不完全相同，因为如果字符串已被原子化，则NameTable将返回对象引用。

XmlTextReader tr =

new XmlTextReader("http://feeds.feedburner.com/ScottHanselman");

新的XmlTextReader(“ http://feeds.feedburner.com/ScottHanselman”);

object enclosure = tr.NameTable.Add("enclosure");

对象附件= tr.NameTable.Add(“附件”);

while (tr.Read())

而(tr.Read())

{

if (tr.NodeType == XmlNodeType.Element &&

如果(tr.NodeType == XmlNodeType.Element &&

Object.ReferenceEquals(tr.LocalName, enclosure))

Object.ReferenceEquals(tr.LocalName，外壳) )

{

while (tr.MoveToNextAttribute())

同时(tr.MoveToNextAttribute())

{

Console.WriteLine(String.Format("{0}:{1}",

Console.WriteLine(String.Format(“ {0}：{1}”，

tr.LocalName, tr.Value));

tr.LocalName，tr.Value));

}

The easiest way, IMHO, to think about it is this:

恕我直言，最简单的思考方法是：

If you know that you're going to look for an element or attribute with a specific name within any System.Xml class that has an XmlNameTable, preload or warn the parser that you'll be watching for these names.
如果您知道要在具有XmlNameTable的任何System.Xml类中查找具有特定名称的元素或属性，请预加载或警告解析器您将注意这些名称。
When you do a comparison between the current element or attribute and your target, use Object.ReferenceEquals. Instead of a string comparison, you'll just be asking "are these the same object" - which is about the fastest thing that the CLR can do.
在当前元素或属性与目标之间进行比较时，请使用Object.ReferenceEquals。而不是字符串比较，您只会问“这些是同一对象”-这是CLR可以做的最快的事情。
- Yes, you can use == rather than Object.ReferenceEquals, but the later makes it totally clear what your intent is, while the former is more vague.
  是的，您可以使用==而不是Object.ReferenceEquals，但是后者可以使您清楚地表达自己的意图，而前者则更加模糊。

This kind of optimization makes a big perf difference (~10% depending) when using an XmlReader. It makes less of one when using an XPathDocument because you are using Select(ing)Nodes in a loop.

使用XmlReader时，这种优化会使性能差异大(约10％取决于)。在使用XPathDocument时，它的用处不大，因为您正在循环中使用Select(ing)Nodes。

Stealing Charles' words: "...because it involves very little extra code it is perhaps an optimization worth making prematurely."

窃取Charles的话： “ ...因为它只包含很少的额外代码，所以可能值得过早进行优化。”

Even the designers agree: "...using the XmlNameTable gives you enough of a performance benefit to make it worthwhile especially if your processing starts to spans multiple XML components in a piplelining scenario and the XmlNameTable is shared across them i.e. XmlTextReader->XmlDocument->XslTransform."

即使设计人员也同意： “ ...使用XmlNameTable可以给您带来足够的性能优势，使其值得使用，特别是如果您的处理过程开始跨越多个XML组件，并且XmlNameTable在它们之间共享，即XmlTextReader-> XmlDocument- > XslTransform。”

Oleg laments: "...that something needs to be done to fix this particular usage pattern of XmlReader to not ignore great NameTable idea."

Oleg哀叹道： “ ...需要做一些事情来修复XmlReader的这种特定用法模式，以免忽略伟大的NameTable想法。”

Conclusion: The NameTable is there for a reason, no matter what System.Xml solution you use. This is a the correct and useful pattern and not using it is just silly. If you're going to develop a habit, why not make it a best-practice-habit?

结论：无论使用哪种System.Xml解决方案，NameTable都是有原因的。这是一个正确的和有用的模式，而不是使用它仅仅是愚蠢的。如果您要养成习惯，为什么不养成最佳习惯呢？