我应该在C ++中使用哪种XML解析器? [关闭]

本文翻译自:What XML parser should I use in C++? [closed]

I have XML documents that I need to parse and/or I need to build XML documents and write them to text (either files or memory). 我有需要解析的XML文档和/或需要构建XML文档并将其写入文本(文件或内存)。 Since the C++ standard library does not have a library for this, what should I use? 由于C ++标准库没有为此提供的库,我应该使用什么?

Note: This is intended to be a definitive, C++-FAQ-style question for this. 注意:这旨在作为对此的权威性C ++-FAQ式问题。 So yes, it is a duplicate of others. 是的,它是其他人的副本。 I did not simply appropriate those other questions because they tended to ask for something slightly more specific. 我并没有简单地提出其他问题,因为它们倾向于要求一些更具体的内容。 This question is more generic. 这个问题比较笼统。


#1楼

参考:https://stackoom.com/question/dO94/我应该在C-中使用哪种XML解析器-关闭


#2楼

Put mine as well. 放我的。

http://www.codeproject.com/Articles/998388/XMLplusplus-version-The-Cplusplus-update-of-my-XML http://www.codeproject.com/Articles/998388/XMLplusplus-version-The-Cplusplus-update-of-my-XML

No XML validation features, but fast. 没有XML验证功能,但速度很快。


#3楼

In Secured Globe , Inc. we use rapidxml . Secured Globe ,Inc.中,我们使用Rapidxml We tried all the others but rapidxml seems to be the best choice for us. 我们尝试了所有其他方法,但是Rapidxml似乎是我们的最佳选择。

Here is an example: 这是一个例子:

 rapidxml::xml_document<char> doc;
    doc.parse<0>(xmlData);
    rapidxml::xml_node<char>* root = doc.first_node();

    rapidxml::xml_node<char>* node_account = 0;
    if (GetNodeByElementName(root, "Account", &node_account) == true)
    {
        rapidxml::xml_node<char>* node_default = 0;
        if (GetNodeByElementName(node_account, "default", &node_default) == true)
        {
            swprintf(result, 100, L"%hs", node_default->value());
            free(xmlData);
            return true;
        }
    }
    free(xmlData);

#4楼

One other note about Expat: it's worth looking at for embedded systems work. 关于Expat的另一个注意事项:嵌入式系统的工作值得一看。 However, the documentation you are likely to find on the web is ancient and wrong. 但是,您可能会在网上找到的文档是古老而错误的。 The source code actually has fairly thorough function-level comments, but it will take some perusing for them to make sense. 源代码实际上具有相当全面的功能级注释,但是要使它们有意义,将需要一些细读。


#5楼

Ok then. 好吧。 I've created new one, since none of the list wasn't statisfy my needs. 我创建了一个新的列表,因为没有一个列表不能满足我的需求。

Benefits: 好处:

  1. Pull-parser Streaming API on the low level ( Java StAX like ) 底层的Pull-parser Streaming API( 类似于Java StAX
  2. Exceptions and RTTI modes of supported 支持的异常和RTTI模式
  3. Limit for memory usage, support for large files (tested with 100 mib XMark file from, speed depends on hardware) 内存使用限制,支持大文件(使用100 mib XMark文件进行测试,速度取决于硬件)
  4. UNICODE support, and auto-detecting for input source encoding 支持UNICODE,并自动检测输入源编码
  5. High level API for reading into structures/POCO 用于读取结构/ POCO的高级API
  6. Meta-programming API for writing and generating XSD from structures/POCO with support for xml structure (attributes and nesting tags) (XSD generation need RTTI, but can be used only on debug to make it once) 元编程API,用于从结构/ POCO编写和生成XSD,并支持xml结构(属性和嵌套标签)(XSD生成需要RTTI,但只能在调试时使用一次)
  7. C++ 11 - GCC and VC++ 15+ C ++ 11-GCC和VC ++ 15+

Disadvantages: 缺点:

  1. DTD and XSD validation not yet provided 尚未提供DTD和XSD验证
  2. Obtaining XML/XSD by HTTP/HTTPS in progress, not yet done 正在通过HTTP / HTTPS获取XML / XSD,尚未完成
  3. New library 新图书馆

Project home 项目首页


#6楼

Just like with standard library containers, what library you should use depends on your needs. 就像标准库容器一样,应使用哪种库取决于您的需求。 Here's a convenient flowchart: 这是一个方便的流程图:

在此处输入图片说明

So the first question is this: What do you need? 所以第一个问题是: 您需要什么?

I Need Full XML Compliance 我需要完全符合XML

OK, so you need to process XML. 好的,因此您需要处理XML。 Not toy XML, real XML. 不是玩具XML,而是真正的 XML。 You need to be able to read and write all of the XML specification, not just the low-lying, easy-to-parse bits. 您需要能够读写所有 XML规范,而不仅仅是低层的,易于解析的位。 You need Namespaces, DocTypes, entity substitution, the works. 您需要命名空间,DocType,实体替换,工程。 The W3C XML Specification, in its entirety. 完整的W3C XML规范。

The next question is: Does your API need to conform to DOM or SAX? 下一个问题是: 您的API是否需要符合DOM或SAX?

I Need Exact DOM and/or SAX Conformance 我需要精确的DOM和/或SAX一致性

OK, so you really need the API to be DOM and/or SAX. 好的,因此您确实需要API为DOM和/或SAX。 It can't just be a SAX-style push parser, or a DOM-style retained parser. 它不仅可以是SAX样式的推式解析器,也可以是DOM样式的保留解析器。 It must be the actual DOM or the actual SAX, to the extent that C++ allows. 在C ++允许的范围内,它必须是实际的DOM或实际的SAX。

You have chosen: 你已经选择:

Xerces Xerces

That's your choice. 那是你的选择。 It's pretty much the only C++ XML parser/writer that has full (or as near as C++ allows) DOM and SAX conformance. 它几乎是唯一具有完全(或尽可能接近C ++允许)DOM和SAX一致性的C ++ XML解析器/编写器。 It also has XInclude support, XML Schema support, and a plethora of other features. 它还具有XInclude支持,XML Schema支持以及许多其他功能。

It has no real dependencies. 它没有真正的依赖性。 It uses the Apache license. 它使用Apache许可证。

I Don't Care About DOM and/or SAX Conformance 我不在乎DOM和/或SAX一致性

You have chosen: 你已经选择:

LibXML2 LibXML2

LibXML2 offers a C-style interface (if that really bothers you, go use Xerces), though the interface is at least somewhat object-based and easily wrapped. LibXML2提供了C样式的接口(如果确实让您感到困扰,请使用Xerces),尽管该接口至少在某种程度上是基于对象的,并且易于包装。 It provides a lot of features, like XInclude support (with callbacks so that you can tell it where it gets the file from), an XPath 1.0 recognizer, RelaxNG and Schematron support (though the error messages leave a lot to be desired), and so forth. 它提供了许多功能,例如XInclude支持(带有回调,以便您可以告诉它从何处获取文件),XPath 1.0识别器,RelaxNG和Schematron支持(尽管错误消息还有很多不足之处),以及等等。

It does have a dependency on iconv, but it can be configured without that dependency. 它确实对iconv有依赖性,但是可以在没有该依赖性的情况下进行配置。 Though that does mean that you'll have a more limited set of possible text encodings it can parse. 虽然这确实意味着您可以解析的文本编码可能会更加有限。

It uses the MIT license. 它使用MIT许可证。

I Do Not Need Full XML Compliance 我不需要完全符合XML

OK, so full XML compliance doesn't matter to you. 好的,因此完全符合XML对您而言无关紧要。 Your XML documents are either fully under your control or are guaranteed to use the "basic subset" of XML: no namespaces, entities, etc. 您的XML文档完全处于您的控制之下,或者保证使用XML的“基本子集”:没有名称空间,实体等。

So what does matter to you? 那对你有什么关系呢? The next question is: What is the most important thing to you in your XML work? 下一个问题是: 在您的XML工作中,最重要的什么?

Maximum XML Parsing Performance 最高的XML解析性能

Your application needs to take XML and turn it into C++ datastructures as fast as this conversion can possibly happen. 您的应用程序需要采用XML并将其尽快转换为C ++数据结构。

You have chosen: 你已经选择:

RapidXML RapidXML

This XML parser is exactly what it says on the tin: rapid XML. 这个XML解析器恰好就是它所说的:快速XML。 It doesn't even deal with pulling the file into memory; 它甚至不处理将文件拉入内存的过程。 how that happens is up to you. 如何进行取决于您。 What it does deal with is parsing that into a series of C++ data structures that you can access. 它所处理的是将其解析为可以访问的一系列C ++数据结构。 And it does this about as fast as it takes to scan the file byte by byte. 它的执行速度与逐字节扫描文件的速度一样快。

Of course, there's no such thing as a free lunch. 当然,没有免费的午餐。 Like most XML parsers that don't care about the XML specification, Rapid XML doesn't touch namespaces, DocTypes, entities (with the exception of character entities and the 6 basic XML ones), and so forth. 像大多数不关心XML规范的XML解析器一样,Rapid XML不会涉及名称空间,DocType,实体(字符实体和6种基本XML实体除外)等等。 So basically nodes, elements, attributes, and such. 所以基本上是节点,元素,属性等等。

Also, it is a DOM-style parser. 另外,它是DOM样式的解析器。 So it does require that you read all of the text in. However, what it doesn't do is copy any of that text (usually). 因此,它确实要求您阅读所有文本。但是,它不执行任何操作(通常是复制任何文本)。 The way RapidXML gets most of its speed is by refering to strings in-place . RapidXML获得大部分速度的方法是指的就地字符串。 This requires more memory management on your part (you must keep that string alive while RapidXML is looking at it). 这需要您进行更多的内存管理(在RapidXML查看字符串时,必须保持该字符串处于活动状态)。

RapidXML's DOM is bare-bones. RapidXML的DOM是准系统。 You can get string values for things. 您可以获取事物的字符串值。 You can search for attributes by name. 您可以按名称搜索属性。 That's about it. 就是这样 There are no convenience functions to turn attributes into other values (numbers, dates, etc). 没有方便的功能可以将属性转换为其他值(数字,日期等)。 You just get strings. 您只是得到字符串。

One other downside with RapidXML is that it is painful for writing XML. RapidXML的另一个缺点是编写 XML很麻烦。 It requires you to do a lot of explicit memory allocation of string names in order to build its DOM. 它要求您对字符串名称进行大量的显式内存分配,以构建其DOM。 It does provide a kind of string buffer, but that still requires a lot of explicit work on your end. 它确实提供了一种字符串缓冲区,但这仍然需要大量的显式工作。 It's certainly functional, but it's a pain to use. 它当然是功能性的,但是使用起来很麻烦。

It uses the MIT licence. 它使用MIT许可证。 It is a header-only library with no dependencies. 它是没有依赖项的仅标头库。

I Care About Performance But Not Quite That Much 我很在意性能,但不是很重要

Yes, performance matters to you. 是的,性能对您很重要。 But maybe you need something a bit less bare-bones. 但是也许您需要的东西少一些。 Maybe something that can handle more Unicode, or doesn't require so much user-controlled memory management. 也许可以处理更多Unicode的东西,或者不需要太多用户控制的内存管理。 Performance is still important, but you want something a little less direct. 性能仍然很重要,但是您需要一些不太直接的东西。

You have chosen: 你已经选择:

PugiXML PugiXML

Historically, this served as inspiration for RapidXML. 从历史上看,这是RapidXML的灵感来源。 But the two projects have diverged, with Pugi offering more features, while RapidXML is focused entirely on speed. 但是这两个项目截然不同,Pugi提供了更多功能,而RapidXML完全专注于速度。

PugiXML offers Unicode conversion support, so if you have some UTF-16 docs around and want to read them as UTF-8, Pugi will provide. PugiXML提供Unicode转换支持,因此,如果您有一些UTF-16文档,并且想将它们读为UTF-8,则Pugi会提供。 It even has an XPath 1.0 implementation, if you need that sort of thing. 如果您需要这种东西,它甚至具有XPath 1.0实现。

But Pugi is still quite fast. 但是Pugi仍然相当快。 Like RapidXML, it has no dependencies and is distributed under the MIT License. 与RapidXML一样,它没有依赖关系,并根据MIT许可证进行分发。

Reading Huge Documents 阅读大量文件

You need to read documents that are measured in the gigabytes in size. 您需要阅读以GB为单位的文档。 Maybe you're getting them from stdin, being fed by some other process. 也许您是从stdin那里获取它们的,可能是由其他过程提供的。 Or you're reading them from massive files. 或者您正在从海量文件中读取它们。 Or whatever. 管他呢。 The point is, what you need is to not have to read the entire file into memory all at once in order to process it. 问题的关键是,你需要的是不必为了处理它读取整个文件到内存中的一次。

You have chosen: 你已经选择:

LibXML2 LibXML2

Xerces's SAX-style API will work in this capacity, but LibXML2 is here because it's a bit easier to work with. Xerces的SAX风格的API将以这种方式工作,但是LibXML2在这里是因为它使用起来更容易。 A SAX-style API is a push-API: it starts parsing a stream and just fires off events that you have to catch. SAX风格的API是推入API:它开始解析流,并仅触发您必须捕获的事件。 You are forced to manage context, state, and so forth. 您被迫管理上下文,状态等。 Code that reads a SAX-style API is a lot more spread out than one might hope. 读取SAX样式的API的代码比人们期望的要分散得多。

LibXML2's xmlReader object is a pull-API. LibXML2的xmlReader对象是pull-API。 You ask to go to the next XML node or element; 要求转到下一个XML节点或元素。 you aren't told. 你没有被告知。 This allows you to store context as you see fit, to handle different entities in a way that's much more readable in code than a bunch of callbacks. 这样一来,您就可以根据需要存储上下文,从而以比一堆回调更具代码可读性的方式处理不同的实体。

Alternatives 备择方案

Expat 外籍人士

Expat is a well-known C++ parser that uses a pull-parser API. Expat是使用Pull-Parser API的著名C ++解析器。 It was written by James Clark. 它是由詹姆斯·克拉克(James Clark)撰写的。

It's current status is active. 当前状态为有效。 The most recent version is 2.2.9, which was released on (2019-09-25). 最新版本是2.2.9,已于(2019-09-25)发行。

LlamaXML LlamaXML

It is an implementation of an StAX-style API. 它是StAX风格的API的实现。 It is a pull-parser, similar to LibXML2's xmlReader parser. 它是一个拉式解析器,类似于LibXML2的xmlReader解析器。

But it hasn't been updated since 2005. So again, Caveat Emptor. 但是自2005年以来就没有进行过更新。同样,Caveat Emptor也是如此。

XPath Support XPath支持

XPath is a system for querying elements within an XML tree. XPath是用于查询XML树中元素的系统。 It's a handy way of effectively naming an element or collection of element by common properties, using a standardized syntax. 这是一种使用标准化语法通过通用属性有效命名元素或元素集合的便捷方法。 Many XML libraries offer XPath support. 许多XML库都提供XPath支持。

There are effectively three choices here: 实际上,这里有三个选择:

  • LibXML2 : It provides full XPath 1.0 support. LibXML2 :它提供了完整的XPath 1.0支持。 Again, it is a C API, so if that bothers you, there are alternatives. 同样,它是一个C API,因此如果您感到不便,则可以选择。
  • PugiXML : It comes with XPath 1.0 support as well. PugiXML :它还带有XPath 1.0支持。 As above, it's more of a C++ API than LibXML2, so you may be more comfortable with it. 如上所述,它比LibXML2更多地是C ++ API,因此您可能会更满意。
  • TinyXML : It does not come with XPath support, but there is the TinyXPath library that provides it. TinyXML :它没有XPath支持,但是有TinyXPath库提供了它。 TinyXML is undergoing a conversion to version 2.0, which significantly changes the API, so TinyXPath may not work with the new API. TinyXML正在转换为2.0版,这将大大更改API,因此TinyXPath可能无法与新API一起使用。 Like TinyXML itself, TinyXPath is distributed under the zLib license. 像TinyXML本身一样,TinyXPath也根据zLib许可证进行分发。

Just Get The Job Done 刚完成工作

So, you don't care about XML correctness. 因此,您不必担心XML的正确性。 Performance isn't an issue for you. 性能对您来说不是问题。 Streaming is irrelevant. 流无关紧要。 All you want is something that gets XML into memory and allows you to stick it back onto disk again. 所有你想要的是什么 ,得到XML到内存中,并允许你再坚持它放回盘。 What you care about is API. 关心的是API。

You want an XML parser that's going to be small, easy to install, trivial to use, and small enough to be irrelevant to your eventual executable's size. 您想要一个XML解析器,它要小,易于安装,使用琐碎并且小到与最终可执行文件的大小无关。

You have chosen: 你已经选择:

TinyXML TinyXML

I put TinyXML in this slot because it is about as braindead simple to use as XML parsers get. 我将TinyXML放在此插槽中是因为它就像XML解析器一样容易使用。 Yes, it's slow, but it's simple and obvious. 是的,它很慢,但是很简单明显。 It has a lot of convenience functions for converting attributes and so forth. 它具有许多用于转换属性等的便利功能。

Writing XML is no problem in TinyXML. 在TinyXML中编写XML没问题。 You just new up some objects, attach them together, send the document to a std::ostream , and everyone's happy. 您只需new一些对象,将它们连接在一起,然后将文档发送到std::ostream ,每个人都很高兴。

There is also something of an ecosystem built around TinyXML, with a more iterator-friendly API, and even an XPath 1.0 implementation layered on top of it. 还有一个围绕TinyXML构建的生态系统,它具有对迭代器更友好的API,甚至在其之上都分层了XPath 1.0实现。

TinyXML uses the zLib license, which is more or less the MIT License with a different name. TinyXML使用zLib许可证,该许可证或多或少是MIT许可证,但名称不同。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值