CMarkup: fast simple C++ XML parser 学习笔记

最新推荐文章于 2019-04-18 09:45:35 发布

zhaodongdong2012

最新推荐文章于 2019-04-18 09:45:35 发布

阅读量1.5k

点赞数

分类专栏： C++

本文链接：https://blog.csdn.net/zhaodongdong2012/article/details/47860021

版权

C++ 专栏收录该内容

23 篇文章 1 订阅

订阅专栏

C++中创建和解析XML最好用的开源工具CMakeup，基本可以满足日常开发之用。

官网：www.firstobject.com，在这里可以下载CMakeup源码，其中还包括一个Demo，可以用来学习之。

注：不知道为什么这个网站访问很慢，有时直接无法访问，在这里提供CMakeup源代码下载，解析工具下载。

下面摘录自“www.firstobject.com”，注意黄底红字：

Create new XML documents, parse and modify existing XML documents from the methods of one simple C++ XML parser class.

Quick Start

Open the zip file and copy Markup.cpp and Markup.h into your C++ project folder
Add Markup.cpp and Markup.h to your project (makefile or IDE)
#include "Markup.h" where you use the CMarkup class
Visual C++ specific:
In Visual C++ projects that use precompiled headers you will need to turn them off for Markup.cpp (see Pre-compiled Header Issue)
In Visual C++ to use STL string instead of MFC CString add MARKUP_STL to your C++ Preprocessor Definitions

CMarkup Methods

This is the master list of CMarkup class methods. The CMarkup methods are based on the originalEDOM design. The shaded methods are only available in the Developer Version of CMarkup.

Initialization

Load	Populates the CMarkup object from a file and parses it
SetDoc	Populates the CMarkup object from a string and parses it

Output

Save	Writes the document to file
GetDoc	Returns the whole document as a markup string
GetDocFormatted	Returns the formatted markup string of the whole document

File mode

Open	Opens file, initiating file mode for read or write (and append is a special case of write mode)
Close	Closes file and ends file mode
Flush	For file write mode, this flushes any partial document in memory (up to the closing tags) and the file stream itself

Changing the current position

FindElem	Locates next element, optionally matching tag name or path
FindChildElem	Locates next child element matching tag name or path
FindPrevElem	Locates previous element, optionally matching tag name
FindPrevChildElem	Locates previous child element, optionally matching tag name
FindNode	Locates next node, optionally matching node type(s)
IntoElem	Go "into" current main position element such that it becomes the current parent position
OutOfElem	Makes the current parent position into the current main position
ResetPos	Resets the current position to the start of the document
ResetMainPos	Resets the current main position to before the first sibling
ResetChildPos	Resets the current child position to before the first child

Adding to the Document

AddElem	Adds an element after the current main position element or last sibling
InsertElem	Inserts an element before the current main position element or first sibling
AddChildElem	Adds an element after the current child position element or last child
InsertChildElem	Inserts an element before the current child position element or first child
AddSubDoc	Adds a subdocument after the current main position element or last sibling
InsertSubDoc	Inserts a subdocument before the current main position element or first sibling
AddChildSubDoc	Adds a subdocument after the current child position element or last child
InsertChildSubDoc	Inserts a subdocument before the current child position element or first child
AddNode	Adds a node after the current node or at the end of the parent element content
InsertNode	inserts a node before the current node or at the beginning of the parent element content

Removing from the Document

RemoveElem	Removes the current main position element including child elements
RemoveChildElem	Removes the current child position element including its child elements
RemoveNode	Removes the current node
RemoveAttrib	Removes the specified attribute from the current main position element
RemoveChildAttrib	Removes the specified attribute from the current child position element

Getting Values

GetData	Returns the string value of the current main position element or node
GetChildData	Returns the string value of the current child position element
GetElemContent	Returns the string markup content of the current main position element including child elements
GetSubDoc	Returns the subdocument markup string of the current main position element including child elements
GetChildSubDoc	Returns the subdocument markup string of the current child position element including child elements
GetAttrib	Returns the string value of the specified attribute of the main position element (or processing instruction)
GetChildAttrib	Returns the string value of the specified attribute of the child position element
HasAttrib	Returns true if the specified attribute exists in the main position element (or processing instruction)
HasChildAttrib	Returns true if the specified attribute exists in the child position element
GetTagName	Returns the tag name of the main position element (or processing instruction)
GetChildTagName	Returns the tag name of the child position element
FindGetData	Locates the next element matching the specified path and returns the string value

Setting Values

SetData	Sets the value of the current main position element or node
SetChildData	Sets the value of the current child position element
SetElemContent	Sets the markup content of the current main position element
SetAttrib	Sets the value of the specified attribute of the current main position element (or processing instruction)
SetChildAttrib	Sets the value of the specified attribute of the current child position element
FindSetData	Locates the next element matching the specified path and sets the value

Other Info

GetNthAttrib	Returns the name and value of attribute specified by number for the current main position element
GetAttribName	Returns the name of attribute specified by number for the current main position element
GetNodeType	Returns the node type of the current node
GetElemLevel	Returns the level of the current main position
GetElemFlags	Returns the current main position element's flags
SetElemFlags	Sets the current main position element's flags
GetOffsets	Obtains the document text offsets of the current main position
GetAttribOffsets	Obtains the document text offsets of the specified attribute in the current main position

Remembering positions

SavePos	Saves the current position with an optional string name using a hash map
RestorePos	Goes to the position saved with `SavePos`
SetMapSize	Sets the size of a map for use with the `SavePos` and `RestorePos` methods
GetElemIndex	Returns the integer index of the current main position element
GotoElemIndex	Sets the current main position element to that of the given integer index
GetChildElemIndex	Returns the integer index of the current child position element
GotoChildElemIndex	Sets the current child position element to that of the given integer index
GetParentElemIndex	Returns the integer index of the current parent position element
GotoParentElemIndex	Sets the current parent position element to that of the given integer index
GetElemPath	Returns a string representing the absolute path of the main position element
GetChildElemPath	Returns a string representing the absolute path of the child position element
GetParentElemPath	Returns a string representing the absolute path of the parent position element

Document Status

IsWellFormed	Determines if document has a single root element and properly contained elements
GetResult	Returns result markup from last parse or file operation
GetError	Returns English error/result synopsis string from last parse or file operation
GetDocFlags	Returns the document flags
SetDocFlags	Sets the document flags
GetDocElemCount	Returns the number of elements in the document

Static Utility Functions

ReadTextFile	Reads a text file into a string
WriteTextFile	Writes a string to a text file
GetDeclaredEncoding	Returns the encoding name as a string from the XML declaration
EscapeText	Returns the string with special characters encoded for markup
UnescapeText	Returns the string with special characters unencoded for a string value
UTF8ToA	Converts a UTF-8 string to a non-Unicode ("ANSI") string
AToUTF8	Converts a non-Unicode ("ANSI") string to UTF-8
UTF16To8	Converts a UTF-16 string to UTF-8
UTF8To16	Converts a UTF-8 string to UTF-16
EncodeBase64	Encodes a binary data buffer to a Base64 string
DecodeBase64	Encodes a Base64 string to a binary data buffer

Fast start to XML in C++

Enough bull. You want to create XML or read and find things in XML. All you need to know about CMarkup is that it is just one object per XML document (for the API design concept see EDOM). And by the way the free firstobject XML Editor generates C++ source code for creating and navigating your own XML documents with CMarkup.

Creating an XML Document

To create an XML document, instantiate a CMarkup object and call AddElem to create the root element. At this point your document would simply contain the empty root element e.g. <ORDER/>. Then call IntoElem to go "inside" the ORDER element so that you can create child elements under the root element (i.e. the root element will be the "container" of the child elements).

The following example code creates an XML document.

CMarkup xml;
xml.AddElem( "ORDER" );
xml.IntoElem();
xml.AddElem( "ITEM" );
xml.IntoElem();
xml.AddElem( "SN", "132487A-J" );
xml.AddElem( "NAME", "crank casing" );
xml.AddElem( "QTY", "1" );

This code generates the following XML. The root is the ORDER element; notice that its start tag<ORDER> is at the beginning and end tag </ORDER> is at the bottom. When an element is under (i.e. inside or contained by) a parent element, the parent's start tag is before it and the parent's end tag is after it. The ORDER element contains one ITEM element. That ITEM element contains 3 child elements: SN, NAME, and QTY.

<ORDER>
<ITEM>
<SN>132487A-J</SN>
<NAME>crank casing</NAME>
<QTY>1</QTY>
</ITEM>
</ORDER>

As shown in the example, you create elements under an element by calling IntoElem to make your current main position (or "place holder") into your current parent position so you can begin adding child elements. CMarkup maintains a current position in order to keep your source code shorter and simpler. This same position logic is used when navigating a document.

You can write the above document to file with Save:

xml.Save( "C:\\Sample.xml" );

And you can retrieve the XML into a string with GetDoc:

MCD_STR strXML = xml.GetDoc();

Markup.h defines MCD_STR to the string type you compile CMarkup for, so we use MCD_STR in these examples, but you can use your own string type explicitly (e.g. std::string or CString).

Navigating an XML Document

You can navigate the data right inside the same CMarkup object you created in the example above; just call ResetPos if you want to go back to the beginning of the document. Or you can populate a new CMarkup object:

CMarkup xml;

From a file with Load:

xml.Load( "C:\\Sample.xml" );

Or from an XML string with SetDoc:

xml.SetDoc( strXML );

In the following example, we go inside the root ORDER element and loop through all ITEM elements with FindElem to get the serial number and quantity of each with GetData. The serial number is treated as a string and the quantity is converted to an integer using atoi (MCD_2PCSZ is defined in Markup.h to return the string's const pointer).

xml.FindElem(); // root ORDER element
xml.IntoElem(); // inside ORDER
while ( xml.FindElem("ITEM") )
{
    xml.IntoElem();
    xml.FindElem( "SN" );
    MCD_STR strSN = xml.GetData();
    xml.FindElem( "QTY" );
    int nQty = atoi( MCD_2PCSZ(xml.GetData()) );
    xml.OutOfElem();
}

For each item we find, we call IntoElem to interrogate its child elements, and then OutOfElemafterwards. As you get accustomed to this type of navigation you will know to check in your loops to make sure there is a corresponding OutOfElem call for every IntoElem call.

Adding Elements and Attributes

The above example for creating a document only created one ITEM element. Here is an example that creates multiple items loaded from a previously populated data source, plus a SHIPMENT information element in which one of the elements has an attribute we set with SetAttrib.

CMarkup xml;
xml.AddElem( "ORDER" );
xml.IntoElem(); // inside ORDER
for ( int nItem=0; nItem<aItems.GetSize(); ++nItem )
{
    xml.AddElem( "ITEM" );
    xml.IntoElem(); // inside ITEM
    xml.AddElem( "SN", aItems[nItem].strSN );
    xml.AddElem( "NAME", aItems[nItem].strName );
    xml.AddElem( "QTY", aItems[nItem].nQty );
    xml.OutOfElem(); // back out to ITEM level
}
xml.AddElem( "SHIPMENT" );
xml.IntoElem(); // inside SHIPMENT
xml.AddElem( "POC" );
xml.SetAttrib( "type", strPOCType );
xml.IntoElem(); // inside POC
xml.AddElem( "NAME", strPOCName );
xml.AddElem( "TEL", strPOCTel );

This code generates the following XML. The root ORDER element contains 2 ITEM elements and a SHIPMENT element. The ITEM elements both contain SN, NAME and QTY elements. The SHIPMENT element contains a POC element which has a type attribute, and NAME and TEL child elements.

<ORDER>
<ITEM>
<SN>132487A-J</SN>
<NAME>crank casing</NAME>
<QTY>1</QTY>
</ITEM>
<ITEM>
<SN>4238764-A</SN>
<NAME>bearing</NAME>
<QTY>15</QTY>
</ITEM>
<SHIPMENT>
<POC type="non-emergency">
<NAME>John Smith</NAME>
<TEL>555-1234</TEL>
</POC>
</SHIPMENT>
</ORDER>

Finding Elements

The FindElem method goes to the next sibling element. If the optional tag name argument is specified, then it goes to the next element with a matching tag name. The element that is found becomes the current element, and the next call to FindElem will go to the next sibling or matching sibling after that current position.

When you cannot assume the order of the elements, you must move the position back before the first sibling with ResetMainPos in between your calls to the FindElem method. Looking at the ITEM element in the above example, if someone else is creating the XML and you cannot assume the SN element is before the QTY element, then call ResetMainPos before finding the QTY element.

{
    xml.IntoElem();
    xml.FindElem( "SN" );
    MCD_STR strSN = xml.GetData();
    xml.ResetMainPos();
    xml.FindElem( "QTY" );
    int nQty = atoi( MCD_2PCSZ(xml.GetData()) );
    xml.OutOfElem();
}

To find the item with a particular serial number, you can loop through the ITEM elements and compare the SN element data to the serial number you are searching for. By specifying the "ITEM" element tag name in the FindElem method we ignore all other sibling elements such as the SHIPMENT element. Also, instead of going into and out of the ITEM element to look for the SN child element, we use the FindChildElem and GetChildData methods for convenience.

xml.ResetPos(); // top of document
xml.FindElem(); // ORDER element is root
xml.IntoElem(); // inside ORDER
while ( xml.FindElem("ITEM") )
{
    xml.FindChildElem( "SN" );
    if ( xml.GetChildData() == strFindSN )
        break; // found
}

You are NOT on your own

This site has all kinds of examples of doing various XML operations. CMarkup has been widely used for many years. Of course it doesn't do everything, but almost every purpose has at least been discussed. Don't hesitate to ask if you have questions. A good place to go next is the CMarkup Methods.

zhaodongdong2012

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CMarkup: fast simple C++ XML parser 学习笔记

C++中创建和解析XML最好用的开源工具CMakeup，基本可以满足日常开发之用。官网：www.firstobject.com，在这里可以下载CMakeup源码，其中还包括一个Demo，可以用来学习之。注：不知道为什么这个网站访问很慢，有时直接无法访问，在这里提供CMakeup源代码下载，解析工具下载。下面摘录自“www.firstobject.com”，注意黄底红字：C
复制链接

扫一扫

专栏目录