最近的项目需要使用pugixml来替换libxml。
由于项目中使用的字符串编码为utf16le,在解析上面遇到了比较大的问题。直接使用load()或者load_string()都会出现无法解析出正确的结果的问题。
后来想到utf16的字符串,其本身的编码会有类似\0的字符出现,导致一些函数会误判字符串结束的位置。于是自然而然想到,如果有一个load接口是可以传递其字符串长度的话,应该就能解决这个问题。
于是从pugixml.hpp中找到了带size_t size参数的接口load_buffer*:
#ifndef PUGIXML_NO_STL
// Load document from stream.
xml_parse_result load(std::basic_istream<char, std::char_traits<char> >& stream, unsigned int options = parse_default, xml_encoding encoding = encoding_auto
);
xml_parse_result load(std::basic_istream<wchar_t, std::char_traits<wchar_t> >& stream, unsigned int options = parse_default);
#endif
// (deprecated: use load_string instead) Load document from zero-terminated string. No encoding conversions are applied.
PUGIXML_DEPRECATED xml_parse_result load(const char_t* contents, unsigned int options = parse_default);
// Load document from zero-terminated string. No encoding conversions are applied.
xml_parse_result load_string(const char_t* contents, unsigned int options = parse_default);
// Load document from file
xml_parse_result load_file(const char* path, unsigned int options = parse_default, xml_encoding encoding = encoding_auto);
xml_parse_result load_file(const wchar_t* path, unsigned int options = parse_default, xml_encoding encoding = encoding_auto);
// Load document from buffer. Copies/converts the buffer, so it may be deleted or changed after the function returns.
xml_parse_result load_buffer(const void* contents, size_t size, unsigned int options = parse_default, xml_encoding encoding = encoding_auto);
// Load document from buffer, using the buffer for in-place parsing (the buffer is modified and used for storage of document data).
// You should ensure that buffer data will persist throughout the document's lifetime, and free the buffer memory manually once document is destroyed.
xml_parse_result load_buffer_inplace(void* contents, size_t size, unsigned int options = parse_default, xml_encoding encoding = encoding_auto);
// Load document from buffer, using the buffer for in-place parsing (the buffer is modified and used for storage of document data).
// You should allocate the buffer with pugixml allocation function; document will free the buffer when it is no longer needed (you can't use it anymore).
xml_parse_result load_buffer_inplace_own(void* contents, size_t size, unsigned int options = parse_default, xml_encoding encoding = encoding_auto);
使用load_buffer即可解决问题:
pugi::xml_document pugidoc;
pugidoc.load_buffer((const void *)xml_in,xml_len,pugi::parse_full,pugi::encoding_utf16);