c++ jsoncpp中文和\uXXXX使用toStyledString生成字符串中文乱码解决方案

最新推荐文章于 2023-05-12 22:36:25 发布

wxplol

最新推荐文章于 2023-05-12 22:36:25 发布

阅读量5.9k

点赞数

分类专栏： c++

本文链接：https://blog.csdn.net/wxplol/article/details/109505854

版权

c++ 专栏收录该内容

9 篇文章 5 订阅

订阅专栏

一、中文乱码解决方法

1.1、乱码展示

在使用jsoncpp解析含有中文的字符串的时候，使用toStyledString()函数生成的字符串中的中文部分将变成\u加4个16进制数字会出现解析乱码的情况。

比如：

1.2、乱码原因及解决方法

jsoncpp的源码来分析（官方下载地址：http://sourceforge.net/projects/jsoncpp/files/ ）。通过分析StyledWriter的writeValue函数发现他对字符串的处理通过valueToQuotedStringN函数进行了转义：

static String valueToQuotedStringN(const char* value, unsigned length) {
  if (value == nullptr)
    return "";

  if (!isAnyCharRequiredQuoting(value, length))
    return String("\"") + value + "\"";
  // We have to walk value and escape any special characters.
  // Appending to String is not efficient, but this should be rare.
  // (Note: forward slashes are *not* rare, but I am not escaping them.)
  String::size_type maxsize = length * 2 + 3; // allescaped+quotes+NULL
  String result;
  result.reserve(maxsize); // to avoid lots of mallocs
  result += "\"";
  char const* end = value + length;
  for (const char* c = value; c != end; ++c) {
    switch (*c) {
    case '\"':
      result += "\\\"";
      break;
    case '\\':
      result += "\\\\";
      break;
    case '\b':
      result += "\\b";
      break;
    case '\f':
      result += "\\f";
      break;
    case '\n':
      result += "\\n";
      break;
    case '\r':
      result += "\\r";
      break;
    case '\t':
      result += "\\t";
      break;
    // case '/':
    // Even though \/ is considered a legal escape in JSON, a bare
    // slash is also legal, so I see no reason to escape it.
    // (I hope I am not misunderstanding something.)
    // blep notes: actually escaping \/ may be useful in javascript to avoid </
    // sequence.
    // Should add a flag to allow this compatibility mode and prevent this
    // sequence from occurring.
	default: {
		unsigned int cp = utf8ToCodepoint(c, end);
		// don't escape non-control characters
		// (short escape sequence are applied above)
		if (cp < 0x80 && cp >= 0x20)
			result += static_cast<char>(cp);
		else if (cp < 0x10000) { // codepoint is in Basic Multilingual Plane
			result += "\\u";
			result += toHex16Bit(cp);
		}
		else { // codepoint is not in Basic Multilingual Plane
			   // convert to surrogate pair first
			cp -= 0x10000;
			result += "\\u";
			result += toHex16Bit((cp >> 10) + 0xD800);
			result += "\\u";
			result += toHex16Bit((cp & 0x3FF) + 0xDC00);
		}

		}break;
	}
  }
  result += "\"";
  return result;
}

通过代码可以明白的看到default:里面处理的就是包括中文在内的字符：于是我们可以修改源代码重新编译库。将：

	default: {
		unsigned int cp = utf8ToCodepoint(c, end);
		// don't escape non-control characters
		// (short escape sequence are applied above)
		if (cp < 0x80 && cp >= 0x20)
			result += static_cast<char>(cp);
		else if (cp < 0x10000) { // codepoint is in Basic Multilingual Plane
			result += "\\u";
			result += toHex16Bit(cp);
		}
		else { // codepoint is not in Basic Multilingual Plane
			   // convert to surrogate pair first
			cp -= 0x10000;
			result += "\\u";
			result += toHex16Bit((cp >> 10) + 0xD800);
			result += "\\u";
			result += toHex16Bit((cp & 0x3FF) + 0xDC00);
		}

			//result += *c;
			
		}break;

改为：

	default: {
			result += *c;
    }break;

最终结果为：

参考链接：

c++ jsoncpp使用toStyledString生成字符串中文乱码解决方案

二、含有\uXXXX解析乱码的解决方法

2.1、乱码展示

json文件如下：

解析结果：

2.2、乱码原因

之前改过valueToQuotedStringN函数，这个函数是将字符串转化为unicode编码，所以直接读取\uXXXX格式的字符串得到的其实是utf-8的字符串（如果读的是中文才是unicode编码）。所以这里需要额外的将字符串转化为unicode代码

2.3、解决方法

utf-8转unicode:

wstring UTF8ToUnicode(const string& str)
{
	int len = 0;
	len = str.length();
	int unicodeLen = ::MultiByteToWideChar(CP_UTF8,
		0,
		str.c_str(),
		-1,
		NULL,
		0);
	wchar_t * pUnicode;
	pUnicode = new wchar_t[unicodeLen + 1];
	memset(pUnicode, 0, (unicodeLen + 1) * sizeof(wchar_t));
	::MultiByteToWideChar(CP_UTF8,
		0,
		str.c_str(),
		-1,
		(LPWSTR)pUnicode,
		unicodeLen);
	wstring rt;
	rt = (wchar_t*)pUnicode;
	delete pUnicode;
	return rt;
}

在程序中加入该函数，并调用：

std::string ws2s(const std::wstring& ws)
{
	std::string curLocale = setlocale(LC_ALL, NULL);     
	setlocale(LC_ALL, "chs");
	const wchar_t* _Source = ws.c_str();
	size_t _Dsize = 2 * ws.size() + 1;
	char *_Dest = new char[_Dsize];
	memset(_Dest, 0, _Dsize);
	wcstombs(_Dest, _Source, _Dsize);
	std::string result = _Dest;
	delete[]_Dest;
	setlocale(LC_ALL, curLocale.c_str());
	return result;
}


//调用
std::string content = root["Cnki"][i]["content"].toStyledString();
wstring wstr = UTF8ToUnicode(content);//将utf-8转化为unicode格式	
cout << ws2s(wstr) << endl;

结果：

参考链接：

C++ STRING 和WSTRING 之间的互相转换函数

wxplol

关注

0
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
c++ jsoncpp中文和\uXXXX使用toStyledString生成字符串中文乱码解决方案

一、中文乱码解决方法1.1、乱码展示在使用jsoncpp解析含有中文的字符串的时候，使用toStyledString()函数生成的字符串中的中文部分将变成\u加4个16进制数字会出现解析乱码的情况。比如：1.2、乱码原因及解决方法jsoncpp的源码来分析（官方下载地址：http://sourceforge.net/projects/jsoncpp/files/）。通过分析StyledWriter的writeValue函数发现他对字符串的处理通过valueToQuotedStrin.
复制链接

扫一扫