一些常用的字符转换

最新推荐文章于 2022-05-15 16:34:03 发布

C__Allen

最新推荐文章于 2022-05-15 16:34:03 发布

阅读量954

点赞数

分类专栏： C++ 文章标签： string c basic windows dos path

本文链接：https://blog.csdn.net/C__Allen/article/details/7566155

版权

C++ 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

最近做FTP相关的东西，遇到各种转换：

1.utf-8与unicode的转换：

Unicode是定长，而utf-8是变长。转换原理：

U-00000000 - U-0000007F: 0xxxxxxx //以0开头的为ASCII码（<=0x7f）)

        U-00000080 - U-000007FF: 110xxxxx 10xxxxxx //开头两个“1”，表示有两个字节（16个二进制位），

        U-00000800 - U-0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx //个人把每个字节前的“10”理解为分隔

        U-00010000 - U-001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx //后面的X是我们真正要转换的地方

        U-00200000 - U-03FFFFFF: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

        U-04000000 - U-7FFFFFFF: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

一个例子： unicode: 100111101100000 (4F60) 转换成 utf-8:    11100100,10111101,10100000       E4BDA0

首先，4F60在上表中的 U-00000800 - U-0000FFFF部分，因此，转换后的格式为1110xxxx 10xxxxxx 10xxxxxx，

关键在于确定这些X。把4F60转换成二进制为0100 1111 0110 0000，把这16个二进制位代入上述中的X，再重新进行字节划分，即得结果

此部分代码就不贴了，utf-8转换unicode就是个逆过程，上代码：

	c = utf[index++];
	while(c)
	{
		if((c & 0x80) == 0)
		{
			index += 0;
		}
		else if((c & 0xe0) == 0xe0)
		{
			index += 2;
		}
		else
		{
			index += 1;
		}
		size += 1;
		c = utf[index++];
	}

	index = 0;
	c = utf[index++];
	while(('\0' != c) && (out_index + 1 < len))
	{
		if((c & 0x80) == 0)
		{
			//0xxxxxx
			out[out_index++] = c;
		}
		else if((c & 0xe0) == 0xe0)
		{
			//1110xxxx 10xxxxxx 10xxxxxx
			out[out_index] = (c & 0x1F) << 12;
			c = utf[index++];
			out[out_index] |= (c & 0x3F) << 6;
			c = utf[index++];
			out[out_index++] |= (c & 0x3F);
		}
		else
		{
			//110xxxxx 10xxxxxx
			out[out_index] = (c & 0x3F) << 6;
			c = utf[index++];
			out[out_index++] |= (c & 0x3F);
		}
		c = utf[index++];
	}
	out[out_index] = 0;

2.使用ATL进行CString与char *的转换（UNICODE与ANSI字符串的转换）：

USES_CONVERSION;//必须添加
char *Ansi = A2W(Unicode);//char*转换成CString
CString Unicode = W2A(Ansi);//CString 转换成char*

3.string类相关

关于string，CppRefference里面是这样定义的：

template<

  class CharT,
  class Traits = std::char_traits<CharT>,
  class Allocator = std::allocator<CharT>

> class basic_string ;

Type	Definition

`string`	basic_string<char>

basic_string是类模板，string是basic_string的特化

它的成员函数就不说了，默认构造函数也很方便，直接上MSDN上查。值得一提的是c_str()能方便地完成string到char *的转换

我在取路径(path)的父路径时这样做:（例如在C:/Python32/ 中取得 C:/）

	int tmp_offset = PathName.rfind('/');//右起第一个'/'
	PathName = PathName.substr(0, tmp_offset);
	int locate = PathName.rfind('/');
	PathName = PathName.substr(0, locate + 1);

一个小的插曲是:windows下的路径名支持两种方式：“C:\”和“C:/”都可以,但是dos命令只支持"C:\"的形式