基于C++实现的一种通用Base编解码器(Hex(Base16)/Base32/Base64)

最新推荐文章于 2024-06-07 11:27:08 发布

元子丰

最新推荐文章于 2024-06-07 11:27:08 发布

阅读量5.2k

点赞数 4

分类专栏：程序设计文章标签： Base Base64 Base32 Base16 Hex

本文链接：https://blog.csdn.net/yaoyuanyylyy/article/details/89221792

版权

程序设计专栏收录该内容

37 篇文章 13 订阅

订阅专栏

使用Hex(十六进制)编码、Base32编码和Base64编码可以将原始数据编码为可视化字符串。它们的原理是一样的，都是将指定位数的原始数据编码为特定字符空间中的一个字符。

Hex：也叫作Base16编码；每4位编码为一个字符，字符空间为"0123456789abcdef"或"0123456789ABCDEF"；不区分大小写，其中的字母可以编码为大写也可以编码为小写，同时解码也不区分大小写，应该能对大小写的HEX字符串都能正确解码；
Base32：每5位编码为一个字符，字符空间为"ABCDEFGHIJKLMNOPQRSTUVWXYZ234567"，大小写敏感；
Base64：每6位编码为一个字符，字符空间为"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"，大小写敏感；

填充： 对于Base32和Base64，如果待编码的数据的比特数不是5或6的倍数，则需要进行填充，标准的填充字符为’=’。

本文使用C++实现一种通用的Base编解码器，类名命名为BaseCodec。完整代码贴在文章末尾。

首先，提炼并定义Base编码的影响因素：

		char mPaddingChar;
		bool mIgnoreCase;
		int mPartialBits;
		std::string mBase;

其含义为：

mPaddingChar：填充字符
mIgnoreCase：是否忽略大小写
mPartialBits：分割位数，即多少位编码为一个字符
mBase：字符空间

这四个参数通过BaseCodec的构造函数传入并赋值：

	BaseCodec::BaseCodec(const std::string& pBase, char paddingChar, int partialBits, bool ignoreCase)
	{
		mBase = pBase;
		mPaddingChar = paddingChar;
		mPartialBits = partialBits;
		mIgnoreCase = ignoreCase;
	}

有了这四个参数，就可以开始编解码了。
然后，定义类的3个public接口，包括编码、解码以及字符格式检测。

	public:
		/**
		 * Encoder
		 */
		std::string Encode(const std::string& input);

		/**
		 * Decoder.
		 *
		 * throw std::exception if input length error or has invalid character.
		 */
		std::string Decode(const std::string& input);

		/**
		* Format check.
		*
		* A valid character will be in pBase pass into constructor , case-insensitive.
		*/
		bool Check(const std::string& data);

接着，通过上面的构造函数和3个公共接口，就可以定义我们需要编解码器了。这里定义了以下的几个：

	public:
		/** Binary Codec*/
		static BaseCodec Binary;
		
		/** Hex Codec(lowercase)*/
		static BaseCodec Hex;
		
		/** Hex Codec(uppercase)*/
		static BaseCodec Hexu;
		
		/** Base32 Codec*/
		static BaseCodec Base32;
		
		/** Base64 Codec*/
		static BaseCodec Base64;

把各个编解码器声明为BaseCodec的public静态变量，以便于引用。它们的定义为；

	static const char CHAR_BASE_2[] = "01";
	static const char CHAR_BASE_16[] = "0123456789abcdef";
	static const char CHAR_BASE_16u[] = "0123456789ABCDEF";
	static const char CHAR_BASE_32[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567";
	static const char CHAR_BASE_64[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
	static const char CHAR_PADDING = '=';
	
	BaseCodec BaseCodec::Binary(CHAR_BASE_2, CHAR_PADDING, 1, false);
	BaseCodec BaseCodec::Hex(CHAR_BASE_16, CHAR_PADDING, 4, true);
	BaseCodec BaseCodec::Hexu(CHAR_BASE_16u, CHAR_PADDING, 4, true);
	BaseCodec BaseCodec::Base32(CHAR_BASE_32, CHAR_PADDING, 5, false);
	BaseCodec BaseCodec::Base64(CHAR_BASE_64, CHAR_PADDING, 6, false);

这样，便可以通过如下的形式：

BaseCodec::Base64.Encode
BaseCodec::Base64.Decode
BaseCodec::Base64.Check

进行引用了。

在上面定义的编解码器中的BaseCodec::Hex和BaseCodec::Hexu是十六进制编解码器，前者是编码为小写字母，后者是编码为大写字母；当然，两者都应该能够同时解码大小写字母，因为它们已经声明为忽略大小写的。

BaseCodec::Binary是一个二进制编码器。如对“1234”进行二进制编码：

有了上面的编解码器示例，可以很容易地定义出类似的其他的编解码器，或者更改字符空间也可以定义出私有的编解码器。
比如，我们可以定义一个Base8编码器，即每3位编码为一个字符，字符空间为"01234567"。

	static const char CHAR_BASE_8[] = "01234567";	
	BaseCodec Octal(CHAR_BASE_8, CHAR_PADDING, 3, false);

编解码：当然最重要的还是编码和解码函数的实现。编解码函数的参数采用C格式，这样可以方便在需要时转为C代码。上面的公共接口使用C++和std::string封装了一下。编解码函数如下：

	int BaseCodec::Encode(const unsigned char *input, char *output, int length)
	{
		if (length <= 0 || input == nullptr)
			return 0;

		int partialBits = mPartialBits; //编码为1个字节的位数
		const char* pBase = mBase.c_str(); //编码字符空间

		int nlcm = lcm(8, partialBits); //编码位数和8的最小公倍数
		int nn = length * 8 / nlcm; //非剩余编码的块数
		int retLength = nn * (nlcm / partialBits); //编码后的长度: 首先是非剩余编码的长度
		int padCount = 0; //填充长度

		//有剩余位时则有填充
		nn = length * 8 % nlcm; //剩余位数
		if (nn != 0) {
			padCount = (nlcm - nn) / partialBits; //填充长度
			retLength += nlcm / partialBits; //编码后的长度: 然后是剩余+填充的长度
		}

		if (output == nullptr) return retLength; //返回长度

		unsigned char current; //中间缓存变量
		unsigned char mask = (1 << partialBits) - 1; // 编码转换掩码

		int codedOffset = 0; //已编码的长度偏移
		int leftBits = 0; // 剩余的未编码数据的位数
		unsigned char left_temp; // 剩余的未编码的数据
		for (int i = 0; i < length; i++)
		{			
			int currentBits; //当前未处理的比特数

			if (leftBits == 0) {
				currentBits = 8;
			} else {
				currentBits = 8 + leftBits - partialBits; //先处理一次之后剩余的比特数
				unsigned char mask_need = (1 << (partialBits - leftBits)) - 1; //编码上次剩余的数据还差的位数掩码
				mask_need <<= currentBits; //左移以便取高位

				//处理上次剩余的数据
				current = (left_temp << (partialBits - leftBits)) | ((input[i] & mask_need) >> currentBits);
				current &= mask;
				output[codedOffset++] = pBase[(int)current];
			}

			int times = currentBits / partialBits; //当前剩余的数据可以转换编码的次数

			unsigned char mask_can_be_coded = (1 << (partialBits*times)) - 1; //当前可编码的数据掩码
			mask_can_be_coded <<= (currentBits - partialBits * times);
			unsigned char temp_to_coded = (input[i] & mask_can_be_coded); //当前剩余可编码数据

			leftBits = currentBits - partialBits * times;
			unsigned char mask_temp_left = (1 << (currentBits - partialBits * times)) - 1; //本次未编码的数据掩码
			left_temp = (input[i] & mask_temp_left);

			//将本次可编码的数据进行编码
			for (int m = 0; m < times; m++) {
				current = temp_to_coded >> (currentBits - (m + 1) * partialBits);
				current &= mask;
				output[codedOffset++] = pBase[(int)current];
			}
		}

		//剩余位
		if (leftBits != 0) {
			current = (left_temp << (partialBits - leftBits));
			current &= mask;
			output[codedOffset++] = pBase[(int)current];
		}

		//填充
		//padCount = retLength - j;
		for (int i = 0; i < padCount; i++)
			output[codedOffset++] = mPaddingChar;

		output[codedOffset] = '\0';

		return retLength;
	}


	int BaseCodec::Decode(const char *input, unsigned char *output)
	{
		int length = strlen(input);
		if (input == nullptr || length == 0)
			return 0;

		int partialBits = mPartialBits; //编码为1个字节的位数
		const char* pBase = mBase.c_str(); //转换字符空间

		int nlcm = lcm(8, partialBits); //编码位数和8的最小公倍数
		if ((length % (nlcm / partialBits)) != 0)
			throw std::exception("input length error");

		int padCount = 0; //填充长度
		for (int i = length - 1; i > length - nlcm / partialBits; i--) {
			if (input[i] == mPaddingChar)
				padCount++;
			else
				break;
		}

		int retLength; //解码后的长度
		int leftPadChar = 0; //最后一个完整块中的非填充字节数
		if (padCount != 0) {
			leftPadChar = nlcm / partialBits - padCount;
			retLength = (length - nlcm / partialBits)*partialBits/8; //首先是非剩余编码的长度
			retLength += (leftPadChar - 1);
		} else {
			retLength = length * partialBits / 8;
		}

		if (output == nullptr) return retLength; //返回长度

		unsigned char mask = (1 << partialBits) - 1; // 编码转换掩码
		int codedOffset = 0; //已编码的长度偏移
		int leftBits = 0; // 剩余的未解码数据的位数
		unsigned char left_temp; // 剩余的未解码的数据
		int index = 0;
		for (int i = 0; i < (length - padCount); i++)
		{
			index = GetCharPos(pBase, input[i], mIgnoreCase);
			if (index == -1)
				throw std::exception("has invalid char");

			unsigned char temp = index & mask;

			if (leftBits != 0)
			{
				leftBits += partialBits;
				if (leftBits >= 8)
				{
					leftBits -= 8;
					output[codedOffset++] = (left_temp << (partialBits - leftBits)) | (temp >> leftBits);

					unsigned char mask_left = (1 << leftBits) - 1;
					left_temp = temp & mask_left;
				} else {
					left_temp = (left_temp << partialBits) | temp;
				}

			} else {
				leftBits = partialBits;
				left_temp = temp;
			}
		}

		output[codedOffset] = '\0';

		return retLength;
	}

在编解码中，有两点是重要的：一是编解码后的长度和填充字符的长度计算；二是在连续处理字符时，要在处理当前字节数据的同时，处理掉上次剩余的数据。
在编码解码函数中，还用到了两个辅助函数，一个是最小公倍数的计算，另一个是在字符串中查找字符位置。
还需要考虑的问题：端序与异常。本例是在Windows的VS中编写的，std::exception异常有默认的可传入消息的构造函数；端序也没有考虑。在其他的平台上可能需要考虑这两个问题。

元子丰

关注

4
点赞
踩
9

收藏

觉得还不错? 一键收藏
1
评论
基于C++实现的一种通用Base编解码器(Hex(Base16)/Base32/Base64)

使用Hex(十六进制)编码、Base32编码和Base64编码可以将二进制数据编码为可视化字符串。它们的原理是一样的，都是将指定位数的二进制数据编码为特定字符空间中的一个字符。Hex：也叫作Base16编码；每4位编码为一个字符，字符空间为"0123456789abcdef"或"0123456789ABCDEF"；不区分大小写，其中的字母可以编码为大写也可以编码为小写，同时解码也不区分大小写，...
复制链接

扫一扫

专栏目录