转码与解码

最新推荐文章于 2022-03-05 19:17:07 发布

hbu_dcf

最新推荐文章于 2022-03-05 19:17:07 发布

阅读量2.5k

点赞数

分类专栏： WEB开发文章标签： string null buffer encoding delete parameters

本文链接：https://blog.csdn.net/hbu_dcf/article/details/3703372

版权

WEB开发专栏收录该内容

11 篇文章 0 订阅

订阅专栏

c++里面没有现成的urlencode和urldecode函数，所以我自己写了两个函数。

网上许多代码都是针对gb2312格式的汉字，我写的这个既有适用于utf8格式的汉字，也有gb的。

下面先说下差别：

比如汉字中国，utf8 的中国编码后："%e4%b8%ad%e5%9b%bd"，

gb2312的中国编码后："%d6%d0%b9%fa" 。

默认汉字编码为gb，所以如果要按照utf8编码的话，需要用MultiByteToWideChar转变一下格式。

我的编译环境：

vs.net2005下c++。

参考：

http://www.codeguru.com/Cpp/Cpp/cpp_mfc/article.php/c4029

URL Encoding

Chandrasekhar Vuppalapati (view profile)
May 22, 2003

Environment: VC++, MFC

Introduction前言

写这篇文章的目的是设计一个c++的类，实现 URL encoding的功能。在我前一个项目中，我编写的一个VC++ 6.0的程序需要提交数据，在提交之前需要encoding。我在msdn中查找相关的类和api函数都没有找到，所以我决定自己写一个。

(continued)

The URLEncoder.exe is a MFC dialog-based application that uses the URLEncode class.

Process

URL encoding 是保证你传给网页的字符都是安全的，有些字符在网络传递数据的过程中有着特殊的意义。比如说，ASCII的13这个字符，你的程序在发送数据时会认为这个字符意味着一行数据的结束。

通常，所有的web程序在客户端和服务器端传输数据使用的都是HTTP 或 HTTPS 协议，服务器端有两种方式接收客户端的数据：

通过 HTTP 头（可以通过cookies 或者提交表单）或者
作为请求url的一部分

当数据作为url的一部分时，数据就必须经过encode来符合url语法。在服务器端，数据自动decode。看下面这个url ，数据作为一个请求参数串。

Example: http://WebSite/ResourceName?Data=Data

Web Site:域名
Resource Name：ASP或者 Servlet的名字.
Data 是传给服务器的数据. 如果MIME 类型是 .Content-Type: application/x-www-form-urlencoded的话，你就需要encoder你的数据。

RFC 1738

RFC 1738 规范定义了Uniform Resource Locators (URLs) 里面的字符只能是US-ASCII 里面的一部分字符。另一方面，这个限定是因为HTML，可以在文档中使用所有 ISO-8859-1 (ISO-Latin) 。这就导致了你要提交HTML格式的串，需要encode。

ISO-8859-1 (ISO-Latin) Character Set

下表是所有的 ISO-8859-1的编码，对应256个字符。表中提供了这些字符的十进制，描述，值和在HTML中安全与否。这些字符大致分为两类，安全和非安全字符。

Character range(decimal)	Type	Values	Safe/Unsafe
0-31	ASCII Control Characters	These characters are not printable	Unsafe
32-47	Reserved Characters	' '!?#$%&'()*+,-./	Unsafe
48-57	ASCII Characters and Numbers	0-9	Safe
58-64	Reserved Characters	:;<=>?@	Unsafe
65-90	ASCII Characters	A-Z	Safe
91-96	Reserved Characters	[/]^_`	Unsafe
97-122	ASCII Characters	a-z	Safe
123-126	Reserved Characters	{\|}~	Unsafe
127	Control Characters	' '	Unsafe
128-255	Non-ASCII Characters	' '	Unsafe

所有的非安全的ASCII码字符都必须encode，比如，值在(32-47, 58-64, 91-96, 123-126)范围的字符.

下表说明这些字符为什么不安全。

Character	Unsafe Reason	Character Encode
"<"	Delimiters around URLs in free text	%3C
>	Delimiters around URLs in free text	%3E
.	Delimits URLs in some systems	%22
#	It is used in the World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it.	%23
{	Gateways and other transport agents are known to sometimes modify such characters	%7B
}	Gateways and other transport agents are known to sometimes modify such characters	%7D
\|	Gateways and other transport agents are known to sometimes modify such characters	%7C
/	Gateways and other transport agents are known to sometimes modify such characters	%5C
^	Gateways and other transport agents are known to sometimes modify such characters	%5E
~	Gateways and other transport agents are known to sometimes modify such characters	%7E
[	Gateways and other transport agents are known to sometimes modify such characters	%5B
]	Gateways and other transport agents are known to sometimes modify such characters	%5D
`	Gateways and other transport agents are known to sometimes modify such characters	%60
+	Indicates a space (spaces cannot be used in a URL)	%20
/	Separates directories and subdirectories	%2F
?	Separates the actual URL and the parameters	%3F
&	Separator between parameters specified in the URL	%26

How It Is Done

URL encoding 一个字符就是转化为%加上它的ASCII码，比如， US-ASCII 字符集中的空格，十六机制为20，encode以后为%20.

URLEncode: URLEncode 是个 C++ 类, 用于encode字符串。 CURLEncode 类有一下成员.

isUnsafeString
decToHex
convert
URLEncode

URLEncode() 方法完成 encoding 过程. URLEncode 检查每个字符是否是不安全字符，如果是就转化为%+ASCII码的格式。

参考文档完毕

以下是我写的两个针对utf8汉字格式的函数：

string URLDecode(string &strSrc)
{
string buffer = "";
int len = strSrc.length();

for (int i = 0; i < len; i++)
{
      int j = i ;
      char ch = strSrc.at(j);
      if (ch == '%')
   {
  char tmpstr[] = "0x0__";
  int chnum;
  tmpstr[3] = strSrc.at(j+1);
  tmpstr[4] = strSrc.at(j+2);
  chnum = strtol(tmpstr, NULL, 16);
  buffer += chnum;
  i += 2;
      }
   else if(ch == '+')
   {
    buffer += ' ';
   }
   else
   {
  buffer += ch;
      }
   }

len=MultiByteToWideChar(CP_UTF8, 0, buffer.c_str (), -1, NULL,0);
    WCHAR * wszUtf8 = new WCHAR[len+1];
if( wszUtf8 == NULL )
{
  strSrc = "";
  return strSrc;
}
    memset(wszUtf8, 0, len * 2 + 2);
MultiByteToWideChar(CP_UTF8, 0, buffer.c_str (), -1, wszUtf8, len);

len = WideCharToMultiByte(CP_ACP, 0, wszUtf8, -1, NULL, 0, NULL, NULL);
char *szUtf8=new char[len + 1];
if( szUtf8 == NULL )
{
  delete[] wszUtf8;
  strSrc = "";
  return strSrc;
}
memset(szUtf8, 0, len + 1);
WideCharToMultiByte (CP_ACP, 0, wszUtf8, -1, LPSTR(szUtf8), len, NULL,NULL);

strSrc = szUtf8;
delete[] szUtf8;
delete[] wszUtf8;

return strSrc;
}

string URLEncode(string &strSrc)
{
int len=MultiByteToWideChar(CP_ACP, 0, strSrc.c_str (), -1, NULL,0);
    WCHAR * wszUtf8 = new WCHAR[len+1];
if( wszUtf8 == NULL )
{
  strSrc = "";
  return strSrc;
}
    memset(wszUtf8, 0, len * 2 + 2);
MultiByteToWideChar(CP_ACP, 0, strSrc.c_str (), -1, wszUtf8, len);

len = WideCharToMultiByte(CP_UTF8, 0, wszUtf8, -1, NULL, 0, NULL, NULL);
char *szUtf8=new char[len + 1];
if( wszUtf8 == NULL )
{
  delete[] wszUtf8;
  strSrc = "";
  return strSrc;
}
memset(szUtf8, 0, len + 1);
WideCharToMultiByte (CP_UTF8, 0, wszUtf8, -1, LPSTR(szUtf8), len, NULL,NULL);

szUtf8[len]=0;
string strResult = szUtf8;

char num[4];

//处理特殊字符，转化为%ASCII码的形式，‘ ’转为‘+’
string::size_type index1 = 0 ;
while ( (index1 =strResult.find_first_of( "/"< >%//^[]`+$,@:;/!#?=&", index1)) != string::npos )
{
memset( num, 0, sizeof(num) );

  if( strResult[index1]==' ')
  {
   num[0] = '+';
   strResult.replace (index1, 1, num);
   index1+=1;
   continue;
  }
  sprintf_s( num, "%c%2x", '%', (strResult[index1]&255 ));
  strResult.replace (index1, 1, num);
  index1+=3;
}

strSrc ="";
for(unsigned i=0;i<strResult.length ();i++)
{
  memset( num, 0, sizeof(num) );
  strResult[i]&=255;
  if( strResult[i] <= 32 || strResult[i]>= 123 )
  {
   strSrc+='%';
   sprintf_s( num, "%2x", strResult[i]&255 );
   strSrc+=num;
  }
  else
  {
   strSrc+=strResult[i];
  }
}

delete[] szUtf8;
delete[] wszUtf8;

return strSrc;

}

以下是gb的两个函数：

string URLDecode(string &strSrc)
{
string buffer = "";
int len = strSrc.length();

return buffer;
}

string URLEncode(string &strSrc)
{
string strResult = strSrc;

char num[4];

return strSrc;

}

hbu_dcf

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
转码与解码

c++里面没有现成的urlencode和urldecode函数，所以我自己写了两个函数。网上许多代码都是针对gb2312格式的汉字，我写的这个既有适用于utf8格式的汉字，也有gb的。下面先说下差别：比如汉字中国，utf8 的中国编码后："%e4%b8%ad%e5%9b%bd"， gb2312的中国编码后："%d6%d0
复制链接

扫一扫

专栏目录