html%3c转译,[转]URL编码

URL编码

作者: Chandrasekhar Vuppalapati

翻译:eastvc

下载源代码

本文的目的是设计一个完成URL编码的C++类。在我曾经的项目中,我需要从VC++ 6.0应用程序中POST数据,而这些数据需要进行URL编码。我在MSDN中查找能根据提供的字符串生成URL编码的相关类或API,但我没有找到,因此我必须设计一个自己的URLEncode C++类。

URLEncoder.exe是一个使用URLEncode类的MFC对话框程序。

如何处理

一些特殊字符在Internet上传送是件棘手的事情, 经URL编码特殊处理,可以使所有字符安全地从Internet传送。

例如,回车的ASCII值是13,在发送FORM数据时候这就认为是一行数据的结束。

通常,所有应用程序采用HTTP或HTTPS协议在客户端和服务器端传送数据。服务器端从客户端接收数据有两种基本方法:

1、数据可以从HTTP头传送(COOKIES或作为FORM数据发送)

2、可以包含在URL中的查询部分

当数据包含在URL,它必须遵循URL语法进行编码。在WEB服务器端,数据自动解码。考虑一下下面的URL,哪个数据是作为查询参数。

例如:http://WebSite/ResourceName?Data=Data

WebSite是URL名称

ResourceName可以是ASP或Servlet名称

Data是需要发送的数据。如果MIME类型是Content-Type: application/x-www-form-urlencoded,则要求进行编码。

RFC 1738

RFC 1738指明了统一资源定位(URLs)中的字符应该是US-ASCII字符集的子集。这是受HTML的限制,另一方面,允许在文档中使用所有ISO- 8859-1(ISO-Latin)字符集。这将意味着在HTML FORM里POST的数据(或作为查询字串的一部分),所有HTML编码必须被编码。

ISO-8859-1 (ISO-Latin)字符集

在下表中,包含了完整的ISO-8859-1 (ISO-Latin)字符集,表格提供了每个字符范围(10进制),描述,实际值,十六进制值,HTML结果。某个范围中的字符是否安全。

Character range(decimal)

Type

Values

Safe/Unsafe

0-31

ASCII Control Characters

These characters are not printable

Unsafe

32-47

Reserved Characters

'' ''!?#$%&''()*+,-./

Unsafe

48-57

ASCII Characters and Numbers

0-9

Safe

58-64

Reserved Characters

:;<=>?@

Unsafe

65-90

ASCII Characters

A-Z

Safe

91-96

Reserved Characters

[\]^_`

Unsafe

97-122

ASCII Characters

a-z

Safe

123-126

Reserved Characters

{|}~

Unsafe

127

Control Characters

'' ''

Unsafe

128-255

Non-ASCII Characters

'' ''

Unsafe

所有不安全的ASCII字符都需要编码,例如,范围(32-47, 58-64, 91-96, 123-126)。

下表描述了这些字符为什么不安全。

Character

Unsafe Reason

Character Encode

"

Delimiters around URLs in free text

%3C

>

Delimiters around URLs in free text

%3E

.

Delimits URLs in some systems

%22

#

It is used in the World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it.

%23

{

Gateways and other transport agents are known to sometimes modify such characters

%7B

}

Gateways and other transport agents are known to sometimes modify such characters

%7D

|

Gateways and other transport agents are known to sometimes modify such characters

%7C

\

Gateways and other transport agents are known to sometimes modify such characters

%5C

^

Gateways and other transport agents are known to sometimes modify such characters

%5E

~

Gateways and other transport agents are known to sometimes modify such characters

%7E

[

Gateways and other transport agents are known to sometimes modify such characters

%5B

]

Gateways and other transport agents are known to sometimes modify such characters

%5D

`

Gateways and other transport agents are known to sometimes modify such characters

%60

+

Indicates a space (spaces cannot be used in a URL)

%20

/

Separates directories and subdirectories

%2F

?

Separates the actual URL and the parameters

%3F

&

Separator between parameters specified in the URL

%26

如何实现

字符的URL编码是将字符转换到8位16进制并在前面加上''%''前缀。例如,US-ASCII字符集中空格是10进制

的32或16进制的20,因此,URL编码是%20。

URLEncode: URLEncode是一个C++类,来实现字符串的URL编码。CURLEncode类包含如下函数:

isUnsafeString

decToHex

convert

URLEncode

URLEncode()函数完成编码过程,URLEncode检查每个字符,看是否安全。如果不安全将用%16进制值进行转换并添加

到原始字符串中。

代码片断:

class CURLEncode

{

private:

static CString csUnsafeString;

CString (char num, int radix);

bool isUnsafe(char compareChar);

CString convert(char val);

public:

CURLEncode() { };

virtual ~CURLEncode() { };

CString (CString vData);

};

bool CURLEncode::isUnsafe(char compareChar)

{

bool bcharfound = false;

char tmpsafeChar;

int m_strLen = 0;

m_strLen = csUnsafeString.GetLength();

for(int ichar_pos = 0; ichar_pos < m_strLen ;ichar_pos++)

{

tmpsafeChar = csUnsafeString.GetAt(ichar_pos);

if(tmpsafeChar == compareChar)

{

bcharfound = true;

break;

}

}

int char_ascii_value = 0;

//char_ascii_value = __toascii(compareChar);

char_ascii_value = (int) compareChar;

if(bcharfound == false && char_ascii_value > 32 &&

char_ascii_value < 123)

{

return false;

}

// found no unsafe chars, return false

else

{

return true;

}

return true;

}

CString CURLEncode::decToHex(char num, int radix)

{

int temp=0;

CString csTmp;

int num_char;

num_char = (int) num;

if (num_char < 0)

num_char = 256 + num_char;

while (num_char >= radix)

{

temp = num_char % radix;

num_char = (int)floor(num_char / radix);

csTmp = hexVals[temp];

}

csTmp += hexVals[num_char];

if(csTmp.GetLength() < 2)

{

csTmp += ''0'';

}

CString strdecToHex(csTmp);

// Reverse the String

strdecToHex.MakeReverse();

return strdecToHex;

}

CString CURLEncode::convert(char val)

{

CString csRet;

csRet += "%";

csRet += decToHex(val, 16);

return csRet;

}

参考:

URL编码: http://www.blooberry.com/indexdot/html/topics/urlencoding.htm.

RFC 1866: The HTML 2.0 规范 (纯文本). 附录包含了字符表: http://www.rfc-editor.org/rfc/rfc1866.txt.

Web HTML 2.0 版本(RFC 1866) : http://www.w3.org/MarkUp/html-spec/html-spec_13.html.

The HTML 3.2 (Wilbur) 建议: http://www.w3.org/MarkUp/Wilbur/.

The HTML 4.0 建议: http://www.w3.org/TR/REC-html40/.

W3C HTML 国际化区域: http://www.w3.org/International/O-HTML.html.

posted on 2007-01-04 13:46 永遇乐 阅读(2141) 评论(3)  编辑 收藏 引用 所属分类: 网络

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值