1、编码转换(to Unicode)
(程序代码来源于网络)
Js版
<script>
test = "你好abc"
str = ""
for( i=0; i<test.length; i++ )
{
temp = test.charCodeAt(i).toString(16);
str += "\\u"+ new Array(5-String(temp).length).join("0") +temp;
}
document.write (str)
</script>
vbs版
Function Unicode(str1)
Dim str,temp
str = ""
For i=1 to len(str1)
temp = Hex(AscW(Mid(str1,i,1)))
If len(temp) < 5 Then temp = right("0000" & temp, 4)
str = str & "\u" & temp
Next
Unicode = str
End Function
Function htmlentities(str)
For i = 1 to Len(str)
char = mid(str, i, 1)
If Ascw(char) > 128 then
htmlentities = htmlentities & "&#" & Ascw(char) & ";"
Else
htmlentities = htmlentities & char
End if
Next
End Function
coldfusion版
function nochaoscode(str)
{
var new_str = “”;
for(i=1; i lte len(str);i=i+1){
if(asc(mid(str,i,1)) lt 128){
new_str = new_str & mid(str,i,1);
}else{
new_str = new_str & “&##” & asc(mid(str,i,1));
}
}
return new_str;
}
附:
在php中我们可以用mbstring的mb_convert_encoding函数实现这个正向及反向的转化。 如:
mb_convert_encoding ("你好", "HTML-ENTITIES", "gb2312"); //输出:你好
mb_convert_encoding ("你好", "gb2312", "HTML-ENTITIES"); //输出:你好
如果需要对整个页面转化,则只需要在php文件的头部加上这三行代码:
mb_internal_encoding("gb2312"); // 这里的gb2312是你网站原来的编码
mb_http_output("HTML-ENTITIES");
ob_start('mb_output_handler');
如果没有打开mbstring扩展,可以参考coolcode.cn上的这两篇文章: 在任意字符集下正常显示网页的方法 在任意字符集下正常显示网页的方法(续)
2、HTML实体
HTML 4.01 支持 ISO 8859-1 (Latin-1) 字符集。
提示 实体名是区分大小写的。
备注 同一个符号,可以用“实体名称”和“实体编号”两种方式引用,“实体名称”的优势在于便于记忆,但不能保证所有的浏览器都能顺利识别它,而“实体编号”则没有这种担忧,但它实在不方便记忆。
ASCII中部分实体的新名字
显示 | 描述 | 实体名称 | 实体编号 | " | quotation mark | " | " | ' | apostrophe | ' (IE下无效) | ' | & | ampersand | & | & | < | less-than | < | < | > | greater-than | > | > |
ISO 8859-1 符号实体
显示 | 描述 | 实体名称 | 实体编号 | | non-breaking space | |   | ¡ | inverted exclamation mark | ¡ | ¡ | ¤ | currency | ¤ | ¤ | ¢ | cent | ¢ | ¢ | £ | pound | £ | £ | ¥ | yen | ¥ | ¥ | ¦ | broken vertical bar | ¦ | ¦ | § | section | § | § | ¨ | spacing diaeresis | ¨ | ¨ | © | copyright | © | © | a | feminine ordinal indicator | ª | ª | « | angle quotation mark (left) | « | « | ? | negation | ¬ | ¬ | - | soft hyphen | ­ | ­ | ® | registered trademark | ® | ® | ™ | trademark | ™ | ™ | ˉ | spacing macron | ¯ | ¯ | ° | degree | ° | ° | ± | plus-or-minus | ± | ± | 2 | superscript 2 | ² | ² | 3 | superscript 3 | ³ | ³ | ′ | spacing acute | ´ | ´ | μ | micro | µ | µ | ? | paragraph | ¶ | ¶ | · | middle dot | · | · | ? | spacing cedilla | ¸ | ¸ | 1 | superscript 1 | ¹ | ¹ | o | masculine ordinal indicator | º | º | » | angle quotation mark (right) | » | » | ? | fraction 1/4 | ¼ | ¼ | ? | fraction 1/2 | ½ | ½ | ? | fraction 3/4 | ¾ | ¾ | ? | inverted question mark | ¿ | ¿ | × | multiplication | × | × | ÷ | division | ÷ | ÷ |
ISO 8859-1 字符实体
显示 | 描述 | 实体名称 | 实体编号 | À | capital a, grave accent | À | À | Á | capital a, acute accent | Á | Á | Â | capital a, circumflex accent | Â | Â | Ã | capital a, tilde | Ã | Ã | Ä | capital a, umlaut mark | Ä | Ä | Å | capital a, ring | Å | Å | Æ | capital ae | Æ | Æ | Ç | capital c, cedilla | Ç | Ç | È | capital e, grave accent | È | È | É | capital e, acute accent | É | É | Ê | capital e, circumflex accent | Ê | Ê | Ë | capital e, umlaut mark | Ë | Ë | Ì | capital i, grave accent | Ì | Ì | Í | capital i, acute accent | Í | Í | Î | capital i, circumflex accent | Î | Î | Ï | capital i, umlaut mark | Ï | Ï | Ð | capital eth, Icelandic | Ð | Ð | Ñ | capital n, tilde | Ñ | Ñ | Ò | capital o, grave accent | Ò | Ò | Ó | capital o, acute accent | Ó | Ó | Ô | capital o, circumflex accent | Ô | Ô | Õ | capital o, tilde | Õ | Õ | Ö | capital o, umlaut mark | Ö | Ö | Ø | capital o, slash | Ø | Ø | ù | capital u, grave accent | Ù | Ù | ú | capital u, acute accent | Ú | Ú | ? | capital u, circumflex accent | Û | Û | ü | capital u, umlaut mark | Ü | Ü | Y | capital y, acute accent | Ý | Ý | T | capital THORN, Icelandic | Þ | Þ | ? | small sharp s, German | ß | ß | à | small a, grave accent | à | à | á | small a, acute accent | á | á | a | small a, circumflex accent | â | â | ? | small a, tilde | ã | ã | ? | small a, umlaut mark | ä | ä | ? | small a, ring | å | å | ? | small ae | æ | æ | ? | small c, cedilla | ç | ç | è | small e, grave accent | è | è | é | small e, acute accent | é | é | ê | small e, circumflex accent | ê | ê | ? | small e, umlaut mark | ë | ë | ì | small i, grave accent | ì | ì | í | small i, acute accent | í | í | ? | small i, circumflex accent | î | î | ? | small i, umlaut mark | ï | ï | e | small eth, Icelandic | ð | ð | ? | small n, tilde | ñ | ñ | ò | small o, grave accent | ò | ò | ó | small o, acute accent | ó | ó | ? | small o, circumflex accent | ô | ô | ? | small o, tilde | õ | õ | ? | small o, umlaut mark | ö | ö | ? | small o, slash | ø | ø | ù | small u, grave accent | ù | ù | ú | small u, acute accent | ú | ú | ? | small u, circumflex accent | û | û | ü | small u, umlaut mark | ü | ü | y | small y, acute accent | ý | ý | t | small thorn, Icelandic | þ | þ | ? | small y, umlaut mark | ÿ | ÿ |
其它一些 HTML 所支持的实体
显示 | 描述 | 实体名称 | 实体编号 | Œ | capital ligature OE | Œ | Œ | œ | small ligature oe | œ | œ | Š | capital S with caron | Š | Š | š | small S with caron | š | š | Ÿ | capital Y with diaeres | Ÿ | Ÿ | ˆ | modifier letter circumflex accent | ˆ | ˆ | ˜ | small tilde | ˜ | ˜ | | en space |   |   | | em space |   |   | | thin space |   |   | | zero width non-joiner | ‌ | ‌ | | zero width joiner | ‍ | ‍ | | left-to-right mark | ‎ | ‎ | | right-to-left mark | ‏ | ‏ | – | en dash | – | – | — | em dash | — | — | ‘ | left single quotation mark | ‘ | ‘ | ’ | right single quotation mark | ’ | ’ | ‚ | single low-9 quotation mark | ‚ | ‚ | “ | left double quotation mark | “ | “ | ” | right double quotation mark | ” | ” | „ | double low-9 quotation mark | „ | „ | † | dagger | † | † | ‡ | double dagger | ‡ | ‡ | … | horizontal ellipsis | … | … | ‰ | per mille | ‰ | ‰ | ‹ | single left-pointing angle quotation | ‹ | ‹ | › | single right-pointing angle quotation | › | › | | euro | € | € |
|