4 个答案:
答案 0 :(得分:8)
您的字符串看起来像UCS-4编码,您可以尝试
$first = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($m) {
$char = current($m);
$utf = iconv('UTF-8', 'UCS-4', $char);
return sprintf("%s;", ltrim(strtoupper(bin2hex($utf)), "0"));
}, $string);
输出
string 'Français' (length=13)
答案 1 :(得分:8)
$output = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($match) {
list($utf8) = $match;
$binary = mb_convert_encoding($utf8, 'UTF-32BE', 'UTF-8');
$entity = vsprintf('%X;', unpack('N', $binary));
return $entity;
}, $input);
这与使用UTF-32BE然后unpack和vsprintf的@ Baba答案类似,以满足格式化需求。
$output = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($match) {
list($utf8) = $match;
$binary = iconv('UTF-8', 'UTF-32BE', $utf8);
$entity = vsprintf('%X;', unpack('N', $binary));
return $entity;
}, $input);
答案 2 :(得分:4)
首先,当我最近遇到这个问题时,我通过确保我的代码文件,数据库连接和数据库表都是UTF-8来解决它然后,简单地回显文本的工作原理。如果必须使用htmlspecialchars()而不是htmlentities()来转义数据库的输出,那么UTF-8符号将保持不变,并且不会尝试转义。
想要记录替代解决方案,因为它为我解决了类似的问题。
我正在使用PHP的utf8_encode()来逃避特殊的'字符。
我想将它们转换为HTML实体进行显示,我编写这段代码是因为我想尽可能避免使用iconv或类似的功能,因为并非所有环境都必须拥有它们(如果不是这样的话,请纠正我!)< / p>$foo = 'This is my test string \u03b50';
echo unicode2html($foo);
function unicode2html($string) {
return preg_replace('/\\\\u([0-9a-z]{4})/', '$1;', $string);
}
希望这可以帮助有需要的人: - )
答案 3 :(得分:0)
使用示例:
echo "Get string from numeric DEC value\n";
var_dump(mb_chr(50319, 'UCS-4BE'));
var_dump(mb_chr(271));
echo "\nGet string from numeric HEX value\n";
var_dump(mb_chr(0xC48F, 'UCS-4BE'));
var_dump(mb_chr(0x010F));
echo "\nGet numeric value of character as DEC string\n";
var_dump(mb_ord('ď', 'UCS-4BE'));
var_dump(mb_ord('ď'));
echo "\nGet numeric value of character as HEX string\n";
var_dump(dechex(mb_ord('ď', 'UCS-4BE')));
var_dump(dechex(mb_ord('ď')));
echo "\nEncode / decode to DEC based HTML entities\n";
var_dump(mb_htmlentities('tchüß', false));
var_dump(mb_html_entity_decode('tchüß'));
echo "\nEncode / decode to HEX based HTML entities\n";
var_dump(mb_htmlentities('tchüß'));
var_dump(mb_html_entity_decode('tchüß'));
echo "\nUse JSON encoding / decoding\n";
var_dump(codepoint_encode("tchüß"));
var_dump(codepoint_decode('tch\u00fc\u00df'));
输出:
Get string from numeric DEC value
string(4) "ď"
string(2) "ď"
Get string from numeric HEX value
string(4) "ď"
string(2) "ď"
Get numeric value of character as DEC int
int(50319)
int(271)
Get numeric value of character as HEX string
string(4) "c48f"
string(3) "10f"
Encode / decode to DEC based HTML entities
string(15) "tchüß"
string(7) "tchüß"
Encode / decode to HEX based HTML entities
string(15) "tchüß"
string(7) "tchüß"
Use JSON encoding / decoding
string(15) "tch\u00fc\u00df"
string(7) "tchüß"