php解码html,通过PHP解码数字html实体

我有这个代码将数字html实体解码为UTF8等效字符.

我正在尝试转换这个角色:

应该输出:

然而,它只是消失(没有输出). (我已经检查了页面的源代码,页面有正确的utf8字符集标题/元标记).

有谁知道代码有什么问题?

function entity_decode($string, $quote_style = ENT_COMPAT, $charset = "UTF-8") {

$string = html_entity_decode($string, $quote_style, $charset);

$string = preg_replace_callback('~([0-9a-fA-F]+);~i', "chr_utf8_callback", $string);

$string = preg_replace('~([0-9]+);~e', 'chr_utf8("\\1")', $string);

//this is another method, which also doesn't work..

//$string = preg_replace_callback("/(\[0-9]+;)/", "entity_decode_callback", $string);

return $string;

}

function chr_utf8_callback($matches) {

return chr_utf8(hexdec($matches[1]));

}

function chr_utf8($num) {

if ($num < 128) return chr($num);

if ($num < 2048) return chr(($num >> 6) + 192) . chr(($num & 63) + 128);

if ($num < 65536) return chr(($num >> 12) + 224) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);

if ($num < 2097152) return chr(($num >> 18) + 240) . chr((($num >> 12) & 63) + 128) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);

return '';

}

function entity_decode_callback($m) {

return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES");

}

echo '=' . entity_decode('');

解决方法:

html_entity_decode已经做了你想要的:

$string = '';

echo html_entity_decode($string, ENT_COMPAT, 'UTF-8');

它将返回角色:

’ binary hex: c292

这是PRIVATE USE TWO (U+0092).由于它是私人使用,您的PHP配置/版本/编译可能根本不会返回它.

还有一些更多的怪癖:

But in HTML (other than XHTML, which uses XML rules), it’s a long-standing browser quirk that character references in the range to are misinterpreted to mean the characters associated with bytes 128 to 159 in the Windows Western code page (cp1252) instead of the Unicode characters with those code points. The HTML5 standard finally documents this behaviour.

标签:html,php,character-encoding,utf-8

来源: https://codeday.me/bug/20190902/1790609.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值