domdocument40_DOMDocument和UTF-8问题

domdocument40

A few weeks back I shared how I used PHP DOMDocument to reliably update all image URLs from standard HTTP to HTTPS.  DOMDocument made a difficult problem seem incredibly easy ... but with one side-effect that it took me a while to spot:  UTF-8 characters were being mutated into another set of characters.  I was seeing a bunch of odd characters like "ãç³" and"»ã®é" all over each blog post.

几周前,我分享了如何使用PHP DOMDocument可靠地将所有图像URL从标准HTTP更新为HTTPS。 DOMDocument使一个棘手的问题似乎变得异常容易……但是有一个副作用,我花了一段时间才发现:UTF-8字符被突变为另一组字符。 我在每个博客文章中看到一堆奇怪的字符,例如“ãç³”和“»ã®é”。

I knew the problem was happening during the DOMDocument parsing and that I need to find a fix quickly.  The solution was just a tiny bit of code:

我知道在DOMDocument解析期间会发生问题,因此我需要快速找到修复程序。 解决方案只是一小段代码:


// Create a DOMDocument instance 
$doc = new DOMDocument();

// The fix: mb_convert_encoding conversion
$doc->loadHTML(mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8'));


After setting the character set with mb_convert_encoding, the odd characters vanished and the desired characters were back in place.  Phew!

mb_convert_encoding设置字符集mb_convert_encoding ,奇数字符消失了,所需的字符又恢复了原位。 !

翻译自: https://davidwalsh.name/domdocument-utf8-problem

domdocument40

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值