用fabpot/goutte([url]https://github.com/FriendsOfPHP/Goutte[/url])抓取网页的时候,发现无论目标页面是什么编码(gb2312...),最后得到的都是unicode。
研究下发现是Symfony的crawler调用了html-entities编码。
然后,wiki百科上普及了下基础知识。。。html-entities编码用的是unicode ([url]http://en.wikipedia.org/wiki/Character_encodings_in_HTML[/url])。
[quote]A numeric character reference in HTML refers to a character by its Universal Character Set/Unicode code point[/quote]
特此记录。
研究下发现是Symfony的crawler调用了html-entities编码。
mb_convert_encoding($content, 'HTML-ENTITIES', $charset);
然后,wiki百科上普及了下基础知识。。。html-entities编码用的是unicode ([url]http://en.wikipedia.org/wiki/Character_encodings_in_HTML[/url])。
[quote]A numeric character reference in HTML refers to a character by its Universal Character Set/Unicode code point[/quote]
特此记录。