php utf8 bom,php-如何删除多个UTF-8 BOM序列

最新推荐文章于 2021-04-12 11:41:50 发布

神经现实

最新推荐文章于 2021-04-12 11:41:50 发布

阅读量267

点赞数

文章标签： php utf8 bom

php-如何删除多个UTF-8 BOM序列

使用PHP5(cgi)从文件系统输出模板文件，并出现吐出原始HTML的问题。

private function fetch($name) {

$path = $this->j->config['template_path'] . $name . '.html';

if (!file_exists($path)) {

dbgerror('Could not find the template "' . $name . '" in ' . $path);

}

$f = fopen($path, 'r');

$t = fread($f, filesize($path));

fclose($f);

if (substr($t, 0, 3) == b'\xef\xbb\xbf') {

$t = substr($t, 3);

}

return $t;

}

即使我添加了BOM修复程序，但Firefox接受它仍存在问题。您可以在此处查看实时副本：[http://ircb.in/jisti/](以及要检出的模板文件，我将其放在[http://ircb.in/jisti/home.html]中) )

任何想法如何解决这个问题？ o_o

11个解决方案

129 votes

您将使用以下代码删除utf8 bom

//Remove UTF8 Bom

function remove_utf8_bom($text)

{

$bom = pack('H*','EFBBBF');

$text = preg_replace("/^$bom/", '', $text);

return $text;

}

jasonhao answered 2020-02-14T06:40:40Z

34 votes

尝试：

// -------- read the file-content ----

$str = file_get_contents($source_file);

// -------- remove the utf-8 BOM ----

$str = str_replace("\xEF\xBB\xBF",'',$str);

// -------- get the Object from JSON ----

$obj = json_decode($str);

o1max answered 2020-02-14T06:41:05Z

13 votes

删除BOM的另一种方法是Unicode代码点U + FEFF

$str = preg_replace('/\x{FEFF}/u', '', $file);

Dean Or answered 2020-02-14T06:41:25Z

7 votes

b'\xef\xbb\xbf'代表文字字符串“ \ xef \ xbb \ xbf”。如果要检查BOM，则需要使用双引号，因此\x序列实际上被解释为字节：

"\xef\xbb\xbf"

您的文件似乎还包含很多垃圾，而不仅仅是一个领先的BOM：

$ curl http://ircb.in/jisti/ | xxd

0000000: efbb bfef bbbf efbb bfef bbbf efbb bfef ................

0000010: bbbf efbb bf3c 2144 4f43 5459 5045 2068 ...../p>

0000020: 746d 6c3e 0a3c 6874 6d6c 3e0a 3c68 6561 tml>..

...

deceze answered 2020-02-14T06:41:52Z

4 votes

此全局函数解决了UTF-8系统基本字符集。坦克！

function prepareCharset($str) {

// set default encode

mb_internal_encoding('UTF-8');

// pre filter

if (empty($str)) {

return $str;

}

// get charset

$charset = mb_detect_encoding($str, array('ISO-8859-1', 'UTF-8', 'ASCII'));

if (stristr($charset, 'utf') || stristr($charset, 'iso')) {

$str = iconv('ISO-8859-1', 'UTF-8//TRANSLIT', utf8_decode($str));

} else {

$str = mb_convert_encoding($str, 'UTF-8', 'UTF-8');

}

// remove BOM

$str = urldecode(str_replace("%C2%81", '', urlencode($str)));

// prepare string

return $str;

}

Patrick Otto answered 2020-02-14T06:42:14Z

3 votes

如果有人使用csv import，那么下面的代码很有用

$header = fgetcsv($handle);

foreach($header as $key=> $val) {

$bom = pack('H*','EFBBBF');

$val = preg_replace("/^$bom/", '', $val);

$header[$key] = $val;

}

phvish answered 2020-02-14T06:42:35Z

1 votes

另一种执行相同工作的方法：

function remove_utf8_bom_head($text) {

if(substr(bin2hex($text), 0, 6) === 'efbbbf') {

$text = substr($text, 3);

}

return $text;

}

我发现的其他方法在我的情况下不起作用。

希望它在某些特殊情况下有所帮助。

Alfred Huang answered 2020-02-14T06:43:04Z

1 votes

如果您正在阅读使用file_get_contents的一些API并从json_decode获得了无法解释的NULL，请检查json_last_error()的值：有时从file_get_contents返回的值将具有无关的BOM，当您检查字符串时，该BOM几乎是不可见的，但将使json_last_error()返回27468480252839901 (4)。

>>> $json = file_get_contents("http://api-guiaserv.seade.gov.br/v1/orgao/all");

=> "\t{"orgao":[{"Nome":"Tribunal de Justi\u00e7a","ID_Orgao":"59","Condicao":"1"}, ...]}"

>>> json_decode($json);

=> null

>>>

在这种情况下，请检查前3个字节-回显它们不是很有用，因为BOM表在大多数设置中不可见：

>>> substr($json, 0, 3)

=> " "

>>> substr($json, 0, 3) == pack('H*','EFBBBF');

=> true

>>>

如果上面的行为您返回TRUE，那么一个简单的测试可能会解决该问题：