php-如何删除html特殊字符?
我正在为我的应用程序创建一个RSS feed文件,其中我想删除HTML标记,这是由strip_tags完成的。但是strip_tags并没有删除HTML特殊代码字符:
& ©
等等
请告诉我任何可用于从字符串中删除这些特殊代码字符的函数。
Prashant asked 2019-12-27T18:16:33Z
14个解决方案
106 votes
使用html_entity_decode对其进行解码,或者使用preg_replace对其进行删除:
$Content = preg_replace("/?[a-z0-9]+;/i","",$Content);
(从这里)
编辑:根据雅科的评论的替代
可能很好用替换“ +” {2,8}之类的。 这将限制 更换整个的机会 未编码的'&'为 当下。
$Content = preg_replace("/?[a-z0-9]{2,8};/i","",$Content);
schnaader answered 2019-12-27T18:16:56Z
20 votes
使用html_entity_decode转换HTML实体。
您需要设置字符集以使其正常工作。
andi answered 2019-12-27T18:17:21Z
16 votes
除了上面的好答案之外,PHP还具有一个非常有用的内置过滤器功能:filter-var。
要删除HTML字符,请使用:
$cleanString = filter_var($dirtyString, FILTER_SANITIZE_STRING);
更多信息:
function.filter-var
filter_sanitize_string
gpkamp answered 2019-12-27T18:18:03Z
8 votes
您可能想在这里看看htmlentities()和html_entity_decode()
$orig = "I'll \"walk\" the dog now";
$a = htmlentities($orig);
$b = html_entity_decode($a);
echo $a; // I'll "walk" the <b>dog</b> now
echo $b; // I'll "walk" the dog now
0xFF answered 2019-12-27T18:18:23Z
4 votes
这对于删除特殊字符可能效果很好。
$modifiedString = preg_replace("/[^a-zA-Z0-9_.-\s]/", "", $content);
Vinit Kadkol answered 2019-12-27T18:18:42Z
2 votes
一个普通的香草弦方式可以做到这一点而无需使用preg regex引擎:
function remEntities($str) {
if(substr_count($str, '&') && substr_count($str, ';')) {
// Find amper
$amp_pos = strpos($str, '&');
//Find the ;
$semi_pos = strpos($str, ';');
// Only if the ; is after the &
if($semi_pos > $amp_pos) {
//is a HTML entity, try to remove
$tmp = substr($str, 0, $amp_pos);
$tmp = $tmp. substr($str, $semi_pos + 1, strlen($str));
$str = $tmp;
//Has another entity in it?
if(substr_count($str, '&') && substr_count($str, ';'))
$str = remEntities($tmp);
}
}
return $str;
}
karim79 answered 2019-12-27T18:19:02Z
2 votes
我所做的是使用:html_entity_decode,然后使用strip_tags删除它们。
Gwapz Juan answered 2019-12-27T18:19:22Z
2 votes
尝试这个
$str = "\x8F!!!";
// Outputs an empty string
echo htmlentities($str, ENT_QUOTES, "UTF-8");
// Outputs "!!!"
echo htmlentities($str, ENT_QUOTES | ENT_IGNORE, "UTF-8");
?>
RaGu answered 2019-12-27T18:19:42Z
1 votes
看起来您真正想要的是:
function xmlEntities($string) {
$translationTable = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
foreach ($translationTable as $char => $entity) {
$from[] = $entity;
$to[] = ''.ord($char).';';
}
return str_replace($from, $to, $string);
}
它将命名实体替换为其与数字等效的实体。
Jacco answered 2019-12-27T18:20:06Z
1 votes
function strip_only($str, $tags, $stripContent = false) {
$content = '';
if(!is_array($tags)) {
$tags = (strpos($str, '>') !== false
? explode('>', str_replace('
: array($tags));
if(end($tags) == '') array_pop($tags);
}
foreach($tags as $tag) {
if ($stripContent)
$content = '(.+'.$tag.'[^>]*>|)';
$str = preg_replace('#?'.$tag.'[^>]*>'.$content.'#is', '', $str);
}
return $str;
}
$str = 'red text';
$tags = 'font';
$a = strip_only($str, $tags); // red text
$b = strip_only($str, $tags, true); // text
?>
jahanzaib answered 2019-12-27T18:20:22Z
1 votes
我用来执行任务的功能是加入schnaader进行的升级:
mysql_real_escape_string(
preg_replace_callback("/?[a-z0-9]+;/i", function($m) {
return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES");
}, strip_tags($row['cuerpo'])))
此函数删除所有以UTF-8格式转换并准备保存在MySQL中的html标记和html符号
Lalala answered 2019-12-27T18:20:46Z
1 votes
如果您要转换HTML特殊字符,而不仅仅是删除它们以及剥离内容并准备纯文本,这是对我有用的解决方案...
function htmlToPlainText($str){
$str = str_replace(' ', ' ', $str);
$str = html_entity_decode($str, ENT_QUOTES | ENT_COMPAT , 'UTF-8');
$str = html_entity_decode($str, ENT_HTML5, 'UTF-8');
$str = html_entity_decode($str);
$str = htmlspecialchars_decode($str);
$str = strip_tags($str);
return $str;
}
$string = '
this is ( ) a test
htmlToPlainText($string);
// "this is ( ) a test. Yes this is! & does it get processed?"`
带有ENT_QUOTES的html_entity_decode | ENT_XML1转换类似于'htmlspecialchars_decode转换类似&之类的东西html_entity_decode转换类似'<的内容和strip_tags删除所有剩余的HTML标签。
编辑-添加了str_replace(''','',$ str); 以及其他几个html_entity_decode()(持续测试表明他们需要它们)。
Jay answered 2019-12-27T18:21:16Z
0 votes
您可以尝试htmlspecialchars_decode($string)。它对我有用。
[http://www.w3schools.com/php/func_string_htmlspecialchars_decode.asp]
surabhivin answered 2019-12-27T18:21:40Z
-1 votes
$string = "äáčé";
$convert = Array(
'ä'=>'a',
'Ä'=>'A',
'á'=>'a',
'Á'=>'A',
'à'=>'a',
'À'=>'A',
'ã'=>'a',
'Ã'=>'A',
'â'=>'a',
'Â'=>'A',
'č'=>'c',
'Č'=>'C',
'ć'=>'c',
'Ć'=>'C',
'ď'=>'d',
'Ď'=>'D',
'ě'=>'e',
'Ě'=>'E',
'é'=>'e',
'É'=>'E',
'ë'=>'e',
);
$string = strtr($string , $convert );
echo $string; //aace
Zombyii answered 2019-12-27T18:21:56Z