我正在尝试编写一个正则表达式,它将除去除src属性之外的所有标记属性。例如:
This is a paragraph with an image
将退回如下:
This is a paragraph with an image
我有一个正则表达式来去除所有属性,但是我正在尝试调整它以将其留在SRC中。以下是我目前为止的情况:
<?php preg_replace('/<([A-Z][A-Z0-9]*)(\b[^>]*)>/i', '', '');
为此使用php的preg_replace()。
谢谢!
伊恩
最佳答案
这可能适用于您的需求:$text = '
This is a paragraph with an image
';echo preg_replace("/]*(\ssrc=['\"][^'\"]*['\"]))?[^>]*?(\/?)>/i",'', $text);
//
This is a paragraph with an image
regexp分解如下:
/ # Start Pattern
< # Match '
( # Start Capture Group $1 - Tag Name
[a-z] # Match 'a' through 'z'
[a-z0-9]* # Match 'a' through 'z' or '0' through '9' zero or more times
) # End Capture Group
(?: # Start Non-Capture Group
[^>]* # Match anything other than '>', Zero or More Times
( # Start Capture Group $2 - ' src="...."'
\s # Match one whitespace
src= # Match 'src='
['"] # Match ' or "
[^'"]* # Match anything other than ' or "
['"] # Match ' or "
) # End Capture Group 2
)? # End Non-Capture Group, match group zero or one time
[^>]*? # Match anything other than '>', Zero or More times, not-greedy (wont eat the /)
(\/?) # Capture Group $3 - '/' if it is there
> # Match '>'
/i # End Pattern - Case Insensitive
添加一些引用,并使用替换文本it should strip any nonsrc=properties from well-formed html tags.
请注意,这不一定适用于所有输入,因为反HTML+regexp的人在下面非常聪明地指出。有一些回退,最显著的是,
将结束
">和其他一些破碎的问题…我建议在PHP中将Zend_Filter_StripTags视为完整的证明标记/属性过滤器。