您如何确定您的字串正是这样?像这样的输入呢:
这是什么编程语言?您是否出于某种原因没有使用标准的HTML解析类来处理此问题?当您有一组非常著名的输入时,正则表达式只是一种好方法。它们不适用于真正的HTML,仅适用于装配的演示。
即使必须使用正则表达式,也应使用适当的语法。这很容易。我已经在不计其数的网页上测试了以下programacita。它处理了我上面概述的情况,也处理了另外一两个情况。
#!/usr/bin/perl
use 5.10.0;
use strict;
use warnings;
my $img_rx = qr{
# save capture in $+{TAG} variable
(? (?&image_tag) )
# remainder is pure declaration
(?(DEFINE)
(?
(?&start_tag)
(?&might_white)
(?&attributes)
(?&might_white)
(?&end_tag)
)
(?
(?:
(?&might_white)
(?&one_attribute)
) *
)
(?
\b
(?&legal_attribute)
(?&might_white) = (?&might_white)
(?:
(?"ed_value)
| (?&unquoted_value)
)
)
(?
(?: (?&required_attribute)
| (?&optional_attribute)
| (?&standard_attribute)
| (?&event_attribute)
# for LEGAL parse only, comment out next line
| (?&illegal_attribute)
)
)
(? \b \w+ \b )
(?
alt
| src
)
(?
(?&permitted_attribute)
| (?&deprecated_attribute)
)
# NB: The white space in string literals
# below DOES NOT COUNT! It's just
# there for legibility.
(?
height
| is map
| long desc
| use map
| width
)
(?
align
| border
| hspace
| vspace
)
(?
class
| dir
| id
| style
| title
| xml:lang
)
(?
on abort
| on click
| on dbl click
| on mouse down
| on mouse out
| on key down
| on key press
| on key up
)
(?
(?&unwhite_chunk)
)
(?
(? ["'] )
(?: (?! \k ) . ) *
\k
)
(?
(?:
# (?! [<>'"] )
(?! > )
\S
) +
)
(? \s * )
(?
< (?&might_white)
img
\b
)
(?
(?&html_end_tag)
| (?&xhtml_end_tag)
)
(? > )
(? / > )
)
}six;
$/ = undef;
$_ = <>; # read all input
# strip stuff we aren't supposed to look at
s{ }{}sx;
s{ }{}gsx;
s{ }{}gsix;
s{ }{}gsx;
my $count = 0;
while (/$img_rx/g) {
printf "Match %d at %d: %s\n",
++$count, pos(), $+{TAG};
}
哎呀,为什么你会永远想使用的HTML解析类,给出了如何轻松地HTML可以在正则表达式来处理。☺